Community Packages

Please read this before reporting a bug:
http://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#68684 - [aspell-pl] Wrong encoding of dictionary file

Attached to Project: Community Packages
Opened by Krzysztof Miernik (kmiernik) - Friday, 20 November 2020, 11:44 GMT
Last edited by Andreas Radke (AndyRTR) - Friday, 20 November 2020, 16:57 GMT
Task Type Bug Report
Category Packages
Status Assigned
Assigned To Johannes Löthberg (demize)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 0
Private No

Details

Description:
File pl_PL.dic is supposed to be in UTF-8 encoding (result of uchardet), but it is not. Any program using this dictionary marks input text containing special characters as misspelled words. Package hunspell-pl is based on the same dictionary (https://sjp.pl/slownik/ort/) but the dictionary file pl_PL.dic is properly converted to UTF-8 encoding.


Additional info:
* package version(s): 20201011-1


Steps to reproduce:
* Use spell checker in any program using aspell (Firefox, Thunderbird, ...)
* Enter word containing special polish characters
This task depends upon

Comment by Johannes Löthberg (demize) - Saturday, 21 November 2020, 13:09 GMT
Both Firefox and Thunderbird use hunspell dictionaries, not aspell. Have you actually tried it with a program that uses aspell?

Grabbing some random Polish text from Wikipedia and piping it into `aspell -l pl -a` seems to work fine as it is, so would be interesting to see an actually reproduction case.
Comment by Krzysztof Miernik (kmiernik) - Saturday, 21 November 2020, 14:17 GMT
I had to remove aspell completely to make the spell checking actually work correctly. I had both aspell and hunspell installed, now I have only the latter and it works, so I assumed that this was due to encoding. Actually the pl_PL.dic file opened by any editor (vim, gedit, ...) show wrong symbols in places where polish letters should be. The hunspell version looks ok by visual inspection.

Loading...