FS#68684 - [aspell-pl] Wrong encoding of dictionary file

Attached to Project: Community Packages
Opened by Krzysztof Miernik (kmiernik) - Friday, 20 November 2020, 11:44 GMT
Last edited by Toolybird (Toolybird) - Thursday, 04 May 2023, 07:43 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Johannes Löthberg (demize)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


File pl_PL.dic is supposed to be in UTF-8 encoding (result of uchardet), but it is not. Any program using this dictionary marks input text containing special characters as misspelled words. Package hunspell-pl is based on the same dictionary (https://sjp.pl/slownik/ort/) but the dictionary file pl_PL.dic is properly converted to UTF-8 encoding.

Additional info:
* package version(s): 20201011-1

Steps to reproduce:
* Use spell checker in any program using aspell (Firefox, Thunderbird, ...)
* Enter word containing special polish characters
This task depends upon

Closed by  Toolybird (Toolybird)
Thursday, 04 May 2023, 07:43 GMT
Reason for closing:  None
Additional comments about closing:  Without a reproducer, it's hard to identify a bug. Either way, it would seem to be an upstream issue. (Note: sjp-aspell6-pl-6.0_20230501-0.tar.bz2 is available upstream).
Comment by Johannes Löthberg (demize) - Saturday, 21 November 2020, 13:09 GMT
Both Firefox and Thunderbird use hunspell dictionaries, not aspell. Have you actually tried it with a program that uses aspell?

Grabbing some random Polish text from Wikipedia and piping it into `aspell -l pl -a` seems to work fine as it is, so would be interesting to see an actually reproduction case.
Comment by Krzysztof Miernik (kmiernik) - Saturday, 21 November 2020, 14:17 GMT
I had to remove aspell completely to make the spell checking actually work correctly. I had both aspell and hunspell installed, now I have only the latter and it works, so I assumed that this was due to encoding. Actually the pl_PL.dic file opened by any editor (vim, gedit, ...) show wrong symbols in places where polish letters should be. The hunspell version looks ok by visual inspection.