FS#68481 - [hunspell-de] German dictionary doesn't work in multi-dictionary mode with UTF8-encoded dicts
Attached to Project:
Arch Linux
Opened by Mikhail Skorzhinskii (rasmi) - Friday, 30 October 2020, 16:29 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 17 November 2020, 19:41 GMT
Opened by Mikhail Skorzhinskii (rasmi) - Friday, 30 October 2020, 16:29 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 17 November 2020, 19:41 GMT
|
Details
When user tries to use together DE with other dictionaries
encoded in UTF-8 it fails with the following error:
# hunspell -d ru_RU,de_DE 'error - iconv: ISO8859-1 -> UTF-8' This happens because german dictionary is encoded in ISO8859-1. This is discussed in more details here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=864864. After applying the same suggestion (basically encoding german dict into UTF-8) the problem is gone. I've filed a bug about this at hunspell github: https://github.com/hunspell/hunspell/issues/688 and also wrote a personal letter to the german dict maintainer. Unfortunately I didn't yet received any answer. Would it be possible to encode german (and possibly other dictionaries) into UTF-8 in ArchLinux? This is possibly a bad decision to fix it in distribution, but I see no other choice to improve user experience given that upstream reluctant to fix that. Package version: extra/hunspell-de 20161207-4 aur/hunspell-ru-aot 0.4.5-1 |
This task depends upon
# hunspell -d de_DE,ru_RU
Notice different orders of dictionaries in the command line. To fix that one need to fix dictionary .aff files. In .aff files there is a line with explicit file encoding setting:
SET ISO8859-1
But after encoding it to UTF-8 it should be
SET UTF-8
For example:
sed -i 's/SET ISO8859-1/SET UTF-8/' de_DE.aff
# hunspell -d de_DE,ru_RU
Hunspell 1.7.0
hallo
*
привет
error - iconv: ISO8859-1 -> UTF-8
*
But combining English and German dictionaries there is also a way to cause troubles. Example:
# hunspell -d de_DE,en_GB
Hunspell 1.7.0
fünfundfünfzig
& fünfundfünfzig 1 0: fünfundfünfzig
Changing the "SET ISO8859-1" line fixes this problem.
Please report back if this has been solved now. For the future please also try nuspell that may some day replace hunspell.
> For the future please also try nuspell that may some day replace hunspell.
Wow, that is a discovery for me. I played a little with it today. Reading the project description is looks very promising from my perspective. Will do much more later.