FS#45158 - [hunspell-en] broken links and encoding in en_*.aff

Attached to Project: Arch Linux
Opened by Rodrigo Rivas Costa (rodrigorc) - Sunday, 31 May 2015, 19:14 GMT
Last edited by Andreas Radke (AndyRTR) - Thursday, 04 June 2015, 19:33 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Andreas Radke (AndyRTR)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:

Some of the files /usr/share/hunspell/*.aff from this package are wrong.

Many "en_*.aff" files are symbolic links to "en_GB.aff" but the latter does not exist.

Moreover, the ones that do exists contain "SET UTF8" in the header instead of the correct "SET UTF-8".

Additional info:

* hunspell-en-2015.05.18-1

Steps to reproduce:

Issue #1

$ cat /usr/share/hunspell/en_*.aff > /dev/null

cat: /usr/share/hunspell/en_AG.aff: No such file or directory
cat: /usr/share/hunspell/en_AU.aff: No such file or directory
...

Issue #2:

$ hunspell
error: unknown encoding UTF8: using iso88591 as fallback
...
Hunspell 1.3.3
^C
This task depends upon

Closed by  Andreas Radke (AndyRTR)
Thursday, 04 June 2015, 19:33 GMT
Reason for closing:  Fixed
Comment by Olivier Mehani (shtrom) - Monday, 01 June 2015, 04:49 GMT
Same issue here. Could en_GB.* symlinks be added to en_GB-large.*?

This should solve the problem seamlessly.
Comment by Rodrigo Rivas Costa (rodrigorc) - Monday, 01 June 2015, 08:03 GMT
> Could en_GB.* symlinks be added to en_GB-large.

Well, you can just do

# ln -s en_GB-large.aff en_GB.aff ; ln -s en_GB-large.dic en_GB.dic

and be happy about it. There still remains the issue with the UTF8 vs UTF-8 however.

Curiously the files en_GB-large.aff and en_US.aff are identical (the .dic files are not, of course):

$ md5sum en_US.aff en_GB-large.aff
8fe5adefda9a24fe9296ec9eaf6d515e en_US.aff
8fe5adefda9a24fe9296ec9eaf6d515e en_GB-large.aff
Comment by Olivier Mehani (shtrom) - Monday, 01 June 2015, 08:09 GMT
> Well, you can just do
> # ln -s en_GB-large.aff en_GB.aff ; ln -s en_GB-large.dic en_GB.dic

This is what I did, and this fixed the problem, but the package is still broken.
Comment by Charles Bos (Chazza) - Tuesday, 02 June 2015, 15:14 GMT
  • Field changed: Percent Complete (100% → 0%)
Links are not fixed. en_GB.aff and en_GB.dic still do not exist.
Comment by Charles Bos (Chazza) - Tuesday, 02 June 2015, 15:57 GMT
I think you need to add en_GB to en_GB_aliases in the PKGBUILD.
Comment by Charles Bos (Chazza) - Wednesday, 03 June 2015, 08:23 GMT
Regarding the UTF8 -> UTF-8 issue, you could fix that with something like this:

find $pkgdir -type f -name "*.aff" -exec sed -i "s/SET UTF8/SET UTF-8/g" {} \;
Comment by Andreas Radke (AndyRTR) - Thursday, 04 June 2015, 19:33 GMT
UTF-8 encoding fixed.

Loading...