FS#12888 - "sort" in core/coreutils-6.12-1 is broken with UTF-8 locales
Attached to Project:
Arch Linux
Opened by Daniel Thaler (danielthaler) - Thursday, 22 January 2009, 10:01 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 24 February 2009, 18:26 GMT
Opened by Daniel Thaler (danielthaler) - Thursday, 22 January 2009, 10:01 GMT
Last edited by Andreas Radke (AndyRTR) - Tuesday, 24 February 2009, 18:26 GMT
|
Details
"sort" in core/coreutils-6.12-1 is broken with UTF-8
locales
On my system I have LANG="en_US.UTF-8" running sort results in > sort: sort.c:1150: inittables_mb: Assertion `mblength != (size_t)-1 && mblength != (size_t)-2' failed. > Aborted No sorting takes place. If I call sort like this $ LANG="en_US" sort textfile.txt sorting works as expected. Since sort is used by a number of scripts/utilities on the system (mkinitcpio for example) this is no good as a woraround. This problem is NOT an upstream problem; the bug is in Arch's coreutils-i18n.patch, which is applied by the pkgbuild. |
This task depends upon
Closed by Andreas Radke (AndyRTR)
Tuesday, 24 February 2009, 18:26 GMT
Reason for closing: Won't fix
Additional comments about closing: please ask for reopening if the bug resist in coreutils 7.1 with any locale setting.
Tuesday, 24 February 2009, 18:26 GMT
Reason for closing: Won't fix
Additional comments about closing: please ask for reopening if the bug resist in coreutils 7.1 with any locale setting.
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE=C
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
I can pipe so far everything to sort without an error. Please give an example and show your output of "locale".
LANG=en_US.utf8
LANGUAGE=
LC_COLLATE=C
LC_TIME=de_DE
no other localzation-related environment variables are set.
I've just played around with this some more and I have found that the actual problem appears to be the mismatch between LC_TIME and LANG.
If I set LC_TIME=de_DE.UTF-8 and LANG=en_US.UTF-8 it works. Similarly it also works if neither is UTF-8. Only if one of the 2 is UTF-8 and the other isn't sort breaks.
(Aside: the intent of this setup is to get system messages in english - most translations annoy me - while using familiar dates/times)
Side note: would you mind documenting this on this page, for other users:
http://wiki.archlinux.org/index.php/Locale
Just mention there is a bug when LC_TIME and LANG use different encodings
The failing assertion doesn't exist in the vanilla source; it is added by by that patch.
I also just tested the LANG=en_US.UTF-8, LC_TIME=de_DE setup on my desktop which is running Gentoo and it worked fine.
for one thing, you should always use UTF-8 locales, for another.. you can set LANGUAGE=en to get the messages in english. How this works is:
LANG is the default locale setup, it can be overridden by specific LC_xxx variables. LC_ALL overrides everything.
LANGUAGE is the variable that controls the messages (gettext). It's a list .. so you can have LANGUAGE=de:fr:en and the first available will be used.
If LANGUAGE is not set, gettext will try to guess it from LANG (or LC_MESSAGES I guess).
Long story, short - I guess you need
LANG=de_DE.UTF_8
LANGUAGE=en
and optionally LC_COLLATE=C (personally I don't like it)
Anyway: As far as I'm concerned the bug is fixed (for me) by changing my locale settings so that I wasn't mixing UTF-8 with non-UTF-8 locales.
If you don't want or intend to modify the patch (or delegate the problem to upstream), this bug could be closed.