FS#10435 - LC_COLLATE="C" should be removed from /etc/profile

Attached to Project: Arch Linux
Opened by Heiko Baums (cyberpatrol) - Saturday, 17 May 2008, 10:28 GMT
Last edited by Aaron Griffin (phrakture) - Thursday, 20 November 2008, 20:34 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Aaron Griffin (phrakture)
Architecture All
Severity Medium
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

I had a problem with the sorting of the ls output and the menu items in KMenu. They were sorted ASCII like and case sensitive, instead of alphabetically and case insensitive as it is expected and defined in my chosen locale.

This is, because in /etc/profile the variable LC_COLLATE is set to "C" independently of the variable LOCALE in /etc/rc.conf.

So the line `LC_COLLATE="C"` should be removed from /etc/profile. If someone needs a different LC_COLLATE than this one set by the chosen locale, this can and should be set manually in /etc/profile or /etc/profile.d/locale.sh, depending on the result of  FS#10428 .

It should, at least, be explained in http://wiki.archlinux.org/index.php/Configuring_locales, that LC_COLLATE="C" is set in /etc/profile, that this can be removed or changed, and which effect this has.

LC_COLLATE is, btw., not necessary to get and keep the system running. If a script needs LC_COLLATE="C" for whatever reason, the command, which needs it, should be prefixed with 'LC_COLLATE="C"' in the script, so that this setting is changed only for this command.

Additional info:
* package version(s)
filesystem 2008.03-2
* config and/or log files etc.
/etc/profile
This task depends upon

Closed by  Aaron Griffin (phrakture)
Thursday, 20 November 2008, 20:34 GMT
Reason for closing:  Won't fix
Additional comments about closing:  Can't please all of the people all of the time. Trying to cover the common case, added docs for the edge case
Comment by Kerin Millar (kerframil) - Friday, 10 October 2008, 19:14 GMT
I'll play devil's advocate here, and present a counter-argument: that LC_COLLATE="C" should not necessarily be dropped as a default. At least, if it is dropped, the potential pitfalls should be adequately addressed either by the documentation or by informative comments within the configuration file. This matter was touched upon by a Gentoo Linux bug recently that readers of this bug may find interesting:

http://bugs.gentoo.org/show_bug.cgi?id=208082

As mentioned within the bug, there are various applications that are unduly sensitive to locale settings and, in some cases, exhibit broken behaviour when dealing with a UTF-8 based collation order. Some examples are provided in the opening comment and in comment #20 and there are probably more that have yet to be identified.

For these reasons, Gentoo actually went in the other direction and now suggests defining LC_COLLATE="C" although, unfortunately, the documentation doesn't go into the rationale behind this change (unlike the bug!). A formative example of how the issue might be adequately presented to the user is demonstrated in the opening comment of the bug:

"Note: Locale settings can sometimes cause unexpected behavior in utilities that use glibc's regular expressions library, like sed and grep. Setting LC_COLLATE=C can prevent such unexpected behavior without impacting the rest of your localization ..."
Comment by Heiko Baums (cyberpatrol) - Friday, 10 October 2008, 19:50 GMT
I don't agree in this point. I as a user need a sort order for my locale and I don't want the lower-case characters below the upper-case characters as LC_COLLATE="C" sorts it. I need an alphabetical order.

If some single programs don't work correctly with LC_COLLATES other than "C", then it's a bug in these programs and a bug report should be reported to the upstream developers of these programs or to the glibc developers, if this is really a glibc issue. Or these programs should set LC_COLLATE="C" by themselves in their own environment, but not system wide. And, btw., I hadn't had a problem with LC_COLLATE="de_DE.UTF-8", yet, which I'm using.
Comment by Kerin Millar (kerframil) - Friday, 10 October 2008, 21:20 GMT
Oh, I agree that bugs should be fixed upstream and realise that there is a danger of "masking" valid bugs by retaining this setting (although I think it's fair to speculate that there are a great many users that would not want to have to deal with these bugs in any shape or form). In any case, there are some genuine pitfalls in terms of how things work right now and my reason for posting was to point out that the issue cuts both ways. Unfortunately, locale handling seems to be a very poorly understood topic among a large cross-section of the Linux userbase - and unfortunately that goes for many developers too. That LC_COLLATE="C" is set now seems to me to be a pragmatic compromise although I accept your (differing) point of view and do understand your argument for dropping it.

Ultimately, whether it stays or goes, I think it is important that users are made fully aware of the aformentioned pitfalls so that they are able to make an informed decision as to whether it is a matter they think they need to address or not; this is the exact sentiment that I was trying to express in the Gentoo bug. I notice that you have since commented on it and concur that it was not addressed in an ideal fashion in that particular case.
Comment by Aaron Griffin (phrakture) - Friday, 17 October 2008, 19:01 GMT
I think the way it is now is ideal. We can't please everyone, but LC_COLLATE="C" pleases most people. If you need to change the setting, you can easily edit /etc/profile - nothing is hardcoded in a binary or anything like that. Text files in /etc are made to be edited if the default is not good enough for you.

Also note that if you think something should be in a wiki page, then edit it yourself. That's why it's a publicly editable wiki!
I will edit it for you...
Comment by Aaron Griffin (phrakture) - Friday, 17 October 2008, 19:05 GMT

Loading...