FS#8877 - Setting LESSCHARSET in /etc/profile is broken and just plain wrong

Attached to Project: Arch Linux
Opened by Dan McGee (toofishes) - Sunday, 09 December 2007, 08:07 GMT
Last edited by Roman Kyrylych (Romashka) - Saturday, 09 February 2008, 09:18 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Aaron Griffin (phrakture)
Architecture All
Severity Medium
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Can anyone explain why we still have settings in /etc/profile that shouldn't be there? This particular report is being filed for the LESSCHARSET="latin1" setting, which breaks a whole bunch of stuff in my pager, including the example below:

Before:
commit e360bebf713b6b03768c62de8b94ddf9350b0953
Author: <81><97><82><89><81><84><81><97><81>�<81>�<81><93> <email@example.com>
Date: Wed Dec 5 18:24:26 2007 +0900


After:
commit e360bebf713b6b03768c62de8b94ddf9350b0953
Author: しらいしななこ <email@example.com>
Date: Wed Dec 5 18:24:26 2007 +0900

When the majority of people probably want to use a UTF-8 locale, this is broken behavior.
This task depends upon

Closed by  Roman Kyrylych (Romashka)
Saturday, 09 February 2008, 09:18 GMT
Reason for closing:  Fixed
Comment by Attila (attila) - Sunday, 09 December 2007, 12:04 GMT
I recognized this too but i don't know who use what as $LANG and i can understand that this is a problem for the devs. Perhaps this (which i have in my own file in /etc/profile.d) can solves it:
if [ $(echo $LANG | grep utf8) ]; then export LESSCHARSET="utf-8"; else export LESSCHARSET="latin1";fi
I am sure that there be better solutions so please see this only as an example.
Comment by Dan McGee (toofishes) - Sunday, 09 December 2007, 17:05 GMT
From the manpage:

If neither LESSCHARSET nor LESSCHARDEF is set, but any of the strings
"UTF-8", "UTF8", "utf-8" or "utf8" is found in the LC_ALL, LC_TYPE or
LANG environment variables, then the default character set is utf-8.

If that string is not found, but your system supports the setlocale
interface, less will use setlocale to determine the character set.
setlocale is controlled by setting the LANG or LC_CTYPE environment
variables.

Finally, if the setlocale interface is also not available, the default
character set is latin1.

It seems like it really isn't necessary to set it at all, as less/more can detect it on their own. Your above solution would also be an issue if the default system locale is one language, but a user's locale (set after the if magic) was a different one.
Comment by Attila (attila) - Monday, 10 December 2007, 06:04 GMT
This sounds nice but it don't works for me. A text file with "äÄ üÜ öÖ ß", LESSCHARSET="latin1" and LANG=de_DE.utf8 results to "ä�<84> ü�<9C> ö�<96> �<9F>". But you be right that my example is only for my situation with no mixing up different locale's.
Comment by Dan McGee (toofishes) - Monday, 10 December 2007, 06:27 GMT
You left out some important details: is your text file saved as latin1 or utf8?

For me, a text file saved as utf8 displays fine in less with the following settings:
LANG=en_US.utf8
LESSCHARSET not set

And as a second check, if "export LESSCHARSET=latin1" is done, then it shows up broken as in your above example. So it is my guess your text file was not using UTF-8 encoding that you ran your tests with.

In either case, enforcing a default latin1 charset seems silly when the setlocale() interfaces are available to less.
Comment by Attila (attila) - Monday, 10 December 2007, 06:49 GMT
Okay, you be right i have "-encoding utf-8" in my joerc, sorry i forgot to say this . I can confirm too that with LESSCHARSET="" all works fine and so less works as described in the manpage. Does this means that a empty LESSCHARSET could be the default in the /etc/profile and all will works better? This would be nice because this is a minimal change.-)
Comment by Aaron Griffin (phrakture) - Monday, 10 December 2007, 17:29 GMT
So the verdict? Remove this from /etc/profile?
Comment by Roman Kyrylych (Romashka) - Monday, 10 December 2007, 17:52 GMT
+10 for removing LESSCHARSET="latin1"
Comment by Dan McGee (toofishes) - Monday, 10 December 2007, 17:58 GMT
I think removing it would be the most prudent option- if users really want it set, it is something they can do on their own. Maybe leave it in there but commented by default? I'd rather just kill it completely though as not to confuse.
Comment by Attila (attila) - Monday, 10 December 2007, 17:59 GMT
I suggest it too and to fell better for doing this: opensuse 10.1, opensuse 10.3 and debian 4.0 shows an empty string as result for "echo $LESSCHARSET" plus a "grep -r -i lesscharset *" in /etc finds nothing. Thanks Dan for point out this.
Comment by Roman Kyrylych (Romashka) - Monday, 10 December 2007, 18:03 GMT
It seems safe to remove it according to man page.

Loading...