FS#4418 - Broken ncurses applications with UTF8 locale
Attached to Project:
Arch Linux
Opened by Michal Krenek (Mikos) - Sunday, 09 April 2006, 12:12 GMT
Last edited by Paul Mattal (paul) - Saturday, 13 September 2008, 11:41 GMT
Opened by Michal Krenek (Mikos) - Sunday, 09 April 2006, 12:12 GMT
Last edited by Paul Mattal (paul) - Saturday, 13 September 2008, 11:41 GMT
|
Details
Hello,
I want to switch to UTF8 locale, but there are some really painful bugs in Arch Linux. If I set my locale in /etc/rc.conf to cs_CZ.utf8 (and console font is lat2-16, but this is not related only to console, it also doesn't work in xterm, rxvt-unicode and konsole with unicode-capable font), I can read and write in UTF8 without problems, but every ncurses application is badly broken - for example mc, nano, links, elinks, etc. Many people have exactly same problem, you can look at http://bbs.archlinux.org/viewtopic.php?p=155897 for examples. It is really pissing me off, because Arch Linux seems to be the only distribution with broken UTF8 support I have tried. Even old Red Hat 9 had working UTF8 support. I have tried it even with fresh Arch Linux install with same results. Btw. I have looked at mc PKGBUILD and there is UTF8-related patch applied. Also ncurses seems to be compiled with unicode support. So this is really strange and I don't know what to do... |
There are two versions of the ncurses library provided, one with widechar support and one without. Perhaps mc (and others) are linking to the one without widechar support.
As a simple test, try copying the widechar lib over the non-widechar one and re-running mc.
# cp /usr/lib/libncursesw.so.5.5 /lib/libncurses.so.5.5
If something breaks because of this, you can fix things by re-installing ncurses from pacman:
# pacman -Sy ncurses
Let me know if this helps anything or not.
There are also other related libraries in ncurses: libform.so.5.5/libformw.so.5.5, libmenu.so.5.5/libmenuw.so.5.5 and libpanel.so.5.5/libpanelw.so.5.5
Should I try it from LiveCD? Or is it too dangerous? Well, I can backup these files before overwriting them, so it is not too dangerous...
# cp /usr/lib/libncursesw.so.5.5 /lib/libncurses.so.5.5
Unfortunately it doesn't work.
Now for example localized legend in nano isn't broken anymore (I can see UTF8 two-byte characters without problem in legend), but I can't still type localized two-byte UTF8 characters (they are badly broken and nano behaves strangely if I try it). In mc and elinks dialogs aren't scattered anymore, but localised menu entries still aren't good (instead of localized UTF8 two-byte characters I see two strange ASCII characters).
- nano have fixed UTF8 chars in legend. But typing UTF8 chars is broken.
- mc is still completely broken (dialogs are scattered) in urxvt (rxvt-unicode) and also in console.
- elinks have fixed dialogs and menus in urxvt (but as I said instead of two-byte UTF8 characters I see bad ASCII characters, but this seems to be encoding problem with elinks - maybe elinks is always using ISO-8859-2?). But in console, dialogs and menus are completely broken (scattered as dialogs in mc)
Btw. I have also overwritted libform.so.5.5, libmenu.so.5.5 and libpanel.so.5.5 with their widechar counterparts, but this have no effect.
010_all_coreutils-stty-utf8.p... (1.7 KiB)
Btw. why is coreutils package in Arch Linux so old (it is definitely version more than year old, maybe even older)? Other distributions (for example Gentoo, but even Debian testing) uses much newer version (5.94)... this seems to be against the Arch way...
I don't know if it is a clean solution but this symlink can be done in PKGBUILD of ncurses because some apps are awaiting it.
Further, the current utf8 patch does not apply correctly. Gentoo has a more recent patch that applies cleanly and works. It can be had from e.g. http://gd.tuwien.ac.at/opsys/linux/gentoo/distfiles/mc-4.6.1-utf8-r1.patch.bz2
http://aur.archlinux.org/packages.php?do_Details=1&ID=4346
UTF-8 compatible package: http://aur.archlinux.org/packages.php?do_Details=1&ID=8169
Is there any problem with dropping 'libncurses' altogether in favor of the 'w' version?
By that, I mean, compile ncurses by default with utf8 enabled.
Does this break anything? How do other distros do this?
I have latest Mandriva ISO, may check status of Cyrillic chars support there in UTF-8 and non-UTF-8 locales and how ncurses is used (when get some free time).
* setup a temporary repo
* build ncurses with unicode support ONLY (mv libncursesw libncurses, heh)
* rebuild all applications that require ncurses
Does this sound like a good idea, or no?
I'll install everything first, so as not to bork everyone else's applications... but I probably can't sufficiently test all locales I should test
More on this later
http://lkml.org/lkml/2007/4/2/189
http://lkml.org/lkml/2007/4/2/422
It seems to me we need to preserve both unicode and non-unicode support in the ncurses package, unless we plan to remove support for all non-unicode locales.
It also appears (someone correct me if I'm wrong) that apps really should be wide-character aware in order to function properly. For instance, if an app links against ncurses but gets ncursesw, and is using strlen() all over the place, things will be ugly.
So it appears that the remaining approach here is to make sure individual packages which actually support UTF8 are built linking to ncursesw, and that this has been done for at least some packages already.
So my question is: what next? are there other packages that need tweaking/rebuilding? ncmpc? others?
I will do so on or around Mon 9/15 if there's no objection raised here.