FS#4418 - Broken ncurses applications with UTF8 locale

Attached to Project: Arch Linux
Opened by Michal Krenek (Mikos) - Sunday, 09 April 2006, 12:12 GMT
Last edited by Paul Mattal (paul) - Saturday, 13 September 2008, 11:41 GMT
Task Type Bug Report
Category System
Status Closed
Assigned To Paul Mattal (paul)
Aaron Griffin (phrakture)
Roman Kyrylych (Romashka)
Architecture All
Severity High
Priority High
Reported Version 0.7.1 Noodle
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Hello,
I want to switch to UTF8 locale, but there are some really painful bugs in Arch Linux. If I set my locale in /etc/rc.conf to cs_CZ.utf8 (and console font is lat2-16, but this is not related only to console, it also doesn't work in xterm, rxvt-unicode and konsole with unicode-capable font), I can read and write in UTF8 without problems, but every ncurses application is badly broken - for example mc, nano, links, elinks, etc. Many people have exactly same problem, you can look at http://bbs.archlinux.org/viewtopic.php?p=155897 for examples.

It is really pissing me off, because Arch Linux seems to be the only distribution with broken UTF8 support I have tried. Even old Red Hat 9 had working UTF8 support. I have tried it even with fresh Arch Linux install with same results.

Btw. I have looked at mc PKGBUILD and there is UTF8-related patch applied. Also ncurses seems to be compiled with unicode support. So this is really strange and I don't know what to do...
Closed by  Paul Mattal (paul)
Saturday, 13 September 2008, 11:41 GMT
Reason for closing:  Fixed
Comment by Tobias Powalowski (tpowa) - Monday, 10 April 2006, 08:21 GMT
calm down a bit, words like pisses me off are not really nice.
Comment by Michal Krenek (Mikos) - Monday, 10 April 2006, 11:32 GMT
Sorry, I didn't want to be crude. I really like Arch Linux. I was only fed-up that I don't know what to do. Again, I am really sorry...
Comment by Judd Vinet (judd) - Monday, 10 April 2006, 16:59 GMT
Hi Mikos,

There are two versions of the ncurses library provided, one with widechar support and one without. Perhaps mc (and others) are linking to the one without widechar support.

As a simple test, try copying the widechar lib over the non-widechar one and re-running mc.

# cp /usr/lib/libncursesw.so.5.5 /lib/libncurses.so.5.5

If something breaks because of this, you can fix things by re-installing ncurses from pacman:

# pacman -Sy ncurses


Let me know if this helps anything or not.
Comment by Michal Krenek (Mikos) - Monday, 10 April 2006, 19:20 GMT
I have tried it now, but it is not possible to copy /usr/lib/libncursesw.so.5.5 to /lib/libncurses.so.5.5. If I try it, I am suddenly logged off from terminal (I have tried it in xterm and also in console) and file is not overwritten. I don't know how is this possible, maybe because readline depends on ncurses?

There are also other related libraries in ncurses: libform.so.5.5/libformw.so.5.5, libmenu.so.5.5/libmenuw.so.5.5 and libpanel.so.5.5/libpanelw.so.5.5

Should I try it from LiveCD? Or is it too dangerous? Well, I can backup these files before overwriting them, so it is not too dangerous...
Comment by Olli (Blackhouse) - Tuesday, 11 April 2006, 13:02 GMT
I have tried the above:
# cp /usr/lib/libncursesw.so.5.5 /lib/libncurses.so.5.5

Unfortunately it doesn't work.
Comment by Michal Krenek (Mikos) - Tuesday, 11 April 2006, 16:24 GMT
Well, informations I have written in my previous comment are wrong. After reboot (or maybe re-logging is sufficient, I don't know, I have taken notice about it after reboot) I see some differences! So /lib/libncurses.so.5.5 was overwritten by /usr/lib/libncursesw.so.5.5 in the end (I have checked md5sums now).

Now for example localized legend in nano isn't broken anymore (I can see UTF8 two-byte characters without problem in legend), but I can't still type localized two-byte UTF8 characters (they are badly broken and nano behaves strangely if I try it). In mc and elinks dialogs aren't scattered anymore, but localised menu entries still aren't good (instead of localized UTF8 two-byte characters I see two strange ASCII characters).
Comment by Michal Krenek (Mikos) - Tuesday, 11 April 2006, 16:45 GMT
Again, I was wrong. So this is what I see:

- nano have fixed UTF8 chars in legend. But typing UTF8 chars is broken.
- mc is still completely broken (dialogs are scattered) in urxvt (rxvt-unicode) and also in console.
- elinks have fixed dialogs and menus in urxvt (but as I said instead of two-byte UTF8 characters I see bad ASCII characters, but this seems to be encoding problem with elinks - maybe elinks is always using ISO-8859-2?). But in console, dialogs and menus are completely broken (scattered as dialogs in mc)

Btw. I have also overwritted libform.so.5.5, libmenu.so.5.5 and libpanel.so.5.5 with their widechar counterparts, but this have no effect.
Comment by Alexander Baldeck (kth5) - Wednesday, 12 April 2006, 08:17 GMT
we needed to patch coreutils on ppc to fix this issue. i attached the files fo review.
Comment by Michal Krenek (Mikos) - Thursday, 13 April 2006, 00:46 GMT
I have added these patches to coreutils PKGBUILD and compiled it without problems, but after installing it and relogging, I don't see any change. My problems are exactly same as before. Maybe these patches solve another problem with utf8?

Btw. why is coreutils package in Arch Linux so old (it is definitely version more than year old, maybe even older)? Other distributions (for example Gentoo, but even Debian testing) uses much newer version (5.94)... this seems to be against the Arch way...
Comment by Jan Blazek (appolito) - Tuesday, 11 July 2006, 14:02 GMT
Hi, I've been compiling ncmpc [url]http://hem.bredband.net/kaw/ncmpc/[/url] with ncursesw support. I've noticed that configure script was searching for ncursesw/ncurses.h header which didn't exist. So I have made symlink /usr/include/ncursesw.h to /usr/include/ncurses.h. ncmpc has built successfully and works well.
I don't know if it is a clean solution but this symlink can be done in PKGBUILD of ncurses because some apps are awaiting it.
Comment by Roman Kyrylych (Romashka) - Monday, 21 August 2006, 08:04 GMT
Any progress, Judd?
Comment by Ash (Thikasabrik) - Sunday, 27 August 2006, 13:45 GMT
I had a look at gentoo's MC ebuild and built a working version.. The main point is that it should be compiled with --with-screen=slang and needs to depend (naturally) on slang (with utf8 support) for the patch that's included to work at all.
Further, the current utf8 patch does not apply correctly. Gentoo has a more recent patch that applies cleanly and works. It can be had from e.g. http://gd.tuwien.ac.at/opsys/linux/gentoo/distfiles/mc-4.6.1-utf8-r1.patch.bz2
Comment by Roman Kyrylych (Romashka) - Monday, 28 August 2006, 06:21 GMT
There are working mc-utf8 and slang utf-8 in Community already. But IMHO mc-utf8 still needs some small fixes.
Comment by Roman Kyrylych (Romashka) - Saturday, 09 September 2006, 10:39 GMT
There are also patches from SUSE: http://www.suse.de/~nadvornik/mc.html
Comment by Roman Kyrylych (Romashka) - Tuesday, 07 November 2006, 15:11 GMT
Nano 2.0 is out finally! Now with UTF-8 support. Please update it.
Comment by Roman Kyrylych (Romashka) - Friday, 10 November 2006, 09:01 GMT Comment by Roman Kyrylych (Romashka) - Monday, 25 December 2006, 13:04 GMT
Ncurses 5.6 improves support for Unicode.
Comment by Roman Kyrylych (Romashka) - Tuesday, 23 January 2007, 10:35 GMT
The fix for dialog package is described in http://bugs.archlinux.org/task/6233 (closed by me just to keep everything here).
UTF-8 compatible package: http://aur.archlinux.org/packages.php?do_Details=1&ID=8169
Comment by Sergej Pupykin (sergej) - Wednesday, 31 January 2007, 12:52 GMT
please add attached ru-utf.map.gz to kbd package
Comment by Sergej Pupykin (sergej) - Wednesday, 31 January 2007, 17:24 GMT
PS - I add kbd-ru-keymaps into [community]
Comment by Aaron Griffin (phrakture) - Friday, 09 February 2007, 15:45 GMT
I need a hand with some investigation on this:
Is there any problem with dropping 'libncurses' altogether in favor of the 'w' version?
By that, I mean, compile ncurses by default with utf8 enabled.

Does this break anything? How do other distros do this?
Comment by Roman Kyrylych (Romashka) - Friday, 09 February 2007, 16:03 GMT
Does ncursesw allow using non-UTF-8 locales?
I have latest Mandriva ISO, may check status of Cyrillic chars support there in UTF-8 and non-UTF-8 locales and how ncurses is used (when get some free time).
Comment by Roman Kyrylych (Romashka) - Wednesday, 14 March 2007, 19:14 GMT
One of the worst things is that cfdisk crashes with my uk_UA.UTF-8 locale :(
Comment by Aaron Griffin (phrakture) - Wednesday, 14 March 2007, 19:36 GMT
I'm going to propose the following, but I have no idea what effects it will have (Try It And See)

* setup a temporary repo
* build ncurses with unicode support ONLY (mv libncursesw libncurses, heh)
* rebuild all applications that require ncurses

Does this sound like a good idea, or no?
I'll install everything first, so as not to bork everyone else's applications... but I probably can't sufficiently test all locales I should test

More on this later
Comment by Roman Kyrylych (Romashka) - Wednesday, 14 March 2007, 19:47 GMT
I guess with unicode _only_ other locales won't be supported, which is not very good (if only we drop all locales except UTF-8 ones).
Comment by Roman Kyrylych (Romashka) - Wednesday, 04 April 2007, 17:03 GMT
hehe, not very related to this bugreport, but still interesting:
http://lkml.org/lkml/2007/4/2/189
http://lkml.org/lkml/2007/4/2/422
Comment by SKOCDOPOLE Tomas (skocdopolet) - Wednesday, 12 September 2007, 19:41 GMT
Any news?
Comment by Sven Salzwedel (sasv) - Friday, 28 September 2007, 09:21 GMT
Hi, I think #7477 is somehow related to the whole discussion. All man messages in msgs/ are latin1 or other non-UTF-8 locales. This breaks man messages on de_DE.UTF-8 too. It's sad that in 2007 utf8 isn't supported throughout the system (not saying here that it's Arch peoples' fault :-). I'd even vote for dropping non-UTF-8 locale support in general ... but I'm just a user :-)
Comment by Paul Mattal (paul) - Sunday, 30 December 2007, 04:19 GMT
I'm coming to this bug late in the party, as the new ncurses package maintainer.

It seems to me we need to preserve both unicode and non-unicode support in the ncurses package, unless we plan to remove support for all non-unicode locales.

It also appears (someone correct me if I'm wrong) that apps really should be wide-character aware in order to function properly. For instance, if an app links against ncurses but gets ncursesw, and is using strlen() all over the place, things will be ugly.

So it appears that the remaining approach here is to make sure individual packages which actually support UTF8 are built linking to ncursesw, and that this has been done for at least some packages already.

So my question is: what next? are there other packages that need tweaking/rebuilding? ncmpc? others?
Comment by Roman Kyrylych (Romashka) - Tuesday, 08 January 2008, 01:14 GMT
Hm, IIRC cfdisk was broken for me last time I've tried (with uk_UA.utf8 locale), cannot remember more now.
Comment by Paul Mattal (paul) - Saturday, 12 January 2008, 03:29 GMT
I've just rebuilt ncmpc in extra to see if that helps. cfdisk in util-linux-ng has been rebuilt recently, so I don't think it's a problem.
Comment by Greg (dolby) - Wednesday, 18 June 2008, 14:55 GMT
mc in extra now supports utf8
Comment by Paul Mattal (paul) - Friday, 12 September 2008, 22:34 GMT
Can anyone point to other packages we should rebuild here? If not, I'm inclined to close this bug and solve remaining ncurses issues individually.

I will do so on or around Mon 9/15 if there's no objection raised here.
Comment by Allan McRae (Allan) - Saturday, 13 September 2008, 01:58 GMT
I rebuild everything against ncursesw so hopefully this fixed most of these bugs.

Loading...