FS#7477 - man package and utf8
Attached to Project:
Arch Linux
Opened by Sergej Pupykin (sergej) - Wednesday, 20 June 2007, 16:40 GMT
Last edited by Andreas Radke (AndyRTR) - Thursday, 26 February 2009, 19:23 GMT
Opened by Sergej Pupykin (sergej) - Wednesday, 20 June 2007, 16:40 GMT
Last edited by Andreas Radke (AndyRTR) - Thursday, 26 February 2009, 19:23 GMT
|
Details
Description:
man can not display russian error messages (for example "man page not found") in utf8 Additional info: * package version - 1.6e-2 Steps to reproduce: - set ru_RU.UTF-8 locale - man qweqweqwe Fix: Please iconv mess.ru to utf8 --- begining of PKGBUILD cd $startdir/src/$pkgname-$pkgver iconv -f koi8-r -t utf-8 msgs/mess.ru > /tmp/mess.ru mv /tmp/mess.ru msgs/ echo "$ codeset=UTF-8" > msgs/mess.ru.codeset patch -Np1 -i ../man-troff.patch || return 1 --- May be some other languages needs the same fix. |
FS#9130.* remove '-Tlatin1' from /etc/man.conf
* unset LESSCHARSET
does this solve the problem at all? Or is this actually an upstream bug with their internal messages?
This does not work:
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc -c -Tutf8
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc -c
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc
zcat /usr/share/man/ru/man1/ls.1.gz | nroff
/usr/share/man/ru/man1/ls.1.gz owned by man-pages-ru and it is utf8 encoded
groff 1.19.2-4
Secondly, could you explain to me what "doesn't work" means - does it display wrongly, or is it keyboard related (the recent groff changes were related to some utf8 sigils being wrong)
doesn't work means - displays wrongly (chars are in wrong encoding).
Does that sound correct?
http://cvs.fedora.redhat.com/viewvc/rpms/man/F-10/man-1.6b-i18n_nroff.patch?revision=1.3&view=markup (Fedora has also some more promesing patches...)
So I propose to stay as simple as possible, while still correct (where "correct" means "never producing unreadable garbage"). Stay with Man, but completely drop support for translated man messages and translated manual pages. Dropping translated man messages is done with the "+lang none" switch, and the patch below disables support for translated manual pages:
diff -ur man-1.6f.orig/src/manpath.c man-1.6f/src/manpath.c
--- man-1.6f.orig/src/manpath.c 2006-08-04 03:18:33.000000000 +0600
+++ man-1.6f/src/manpath.c 2008-09-13 15:16:29.000000000 +0600
@@ -279,17 +279,6 @@
if (alt_system) {
add_to_list(dir, alt_system_name, perrs);
} else {
- /* We cannot use "lang = setlocale(LC_MESSAGES, NULL)" or so:
- the return value of setlocale is an opaque string. */
- /* POSIX prescribes the order: LC_ALL, LC_MESSAGES, LANG */
- if((lang = getenv("LC_ALL")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LC_MESSAGES")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LANG")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LANGUAGE")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
add_to_mandirlist_x(dir, 0, perrs);
}
}
--
Alexander E. Patrakov
A native speaker of Russian
The person who added Man-DB to Linux From Scratch
#TROFF /usr/bin/groff -Tps -mandoc -c
TROFF /usr/bin/groff-utf8 -Tutf8 -mandoc -c
#NROFF /usr/bin/nroff -mandoc -c
NROFF /usr/bin/groff-utf8 -Tutf8 -mandoc -c
The only flaw of this method is the warning lines in man's output (try "man mplayer").
http://www.linuxfromscratch.org/lfs/view/6.4/chapter06/man-db.html
if situation has been improved i'd like to close this one.
There are two ways to fix this:
1. Use groff-1.18.x with debian patch and recompile Man-DB with the --enable-multibyte switch. Then it will use the -Tascii8 device.
2. Use groff-1.20.1 and recompile Man-DB with the --enable-multibyte switch. Then it will use preconv. However, this feature is not official yet.
The problem is that --enable-multibyte is used for two different purposes: it lets Man-DB know that Groff accepts -Tascii8 and -Tnippon (which is true for Debian-patched Groff, but not for groff-1.20.1, but these switches are never used when preconv exists), and also lets Man-DB know that Russian manual pages are in KOI8-R.
LFS only recently switched to the combination of Man-DB + non-Debian groff, and it took us a while to figure out why this can work at all. Seelfs-dev@linuxfromscratch.org/msg45678.html"> http://www.mailinglistarchive.com/lfs-dev@linuxfromscratch.org/msg45678.html
OTOH, since you obviously don't test anything yourself (otherwise you'd immediately notice this deficiency of Man-DB build in testing), I ask you to refrain from any fragile solutions. Use plain old Man with my patch.
I don't think going back to the original man is a good idea, as man-db is the way we want to go. I'd rather get the russian man pages fixed in man-db. In fact, reading the thread you linked, it indicates this is a font problem more than anything else - is that true?
Passing --enable-mb-groff to man-db's configure (not --enable-multibyte; that was the name of the corresponding groff configure option, so I assume that this is a typo) is a reasonable workaround for your current problems. I have tested this with groff 1.20.1 and Russian pages (and in general Russian is one of the cases I test). In response to a private mail from Matthew Burgess (who wrote the linked lfs-dev post), I improved man-db's configure script to consider the presence of preconv sufficient to autodetect this; this fix will be in man-db 2.5.4.
I suggest applying revision 1023 from http://bazaar.launchpad.net/~cjwatson/man-db/trunk, which will correct a transliteration problem also reported by Matthew Burgess (the hyphenation bug in the linked lfs-dev post). Revisions 1021 and 1024 from the same URL would probably be a good idea too; the former fixes page sorting so that Russian pages display in preference to English if you're in a Russian locale, and the latter simplifies the pipeline for pages encoded in UTF-8.
The only remaining problem I know about with the combination of man-db 2.5.4 (once released) and groff 1.20.1 is that CJK manual pages will not be correctly word-wrapped. This is precisely the problem that means that Debian has not yet upgraded to groff 1.20.1 (the absence of both kinsoku shori support and knowledge of CJK double-width characters in groff); I'm working on that. The text itself appears to display fine and should be readable if you don't mind the poor wrapping, though.