FS#7477 - man package and utf8

Attached to Project: Arch Linux
Opened by Sergej Pupykin (sergej) - Wednesday, 20 June 2007, 16:40 GMT
Last edited by Andreas Radke (AndyRTR) - Thursday, 26 February 2009, 19:23 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Aaron Griffin (phrakture)
Andreas Radke (AndyRTR)
Roman Kyrylych (Romashka)
Architecture All
Severity Low
Priority Normal
Reported Version 2007.05 Duke
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 18
Private No

Details

Description:
man can not display russian error messages (for example "man page not found") in utf8

Additional info:
* package version - 1.6e-2

Steps to reproduce:
- set ru_RU.UTF-8 locale
- man qweqweqwe

Fix:
Please iconv mess.ru to utf8
--- begining of PKGBUILD
cd $startdir/src/$pkgname-$pkgver
iconv -f koi8-r -t utf-8 msgs/mess.ru > /tmp/mess.ru
mv /tmp/mess.ru msgs/
echo "$ codeset=UTF-8" > msgs/mess.ru.codeset
patch -Np1 -i ../man-troff.patch || return 1
---

May be some other languages needs the same fix.
This task depends upon
 FS#9130 - replace man with man-db 

Closed by  Andreas Radke (AndyRTR)
Thursday, 26 February 2009, 19:23 GMT
Reason for closing:  Fixed
Comment by Sven Salzwedel (sasv) - Friday, 28 September 2007, 09:19 GMT
This happens also with de_DE.utf8. man should be updated ...
Comment by Emilio Pavia (emix) - Thursday, 01 November 2007, 10:43 GMT
man should be replaced by man-db (http://www.nongnu.org/man-db/) which supports utf8.
Comment by Sergej Pupykin (sergej) - Thursday, 01 November 2007, 11:02 GMT
and which already packaged in community - http://aur.archlinux.org/packages.php?do_Details=1&ID=9343
Comment by Aaron Griffin (phrakture) - Tuesday, 08 January 2008, 13:25 GMT
Hmmm would someone mind opening a separate FR for replacing man with man-db? Just so I can keep track?
Comment by Roman Kyrylych (Romashka) - Tuesday, 08 January 2008, 13:45 GMT
done,  FS#9130 .
Comment by Aaron Griffin (phrakture) - Thursday, 10 January 2008, 07:00 GMT
Quick question. If you do the following:
* remove '-Tlatin1' from /etc/man.conf
* unset LESSCHARSET

does this solve the problem at all? Or is this actually an upstream bug with their internal messages?
Comment by Sergej Pupykin (sergej) - Wednesday, 30 January 2008, 11:51 GMT
This works: zcat /usr/share/man/ru/man1/ls.1.gz | iconv -t koi8-r | nroff -mandoc -c -Tlatin1 | iconv -f koi8

This does not work:
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc -c -Tutf8
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc -c
zcat /usr/share/man/ru/man1/ls.1.gz | nroff -mandoc
zcat /usr/share/man/ru/man1/ls.1.gz | nroff

/usr/share/man/ru/man1/ls.1.gz owned by man-pages-ru and it is utf8 encoded
Comment by Sergej Pupykin (sergej) - Wednesday, 30 January 2008, 11:52 GMT
$ pacman -Q groff
groff 1.19.2-4
Comment by Aaron Griffin (phrakture) - Wednesday, 30 January 2008, 17:13 GMT
Quick question - can you try your long iconv line WITHOUT -Tlatin1? groff should detect the encoding of your terminal by itself.

Secondly, could you explain to me what "doesn't work" means - does it display wrongly, or is it keyboard related (the recent groff changes were related to some utf8 sigils being wrong)
Comment by Roman Kyrylych (Romashka) - Wednesday, 30 January 2008, 17:17 GMT
that long iconv line fails without -Tlatin1
doesn't work means - displays wrongly (chars are in wrong encoding).
Comment by JM (fijam) - Sunday, 30 March 2008, 17:11 GMT
There is the same problem with pl_PL.utf
Comment by Alois Nespor (anespor) - Monday, 31 March 2008, 06:48 GMT
czech - cs_CZ.utf also
Comment by a (matteo.gazzoni) - Sunday, 15 June 2008, 20:00 GMT
Using en_US.utf8 cause the same issue! Solved adding -Tlatin1 to NROFF and NEQN in man.conf as reported here: http://bugs.archlinux.org/task/9555, is there a reasonable reason (?) for stripping Tlatin in the update from 1.6f-1 to 1.6f-2 ?
Comment by a (matteo.gazzoni) - Monday, 16 June 2008, 00:09 GMT
I've just seen that -Tlatin1 causes other enconding issues (e.g. <B7> chars), LC_ALL="C" seems the only workaround...
Comment by Glenn Matthys (RedShift) - Friday, 05 December 2008, 23:20 GMT
What's the status of this issue?
Comment by Aaron Griffin (phrakture) - Friday, 05 December 2008, 23:35 GMT
Status: This was mostly Roman's bag, but I imagine replacing man with man-db will solve all of this.

Does that sound correct?
Comment by Andreas Radke (AndyRTR) - Monday, 22 December 2008, 18:24 GMT
please someone who is affected try to apply that patch:

http://cvs.fedora.redhat.com/viewvc/rpms/man/F-10/man-1.6b-i18n_nroff.patch?revision=1.3&view=markup (Fedora has also some more promesing patches...)
Comment by Alexander E. Patrakov (patrakov) - Saturday, 27 December 2008, 17:26 GMT
IMHO, the whole business with translated manual pages is too patchy and shaky. Besides, some of the translated manual pages installed by the packages are in UTF-8, while others are in the local 8-bit encoding. Replacing Man with Man-DB won't magically solve this, as Man-DB needs correctly sorted manual pages in order to work correctly. And in order to work for Russian, it needs Groff with the "ascii8" device (i.e., Debian patch).

So I propose to stay as simple as possible, while still correct (where "correct" means "never producing unreadable garbage"). Stay with Man, but completely drop support for translated man messages and translated manual pages. Dropping translated man messages is done with the "+lang none" switch, and the patch below disables support for translated manual pages:

diff -ur man-1.6f.orig/src/manpath.c man-1.6f/src/manpath.c
--- man-1.6f.orig/src/manpath.c 2006-08-04 03:18:33.000000000 +0600
+++ man-1.6f/src/manpath.c 2008-09-13 15:16:29.000000000 +0600
@@ -279,17 +279,6 @@
if (alt_system) {
add_to_list(dir, alt_system_name, perrs);
} else {
- /* We cannot use "lang = setlocale(LC_MESSAGES, NULL)" or so:
- the return value of setlocale is an opaque string. */
- /* POSIX prescribes the order: LC_ALL, LC_MESSAGES, LANG */
- if((lang = getenv("LC_ALL")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LC_MESSAGES")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LANG")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
- if((lang = getenv("LANGUAGE")) != NULL)
- split2(dir, lang, add_to_mandirlist_x, perrs);
add_to_mandirlist_x(dir, 0, perrs);
}
}


--
Alexander E. Patrakov
A native speaker of Russian
The person who added Man-DB to Linux From Scratch
Comment by Glenn Matthys (RedShift) - Saturday, 27 December 2008, 17:36 GMT
Alexander's suggestion is a bit extreme, but Arch used to disable National Language Support where it was possible. Maybe we should do this for the manpages and revisit this issue when the linux community has decided it should do a massive cleanup of the manpages that are currently being used.
Comment by Lyman Li (lyman) - Thursday, 29 January 2009, 10:29 GMT
AFAIK the simplest way to solve such issue is to replace groff with groff-utf8, update man.conf like this

#TROFF /usr/bin/groff -Tps -mandoc -c
TROFF /usr/bin/groff-utf8 -Tutf8 -mandoc -c
#NROFF /usr/bin/nroff -mandoc -c
NROFF /usr/bin/groff-utf8 -Tutf8 -mandoc -c

The only flaw of this method is the warning lines in man's output (try "man mplayer").
Comment by Alexander E. Patrakov (patrakov) - Thursday, 29 January 2009, 14:43 GMT
groff-utf8 is no longer needed with groff-1.20.1, because there is preconv (aka "groff -D encoding"). However, both groff-utf8 and "groff -D encoding" work only when the encoding of the manual page is known (and is UTF-8 in the case of groff-utf8). Until we sort all manual pages in all packages by the encoding, no solution except disabling translated manual pages completely is going to work.
Comment by Alexander E. Patrakov (patrakov) - Thursday, 29 January 2009, 14:46 GMT
A concrete example: shadow installs UTF-8 encoded manual pages into /usr/share/man/ru, while mplayer installs KOI8-R encoded manual page there. Any sane man setup expects one encoding (not the mix of two) for one directory.
Comment by Bogdan Szczurek (thebodzio) - Thursday, 29 January 2009, 15:14 GMT
I concur! There's the same situation with polish man pages.
Comment by Andreas Radke (AndyRTR) - Wednesday, 18 February 2009, 22:29 GMT
please test with man-db from testing. if there are still broken man-pages please report them for each pkg in a separate report. they can befixed with the convert-mans script. see here for more:

http://www.linuxfromscratch.org/lfs/view/6.4/chapter06/man-db.html

if situation has been improved i'd like to close this one.
Comment by Alexander E. Patrakov (patrakov) - Monday, 23 February 2009, 07:20 GMT
The situation is very bad, because man-db has not been properly compiled to work with groff-1.20.1. Instead of Russian letters, I get unreadable glibberish consisting of accented latin.

There are two ways to fix this:

1. Use groff-1.18.x with debian patch and recompile Man-DB with the --enable-multibyte switch. Then it will use the -Tascii8 device.

2. Use groff-1.20.1 and recompile Man-DB with the --enable-multibyte switch. Then it will use preconv. However, this feature is not official yet.

The problem is that --enable-multibyte is used for two different purposes: it lets Man-DB know that Groff accepts -Tascii8 and -Tnippon (which is true for Debian-patched Groff, but not for groff-1.20.1, but these switches are never used when preconv exists), and also lets Man-DB know that Russian manual pages are in KOI8-R.

LFS only recently switched to the combination of Man-DB + non-Debian groff, and it took us a while to figure out why this can work at all. Seelfs-dev@linuxfromscratch.org/msg45678.html"> http://www.mailinglistarchive.com/lfs-dev@linuxfromscratch.org/msg45678.html

OTOH, since you obviously don't test anything yourself (otherwise you'd immediately notice this deficiency of Man-DB build in testing), I ask you to refrain from any fragile solutions. Use plain old Man with my patch.
Comment by Aaron Griffin (phrakture) - Monday, 23 February 2009, 16:38 GMT
The last comment was a little uncalled for - We all tested this, we just don't speak russian nor use russian locales, so have not tested that avenue.

I don't think going back to the original man is a good idea, as man-db is the way we want to go. I'd rather get the russian man pages fixed in man-db. In fact, reading the thread you linked, it indicates this is a font problem more than anything else - is that true?
Comment by Colin Watson (cjwatson) - Monday, 23 February 2009, 20:57 GMT
Alexander is very much mistaken that man-db requires pages to be sorted by encoding. In fact, man-db includes code to guess the encoding of a page on the fly. Obviously this is not absolutely reliable, but in practice it is fairly easy to tell the difference between UTF-8 and one single legacy encoding (KOI8-R in the case of Russian pages) - just try decoding it as UTF-8 and if that fails then assume it's KOI8-R - so this usually works just fine. Just dump all your Russian pages into /usr/share/man/ru/ and as long as they're all UTF-8 or KOI8-R rather than some random other encoding or a mix of encodings then you'll be fine. This was done intentionally to simplify migration to UTF-8 manual pages for distributions.

Passing --enable-mb-groff to man-db's configure (not --enable-multibyte; that was the name of the corresponding groff configure option, so I assume that this is a typo) is a reasonable workaround for your current problems. I have tested this with groff 1.20.1 and Russian pages (and in general Russian is one of the cases I test). In response to a private mail from Matthew Burgess (who wrote the linked lfs-dev post), I improved man-db's configure script to consider the presence of preconv sufficient to autodetect this; this fix will be in man-db 2.5.4.

I suggest applying revision 1023 from http://bazaar.launchpad.net/~cjwatson/man-db/trunk, which will correct a transliteration problem also reported by Matthew Burgess (the hyphenation bug in the linked lfs-dev post). Revisions 1021 and 1024 from the same URL would probably be a good idea too; the former fixes page sorting so that Russian pages display in preference to English if you're in a Russian locale, and the latter simplifies the pipeline for pages encoded in UTF-8.

The only remaining problem I know about with the combination of man-db 2.5.4 (once released) and groff 1.20.1 is that CJK manual pages will not be correctly word-wrapped. This is precisely the problem that means that Debian has not yet upgraded to groff 1.20.1 (the absence of both kinsoku shori support and knowledge of CJK double-width characters in groff); I'm working on that. The text itself appears to display fine and should be readable if you don't mind the poor wrapping, though.
Comment by Colin Watson (cjwatson) - Tuesday, 24 February 2009, 02:33 GMT
I've released man-db 2.5.4 now, so you probably might as well use that rather than backporting patches or fiddling with configure options.
Comment by Andreas Radke (AndyRTR) - Tuesday, 24 February 2009, 16:23 GMT
man-db 2.5.4 is in testing. please give reports.
Comment by Alexander E. Patrakov (patrakov) - Thursday, 26 February 2009, 04:58 GMT
man-db-2.5.4-1-i686 works for me. Thanks, and please don't lose this achievement!
Comment by Alois Nespor (anespor) - Thursday, 26 February 2009, 07:53 GMT
cs_CZ utf8, works! Thanks!

Loading...