FS#6437 - Pacman 3 progress bar- issue with UTF-8 chars

Attached to Project: Pacman
Opened by Dan McGee (toofishes) - Saturday, 17 February 2007, 07:13 GMT
Last edited by Dan McGee (toofishes) - Monday, 18 February 2008, 02:45 GMT
Task Type Bug Report
Category Output
Status Closed
Assigned To Aaron Griffin (phrakture)
Xavier (shining)
Dan McGee (toofishes)
Architecture All
Severity Low
Priority Normal
Reported Version 0.7.2 Gimmick
Due in Version 3.1.2
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

As described by Pierre Schmitz <pierre@archlinux.de> on the pacman-dev ML:

it seems as if the function strlen only counts the number of bytes within a
given string and not the number of characters. In case of utf8 a character
can have a length of one or two bytes.

The attached screenshot show that in trans.c:337 the length is calculated
wrong (because he counts the ü twice). When replacing ü with ue everything is
right.
This task depends upon

Closed by  Dan McGee (toofishes)
Monday, 18 February 2008, 02:45 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed in commit 2374c81e55abc0f7252fad7eb53d2b75bb33f750
Comment by Dan McGee (toofishes) - Saturday, 17 February 2007, 07:13 GMT
Ok. I know where this problem stems from, and more or less how to fix
it. However, it will require a bit of changing to get it to output
right, and I don't want to make these kind of changes right before we
release. It has to do with printf() being a byte-oriented output, so
we need to do some conversion to a wchar_t based output in cases like
this where character numbers are important and they aren't guaranteed
to match up with the byte numbers.

This bug should never go the other way (make the progress bar line too
long), so I don't want to call this a critical bug. Are you fine with
putting it off until the first 'bugfix' release?
Comment by Dan McGee (toofishes) - Tuesday, 13 March 2007, 16:19 GMT
Found another issue: this deals with the indentprint function:

$ sudo ./src/pacman/pacman.static -Qi glibc
Название : glibc
Версия : 2.5-6
URL : http://www.gnu.org/software/libc
Лицензия : Не указано
Группы : Не указано
Обеспечивает : Не указано
Зависит от : kernel-headers>=2.6.20
Удаляет.................: Не указано
Требуется пакетами......: a52dec aalib acpid alsa-lib alsa-oss
...........................................attr audiofile bash bftpd bin86
...........................................binutils bison bzip2 cabextract
...........................................ccache cdparanoia cdrkit codecs
coreutils cpio cracklib ctags cvs
db dbh device-mapper dhcpcd
diffutils dosfstools dvd+rw-tools
e2fsprogs ed eventlog expat faac
faad2 fakeroot fbset file findutils
flac fribidi fuse gcc gdbm
gen-init-cpio glib glib2 gphoto2
grep gzip hdparm indent iproute

Because indent is counting bytes, we get way off on non-latin (single byte) characters.

Edit: it compresses the spaces. I've added dots to a few lines to replace spaces to show what I mean, but since this is for me anyway I think I'll remember.
Comment by Roman Kyrylych (Romashka) - Tuesday, 13 March 2007, 16:30 GMT
Why the hell flyspray doesn't support <pre></pre>-like tags to display code correctly? Errr...
I suggest using pastebin with time set to "monht" or "unlimited" for things where preformatted text is important.
Comment by Dan McGee (toofishes) - Tuesday, 13 March 2007, 16:33 GMT
Yeah, I could do that had I been thinking, but no big deal. Easier to just set steps to reproduce:

$ export LANG=ru_RU.utf8
$ pacman -Qi glibc
Comment by Sébastien (sebcactus) - Thursday, 03 May 2007, 11:02 GMT
I don't know if it's linked to UTF-8 too, but my bar is too long (see the screen) when downloading, whereas it is ok when installing/upgrading.
It looks like a pb with the percents.
Comment by Aaron Griffin (phrakture) - Friday, 28 September 2007, 02:04 GMT
Chantry submitted a patch for this, in the output which computes the real length.

Dan, Chantry, can you guys confirm?
Comment by Stephen Wilkinson (sw8511) - Tuesday, 16 October 2007, 00:27 GMT
Sometimes, while downloading a package, instead of a single line of download progress (with #) I get the several lines (see attached image).
I don't know if it's related to the above, but it falls under the title of this bug report! I have LANG=en_GB.utf8
Comment by Aaron Griffin (phrakture) - Tuesday, 23 October 2007, 20:07 GMT
Stephen, that is a length computation issue, yes, but somewhat different. Look at the ETA time and the percentage completion - 4 hours 43 minutes until complete, and -522% complete.

Are you, by chance, using a proxy? I remember the calcs screwing up on one proxy I used. If so, is there anyway for me to access the proxy for testing?
Comment by Stephen Wilkinson (sw8511) - Wednesday, 24 October 2007, 21:33 GMT
Hi Aaron,
Thanks very much for your reply - I did have an http proxy environment variable set in my .bashrc and was invalid too (should have removed it months ago!), so I removed it and now the progress bar in pacman appears to work as before. The problem wasn't 100% repeatable, but I've run pacman -Sy a few times and it seems to be OK now.
Thanks a lot for pointing this out to me.
Steve
Comment by Xavier (shining) - Sunday, 11 November 2007, 22:32 GMT
Hm, I only see this bug report now. The little patch I made against 3.0 indeed has a lot to do with the original issue reported by Dan :
http://www.archlinux.org/pipermail/pacman-dev/2007-September/009325.html
I didn't touch the lib/libalpm/trans.c code at all, only src/pacman/log.c, so maybe that change needs to be done in several other places.

I couldn't submit a patch against 3.1 because the code changed too much (the needpad stuff was removed).
I had high hopes about the idea in Dan's head :)
http://www.archlinux.org/pipermail/pacman-dev/2007-October/009552.html
Comment by Xavier (shining) - Sunday, 17 February 2008, 22:50 GMT
My comment above was a bit off. I didn't realize there was a src/pacman/trans.c before, which was moved to callback.c
So the attached patch deals precisely with the part Dan mentioned back then (src/pacman/trans.c:337).

Loading...