FS#15845 - pacman malloc() memory corruption after latest update
Attached to Project:
Pacman
Opened by Michael C (afterlife) - Friday, 07 August 2009, 16:28 GMT
Last edited by Xavier (shining) - Monday, 07 September 2009, 10:56 GMT
Opened by Michael C (afterlife) - Friday, 07 August 2009, 16:28 GMT
Last edited by Xavier (shining) - Monday, 07 September 2009, 10:56 GMT
|
Details
Summary and Info:
pacman fails after latest upgrade (07/08/2009). IBM x31 Centrino 1.6GHz Linux 2.6.30-ARCH #1 SMP PREEMPT Fri Jul 31 18:10:38 UTC 2009 i686 Intel(R) Pentium(R) M processor 1600MHz GenuineIntel GNU/Linux Steps to Reproduce: issue "pacman -Syu" accept pacmans suggestion to update itself, observe it crash and burn after pacman has updated itself. Output: (...some output ommited) Total Download Size: 88.82 MB Total Installed Size: 253.36 MB Proceed with installation? [Y/n] y :: Retrieving packages from core... filesystem-2009.07-... 12.6K 60.8K/s 00:00:00 [#####################] 100% gawk-3.1.7-1-i686 944.0K 181.1K/s 00:00:05 [#####################] 100% udev-141-5-i686 235.3K 169.0K/s 00:00:01 [#####################] 100% initscripts-2009.07... 39.0K 108.1K/s 00:00:00 [#####################] 100% licenses-2.5-1-i686 78.6K 138.3K/s 00:00:01 [#####################] 100% mkinitcpio-0.5.26-1-any 15.7K 75.7K/s 00:00:00 [#####################] 100% psmisc-22.8-1-i686 73.5K 136.0K/s 00:00:01 [#####################] 100% readline-6.0.004-1-i686 389.3K 176.0K/s 00:00:02 [#####################] 100% *** glibc detected *** /usr/bin/pacman: malloc(): memory corruption: 0xb7e300b8 *** ======= Backtrace: ========= /lib/libc.so.6[0xb7d589b1] /lib/libc.so.6[0xb7d5b58a] /lib/libc.so.6(__libc_malloc+0x5f)[0xb7d5d24f] /lib/libc.so.6[0xb7d1495e] /lib/libc.so.6[0xb7d12b1c] /lib/libc.so.6[0xb7d12451] /lib/libc.so.6(dcgettext+0x43)[0xb7d112f3] /lib/libc.so.6(__strerror_r+0x51)[0xb7d60881] /lib/libc.so.6(strerror+0x35)[0xb7d60795] /usr/lib/libalpm.so.4[0xb802a651] /usr/lib/libalpm.so.4[0xb802ab55] /usr/lib/libalpm.so.4[0xb802acb0] /usr/lib/libalpm.so.4[0xb8031200] /usr/lib/libalpm.so.4[0xb80338fa] /usr/lib/libalpm.so.4(alpm_trans_commit+0x48)[0xb8033998] /usr/bin/pacman[0x8051475] /usr/bin/pacman[0x8051a07] /usr/bin/pacman[0x804e945] /lib/libc.so.6(__libc_start_main+0xe6)[0xb7d03a36] /usr/bin/pacman[0x804b4b1] ======= Memory map: ======== 08048000-08058000 r-xp 00000000 08:02 93482 /usr/bin/pacman 08058000-08059000 rw-p 00010000 08:02 93482 /usr/bin/pacman 08cda000-08f2b000 rw-p 00000000 00:00 0 [heap] b7900000-b7921000 rw-p 00000000 00:00 0 b7921000-b7a00000 ---p 00000000 00:00 0 b7aba000-b7ad7000 r-xp 00000000 08:02 35382 /usr/lib/libgcc_s.so.1 b7ad7000-b7ad8000 rw-p 0001c000 08:02 35382 /usr/lib/libgcc_s.so.1 b7ad8000-b7ae8000 r-xp 00000000 08:02 34375 /lib/libresolv-2.10.1.so b7ae8000-b7ae9000 r--p 00010000 08:02 34375 /lib/libresolv-2.10.1.so b7ae9000-b7aea000 rw-p 00011000 08:02 34375 /lib/libresolv-2.10.1.so b7aea000-b7aec000 rw-p 00000000 00:00 0 b7aec000-b7af6000 r-xp 00000000 08:02 34378 /lib/libnss_files-2.10.1.so:00 0 b7ca7000-b7ce9000 r-xp 00000000 08:02 90772 /usr/lib/libssl.so.0.9.8 b7ce9000-b7ced000 rw-p 00041000 08:02 90772 /usr/lib/libssl.so.0.9.8 b7ced000-b7e2d000 r-xp 00000000 08:02 34391 /lib/libc-2.10.1.so b7e2d000-b7e2f000 r--p 00140000 08:02 34391 /lib/libc-2.10.1.so b7e2f000-b7e30000 rw-p 00142000 08:02 34391 /lib/libc-2.10.1.so b7e30000-b7e33000 rw-p 00000000 00:00 0 b7e33000-b7e46000 r-xp 00000000 08:02 35318 /usr/lib/libz.so.1.2.3.3 b7e46000-b7e47000 rw-p 00012000 08:02 35318 /usr/lib/libz.so.1.2.3.3 b7e47000-b7e56000 r-xp 00000000 08:02 35342 /lib/libbz2.so.1.0.4 b7e56000-b7e57000 rw-p 0000f000 08:02 35342 /lib/libbz2.so.1.0.4 b7e57000-b7e77000 r-xp 00000000 08:02 36556 /usr/lib/liblzma.so.0.0.0 b7e77000-b7e78000 rw-p 0001f000 08:02 36556 /usr/lib/liblzma.so.0.0.0 b7e78000-b7e79000 rw-p 00000000 00:00 0 b7e79000-b7fb2000 r-xp 00000000 08:02 90770 /usr/lib/libcrypto.so.0.9.8 b7fb2000-b7fc8000 rw-p 00139000 08:02 90770 /usr/lib/libcrypto.so.0.9.8 b7fc8000-b7fcb000 rw-p 00000000 00:00 0 b7fcb000-b7fcf000 r-xp 00000000 08:02 34510 /lib/libattr.so.1.1.0 b7fcf000-b7fd0000 rw-p 00003000 08:02 34510 /lib/libattr.so.1.1.0 b7fd0000-b7fd6000 r-xp 00000000 08:02 34526 /lib/libacl.so.1.1.0 b7fd6000-b7fd7000 rw-p 00005000 08:02 34526 /lib/libacl.so.1.1.0 b7fd7000-b8008000 r-xp 00000000 08:02 90787 /usr/lib/libarchive.so.2.7.0 b8008000-b8009000 rw-p 00031000 08:02 90787 /usr/lib/libarchive.so.2.7.0 b8009000-b800a000 rw-p 00000000 00:00 0 b800a000-b8017000 r-xp 00000000 08:02 33469 /usr/lib/libfetch.so b8017000-b8018000 rw-p 0000c000 08:02 33469 /usr/lib/libfetch.so b8018000-b8019000 rw-p 00000000 00:00 0 b8019000-b803a000 r-xp 00000000 08:02 33473 /usr/lib/libalpm.so.4.0.0 b803a000-b803b000 rw-p 00020000 08:02 33473 /usr/lib/libalpm.so.4.0.0 b803b000-b803c000 rw-p 00000000 00:00 0 b8040000-b8041000 rw-p 00000000 00:00 0 b8041000-b8046000 r-xp 00000000 08:02 34376 /lib/libnss_dns-2.10.1.so b8046000-b8047000 r--p 00004000 08:02 34376 /lib/libnss_dns-2.10.1.so b8047000-b8048000 rw-p 00005000 08:02 34376 /lib/libnss_dns-2.10.1.so b8048000-b804b000 rw-p 00000000 00:00 0 b804b000-b804c000 r-xp 00000000 00:00 0 [vdso] b804c000-b8068000 r-xp 00000000 08:02 34390 /lib/ld-2.10.1.so b8068000-b8069000 r--p 0001b000 08:02 34390 /lib/ld-2.10.1.so b8069000-b806a000 rw-p 0001c000 08:02 34390 /lib/ld-2.10.1.so bfb5d000-bfb72000 rw-p 00000000 00:00 0 [stack] /usr/bin/yaourt: line 568: 6039 Aborted sudo $@ Checking the integrity of the local database in /var/lib/pacman/ I will help any way I can with additional information upon request. |
This task depends upon
Closed by Xavier (shining)
Monday, 07 September 2009, 10:56 GMT
Reason for closing: Fixed
Additional comments about closing: fixed with libfetch 2.25
Monday, 07 September 2009, 10:56 GMT
Reason for closing: Fixed
Additional comments about closing: fixed with libfetch 2.25
Xavier- want to give him those instructions on how to get a debug build again, or can we repro this quite easily ourselves?
but it seems different from the original issue reported in
FS#10773and download_internal did not change much either : move to libfetch, and addition of sigaction (commit 43f9eb1aa95)
it never segfaults when using --noprogressbar ?
and when using attached libfetch pkgbuild?
Still segfaults I will try using the attached libfetch now and report back.
$ gdb
(gdb) run pacman -Syu --noprogressbar
... segfault should happen here ...
(gdb) bt full
As Dan said, it might actually be helpful to do some exploration of the memory with gdb when the segfault happens.
Because the line where the segfault happens is not the problem.
btw: the fancy dump is because the "MALLOC_CHECK_" env variable is set to 3. The bug should not be trigger if MALLOC_CHECK_ is not defined or is 0.
The two reporters, Michael and Konstantin, seem to also use this mirror. So it looks like a requirement. Did you manage to get a segfault using another mirror? Or even the same mirror with ftp (I didn't try that yet).
And I never got a fancy dump, when I had MALLOC_CHECK_ unset. Do you both have MALLOC_CHECK_ set?
Finally, my segfault never happens between downloads, always during one download. But the reported segfaults are always between two dowloads. This is what I find the most strange. You both never got segfaults during one download?
does this mean buggy http server on mirror.aarnet.edu.au , or buggy libfetch http code, or both? I don't know
* Segfault only reproducible on i686, not x86_64. Interesting. However, both architectures give me the "appears to be truncated" warning, although only i686 segfaults.
* I'm thinking the problem is in a memory allocation call we make, but haven't pinpointed it yet. In internal_download(), we allocate memory a few times- a call to fetchParseURL() (in libfetch), CALLOC in get_tempfile(), CALLOC in get_destfile(), malloc a bunch in getservbyname (in libc) in fetch_default_port() (libfetch), also in fetch_connect(), http_cmd() via vasprintf(), http_request() via setlocale() and timegm(), cb_dl_progress() for the strdup() call. OK, so we do call malloc()/calloc() more than I thought.
Now explain to me why I can't ever get a reasonable awesome glibc/malloc dump like in the original report:
$ sudo MALLOC_CHECK_=3 ./src/pacman/.libs/lt-pacman -U http://mirror.aarnet.edu.au/pub/archlinux/extra/os/i686/qt-4.5.2-5-i686.pkg.tar.gz
error: segmentation fault 2.3M 112.9K/s 00:03:48 [-------C o o o o o o o o o o o o o o o o o o o o o o o o o o ] 8%
Internal pacman error: Segmentation fault.
Please submit a full bug report with --debug if appropriate.
I'm having trouble using GDB but i'll post up on the forum for that
and hopefully get a dump from the original machine which caused me to
file this bug. Thanks!
[New Thread 0xb7ba76c0 (LWP 10427)]
[Switching to Thread 0xb7ba76c0 (LWP 10427)]
*** glibc detected *** /home/dmcgee/projects/pacman/src/pacman/.libs/lt-pacman: malloc(): memory corruption: 0xb7d500b8 ***
Program received signal SIGINT, Interrupt.
And that memory:
0xb7d500a8 <initial_quick+456>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xb7d500b0 <initial_quick+464>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0xb7d500b8 <initial_quick+472>: 0xb0 0x03 0xd5 0xb7 0x00 0x00 0x00 0x00
0xb7d500c0 <initial_quick+480>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
You can see the 4 corrupted bytes in there (guessing it should all be zeroes). These don't make a whole lot of sense to me in any format (chars, ints, unsigned, etc.).
The interesting part is this location was located B7D50380 - B7D500B8 = 712 bytes before the address glibc labels "main_arena" (at address 0xb7d50380 for me this time), where it does much of its memory allocation out of. It is roughly 1100 bytes long. I'm guessing there is some busted pointer arithmetic along the way, and we have stepped outside of our bounds and written somewhere we shouldn't have.
I have this core dump saved; as long as I don't do a recompile I should be able to garner more info out of this if other people have thoughts. Note that most of the symbols visible on the stack are in far different places (aka the heap): dlf at 0x9d6b4d0, localf at 0x9d6ba88, fileurl at 0x9d69cd0. Potentially interesting is the following:
(gdb) f 26
#26 0xb7f61440 in download_internal (url=<value optimized out>, localpath=<value optimized out>, mtimeold=0, mtimenew=0x0) at dload.c:237
237 _alpm_log(PM_LOG_ERROR, _("%s appears to be truncated: %jd/%jd bytes\n"),
(gdb) p &buffer
$3 = (char (*)[10240]) 0xbf83ea8c
However, that is a hell of a long way away, so I'm not putting much faith in it.
As I wrap up this long comment, I might have gotten somewhere with valgrind...stay tuned.
My line numbers will be a bit off and will need to be adjusted from the current git master due to the following local changes:
diff --git a/lib/libalpm/dload.c b/lib/libalpm/dload.c
index a06cb46..3f0a0c9 100644
--- a/lib/libalpm/dload.c
+++ b/lib/libalpm/dload.c
@@ -158,6 +158,7 @@ static int download_internal(const char *url, const char *localpath,
* something along those lines. Store the old signal handler first. */
new_action.sa_handler = SIG_IGN;
sigemptyset(&new_action.sa_mask);
+ new_action.sa_flags = 0;
sigaction(SIGPIPE, NULL, &old_action);
sigaction(SIGPIPE, &new_action, NULL);
@@ -227,6 +228,9 @@ static int download_internal(const char *url, const char *localpath,
handle->dlcb(filename, dl_thisfile, ust.size);
}
}
+ fprintf(stderr, "nread: %u\n", (unsigned int)nread);
+ fprintf(stderr, "fetchErr: %d\n", fetchLastErrCode);
+ fprintf(stderr, "fetchErr: %s\n", fetchLastErrString);
/* did the transfer complete normally? */
if (ust.size != -1 && dl_thisfile < ust.size) {
Regards,
Mike.
(gdb) run -Syubash-4.0# gdb pacman
GNU gdb 6.8
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(no debugging symbols found)
Starting program: /usr/bin/pacman -Syu
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
:: Synchronizing package databases...
[New Thread 0xb7c6aab0 (LWP 4295)]
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
core is up to date
extra is up to date
community is up to date
archlinuxfr is up to date
:: Starting full system upgrade...
resolving dependencies...
looking for inter-conflicts...
Targets (28): libgl-7.5-2 ati-dri-7.5-2 cryptsetup-1.0.7-1
filesystem-2009.07-1 nss-3.12.3.1-1 xulrunner-1.9.1.2-1
firefox-3.5.2-1 gawk-3.1.7-1 ghostscript-8.70-1
imagemagick-6.5.4.8-1 udev-141-5 initscripts-2009.07-3
libgsf-1.14.15-1 licenses-2.5-1 mesa-7.5-2
mkinitcpio-0.5.26-1 psmisc-22.8-1 xdg-utils-1.0.2-2
cdparanoia-10.2-2 libvisual-0.4.0-2 libtheora-1.0-1
gstreamer0.10-base-plugins-0.10.23-2 qt-4.5.2-5
readline-6.0.004-1 shadow-4.1.4.2-1 syslog-ng-3.0.3-2
wget-1.11.4-2 xorg-server-1.6.3-2
Total Download Size: 89.36 MB
Total Installed Size: 254.58 MB
Proceed with installation? [Y/n] y
:: Retrieving packages from core...
cryptsetup-1.0.7-1-i686 558.3K 333.5K/s 00:00:02 [#####################] 100%
filesystem-2009.07-... 12.6K 120.1K/s 00:00:00 [#####################] 100%
gawk-3.1.7-1-i686 944.0K 395.8K/s 00:00:02 [#####################] 100%
udev-141-5-i686 235.3K 262.9K/s 00:00:01 [#####################] 100%
initscripts-2009.07... 39.0K 121.9K/s 00:00:00 [#####################] 100%
licenses-2.5-1-i686 78.6K 161.5K/s 00:00:00 [#####################] 100%
mkinitcpio-0.5.26-1-any 15.7K 68.9K/s 00:00:00 [#####################] 100%
psmisc-22.8-1-i686 73.5K 185.4K/s 00:00:00 [#####################] 100%
readline-6.0.004-1-i686 389.3K 320.9K/s 00:00:01 [#####################] 100%
shadow-4.1.4.2-1-i686 190.0K 206.3K/s 00:00:04 [###------------------] 16%
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c6aab0 (LWP 4295)]
0xb7d44a76 in memcpy () from /lib/libc.so.6
(gdb) bt full
#0 0xb7d44a76 in memcpy () from /lib/libc.so.6
No symbol table info available.
#1 0x08823e50 in ?? ()
No symbol table info available.
#2 0xb7ff6b74 in http_readfn () from /usr/lib/libfetch.so
No symbol table info available.
#3 0xb7feecea in fetchIO_read () from /usr/lib/libfetch.so
No symbol table info available.
#4 0xb800c5d8 in download_internal () from /usr/lib/libalpm.so.4
No symbol table info available.
#5 0xb800cb55 in _alpm_download_single_file () from /usr/lib/libalpm.so.4
No symbol table info available.
#6 0xb800ccb0 in _alpm_download_files () from /usr/lib/libalpm.so.4
No symbol table info available.
#7 0xb8013200 in _alpm_sync_commit () from /usr/lib/libalpm.so.4
No symbol table info available.
#8 0xb80158fa in _alpm_trans_commit () from /usr/lib/libalpm.so.4
No symbol table info available.
#9 0xb8015998 in alpm_trans_commit () from /usr/lib/libalpm.so.4
No symbol table info available.
#10 0x08051475 in ?? ()
No symbol table info available.
#11 0xbfb3087c in ?? ()
---Type <return> to continue, or q <return> to quit---
No symbol table info available.
#12 0x08056e21 in ?? ()
No symbol table info available.
#13 0x00000005 in ?? ()
No symbol table info available.
#14 0xb8006e0b in _alpm_db_get_pkgfromcache () from /usr/lib/libalpm.so.4
No symbol table info available.
#15 0x08051a07 in ?? ()
No symbol table info available.
#16 0x085f9d40 in ?? ()
No symbol table info available.
#17 0x085f9dd0 in ?? ()
No symbol table info available.
#18 0x00000005 in ?? ()
No symbol table info available.
#19 0xb801c15c in ?? () from /usr/lib/libalpm.so.4
No symbol table info available.
#20 0x00000000 in ?? ()
No symbol table info available.
(gdb)
Moving the mirror from "Server = http://mirror.aarnet.edu.au/pub/archlinux/$repo/os/i686" to another
'resolved' this problem. Could something be so out-of-whack on their end as to crash pacman (perhaps check
their package integrity)? Someone should
inform them ASAP that in a limited number of situations this can occour.
Anyway, even if their server is the problem, we can't let pacman crash or potentially be vulnerable due to it, especially since pacman runs as root.
Running it inside valgrind shows the same errors.
And the dummy malloc I added after the download also causes segfault.
dload-backtrace (1.1 KiB)
@Xavier: Interesting, a more simple testcase, the problem seems that are inside fetchIO_read() -> execute it an wait for the crash.
bt.txt (0.9 KiB)
putting a printf before, to show values
printf("%p %p %d\n", (char *)buf + pos, io->buf + io->bufpos, l);
memcpy((char *)buf + pos, io->buf + io->bufpos, l);
will show, in normal cases:
0xbfe68b74 0x8fc9008 512
0xbfe68b74 0x8fc9008 512
0xbfe68b74 0x8fc9008 123
...
0xbfe68b8f 0x8fc9008 485
0xbfe68b74 0x8fc9008 512
0xbfe68b74 0x8fc9008 443
but in some point (aprox ~5126 calls to fetchIO_read)
the second number become incremented every time, and finally boom!:
0xbfe68b74 0x8fc91c3 512
0xbfe68b74 0x8fc93c3 512
0xbfe68b74 0x8fc95c3 512
...
0xbfe68b74 0x8fe9bc3 512
0xbfe68b74 0x8fe9dc3 512
0xbfe68b74 0x8fe9fc3 512
Segmentation fault
I guess, must be managed in another way but i don't have now much time to investigate now. Probably this will help other people.
now this has been fixed upstream, and we have a shiny new libfetch 2.25 in testing, hopefully soon in core!
So if anyone else can confirm the segfault has been fixed, we can finally close this.