FS#18644 - pacman -Sy sometimes freezes mid-sync in uninterruptible sleep
Attached to Project:
Pacman
Opened by Isaac Dupree (idupree) - Thursday, 11 March 2010, 21:01 GMT
Last edited by Dan McGee (toofishes) - Tuesday, 15 February 2011, 23:05 GMT
Opened by Isaac Dupree (idupree) - Thursday, 11 March 2010, 21:01 GMT
Last edited by Dan McGee (toofishes) - Tuesday, 15 February 2011, 23:05 GMT
|
Details
Summary and Info:
`sudo pacman -Sy` sometimes freezes mid-update in state 'D' -- "uninterruptible sleep", according to 'top'. It's using 0% CPU and very little RAM. But it can't be killed even by kill -9, and it prevents system suspend-to-ram from succeeding... it seems I have to shut down my computer in order to kill it, but then booting, removing db.lck, and running `sudo pacman -Syy` fixes things. Steps to Reproduce: Rarely. Don't know how. Be using my system for a while. I'm using some of my swap space. I upgraded my system a few days ago and haven't shut down since (this time, the only upgrades that look possibly at all related to pacman are: upgraded openssl (0.9.8m-1 -> 0.9.8m-2) upgraded readline (6.1.001-1 -> 6.1.002-1) upgraded shadow (4.1.4.2-1 -> 4.1.4.2-2) upgraded sudo (1.7.2p4-1 -> 1.7.2p5-1) ) Any debugging advice? (Especially while my system with this dead-pacman is still running? This issue happened to me once before, also. Unfortunately GDB isn't installed at the moment.) I seem to remember that last time, I got a backtrace that looked something like the one in this comment http://bugs.archlinux.org/task/16210#comment49650 , but I can't remember how I did it and I might be remembering wrong. |
This task depends upon
Closed by Dan McGee (toofishes)
Tuesday, 15 February 2011, 23:05 GMT
Reason for closing: Duplicate
Additional comments about closing: FS#15369
Tuesday, 15 February 2011, 23:05 GMT
Reason for closing: Duplicate
Additional comments about closing:
I could always run pacman with --debug from now on, in case I hit the bug...(although --debug seemed to slow pacman down a bit).
I've installed gdb now, though I'm not sure it'll help anything.
Is this process state a legitimate thing under the Linux kernel, or is its existence a kernel bug?
Could it be if a server is in the middle of updating its package lists, that pacman gets confused by wrong data?
I noticed, by watching pacman -Sy running normally -- both its console and through 'top' -- an analogous situation. I believe the bug happens right after pacman is finished downloading one of the package lists (e.g. 'extra', or 'community'); it pauses and goes into state "D" for (when there's no bug) just a bit of time (but the couple bug-times, that amount of time has been "forever" :-)
I ran pacman -Syy under valgrind (sudo valgrind --track-origins=yes pacman -Syy) and the only issue it found occurred before downloading any of the three package-lists -- same valgrind result with pacman -Sy --
:: Synchronizing package databases...
==18566== Syscall param rt_sigaction(act->sa_flags) points to uninitialised byte(s)
==18566== at 0x507B1CE: __libc_sigaction (in /lib/libc-2.11.1.so)
==18566== by 0x4E37B5A: download_internal (in /usr/lib/libalpm.so.4.0.3)
==18566== by 0x4E38229: _alpm_download_single_file (in /usr/lib/libalpm.so.4.0.3)
==18566== by 0x4E31D77: alpm_db_update (in /usr/lib/libalpm.so.4.0.3)
==18566== by 0x409C86: ??? (in /usr/bin/pacman)
==18566== by 0x406E8B: ??? (in /usr/bin/pacman)
==18566== by 0x5067B6C: (below main) (in /lib/libc-2.11.1.so)
==18566== Address 0x7feffd648 is on thread 1's stack
==18566== Uninitialised value was created by a stack allocation
==18566== at 0x4E377D1: download_internal (in /usr/lib/libalpm.so.4.0.3)
==18566==
core is up to date
extra is up to date
community is up to date
OK, can you think of anything else to try? I suppose I could set a loop going of pacman -Sy all the time, to see if I get a bug, though that seems like it'd be rather abusing the mirror-servers :-/ And it might be worth seeking out a kernel/Unix expert to tell me what that state "D" could mean... If any other of my programs hung in this way, I'd suspect a kernel issue more, but it's only been pacman...
Also, that valgrind uninitialized-bytes issue seems mighty suspicious.