FS#18334 - [kernel26] suspend to disk breaks the system (process segfault)
Attached to Project:
Arch Linux
Opened by Adrian C. (anrxc) - Monday, 15 February 2010, 03:33 GMT
Last edited by Andrea Scarpino (BaSh) - Thursday, 05 August 2010, 18:11 GMT
Opened by Adrian C. (anrxc) - Monday, 15 February 2010, 03:33 GMT
Last edited by Andrea Scarpino (BaSh) - Thursday, 05 August 2010, 18:11 GMT
|
Details
Hi,
I am not sure where to file this; kernel26, pm-utils, uswsusp, some other package. I am tagging as kernel26 because 7 days ago on my last hibernation everything was working, and kernel26 was upgraded in the mean time, while uswsusp and pm-utils were not. I am using software suspend, uswsusp, for a long time now. At the moment pm-suspend calling s2ram still works, as usual. However pm-hibernate calling s2disk does not. Machine properly wakes up from hibernation but the system is broken, I can not start any new processes because they crash, and some time even resumed process will segfault; I had an X11 crash, Emacs crashed... Being that I can not start any new process it is hard to find any useful data. Only thing I extracted from the logs are those segfaults, below are some examples: Additional info: Feb 15 04:19:04 katana kernel: zsh[6376]: segfault at 0 ip b787bc02 sp bfa380b0 error 6 in ld-2.11.1.so[b7869000+1c000] Feb 15 04:19:25 katana kernel: sudo[6411]: segfault at 0 ip b7785c02 sp bfba6690 error 6 in ld-2.11.1.so[b7773000+1c000] Steps to reproduce: |
This task depends upon
Closed by Andrea Scarpino (BaSh)
Thursday, 05 August 2010, 18:11 GMT
Reason for closing: Fixed
Additional comments about closing: kernel26 2.6.34.2-2
Thursday, 05 August 2010, 18:11 GMT
Reason for closing: Fixed
Additional comments about closing: kernel26 2.6.34.2-2
ls: symbol lookup error: /lib/libc.so.6: undefined symbol: error_print_progname, version GLIBC_2.0
Every 3 months it breaks all over again, in the mean time while it does work you simply can not use it because it is like playing lottery with your work, your data and your hardware. You never know where it will break again. So to your earlier question I say again I did reboot, because I shutdown 99% of the time as I don't dare hibernate. It is 2010, for two years I am in constant battle with hibernation - system freeze on resume, no keyboard on resume, graphic corruption on resume, system broken on resume...
One could say that when it works you can stop upgrading, and rolling releases is to blame. But on the other side you keep upgrading because of other software or drivers that is in a poor state.
But I don't hope much, I'm almost certain it's not an uswsusp problem. The memory gets corrupted for some reason, and so far the KMS looks like it (from other bug reports I found now). Unfortunately Arch now forces KMS on Intel graphics and I am left without options to test this:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=534422
https://bugzilla.redhat.com/show_bug.cgi?id=524905
ps: relocation error: /lib/libnss_files.so.2: symbol fgets_unlocked, version GLIBC_2.1 not defined in file libc.so.6 with link time reference
This corruption is probably caused by KMS, if that is so I understand you can not do much unless there is an existing patch available to address the exact issue. At least there is a point of reference to any Intel owner that suffers from the same problem. I have the GM965 chipset, and have been using KMS for some time now, this problem started with kernel26 2.6.32.8 for me.
Main upstream bug report is apparently this one: http://bugzilla.kernel.org/show_bug.cgi?id=13811
i thought it was related to me having btrfs as my root partition, but it doesn't seem so. no FS corruption... luckily.
from logs:
Jun 26 18:51:18 extofem-n0 kernel: PM: restore of devices complete after 936.999 msecs
Jun 26 18:51:18 extofem-n0 kernel: PM: Image restored successfully.
Jun 26 18:51:18 extofem-n0 kernel: Restarting tasks ...
Jun 26 18:51:18 extofem-n0 kernel: hald[1474]: segfault at fffffffe ip b755d000 sp bfe42f9c error 6 in libc-2.12.so[b749c000+145000]
Jun 26 18:51:18 extofem-n0 kernel: udevd[1288]: segfault at ffffffea ip b7793000 sp bfa994cc error 6 in libc-2.12.so[b76d2000+145000]
..........
with many other fails after that, including bash/anything that tries to start up. same error in libc-2.12.so for everything.
Jun 30 00:06:55 dimich kernel: fbrun[21342]: segfault at b73d8968 ip b7791410 sp bfe34e74 error 7 in ld-2.12.so[b7788000+1c000]
Jun 30 00:06:57 dimich kernel: fbrun[21343]: segfault at b73c5968 ip b777e410 sp bffaac44 error 7 in ld-2.12.so[b7775000+1c000]
Jul 2 00:58:17 dimich kernel: udevd[2899]: segfault at d10a439 ip b7772c22 sp bfa5690c error 4 in libc-2.12.so[b7706000+145000]
Jul 2 00:58:18 dimich kernel: acpid[3181]: segfault at d153439 ip b77bbc22 sp bfd92144 error 4 in libc-2.12.so[b774f000+145000]
Jul 2 00:58:20 dimich kernel: tilda[3427]: segfault at c99d439 ip b7005c22 sp bfa0923c error 4 in libc-2.12.so[b6f99000+145000]
Jul 3 23:57:42 dimich kernel: less[6921]: segfault at 69f30106 ip b76c2403 sp bff2563c error 6 in libc-2.12.so[b75ef000+145000]
(See detailed log in attachment).
Unfortunately it only went into 2.6.35-rc4, so still not working in Arch with latest kernel26 2.6.34.1... I can't wait to have power management again, this was a very ugly bug.