FS#33431 - [linux] 3.7.x - 3.10.x fails to resume from s2ram (Thinkpad T43)

Attached to Project: Arch Linux
Opened by Jonas Heinrich (onny) - Friday, 18 January 2013, 11:43 GMT
Last edited by Tobias Powalowski (tpowa) - Tuesday, 06 August 2013, 06:31 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture i686
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
Since kernel version 3.7.1 from the testing-repository, my Thinkpad T43 fails to resume from suspend to ram. The screen remains blank, only the sleep led stops blinking and the hdd wakes up.
I compiled the latest stable kernel 3.7.3 with PM_DEBUG and PM_TRACE enaled (see attached log) and tried to debug it following this documentation http://www.kernel.org/doc/Documentation/power/s2ram.txt . But in dmesg, "Magic number" appears without further information:

540 [ 0.997657] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
541 [ 1.003574] PM: Hibernation image not present or could not be loaded.
542 [ 1.003591] registered taskstats version 1
543 [ 1.004037] Magic number: 13:859:428
544 [ 1.004129] rtc_cmos 00:06: setting system clock to 2013-01-18 11:25:17 UTC (1358508317)
545 [ 1.004267] Freeing unused kernel memory: 552k freed
546 [ 1.004586] Write protecting the kernel text: 4080k
547 [ 1.004613] Write protecting the kernel read-only data: 1248k
548 [ 1.016965] systemd-udevd[40]: starting version 197
549 [ 1.071171] ACPI: bus type usb registered

Does someone knows further debugging steps?

Additional info:
* package version(s)
pm-utils 1.4.1-5

* config and/or log files etc.


Steps to reproduce:
pm-suspend
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Tuesday, 06 August 2013, 06:31 GMT
Reason for closing:  Fixed
Additional comments about closing:  3.10.5
Comment by Jonas Heinrich (onny) - Friday, 18 January 2013, 11:50 GMT
cat /proc/acpi/wakeup
   wakeup (0.4 KiB)
Comment by Jan (medhefgo) - Tuesday, 22 January 2013, 11:55 GMT
Did you try running "cat /sys/power/pm_trace_dev_match" later when all modules you use are loaded? The doc says that the hash may not match if the module in question hasn't been loaded yet.
Comment by Jonas Heinrich (onny) - Wednesday, 23 January 2013, 16:39 GMT
@medhefgo: I guess this file does only exist if I rebuild my kernel with the debug options enabled. Currently I'm running 3.7.4 (testing) where this bug is still present.
I'll check for that file after I recompiled the kernel and report here, thanks for the hint.
Comment by Jonas Heinrich (onny) - Saturday, 26 January 2013, 09:17 GMT
Tried it again with 3.7.4 and following conf:

onny@onny> cat config | grep CONFIG_PM ~
CONFIG_PM_STD_PARTITION=""
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_AUTOSLEEP=y
CONFIG_PM_WAKELOCKS=y
CONFIG_PM_WAKELOCKS_LIMIT=100
CONFIG_PM_WAKELOCKS_GC=y
CONFIG_PM_RUNTIME=y
CONFIG_PM=y
CONFIG_PM_DEBUG=y
CONFIG_PM_ADVANCED_DEBUG=y
# CONFIG_PM_TEST_SUSPEND is not set
CONFIG_PM_SLEEP_DEBUG=y
CONFIG_PM_TRACE=y
CONFIG_PM_TRACE_RTC=y
CONFIG_PMBUS=m
# CONFIG_PM_DEVFREQ is not set

I ran the script mentioned here: http://www.kernel.org/doc/Documentation/power/s2ram.txt . After reboot (wakeup from suspend failed ofc), dmesg has now this interesting lines:

[ 0.995334] registered taskstats version 1
[ 0.995779] Magic number: 13:938:121
[ 0.995782] hash matches drivers/base/power/main.c:515
[ 0.995874] rtc_cmos 00:06: setting system clock to 2013-01-26 09:07:07 UTC (1359191227)

/sys/power/pm_trace_dev_match is still empty :(
Comment by Jan (medhefgo) - Saturday, 26 January 2013, 12:20 GMT
I guess you'll have to resort to a bisecting the kernel to find the culprit commit...
Comment by Jonas Heinrich (onny) - Saturday, 26 January 2013, 15:06 GMT
@medhefgo: any good method to do that? I guess I should install a x32 arch on my root server to compile the kernels there instead on my old t43 ...
I guess I should git clone in old revisions of the linux PKGBUILD of the testing repository?
Comment by Jonas Heinrich (onny) - Wednesday, 06 February 2013, 21:42 GMT
I've tested several kernels and the regression should be between 3.6.11 and 3.7-rc1. Besides that, kernel 3.8-rc6 also fails to resume from suspend on my notebook. Any way to further bisect the kernel and to find the culprit commit?
Comment by Jan (medhefgo) - Thursday, 07 February 2013, 13:45 GMT
Well, git bisect is your friend here. You could try speeding it up a little by restricting the bisection to drivers/base/power in hope that the culprit is somewhere there. Otherwise a full bisection would be your only choice.
Comment by Tobias Powalowski (tpowa) - Wednesday, 27 February 2013, 11:25 GMT
Status on 3.8?
Comment by Jonas Heinrich (onny) - Thursday, 28 February 2013, 03:43 GMT
Still present, but I was able to narrow down the cause of this bug: http://linux-kernel.2935.n7.nabble.com/Bisected-3-7-rc1-can-t-resume-td602326.html
Comment by Jonas Heinrich (onny) - Friday, 03 May 2013, 11:21 GMT
Bug still present with kernel 3.9 stable. Wrote a more minimal patch (see attachement) and also submitted it to the kernel devs concerning this issue. Hope they can fix this soon :/
Comment by Jonas Heinrich (onny) - Thursday, 04 July 2013, 21:56 GMT
Bug still present in 3.10. Applying the patch solves the issue.
Comment by Jonas Heinrich (onny) - Monday, 08 July 2013, 14:09 GMT
This bug might be related: https://bugs.archlinux.org/task/33516
Maybe the notebooks Thinkpad X60 and Samsung X20 are also affected by this regression.

I was able to check the notebooks Thinkpad R52 and T40 but both work fine using the latest stable kernel.

I attached a file with dmidecode output.

Loading...