FS#37720 - [linux] Kernel panic on resume from suspend (3.12 only)

Attached to Project: Arch Linux
Opened by Maël Kerbiriou (Piezoid) - Sunday, 10 November 2013, 15:21 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Tuesday, 21 January 2014, 00:28 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
Kernel panic when resuming from suspend.
"freezer" and "devices" > /sys/power/pm_test modes are working fine, but "platform" cause the kernel panic.


Additional info:
* package version(s)
- linux 3.12-1 cause the problem.
- linux 3.11.6-1 was working fine.
* config and/or log files etc.
Kernel panic is mentioning synaptic and ps2 drivers. I don't know how to log better debug information (pstore remains empty).


Steps to reproduce:
# systemctl suspend
This task depends upon

Closed by  Gerardo Exequiel Pozzi (djgera)
Tuesday, 21 January 2014, 00:28 GMT
Reason for closing:  Fixed
Additional comments about closing:  Tested with 3.12.8-1, problem solved !
Comment by Maël Kerbiriou (Piezoid) - Sunday, 10 November 2013, 16:11 GMT
Kernel panics are quite random...
I managed to get a "platform" suspend without kernel panic (first boot on the syslog with resumedelay=30). I got some
ACPI Error: Method parse/execution failed
CPI Error: [_T_0] Namespace lookup failure, AE_ALREADY_EXISTS

Then I reboot to try to reproduce that, but I get a kernel panic, this time partially logged by journald :
BUG: unable to handle kernel paging request at ffffc900175a0014
[...]
Call Trace:
[<ffffffffa05cc525>] ? rtsx_pci_card_detect+0x55/0x1d0 [rtsx_pci]
[...]

User space had enough time to wake up.

Next reboots are less interesting because kernels errors were not logged.
   syslog.log (514.4 KiB)
Comment by Francis Moreau (fmoreau) - Sunday, 17 November 2013, 13:11 GMT
I have a similar issue when resuming: userspace resumes fine but soon I've got a black screen. Last time instead of the black screen, I got a kernel panic which can be seen here: http://imgur.com/f5uWFTY

I reported this to LKML: https://lkml.org/lkml/2013/11/17/21

Thanks.
Comment by Maël Kerbiriou (Piezoid) - Monday, 18 November 2013, 17:13 GMT Comment by Francis Moreau (fmoreau) - Monday, 18 November 2013, 21:56 GMT
Not yet. But I'm going to install first a new archlinux system on an usb stick in order to do some testing and preserve my current system.

After that I'll try the previsous kernel and if it works, I'll try to make a git-bisect session.
Comment by Bastien Traverse (Neitsab) - Thursday, 05 December 2013, 16:14 GMT
I have the exact same problem as @fmoreau and went through the LKML thread (well, its archived version from http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=1817516;page=1;sb=post_latest_reply;so=ASC;mh=25;list=linux). Penultimate message (http://www.gossamer-threads.com/lists/linux/kernel/1825540#1825540) offers a patch that should work around the issue with the rtsx driver (it happens that I also have a Clevo laptop, W540EU in my case).

Question for @fmoreau : what is the temporary solution while waiting for this patch to be integrated into the kernel ? Downgrade to 3.11.6, blacklist rtsx_pci module (it appears I have two entry for a realtek device, one of unassigned class that uses the rtsx driver/module and one for the Ethernet controller using r8169), build myself a kernel integrating this patch (this would be the first time I dig into kernel compiling, so I'm not so eager to follow this solution)?
Thanks
Comment by Francis Moreau (fmoreau) - Thursday, 05 December 2013, 19:55 GMT
Well, the easiest workaround is to simply unload the boggus modules (rtsx_pci_ms and memstick). In my understanding they're only used for accessing SD cards. After that suspend/resume cycles should work.

Sadly, the driver's developer doesn't seem to care about this bug. Maybe you could confirm on LKML that you're also hit by this bug, it might help to make things done.
Comment by Bastien Traverse (Neitsab) - Thursday, 05 December 2013, 23:03 GMT
Thanks for your quick reply, this is what I did. In my case I simply blacklisted the rtsx_pci module (the only one loaded) and suspend to RAM worked as intended. Now it resumes without a flaw, which is nice.
Joining the LKML thread and adding my use case was my first intention, but given the complexity of the procedures routinely asked there, and my likely inability to provide relevant info (the bisect thing killed me ;-)) besides my absence of knowledge of how to present the case, I preferred contacting you here/subscribing to the Arch bug. Although I saw the developper/maintainer's contact info in the output of modinfo rtsx_pci (that's Wei WANG, right?), I didn't know if emailing him was an acceptable way to further this issue...

I'm preparing a mail to the LKML thread, but what info should I include for my report to be useful?
Thanks
Comment by Francis Moreau (fmoreau) - Saturday, 07 December 2013, 20:35 GMT
I think simply telling that you're also affected by this bug is enough. That might give more importance to this bug.

Most of the information needed to debug and fix it has been dig out therefore I don't think you'll be asked to provide more information.

Thanks
Comment by Francis Moreau (fmoreau) - Thursday, 16 January 2014, 08:40 GMT
This should be fixed in 3.12.8 by commit named : "mfd: rtsx_pcr: Disable interrupts before cancelling delayed works"
Comment by Maël Kerbiriou (Piezoid) - Friday, 17 January 2014, 13:56 GMT
Nice ! Thanks for all the digging, to you and kernel developers.
Comment by Iván Perdomo (katratxo) - Friday, 17 January 2014, 14:23 GMT
Thanks for your support on this issue.

Cheers.
Comment by Bastien Traverse (Neitsab) - Friday, 17 January 2014, 14:55 GMT
Another thank you for your involvement and follow-up. Great news.

Loading...