FS#69702 - [linux][linux-zen] 5.11, dell_wmi_sysman causes unbootable system

Attached to Project: Arch Linux
Opened by Borislav Gerassimov (slimmer) - Friday, 19 February 2021, 08:32 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 31 March 2021, 08:13 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Jan Alexander Steffens (heftig)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
With both of the above kernels at version 5.11 the system won't boot unless the dell_wmi_sysman is blacklisted somehow. Otherwise it boots and works fine. Tested on a Dell Latitude E5570 with the latest updates (including the latest firmware from fwupd).

Additional info:
Affected kernels:
linux 5.11.arch2-1
linux-zen 5.11.zen2-2
* config and/or log files etc.
* link to upstream bug report, if any

Steps to reproduce:
1. Install linux/linux-zen from testing (5.11)
2. Reboot.
3. Enjoy a general protection fault :)
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Wednesday, 31 March 2021, 08:13 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 5.11.11.arch1-1
Comment by Borislav Gerassimov (slimmer) - Friday, 19 February 2021, 08:37 GMT
The failed boot.
Comment by Borislav Gerassimov (slimmer) - Friday, 19 February 2021, 08:58 GMT
Looking at another bugreport, I decided to test with linux-mainline-5.11-1 from the miffe unofficial repository. The system boots fine. BUT there is no module dell_wmi_sysmain, which probably is not enabled in the config file.
Unfortunately I've got no time to investigate this further at the moment.
Comment by loqs (loqs) - Friday, 19 February 2021, 20:32 GMT Comment by Borislav Gerassimov (slimmer) - Saturday, 20 February 2021, 20:00 GMT
@loqs - Isn't this merged just before 5.11rc6 according to this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=215164bfb7144c5890dd8021ff06e486939862d4
or I'm reading somethong wrong? :)
Comment by loqs (loqs) - Saturday, 20 February 2021, 20:26 GMT
No you are right I should have double checked. There is [1] but it does not mention a null pointer deference.
Perhaps your bug has not been reported upstream yet?

[1] https://lore.kernel.org/platform-driver-x86/20210218191723.20030-1-mario.limonciello%40dell.com/
Comment by Borislav Gerassimov (slimmer) - Saturday, 20 February 2021, 20:33 GMT
Maybe. I saw that patch yesterday too. There is also some activity around Dell drivers in the linux-next as far as I saw.
The problem is... I haven't compiled a kernel since... my slackware times (2000) and this patching stuff is a bit "distant". If I have time on monday, I will try to build the arch kernel with this patch and see if it makes any difference.
Comment by Borislav Gerassimov (slimmer) - Monday, 22 February 2021, 12:55 GMT
I've just built linux-next-git 20210222 and tried booting it but the exact same problem persists...

So I created this: https://bugzilla.kernel.org/show_bug.cgi?id=211895
Comment by loqs (loqs) - Monday, 22 February 2021, 20:27 GMT
If you do not hear anything in a few weeks you could try the platform-driver-x86@vger.kernel.org mailing list mentioned in the output below:

perl scripts/get_maintainer.pl drivers/platform/x86/dell-wmi-sysman/sysman.c
Divya Bharathi <divya.bharathi@dell.com> (maintainer:DELL WMI SYSMAN DRIVER)
Mario Limonciello <mario.limonciello@dell.com> (maintainer:DELL WMI SYSMAN DRIVER)
Prasanth Ksr <prasanth.ksr@dell.com> (maintainer:DELL WMI SYSMAN DRIVER)
Hans de Goede <hdegoede@redhat.com> (maintainer:X86 PLATFORM DRIVERS)
Mark Gross <mgross@linux.intel.com> (maintainer:X86 PLATFORM DRIVERS)
platform-driver-x86@vger.kernel.org (open list:DELL WMI SYSMAN DRIVER)
linux-kernel@vger.kernel.org (open list)
Comment by Hans de Goede (hansdegoede) - Saturday, 20 March 2021, 14:40 GMT
I've prepared and posted a set of patches which deal with various problems with error-exit path cleanups and general robustness of the dell-wmi-sysman driver:

"https://lore.kernel.org/platform-driver-x86/20210320143429.76047-1-hdegoede@redhat.com/T/#t"

Note it is not entirely clear to me what is going on here, so I'm not sure if these patches fix things but hopefully they will help.

What would be helpful, independent of testing the patches, is if someone could boot a 5.11 kernel with dell-wmi-sysman blacklisted to avoid the problem.

And then:

1. Switch to a text-console
2. ssh into the machine and run dmesg -w
3. ssh into the machine a second time and run: "sudo modprobe dell_wmi_sysman dyndbg"

And then collect log info from the "dmesg -w" and in case there are log messages on the text-console which did not make it into the ssh dmesg -w output, make a picture of those.

And if you are capable of building your own kernels then testing the patches would be great too of course (save the emails in "raw" format and then "git am" them).
Comment by Hans de Goede (hansdegoede) - Sunday, 21 March 2021, 12:24 GMT
Thanks to testing by a Fedora user I now have confirmation that my patches fix this.

I've posted a v2 of the patches, adding one more robustness fix and dropping one patch which needs more testing:

"https://lore.kernel.org/platform-driver-x86/20210321115901.35072-1-hdegoede@redhat.com/T/#t"

I'll work on getting these merged by Linus and then also on getting them added to the stable kernel series (I'm the drivers/platform/x86 maintainer). In the mean time it would be best if distros carry the v2 patch-series as downstream patches.
Comment by Borislav Gerassimov (slimmer) - Tuesday, 23 March 2021, 13:38 GMT
Just FYI: This is fixed in Linux Next 20210323 - I've just tested it - the laptop boots fine without needing the workaround (it's stuck on reboot, but that's another issue, maybe - have no time to debug now).
Comment by loqs (loqs) - Friday, 26 March 2021, 21:45 GMT
@slimmer can you try applying dell-wmi-sysman.patch which is the seven patch series mentioned above with the path changed so it applies to 5.11.Y.
Included as a separate patch is the other commit to the module from linux-next that is not part of the series.

If you do not want to build a patched kernel I can build it for you.
Comment by Borislav Gerassimov (slimmer) - Saturday, 27 March 2021, 08:44 GMT
Sure but maybe later in the evening, tomorrow or monday at worst. If you need results sooner, I can test kernel ( or two - I guess we should try with the first patch, then with both?) built by you this evening :)
Comment by loqs (loqs) - Saturday, 27 March 2021, 12:12 GMT
https://drive.google.com/file/d/1MrVtbOizGECAtnMSniCGojqj4_l24uTC/view?usp=sharing linux-5.11.10.arch1-1.1-x86_64.pkg.tar.zst

This just contains the first patch.
Comment by Borislav Gerassimov (slimmer) - Saturday, 27 March 2021, 18:40 GMT
@loqs I'm writing from a booted system E5570 with this kernel (thanks for the build). Everything seems normal, nothing out of the ordinary in journalctl, just (as before):

[ 6.758172] acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
[ 6.762750] acpi PNP0C14:03: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)

Modprobbing gives this (I guess expected) result:

modprobe: ERROR: could not insert 'dell_wmi_sysman': No such device

The system works as expected.
Comment by Constantine (Hi-Angel) - Wednesday, 31 March 2021, 06:58 GMT
5.11.11 kernel with all the mentioned patches to dell_wmi_sysman is in the repo now.
Comment by Borislav Gerassimov (slimmer) - Wednesday, 31 March 2021, 07:17 GMT
Yes, I've just tried both kernels above - everything is back to normal. I will request closure. Thanks to all involved!

Loading...