Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#75666 - Massive Kernel issues with 5.19 and Thinkpad X1 Carbon Gen10

Attached to Project: Arch Linux
Opened by Benedict Schlüter (Kakashiy) - Monday, 22 August 2022, 07:35 GMT
Last edited by Toolybird (Toolybird) - Wednesday, 24 August 2022, 06:48 GMT
Task Type Bug Report
Category Kernel
Status Assigned
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
David Runge (dvzrv)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 7
Private No

Details

Description: Since the update to 5.19, I have many issues with the Kernel. I cannot locate the root cause, but something is messing up the kernel heavily, which also brings it to fail at other parts. From what I can tell, it is somehow Intel driver-related.

Sometimes the laptop isn't booting correctly (nullprt dereference and only a black screen), sometimes there are only errors in dmesg, and sometimes unrelated systemscalls fail and firefox isn't loading the webpage.

I have multiple dmesg kernel logs of this behavior.

Additional info:
* 5.19.2-arch1-2

Steps to reproduce:
Boot recent kernel on a Thinkpad X1 Carbon Gen10 (i7 1260p 32GB Ram 1TB SSD, fm350-gl)
This task depends upon

Comment by mindless_canary (mindless_canary) - Monday, 22 August 2022, 08:11 GMT
I can confirm that for my T480 (Intel Core i5 8250U with Intel UHD Graphics 620, 16GB RAM).
I basically can't use the linux kernel package and had to switch to linux-lts last monday (15.08.). Sometimes the kernel panics with nullptr dereference on loading gdm, sometimes it takes some minutes after logging in.

```
Aug 15 15:03:31 elanor.local kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Aug 15 15:03:31 elanor.local kernel: #PF: supervisor instruction fetch in kernel mode
Aug 15 15:03:31 elanor.local kernel: #PF: error_code(0x0010) - not-present page
```

I can't provide any logs. Cause there simply aren't any apart from the above. I can't reproduce anything on tty. It's absolutely frustrating.

Comment by Benedict Schlüter (Kakashiy) - Monday, 22 August 2022, 08:16 GMT
Yep, I also tried a custom kernel, but it seems Linux Firmware hasn't been updated to use GUC 70.XXX (https://www.phoronix.com/news/GuC-Firmware-ADL-P-Linux-5.19). At least the custom Kernel reports GUC 69.XXX. Not sure if I did something wrong or if the Firmware package hasn't been upgraded.
Comment by AK (Andreaskem) - Monday, 22 August 2022, 11:15 GMT
GuC 69 is supposed to be supported. The corresponding commit landed in 5.19rc8:

https://lwn.net/Articles/902348/

Daniele Ceraolo Spurio (1):
drm/i915/guc: support v69 in parallel to v70
Comment by Jozsef Lazar (joelazar) - Monday, 22 August 2022, 14:59 GMT
I'm experiencing some trouble with my Thinkpad X1 Carbon Gen 9 too. I got a kernel panic with 5.19.3 today -->

```
Aug 22 15:02:16 thinkpad-x1 kernel: PKRU: 55555554
Aug 22 15:02:16 thinkpad-x1 kernel: CR2: 0000000000000380 CR3: 000000015e14c001 CR4: 0000000000f70ef0
Aug 22 15:02:16 thinkpad-x1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 22 15:02:16 thinkpad-x1 kernel: FS: 0000000000000000(0000) GS:ffff945dcf600000(0000) knlGS:0000000000000000
Aug 22 15:02:16 thinkpad-x1 kernel: R13: ffff945ac3c101e8 R14: 0000000000000000 R15: ffff945c2246b6c0
Aug 22 15:02:16 thinkpad-x1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff945a81edb810
Aug 22 15:02:16 thinkpad-x1 kernel: RBP: ffff945ac3c101a8 R08: 0000000000000000 R09: 0000000000000000
Aug 22 15:02:16 thinkpad-x1 kernel: RDX: 0000000000000380 RSI: ffff945a80051000 RDI: 0000000000000140
Aug 22 15:02:16 thinkpad-x1 kernel: RAX: 0000000000000206 RBX: 0000000000000206 RCX: 000000000b9b2a00
Aug 22 15:02:16 thinkpad-x1 kernel: RSP: 0018:ffffb59e8601fe28 EFLAGS: 00010006
Aug 22 15:02:16 thinkpad-x1 kernel: Code: 00 0f 0b e9 d8 fc ff ff 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 <f0> 48 0f ba 2a 00 73 15 31 c9 80 e7 02 74 06 fb 0f 1f 44 00 00 89
Aug 22 15:02:16 thinkpad-x1 kernel: RIP: 0010:queue_work_on+0x19/0x50
Aug 22 15:02:16 thinkpad-x1 kernel: ---[ end trace 0000000000000000 ]---
Aug 22 15:02:16 thinkpad-x1 kernel: CR2: 0000000000000380
Aug 22 15:02:16 thinkpad-x1 kernel: acpi_cpufreq():1 pcc_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 fjes():1 acpi_cpufreq():1
Aug 22 15:02:16 thinkpad-x1 kernel: Unloaded tainted modules: pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 pcc_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1 acpi_cpufreq():1>
Aug 22 15:02:16 thinkpad-x1 kernel: processor_thermal_device_pci_legacy i2c_hid processor_thermal_device cfg80211 processor_thermal_rfim intel_lpss_pci processor_thermal_mbox mac_hid intel_lpss processor_thermal_rapl intel_hid int3400_thermal intel_rapl_common rfkill idma64 in>
Aug 22 15:02:16 thinkpad-x1 kernel: soundwire_intel soundwire_generic_allocation soundwire_cadence vfat snd_sof_intel_hda fat snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core>
Aug 22 15:02:16 thinkpad-x1 kernel: Modules linked in: iptable_mangle iptable_raw xt_connmark xt_mark ip6table_mangle ip6table_raw wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel uinput rfco>
Aug 22 15:02:16 thinkpad-x1 kernel: </TASK>
Aug 22 15:02:16 thinkpad-x1 kernel: ret_from_fork+0x1f/0x30
Aug 22 15:02:16 thinkpad-x1 kernel: ? kthread_complete_and_exit+0x20/0x20
Aug 22 15:02:16 thinkpad-x1 kernel: kthread+0x13c/0x160
Aug 22 15:02:16 thinkpad-x1 kernel: ? process_one_work+0x410/0x410
Aug 22 15:02:16 thinkpad-x1 kernel: worker_thread+0x55/0x4d0
Aug 22 15:02:16 thinkpad-x1 kernel: process_one_work+0x255/0x410
Aug 22 15:02:16 thinkpad-x1 kernel: acpi_os_execute_deferred+0x17/0x30
Aug 22 15:02:16 thinkpad-x1 kernel: acpi_ev_notify_dispatch+0x4b/0x63
Aug 22 15:02:16 thinkpad-x1 kernel: ucsi_acpi_notify+0xa8/0xb9 [ucsi_acpi aa80e2b9fc348e78bb20c12eb17c1bdafc3ceb45]
Aug 22 15:02:16 thinkpad-x1 kernel: <TASK>
Aug 22 15:02:16 thinkpad-x1 kernel: Call Trace:
Aug 22 15:02:16 thinkpad-x1 kernel: PKRU: 55555554
Aug 22 15:02:16 thinkpad-x1 kernel: CR2: 0000000000000380 CR3: 000000015e14c001 CR4: 0000000000f70ef0
Aug 22 15:02:16 thinkpad-x1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 22 15:02:16 thinkpad-x1 kernel: FS: 0000000000000000(0000) GS:ffff945dcf600000(0000) knlGS:0000000000000000
Aug 22 15:02:16 thinkpad-x1 kernel: R13: ffff945ac3c101e8 R14: 0000000000000000 R15: ffff945c2246b6c0
Aug 22 15:02:16 thinkpad-x1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff945a81edb810
Aug 22 15:02:16 thinkpad-x1 kernel: RBP: ffff945ac3c101a8 R08: 0000000000000000 R09: 0000000000000000
Aug 22 15:02:16 thinkpad-x1 kernel: RDX: 0000000000000380 RSI: ffff945a80051000 RDI: 0000000000000140
Aug 22 15:02:16 thinkpad-x1 kernel: RAX: 0000000000000206 RBX: 0000000000000206 RCX: 000000000b9b2a00
Aug 22 15:02:16 thinkpad-x1 kernel: RSP: 0018:ffffb59e8601fe28 EFLAGS: 00010006
Aug 22 15:02:16 thinkpad-x1 kernel: Code: 00 0f 0b e9 d8 fc ff ff 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 53 9c 58 0f 1f 40 00 48 89 c3 fa 0f 1f 44 00 00 <f0> 48 0f ba 2a 00 73 15 31 c9 80 e7 02 74 06 fb 0f 1f 44 00 00 89
Aug 22 15:02:16 thinkpad-x1 kernel: RIP: 0010:queue_work_on+0x19/0x50
Aug 22 15:02:16 thinkpad-x1 kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Aug 22 15:02:16 thinkpad-x1 kernel: Hardware name: LENOVO 20XWCTO1WW/20XWCTO1WW, BIOS N32ET76W (1.52 ) 04/08/2022
Aug 22 15:02:16 thinkpad-x1 kernel: CPU: 0 PID: 52605 Comm: kworker/0:0 Tainted: G U OE 5.19.3-zen1-1-zen #1 2e8831c2f52fc0efe7a851816bca15ed55b4761c
Aug 22 15:02:16 thinkpad-x1 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Aug 22 15:02:16 thinkpad-x1 kernel: PGD 0 P4D 0
Aug 22 15:02:16 thinkpad-x1 kernel: #PF: error_code(0x0002) - not-present page
Aug 22 15:02:16 thinkpad-x1 kernel: #PF: supervisor write access in kernel mode
Aug 22 15:02:16 thinkpad-x1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000380
Aug 22 15:02:15 thinkpad-x1 kernel: usb 3-6.3: USB disconnect, device number 8
Aug 22 15:02:15 thinkpad-x1 kernel: usb 3-6.2: USB disconnect, device number 7
Aug 22 15:02:15 thinkpad-x1 kernel: usb 3-6.1: USB disconnect, device number 6
Aug 22 15:02:15 thinkpad-x1 kernel: usb 3-6: USB disconnect, device number 4
```
Comment by Toolybird (Toolybird) - Wednesday, 24 August 2022, 06:47 GMT
If this is a kernel regression (which it appears to be), you can try to help things along by following [1].

It's also worth:

- trying latest -rc kernel to see if problems are resolved
- performing a git bisection to find the offending commit [2]

then report back here with any findings.

[1] https://wiki.archlinux.org/title/Kernel#Debugging_regressions
[2] https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
Comment by none given (hoban) - Thursday, 25 August 2022, 19:15 GMT
I'm seeing this issue with a similar model (Thinkpad X1 Extreme (Gen 1)):
```
❯ sudo journalctl --since "2022-08-16" | grep "Oops" | tail -3
Aug 22 09:55:24 x1e kernel: Oops: 0010 [#38] PREEMPT SMP PTI
Aug 22 09:55:39 x1e kernel: Oops: 0010 [#39] PREEMPT SMP PTI
Aug 22 09:55:39 x1e kernel: Oops: 0010 [#40] PREEMPT SMP PTI

❯ sudo journalctl --since "2022-08-16" | grep "Oops" | wc -l
1013
```

I stopped seeing those logs & stopped having system lock ups by downgrading & ignoring upgrades for the following packages:
```
warning: linux: ignoring package upgrade (5.18.16.arch1-1 => 5.19.3.arch1-1)
warning: linux-headers: ignoring package upgrade (5.18.16.arch1-1 => 5.19.3.arch1-1)
warning: nvidia: ignoring package upgrade (515.65.01-2 => 515.65.01-8)
```

I think maybe https://bbs.archlinux.org/viewtopic.php?id=279027 is related. I'll mention this bug report also.
Comment by Benedict Schlüter (Kakashiy) - Friday, 26 August 2022, 11:07 GMT
I try to bisect it over the weekend (can take some time to compile tho), but 5.18 was fine, so I assume was introduced somewhere in 5.19
Comment by loqs (loqs) - Friday, 26 August 2022, 20:02 GMT
You can find some prebuilt bisection kernels in https://bbs.archlinux.org/viewtopic.php?id=279027&p=2 that may help.
Edit:
One issue was bisected to 87d0e2f41b8cc2018499be4e8003fa8c09b6f2fb https://bugzilla.kernel.org/show_bug.cgi?id=216422
Comment by Adam Beavan (ajbeavan) - Monday, 05 September 2022, 19:17 GMT
I have the same problem with continual freezing, but nothing in the log files
Comment by loqs (loqs) - Monday, 05 September 2022, 20:04 GMT Comment by loqs (loqs) - Thursday, 08 September 2022, 22:27 GMT
Is the issue resolved with linux 5.19.8.arch1-1 now in testing?
Comment by Benedict Schlüter (Kakashiy) - Sunday, 11 September 2022, 07:44 GMT
Nope, testd with 5.19.8.arch1-1 and still the same issue as in the logs above. Tried to bisect it and apparently it was introduced in the 5.19 merge window. Vanilla mainline still has the same issue. Problem is that my X1 throttled down to 0.8GHz if I have heavy load on the machine (good job Lenovo) so it takes some to time to recompile every time.
Comment by loqs (loqs) - Sunday, 11 September 2022, 08:51 GMT
If it helps I can build the bisection kernels for you. I would need your current `git bisect log`.
Comment by Benedict Schlüter (Kakashiy) - Sunday, 11 September 2022, 20:24 GMT
Thanks, I already managed to get some results. However, not very useful since this bug only appears occasionally, it is hard to tell whether the current commit is stable or not. Nevertheless, I hypothesize that some of the introduced i915 changes caused the bug. Attached is the journalctl log of the 5.19.8 crash.
Comment by loqs (loqs) - Sunday, 11 September 2022, 20:28 GMT
I would suggest trying 6.0-rc5 after it is released see if the issue has already been fixed upstream.
Comment by Benedict Schlüter (Kakashiy) - Sunday, 11 September 2022, 20:35 GMT
I tried the current master (b96fbd602d35739b5cdb49baa02048f2c41fdab1), but this was extremely unstable. Had Some errors in the filesystem driver or something related and my /var/log dir got corrupted. Had to run xfs_repair. But before I could boot into my WM, I had one or two crashes. So at least right now it is not fixed :(

I wonder if this is solved on the other Thinkpad models since my log looks different from theirs.
Comment by Iyan (iyanmv) - Saturday, 17 September 2022, 10:19 GMT
I have a Thinkpad X1 Yoga Gen 6 (exactly same hardware as the Thinkpad X1 Carbon Gen 9 except for the screen) and I haven't notice any of the issues described here. Are all of you using the latest BIOS/firmware from Lenovo?

Loading...