FS#77234 - T14 Gen2i laptop occasional blank screen after sleep, and "BUG: kernel NULL pointer dereference"

Attached to Project: Arch Linux
Opened by Attila Vangel (attila123) - Sunday, 22 January 2023, 19:32 GMT
Last edited by Toolybird (Toolybird) - Monday, 03 April 2023, 20:51 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
On my new work laptop, a ThinkPad T14 Gen2 Intel (Intel 11th gen) occasionally the screen does not come back from sleep (just blank/black).
The last time it happened today, with the 6.1.7-arch1-1 kernel. I just upgraded my system yesterday.
Also, I see "BUG: kernel NULL pointer dereference <...>" kernel messages with this new laptop (did not have any of that with my old ThinkPad T470 laptop).
I used also the lts kernel, it also happened with that (with at least 5.15.88-2-lts).

I don't use this laptop for a long time, so these messages start with Jan 10:

$ journalctl -o short-precise -k -b all | grep NULL
Jan 10 13:48:16.545471 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 10 13:48:16.560922 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69
Jan 12 09:32:00.189451 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 12 09:32:00.197823 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69
Jan 16 09:13:21.536894 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 16 09:13:21.545711 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69
Jan 16 18:37:22.576113 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 16 18:37:22.585825 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69
Jan 18 19:19:05.939491 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 18 21:04:45.603106 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 18 21:04:45.620137 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69
Jan 21 15:31:01.678447 t470 kernel: BUG: kernel NULL pointer dereference, address: 000000000000010e
Jan 21 15:31:01.681808 t470 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a69

I am not sure if that is related, but another user reported them also with the same sleep comeback problem at: https://bbs.archlinux.org/viewtopic.php?pid=2080801

Attaching the full output of `journalctl -o short-precise -k -b -1`.

Steps to reproduce:
Last time it happened to me with after opening the laptop lid to make come back from sleep. Screen blank. It was on a charger if that matters (not sure).
This task depends upon

Closed by  Toolybird (Toolybird)
Monday, 03 April 2023, 20:51 GMT
Reason for closing:  Upstream
Additional comments about closing:  If still happening, you will have to contact upstream. "I don't have patience" is not something we can fix.
Comment by Toolybird (Toolybird) - Sunday, 22 January 2023, 20:57 GMT
Not much Arch can do here. You could try reporting upstream. But it's probably worth trying a mainline kernel first to see if the problem is reproducible. Please see the pinned comment here [1] for a precompiled mainline kernel you can install. Otherwise, the general kernel troubleshooting steps are documented here [2]. Please let us know what you find out.

[1] https://aur.archlinux.org/packages/linux-mainline
[2] https://wiki.archlinux.org/title/Kernel#Troubleshooting
Comment by Attila Vangel (attila123) - Saturday, 28 January 2023, 00:22 GMT
Hi, thanks for the answer and sorry for getting back late.
I managed to install the kernel from AUR (although it took some time), but then even after reboot Virtualbox did not work with it out of box and I had no more "mental energy" left at the moment trying to fix it.

Instead, I turned to use Fedora 37 all week to see if it also had this issue (meaning generic kernel issue?), or was it Arch only (I installed it earlier on another partition as "plan B", in case of some issues with Arch). I continued to use my laptop in the same way.
I had zero, 0, zilch such "NULL pointer dereference" kernel issues (or lockups after standby) with the exact same laptop under Fedora 37 (with kernel 6.1.6-200.fc37.x86_64, 6.1.7 came out a bit later for Fedora, and I did not want to reboot my laptop in the whole week to test stability).
Details again:

$ hostnamectl status | tail -6
CPE OS Name: cpe:/o:fedoraproject:fedora:37
Kernel: Linux 6.1.6-200.fc37.x86_64
Architecture: x86-64
Hardware Vendor: Lenovo
Hardware Model: ThinkPad T14 Gen 2i
Firmware Version: N34ET53W (1.53 )

At that time I checked again for BIOS upgrades, did not find any. Anyway, my laptop is stable now (on Fedora 37) since:

$ uptime
18:20:09 up 4 days, 18:30, 1 user, load average: 0.42, 0.46, 0.54

So it seems to be an Arch Linux only issue with this laptop.
I also quickly checked the Arch kernel troubleshooting link, did not help me.
Comment by Attila Vangel (attila123) - Saturday, 28 January 2023, 01:04 GMT
I had an idea to take lsmod list from both distros, so took it. Also then sorted them and took the first "column" to be more diff-able. There are several differences between the module list of the two distros.
Diff follows.

$ diff lsmod_arch_sorted_modules_only.txt lsmod_fedora_sorted_modules_only.txt
5,10d4
< aesni_intel
< af_alg
< algif_hash
< algif_skcipher
< atkbd
< blake2b_generic
13d6
< bpf_preload
19d11
< btrfs
22c14
< ccm
---
> cdc_ether
25d16
< cmac
28,29d18
< crc16
< crc32c_generic
33,36d21
< cryptd
< crypto_simd
< crypto_user
< dm_mod
41,42d25
< ecdh_generic
< ext4
46d28
< gf128mul
47a30
> hid_multitouch
52d34
< i8042
60d41
< intel_gtt
62,63d42
< intel_lpss
< intel_lpss_pci
71,75c50
< ip6table_filter
< ip6_tables
< iptable_filter
< iptable_nat
< ip_tables
---
> ip_set
81d55
< jbd2
87,88d60
< libcrc32c
< libps2
91,92d62
< mac_hid
< mbcache
98a69
> mii
101,102d71
< mousedev
< mtd
108a78,89
> nf_reject_ipv4
> nf_reject_ipv6
> nf_tables
> nft_chain_nat
> nft_compat
> nft_ct
> nft_fib
> nft_fib_inet
> nft_fib_ipv4
> nft_fib_ipv6
> nft_reject
> nft_reject_inet
113a95
> pinctrl_tigerlake
122,123c104,106
< psmouse
< raid6_pq
---
> qrtr
> r8152
> r8153_ecm
127c110,112
< roles
---
> scsi_dh_alua
> scsi_dh_emc
> scsi_dh_rdac
130d114
< serio
132d115
< sg
143a127
> snd_hrtimer
148a133,135
> snd_seq
> snd_seq_device
> snd_seq_dummy
172,174d158
< spi_intel
< spi_intel_pci
< spi_nor
175a160
> sunrpc
178a164
> tls
181a168
> typec_displayport
182a170
> uas
184c172,173
< usbhid
---
> usbnet
> usb_storage
197d185
< vivaldi_fmap
201,204c189
< xhci_pci
< xhci_pci_renesas
< xor
< x_tables
---
> xfs
209c194,195
< xt_tcpudp
---
> xt_REDIRECT
> zram
Comment by Attila Vangel (attila123) - Saturday, 28 January 2023, 01:07 GMT
Also, just updated my Arch and it got linux-6.1.8.arch1-1-x86_64 (will test the laptop for now with this kernel until lockup, if any), and linux-firmware-20230117.7e4f0ed-1-any (if that matters for this issue).
If the issue persists, I will try the mainline kernel.
Comment by Attila Vangel (attila123) - Tuesday, 31 January 2023, 06:39 GMT
6.1.8-arch1-1 also had NULL pointer dereference with raydium_i2c_ts module (see below) - and the screen did not come back.

<...> RIP: 0010:raydium_i2c_irq+0x4c/0x1b0 [raydium_i2c_ts]

don't have to tinker with this now
Comment by loqs (loqs) - Tuesday, 31 January 2023, 18:21 GMT
Please try the linked kernel [1][2] which has [3] applied. Thanks to forum user fancieux for working with upstream.

[1] https://drive.google.com/file/d/1tTJ-aVhrzoaS-vD1dLIV0A5lhGsrn5-v/view?usp=share_link linux-6.1.8.arch1-1.2-x86_64.pkg.tar.zst
[2] https://drive.google.com/file/d/17Bjdqj0uI_DD4Midax9AiZph59WvO_Dh/view?usp=share_link linux-headers-6.1.8.arch1-1.2-x86_64.pkg.tar.zst
[3] https://lore.kernel.org/linux-usb/20230131141518.78215-1-heikki.krogerus%40linux.intel.com/
Comment by Eric (harpium) - Sunday, 05 February 2023, 02:53 GMT
I'm having a similar issue with a 12th Gen Intel i5-12600K PC. Sometimes it doesn't power down when suspending since the power LED is still on and then refuses to resume. I also see the same logged message around the time when it sleeps. It's currently on kernel 6.1.9-arch1-1, so the patch didn't help unless it hasn't been applied to the official version.
Comment by loqs (loqs) - Sunday, 05 February 2023, 11:03 GMT
@harpium the patch I referenced is not in 6.1.9-arch1-1.
Comment by Attila Vangel (attila123) - Sunday, 05 February 2023, 22:08 GMT
@loqs thanks for trying to help. The patch you reference is related to https://bugzilla.kernel.org/show_bug.cgi?id=216697 in which the line with 'RIP' contains 'ucsi_resume'.
I had the line with 'RIP' with raydium_i2c_ts which is supposed to be the touchscreen driver. Maybe I will try to disable that module under Arch, I do not really use the laptop screen as touchscreen.
Comment by Eric (harpium) - Tuesday, 07 February 2023, 15:52 GMT
The patch seems to fix it on my system. I saw "RIP: 0010:amdgpu_amdkfd_gpuvm_restore_process_bos+0x9c/0x660 [amdgpu]" in the logs before the patch
Comment by Attila Vangel (attila123) - Saturday, 04 March 2023, 01:37 GMT
Unfortunately this problem with raydium_i2c still exists with 6.2.1-arch1-1 on my work laptop (ThinkPad T14 Gen2 Intel).
Also, I don't have patience to create upstream issue or build some custom kernel or whatever, so I just dualboot back to Fedora (37 for now), at least that works with this laptop.
Comment by Attila Vangel (attila123) - Saturday, 04 March 2023, 01:47 GMT
journalctl -o short-precise -k -b -1

Loading...