FS#77591 - [linux] xrandr crashes kernels newer than 6.0.9-arch1-1 while using ThinkPad Ultra Docking Station

Attached to Project: Arch Linux
Opened by Devon Bautista (synack.d) - Monday, 20 February 2023, 21:46 GMT
Last edited by Toolybird (Toolybird) - Sunday, 26 March 2023, 21:05 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
Ever since upgrading from linux-6.0.9-arch1-1 (i.e. to linux-6.1-arch1-1 and newer), Xorg causes a kernel oops when running xrandr to extend monitor output while connected to a ThinkPad Ultra Docking Station using a ThinkPad (a T495 in my case, I am not able to test with other hardware).

The crash occurs when running the dock.sh script (attached), which runs xrandr to extend the monitors and sets the i3 workspaces. I've narrowed down the crash to occurring when the first xrandr command is run, for example (from dock.sh):

# Outputs according to xrandr
LEFT_DISPLAY='DisplayPort-4' # Left external monitor
RIGHT_DISPLAY='DisplayPort-2' # Right external monitor
INT_DISPLAY='eDP' # Internal display

/usr/bin/xrandr \
--output "${INT_DISPLAY}" --off \
--output "${RIGHT_DISPLAY}" --mode 1920x1080 --pos 1920x0 --rotate normal \
--output "${LEFT_DISPLAY}" --primary --mode 1920x1080 --pos 0x0 --rotate normal

Running the above xrandr command causes the system to crash. The full journald and kernel log are included, but the kernel oops output is (running linux-6.1.12-arch1-1):

Feb 16 19:31:57 monarch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Feb 16 19:31:57 monarch kernel: #PF: supervisor read access in kernel mode
Feb 16 19:31:57 monarch kernel: #PF: error_code(0x0000) - not-present page
Feb 16 19:31:57 monarch kernel: PGD 0 P4D 0
Feb 16 19:31:57 monarch kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Feb 16 19:31:57 monarch kernel: CPU: 6 PID: 1908 Comm: Xorg Tainted: G OE 6.1.12-arch1-1 #1 0ae38246365c3d8e63089e881d5fe91f13843017
Feb 16 19:31:57 monarch kernel: Hardware name: LENOVO 20NJCT01WW/20NJCT01WW, BIOS R12ET61W(1.31 ) 07/28/2022
Feb 16 19:31:57 monarch kernel: RIP: 0010:drm_dp_atomic_find_time_slots+0x61/0x2a0 [drm_display_helper]
Feb 16 19:31:57 monarch kernel: Code: 00 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d ce 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
Feb 16 19:31:57 monarch kernel: RSP: 0018:ffffa399821ab6d0 EFLAGS: 00010293
Feb 16 19:31:57 monarch kernel: RAX: 0000000000000000 RBX: ffff94408539ae00 RCX: 0000000000000214
Feb 16 19:31:57 monarch kernel: RDX: ffff94408674e800 RSI: ffff9440b52aa540 RDI: ffff94408539ae00
Feb 16 19:31:57 monarch kernel: RBP: ffff9440b7898800 R08: 0000000000000001 R09: ffff94408b2d4050
Feb 16 19:31:57 monarch kernel: R10: ffffa399821ab7a8 R11: 0000000091102cc0 R12: ffff94408539ae00
Feb 16 19:31:57 monarch kernel: R13: ffff944091102cc0 R14: ffff9440b52aa540 R15: 0000000000000214
Feb 16 19:31:57 monarch kernel: FS: 00007f02d82d2400(0000) GS:ffff944330b80000(0000) knlGS:0000000000000000
Feb 16 19:31:57 monarch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 16 19:31:57 monarch kernel: CR2: 0000000000000008 CR3: 000000012084e000 CR4: 00000000003506e0
Feb 16 19:31:57 monarch kernel: Call Trace:
Feb 16 19:31:57 monarch kernel: <TASK>
Feb 16 19:31:57 monarch kernel: compute_mst_dsc_configs_for_link+0x2d4/0x9b0 [amdgpu 926b890c80da99bd774024919ea12ba7bcdeb436]
Feb 16 19:31:57 monarch kernel: compute_mst_dsc_configs_for_state+0x1e1/0x250 [amdgpu 926b890c80da99bd774024919ea12ba7bcdeb436]
Feb 16 19:31:57 monarch kernel: amdgpu_dm_atomic_check+0x1067/0x12e0 [amdgpu 926b890c80da99bd774024919ea12ba7bcdeb436]
Feb 16 19:31:57 monarch kernel: drm_atomic_check_only+0x537/0xba0
Feb 16 19:31:57 monarch kernel: drm_atomic_commit+0x5c/0x100
Feb 16 19:31:57 monarch kernel: ? drm_plane_get_damage_clips.cold+0x1c/0x1c
Feb 16 19:31:57 monarch kernel: drm_atomic_connector_commit_dpms+0xcf/0xf0
Feb 16 19:31:57 monarch kernel: drm_mode_obj_set_property_ioctl+0x197/0x3c0
Feb 16 19:31:57 monarch kernel: ? drm_connector_set_obj_prop+0x90/0x90
Feb 16 19:31:57 monarch kernel: drm_connector_property_set_ioctl+0x3d/0x60
Feb 16 19:31:57 monarch kernel: drm_ioctl_kernel+0xcd/0x170
Feb 16 19:31:57 monarch kernel: drm_ioctl+0x1eb/0x450
Feb 16 19:31:57 monarch kernel: ? drm_connector_set_obj_prop+0x90/0x90
Feb 16 19:31:57 monarch kernel: amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 926b890c80da99bd774024919ea12ba7bcdeb436]
Feb 16 19:31:57 monarch kernel: __x64_sys_ioctl+0x94/0xd0
Feb 16 19:31:57 monarch kernel: do_syscall_64+0x5f/0x90
Feb 16 19:31:57 monarch kernel: ? __pm_runtime_suspend+0x6e/0x100
Feb 16 19:31:57 monarch kernel: ? amdgpu_drm_ioctl+0x71/0x90 [amdgpu 926b890c80da99bd774024919ea12ba7bcdeb436]
Feb 16 19:31:57 monarch kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Feb 16 19:31:57 monarch kernel: ? do_syscall_64+0x6b/0x90
Feb 16 19:31:57 monarch kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Feb 16 19:31:57 monarch kernel: ? do_syscall_64+0x6b/0x90
Feb 16 19:31:57 monarch kernel: ? do_syscall_64+0x6b/0x90
Feb 16 19:31:57 monarch kernel: ? exc_page_fault+0x74/0x170
Feb 16 19:31:57 monarch kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Feb 16 19:31:57 monarch kernel: RIP: 0033:0x7f02d8c6e53f
Feb 16 19:31:57 monarch kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Feb 16 19:31:57 monarch kernel: RSP: 002b:00007ffff49f9190 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 16 19:31:57 monarch kernel: RAX: ffffffffffffffda RBX: 00005606f7701940 RCX: 00007f02d8c6e53f
Feb 16 19:31:57 monarch kernel: RDX: 00007ffff49f9220 RSI: 00000000c01064ab RDI: 000000000000000d
Feb 16 19:31:57 monarch kernel: RBP: 00007ffff49f9220 R08: 0000000000000001 R09: 00005606f7701940
Feb 16 19:31:57 monarch kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00000000c01064ab
Feb 16 19:31:57 monarch kernel: R13: 000000000000000d R14: 00005606f8ef62f0 R15: 0000000000000000
Feb 16 19:31:57 monarch kernel: </TASK>
Feb 16 19:31:57 monarch kernel: Modules linked in: ccm nf_tables libcrc32c nfnetlink btusb btrtl snd_sof_amd_rembrandt btbcm snd_sof_amd_renoir btintel snd_sof_amd_acp snd_usb_audio btmtk uvcvideo snd_sof_pci videobuf2_vmalloc bluetooth snd_usbmidi_lib snd_sof videobuf2_memops snd_rawmidi videobuf2_v4l2 joydev snd_seq_device ecdh_generic snd_ctl_led mousedev iwlmvm snd_sof_utils snd_hda_codec_realtek videobuf2_common snd_hda_codec_generic amdgpu mac80211 snd_hda_codec_hdmi snd_soc_core intel_rapl_msr libarc4 snd_hda_intel intel_rapl_common snd_compress snd_intel_dspcfg ac97_bus edac_mce_amd snd_pcm_dmaengine snd_intel_sdw_acpi snd_pci_ps gpu_sched snd_rpl_pci_acp6x snd_hda_codec drm_buddy snd_acp_pci snd_pci_acp6x drm_ttm_helper kvm_amd snd_pci_acp5x thinkpad_acpi snd_hda_core vfat iwlwifi ttm think_lmi fat r8169 snd_hwdep snd_rn_pci_acp3x snd_pcm firmware_attributes_class ledtrig_audio wmi_bmof drm_display_helper kvm realtek snd_timer platform_profile sp5100_tco cfg80211 snd_acp_config ucsi_acpi
Feb 16 19:31:57 monarch kernel: snd_soc_acpi psmouse k10temp typec_ucsi i2c_piix4 snd_pci_acp3x mdio_devres irqbypass ipmi_devintf snd rfkill libphy rapl soundcore ipmi_msghandler video cec typec roles i2c_scmi wmi acpi_cpufreq mac_hid v4l2loopback(OE) videodev mc dm_multipath i2c_dev crypto_user acpi_call(OE) fuse bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_multitouch usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic serio_raw gf128mul atkbd ghash_clmulni_intel rtsx_pci_sdmmc libps2 sha512_ssse3 mmc_core vivaldi_fmap aesni_intel nvme crypto_simd cryptd nvme_core xhci_pci i8042 ccp xhci_pci_renesas rtsx_pci nvme_common serio
Feb 16 19:31:57 monarch kernel: CR2: 0000000000000008
Feb 16 19:31:57 monarch kernel: ---[ end trace 0000000000000000 ]---
Feb 16 19:31:57 monarch kernel: RIP: 0010:drm_dp_atomic_find_time_slots+0x61/0x2a0 [drm_display_helper]
Feb 16 19:31:57 monarch kernel: Code: 00 00 00 48 8b 85 60 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d ce 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
Feb 16 19:31:57 monarch kernel: RSP: 0018:ffffa399821ab6d0 EFLAGS: 00010293
Feb 16 19:31:57 monarch kernel: RAX: 0000000000000000 RBX: ffff94408539ae00 RCX: 0000000000000214
Feb 16 19:31:57 monarch kernel: RDX: ffff94408674e800 RSI: ffff9440b52aa540 RDI: ffff94408539ae00
Feb 16 19:31:57 monarch kernel: RBP: ffff9440b7898800 R08: 0000000000000001 R09: ffff94408b2d4050
Feb 16 19:31:57 monarch kernel: R10: ffffa399821ab7a8 R11: 0000000091102cc0 R12: ffff94408539ae00
Feb 16 19:31:57 monarch kernel: R13: ffff944091102cc0 R14: ffff9440b52aa540 R15: 0000000000000214
Feb 16 19:31:57 monarch kernel: FS: 00007f02d82d2400(0000) GS:ffff944330b80000(0000) knlGS:0000000000000000
Feb 16 19:31:57 monarch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 16 19:31:57 monarch kernel: CR2: 0000000000000008 CR3: 000000012084e000 CR4: 00000000003506e0

I could not find any reports upstream, so I am reporting here first to determine if it needs to be reported upstream.

What Was Tried:
* Downgrading linux to 6.0.9-arch1-1
- This works, and is the current mitigation strategy.
* Downgrading Xorg
- The version of Xorg didn't seem to matter when using a problematic kernel version.
* Upgrading the T495 firmware
- This left the issue unchanged.

Additional Info:
* Last working kernel: 6.0.9-arch1-1
* First non-working kernel: 6.1-arch1-1
* Xorg version: 1.21.1.7
* Laptop: ThinkPad T495
* Dock: ThinkPad Ultra Docking Stations (Type 40AJ)
- Link: https://pcsupport.lenovo.com/us/en/products/accessory/docks/thinkpad-ultra-docking-station/40aj
* Log files:
- xcrash-6.1.12-arch1-1.log: full journald log
- xcrash-kernel-6.1.12-arch1-1.log: just the kernel logs from journald
- dock.sh: docking script that is run containing xrandr command that causes the issue

Steps to Reproduce:
1. Run Arch with linux-6.1-arch1-1 or later on a ThinkPad able to connect to a ThinkPad Ultra Dock (e.g. a ThinkPad T495).
2. Start an X server.
3. Connect ThinkPad to Ultra Dock. Ensure monitor(s) is/are connected to one/some of the DisplayPort jacks in the dock.
4. Use xrandr to extend the laptop display (see /usr/bin/xrandr command above for an example).
5. System will freeze, necessitating a force power-off.

I'm unsure whether this issue is reproducible outside of using the specified ThinkPad hardware.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 26 March 2023, 21:05 GMT
Reason for closing:  Upstream
Additional comments about closing:  There appears to be progress in the upstream ticket
Comment by Toolybird (Toolybird) - Tuesday, 21 February 2023, 04:35 GMT
There's been a few reports lately around docking stations e.g:  FS#76620   FS#76934 . But either way, this is a kernel regression which means this [1] applies. But first, please try the latest kernel in [testing] (6.2.arch1-1). This comment [2] in upstream bugzilla looks similar to yours. amdgpu bugs can be reported at [3]. Please let us know what you find out.

[1] https://wiki.archlinux.org/title/Kernel#Debugging_regressions
[2] https://bugzilla.kernel.org/show_bug.cgi?id=204181#c69
[3] https://gitlab.freedesktop.org/drm/amd
Comment by Kevin (doesnotcompete) - Thursday, 23 February 2023, 13:45 GMT
Hello,

I'm still encountering this issue with the current 6.2 kernel as well. I was also the person posting that comment on the upstream bugzilla :D
I usually have a Thinkpad T495 (Ryzen 3500U with Vega 8 graphics) connected to a Gen2 ThinkPad USB-C Dock with two monitors. My system reliably freezes when switching into my window manager (Sway on Wayland) after logging in.
Bisecting the issue showed `4d07b0bc403403438d9cf88450506240c5faf92f` [1] to be the first bad commit. Seemingly there were previous issues with that commit which should be fixed by now. [2] For me however, these fixes only cause the system to freeze later when starting the window manager, not already during early KMS in the initramfs.
My workaround so far has been sticking to the 5.15 kernel. With `linux-lts` switching to the 6.1 branch, the issue has become more urgent for me as well though.
I've reported what I found out on the freedesktop.org Gitlab. [3]

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2&id=4d07b0bc403403438d9cf88450506240c5faf92f
[2] https://gitlab.freedesktop.org/drm/amd/-/issues/2171
[3] https://gitlab.freedesktop.org/drm/amd/-/issues/2314#note_1788936
Comment by Toolybird (Toolybird) - Monday, 27 February 2023, 05:54 GMT
It definitely seems like an upstream issue so I'd say you're on the right path in those upstream tickets..

Loading...