FS#73880 : [xorg-server] Crash when starting SteamVR (nvidia)

FS#73880 - [xorg-server] Crash when starting SteamVR (nvidia)

Attached to Project: Arch Linux
Opened by James Hogan (jhogan) - Saturday, 19 February 2022, 09:50 GMT
Last edited by Andreas Radke (AndyRTR) - Monday, 21 February 2022, 09:16 GMT

Task Type	Bug Report
Category	Packages: Extra
Status	Closed
Assigned To	Andreas Radke (AndyRTR) Sven-Hendrik Haase (Svenstaro) Laurent Carlier (lordheavy)
Architecture	x86_64
Severity	High
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

Description:

Since an update yesterday, xorg crashes when starting SteamVR, with nvidia drivers (and an HTC vive)

package version(s)
* xorg-server 21.1.3-3
* nvidia-dkms 510.54-1
* linux 5.16.10.arch1-1
* linux-hardened 5.15.21.hardened1-3

config and/or log files etc.
* see attached Xorg.0.log.old

recent upgrades (all multiple days ago, so surprised if it'd make a difference since I think i've restarted multiple times in that time):
* [2022-02-09T14:31:24+0000] [ALPM] upgraded xorg-server (21.1.3-1 -> 21.1.3-2) (ages ago)
* [2022-02-15T16:58:22+0000] [ALPM] upgraded linux-hardened (5.15.21.hardened1-1 -> 5.15.21.hardened1-3)
* [2022-02-15T16:58:27+0000] [ALPM] upgraded nvidia-dkms (510.47.03-3 -> 510.54-1)

These were after it started happening, didn't help:
* [2022-02-19T07:45:35+0000] [ALPM] upgraded xorg-server (21.1.3-2 -> 21.1.3-3)

Steps to reproduce:
* update system
* start SteamVR with HTC vive plugged in
* xorg crashes back to login screen

Debugging attempts
* I attempted to attach gdb to Xorg from a different TTY, but it seemed to just hang the system (couldn't ssh in at that point either)
* I tried downgrading linux-hardened to 5.15.21.hardened1-1 (it was upgraded a few days back so its a long shot), but it wouldn't boot, nvidia-dkms wouldn't rebuild due to different gcc plugin versions.
* I tried switching to linux-hardened to linux, and it still happened.

Happy to try debugging with guidance

Xorg.0.log.old (36.9 KiB)

This task depends upon

Closed by Andreas Radke (AndyRTR)
Monday, 21 February 2022, 09:16 GMT
Reason for closing: Fixed

Comment by James Hogan (jhogan) - Saturday, 19 February 2022, 09:51 GMT

I'll also that it started happening yesterday, after which it did work for a short time, but today I can't get it to work at all.

Comment by James Hogan (jhogan) - Saturday, 19 February 2022, 10:22 GMT

I found coredumpctl, see attached log, gdb backtrace doesn't look particularly helpful, unsure whether the null pointer is dereferenced from Xorg or nvidia code.

coredump-gdb.txt (11.4 KiB)

Comment by James Hogan (jhogan) - Saturday, 19 February 2022, 10:32 GMT

If I cat /proc/$(pidof Xorg)/maps before the crash:

55ae9d812000-55ae9d841000 r--p 00000000 fe:01 22372125 /usr/lib/Xorg
55ae9d841000-55ae9d9f8000 r-xp 0002f000 fe:01 22372125 /usr/lib/Xorg
55ae9d9f8000-55ae9da6b000 r--p 001e6000 fe:01 22372125 /usr/lib/Xorg
55ae9da6b000-55ae9da6f000 r--p 00258000 fe:01 22372125 /usr/lib/Xorg
55ae9da6f000-55ae9da75000 rw-p 0025c000 fe:01 22372125 /usr/lib/Xorg
55ae9da75000-55ae9dab3000 rw-p 00000000 00:00 0
55ae9f8c8000-55aea182f000 rw-p 00000000 00:00 0 [heap]

Thread 1 (Thread 0x7fd70c43e940 (LWP 48005)):
#0 0x00007fd70cd3134c in __pthread_kill_implementation () at /usr/lib/libc.so.6
#1 0x00007fd70cce44b8 in raise () at /usr/lib/libc.so.6
#2 0x00007fd70ccce534 in abort () at /usr/lib/libc.so.6
#3 0x000055ae9d9659f0 in System ()
#4 0x000055ae9d967535 in FatalError ()
#5 0x000055ae9d96cf0a in ()
#6 0x00007fd70cce4560 in <signal handler called> () at /usr/lib/libc.so.6
#7 0x0000000000000000 in ()
#8 0x000055ae9d8f09fe in ()
#9 0x000055ae9d8f3b52 in ()
#10 0x000055ae9d8f2cd6 in ()
#11 0x000055ae9d84f34a in ()
#12 0x00007fd70cccf310 in __libc_start_call_main () at /usr/lib/libc.so.6
#13 0x00007fd70cccf3c1 in __libc_start_main_impl () at /usr/lib/libc.so.6
#14 0x000055ae9d84f795 in _start ()

so it appears frame 8 (0x000055ae9d8f09fe) which calls the null pointer is in the 2nd map, i.e. Xorg code:

(gdb) frame 8
#8 0x000055ae9d8f09fe in ?? ()
(gdb) x/16i $rip-0x10
0x55ae9d8f09ee: test %edx,%eax
0x55ae9d8f09f0: add (%rax),%eax
0x55ae9d8f09f2: add %cl,-0x7b(%rax)
0x55ae9d8f09f5: shlb $0x89,0x48(%rax,%rcx,2)
0x55ae9d8f09fa: (bad)
0x55ae9d8f09fb: call *0x78(%rax)
=> 0x55ae9d8f09fe: test %rax,%rax
0x55ae9d8f0a01: je 0x55ae9d8f0a40
0x55ae9d8f0a03: mov 0x1bbd9b(%rip),%esi # 0x55ae9daac7a4
0x55ae9d8f0a09: mov 0x8(%rax),%rdx
0x55ae9d8f0a0d: mov 0x1bbd95(%rip),%ecx # 0x55ae9daac7a8
0x55ae9d8f0a13: test %esi,%esi
0x55ae9d8f0a15: je 0x55ae9d8f0a70
0x55ae9d8f0a17: test %ecx,%ecx
0x55ae9d8f0a19: je 0x55ae9d8f0acd
0x55ae9d8f0a1f: movslq 0x1bbd7a(%rip),%rcx # 0x55ae9daac7a0

Don't know if that helps identify where its coming from.

Comment by Andreas Radke (AndyRTR) - Saturday, 19 February 2022, 11:32 GMT

Please check if this is a duplicate of ~~FS#73875~~ - if it keeps crashing with xorg-server 21.1.3-5 please report it upstream.

Comment by James Hogan (jhogan) - Saturday, 19 February 2022, 12:05 GMT

I build based on the PKGBUILD (i think its the one in testing), and got a proper backtrace. I've reported upstream here:
https://gitlab.freedesktop.org/xorg/xserver/-/issues/1315

I think its linked to being after a full power cycle (Storm Eunice meant power cuts yesterday and this morning).

Comment by James Hogan (jhogan) - Saturday, 19 February 2022, 12:33 GMT

I can confirm it is still reproducible on 21.1.3-5

Comment by James Hogan (jhogan) - Monday, 21 February 2022, 08:32 GMT

This is fixed in xorg-server 21.1.3-6 (well, steamvr can't use direct mode, but at least it doesn't crash the x server and i can power cycle the htc vive link box after boot to get it working again).

Thanks
James

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#73880 - [xorg-server] Crash when starting SteamVR (nvidia)

Details

Loading...