FS#54980 - [nvidia] Driver crashes on Kernel 4.12

Attached to Project: Arch Linux
Opened by Konstantin Gizdov (kgizdov) - Sunday, 30 July 2017, 19:35 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Wednesday, 13 September 2017, 16:29 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Felix Yan (felixonmars)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Since kernel 4.12 is in core, NVIDIA driver crashes - log attached. Cannot boot into GUI.


Additional info:
* package version(s) - 384.59-1, 384.59-2
* config and/or log files etc.
nvidia.log - from dmesg call trace

Steps to reproduce:

1. Upgrade to kernel 4.12
2. Reboot
3. NVIDIA driver crashes
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Wednesday, 13 September 2017, 16:29 GMT
Reason for closing:  Upstream
Comment by Felix Yan (felixonmars) - Monday, 31 July 2017, 16:08 GMT
There's no issue here. Did you have nvidia-drm.modeset enabled? If so, please try again with it disabled.
Comment by Andrew Barbarello (drewbarbs) - Monday, 31 July 2017, 23:16 GMT
I experienced the GUI being unable to start with nvidia 384.59-1 after the kernel 4.12 update as well, and had been using nvidia-drm.modeset=1 since at least May without issue. Setting that to 0 allowed X to start fine, but makes gnome wayland unavailable, so this is a regression.
Comment by Konstantin Gizdov (kgizdov) - Thursday, 03 August 2017, 12:39 GMT
OK, so I do have it enabled, but I have just left for holiday and can't risk remote boot fail by testing. I will update as soon as I am back.
Comment by Sven-Hendrik Haase (Svenstaro) - Tuesday, 08 August 2017, 13:19 GMT
I don't see anything that I should be doing from a packaging perspective here as I certainly wouldn't want to change something here for all users. For instance, I run the current driver fine on 5 systems without any manual changes.
Comment by patrick (potomac) - Wednesday, 09 August 2017, 00:06 GMT
the 384.59 nvidia driver seems to crash even on an older kernel ( 3.xx version ),

the crash is random, for example it occurs on GPU computing with cuda, and also when using VDPAU decoding on mpeg2 file according to these users in this nvidia forum :

https://devtalk.nvidia.com/default/topic/1020399/linux/vdpau-fails-decoding-mpeg2-on-gtx-660-and-384-59/
https://devtalk.nvidia.com/default/topic/1021581/linux/kernel-panics-on-centos7-geforce-gtx-1080-ti-with-nvidia-driver-384-59-/
https://devtalk.nvidia.com/default/topic/1021605/linux/nvidia-driver-384-59-triggers-a-kernel-crash-when-we-use-kaldi-software/

it's probably better to downgrade to the previous version of the nvidia driver, this 384.59 version seems not stable
Comment by Konstantin Gizdov (kgizdov) - Wednesday, 16 August 2017, 08:49 GMT
I'm back. I am able to boot if I disable nvidia-drm.modeset. As with Andrew, I used to have this enabled since a long time ago and completely agree that if that options causes the kernel to crash, then it's a regression. However, I am not sure how to connect the dots here exactly - what patrick says leads me to believe it's a binary driver issue, but I did not have any problems prior to kernel 4.12, which is evidence it's a kernel API change or something. Can I help by providing some debug info/logs?
Comment by Darek (blablo) - Wednesday, 16 August 2017, 08:54 GMT Comment by Darek (blablo) - Wednesday, 16 August 2017, 13:33 GMT Comment by Omar Pakker (Omar007) - Thursday, 24 August 2017, 12:28 GMT
Part of the problem (for me) seems to be that the required devices under /dev aren't created with modeset=1 [1]. Using a udev rule that adds these devices[2] solves that part of the problem for me.
This does not resolve the problem completely though; instead of it instantly crashing while trying to run Mutter or Weston and staying at the (now frozen) tty, it now blanks the screen first (but crashes after that anyway).

[1] https://github.com/negativo17/nvidia-driver/issues/27
[2] https://pkgs.rpmfusion.org/cgit/nonfree/xorg-x11-drv-nvidia.git/tree/60-nvidia.rules
Comment by Darek (blablo) - Wednesday, 30 August 2017, 10:05 GMT Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 10 September 2017, 13:01 GMT
I'm not sure there is merit here to track this issue downstream. The udev stuff seems to be the only thing I can do but it seems like ultimately an upstream problem and I don't think the driver is currently being packaged incorrectly. I'm going to leave this open a little longer to give you guys time to weigh in but generally speaking, I don't have a lot of options fixing this downstream for everyone.
Comment by Andrew Barbarello (drewbarbs) - Sunday, 10 September 2017, 20:11 GMT
I agree, thanks for the help!

Loading...