FS#73031 - [nvidia] several xorg aborts w/ nvidia-495.46-1

Attached to Project: Arch Linux
Opened by Thomas Lübking (luebking) - Wednesday, 15 December 2021, 12:44 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Friday, 25 February 2022, 21:51 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 8
Private No
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Friday, 25 February 2022, 21:51 GMT
Reason for closing:  Fixed
Additional comments about closing:  2022-02-25: A task closure has been requested. Reason for request: Fixed in xorg-server 21.1.3-6
Comment by loqs (loqs) - Thursday, 16 December 2021, 04:25 GMT
Have you reported it to Nvidia? The only report I could find was [1].

[1] https://forums.developer.nvidia.com/t/nvidia-driver-495-46/197909
Comment by Thomas Lübking (luebking) - Thursday, 16 December 2021, 08:39 GMT
Nope. At the time there were only two reports and one emerging, we don't have usable stacktraces, the nvidia driver doesn't even show up in the trace and don't know anything about the cause.

We'll need
a) a better backtrace (though this seems a stack corruption and then there's little use of that either)
b) to be prepared to reset the distributed version since this starts to look like the default behaviour now…
c) some pattern (there's an optimus setup and a dual-nvidia one)
Comment by serge (s3rg3) - Thursday, 16 December 2021, 17:29 GMT
Same xorg crash on startx as this one :

https://bbs.archlinux.org/viewtopic.php?id=272169

i also use mate but i don't think it is related since they weren't new mate package, and i have a gtx 1060 on my desktop
Comment by Sven-Hendrik Haase (Svenstaro) - Friday, 17 December 2021, 18:32 GMT
There's nothing here to suggest a packaging issue so far. I'm inclined to close this. We can't fix or patch anything in the binary blob NVIDIA gives us. Nothing notable changed in the package during the versions.
Comment by Thomas Lübking (luebking) - Friday, 17 December 2021, 20:03 GMT
The rather drastic move would be to skip 495.46 and return to shipping 495.44

There's meanwhile https://forums.developer.nvidia.com/t/segfault-with-nvidia-495-46-geforce-rtx-2060-when-attempting-to-run-bitwig-studio/198339 which looks the part.
Comment by Sven-Hendrik Haase (Svenstaro) - Saturday, 18 December 2021, 19:25 GMT
The quick release of .46 by NVIDIA might be because of a security fix and so we can't just blindly ship an older version.
Comment by Barakah AlRashedi (unixv) - Monday, 20 December 2021, 18:51 GMT
I have same issue with nvidia when I update kernel:
currently working I have kernel 5.14.5.arch1-1 and nvidia 470.63.01-11
So if I update only kernel to new version 5.15.10.arch1-1 then nouveau takeover by kernel and my xorg crash. and note that I never install nouveau driver and still is not installed.

Other scenario I did that I keep kernel on 5.14.5.arch1-1 and update only nvidia to 495.46-1 the nouveau takeover by kernel and crach xorg.

This is really frustrated.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 20 December 2021, 19:12 GMT
nouveau is blacklisted by our package so that should really never happen. Are you sure nouveau gets loaded?
Comment by Barakah AlRashedi (unixv) - Monday, 20 December 2021, 19:28 GMT
I'm sure %100 nouveau is loaded when upgrade kernel or nvidia so issue related to kernel modules

in pacman.conf: I have IgnorePkg = linux nvidia nvidia-utils
Then I did full upgraded packages and everything working fine with latest xorg-server 21.1.2-1

Currently working fine:
local/linux 5.14.5.arch1-1
local/nvidia 470.63.01-11

01:00.0 VGA compatible controller: NVIDIA Corporation GK106M [GeForce GTX 765M] (rev a1)
Subsystem: Dell Device 05ac
Kernel driver in use: nvidia



Comment by Thomas Lübking (luebking) - Monday, 20 December 2021, 20:05 GMT
This is a Kepler chip and 470xx is the latest supported version, https://aur.archlinux.org/packages/nvidia-470xx-dkms/ & https://aur.archlinux.org/packages/nvidia-470xx-utils/

Completely unrelated to this bug.
Comment by Rob Pleau (ephos) - Monday, 20 December 2021, 21:01 GMT
I have this issue as well. When my nvidia and Linux packages updated I could no longer start xorg without getting a segmentation fault error.

"[ 137.962] (EE) Segmentation fault at address 0x0"

I initially had upgraded late last week to the following versions:

nvidia 495.46-2
nvidia-utils 495.46-1
lib32-nvidia-utils 495.46-1
linux 5.15.8.arch1-1

After the upgrade I could no longer start xorg without the segmentation fault. I was able to fix by rolling back the packages above to these versions (these were the versions before upgrade on my system).

nvidia 495.44-15
nvidia-utils 495.44-6
lib32-nvidia-utils 495.44-1
linux 5.15.7.arch1-1

Like others I have blocked the packages above from updating until more is known about what is causing the segmentation fault.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 20 December 2021, 21:12 GMT
All of that is really annoying for you guys I realize but I'm afraid we can do absolutely nothing for you here considering that Nvidia drivers are mostly big binary blobs with a little bit of kernel glue. All of the information you provide here is likely helpful to Nvidia but I suggest you provide that information to Nvidia directly in their forums as that will ensure much better visibility for your issues. Hopefully the next release will be less broken.
Comment by Thomas Lübking (luebking) - Monday, 20 December 2021, 21:14 GMT
Did you try to get a (GL) compositor in your setup?
(The backtraces we have won't be all that helpful, but if you can pin it there by going from "worksforme" to "metoo" this way, this might be valuable information for upstream)
Comment by serge (s3rg3) - Tuesday, 21 December 2021, 17:22 GMT
i have made a bug report on nvidia website.

https://forums.developer.nvidia.com/t/x-server-1-21-1-1-crash-on-startx-with-nvidia-driver-495-46/198417

Come join me, the more user will be reporting, the more it will be treated seriously, and if you have additional infos don't hesitate to share them there, because my report is not the best one :)
Comment by Barakah AlRashedi (unixv) - Tuesday, 21 December 2021, 17:51 GMT
Hi @serge,
I have reported to same thread there. Hope they support us.
Comment by Chris Lea (chrislea) - Tuesday, 21 December 2021, 19:07 GMT
I have also contributed to the bug report filed with Nvidia, thanks @serge and @unixv.

https://forums.developer.nvidia.com/t/x-server-1-21-1-1-crash-on-startx-with-nvidia-driver-495-46/198417/5?u=user99294
Comment by Thomas Lübking (luebking) - Tuesday, 21 December 2021, 20:02 GMT
@unixv, you do not have this problem at all.
Your problem is exclusively that your GPU is not supported by ANY of the 495xx drivers.
You *must* use the 470xx legacy series.
Comment by Barakah AlRashedi (unixv) - Wednesday, 22 December 2021, 03:42 GMT
@luebking, Thanks for your help.
That was not mentioned on arch wiki under nvidia before 2 days ago and today is updated "For the Kepler (NVE0) series (including GeForce 630-920) from around 2013-2014, install the nvidia-470xx-dkmsAUR package."
I have installed this package and working good now with fully upgraded system.

My issue is resolved and hope other guys too.
Comment by serge (s3rg3) - Wednesday, 29 December 2021, 17:38 GMT
Multiple Fedora users are reporting the same bug and this seems to be related to the Mate desktop.

https://forums.developer.nvidia.com/t/495-46-xorg-sigsegv-in-fedora-35-only-on-msi-mpg-trident3-and-only-for-mate-desktop/199076
Comment by Thomas Lübking (luebking) - Wednesday, 29 December 2021, 21:56 GMT Comment by Jonathon (jonathon) - Wednesday, 29 December 2021, 22:20 GMT
Probably not related but as a data point, xpresent was broken in amdgpu for a while (which impacted on MATE's Marco); it was fixed in xf86-video-amdgpu=21.0.0

https://bugs.archlinux.org/task/70759
https://gitlab.freedesktop.org/xorg/driver/xf86-video-amdgpu/-/issues/10
Comment by serge (s3rg3) - Thursday, 30 December 2021, 16:51 GMT
i deactivated Marco before update with:

gsettings set org.mate.Marco.general compositing-manager false

And now the X11 session is starting fine :)

This weekend, i will try to reactivate Marco with this patch.


Comment by serge (s3rg3) - Saturday, 01 January 2022, 12:44 GMT
i can confirm this is the same issue as https://gitlab.freedesktop.org/xorg/xserver/-/issues/1275

Patched xorg-server and Marco activated and no more crash
Comment by Nicolas Vila (nicolasv) - Tuesday, 04 January 2022, 09:55 GMT
Please have a look to the solution proposed on my (duplicate) post here.
https://bugs.archlinux.org/task/73239
Sorry for double-posting :-)
Comment by Nicolas Vila (nicolasv) - Tuesday, 04 January 2022, 13:01 GMT
Here are the patched files for building xorg-server-21.1.2 so that Mate desktop works again (with compositor)
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 07 February 2022, 07:00 GMT
Does this still occur with current drivers?
Comment by Thomas Lübking (luebking) - Monday, 07 February 2022, 07:06 GMT
For all we know this will require an update of xorg-server
Comment by Dale Blount (dale) - Friday, 25 February 2022, 14:49 GMT
I just updated today and it's not happening to me any longer.

Loading...