Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#70287 - NVIDIA: Failed to load module "glxserver_nvidia" (module does not exist, 0)

Attached to Project: Arch Linux
Opened by lm808 (lm808) - Sunday, 04 April 2021, 17:18 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Thursday, 29 April 2021, 21:55 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

This is already (temporarily) resolved, see below, but I think the root-cause of the problem still exists and not sure if it is an NVidia or Arch bug.

When X Org Server is initialised, and directed by /etc/X11/xorg.conf to look for an nvidia driver, it cannot seem to find the module "glxserver_nvidia". This causes, in my case, lightdm-gtk3-greeter to freeze after attempting to log in, and the desktop not showing afterwards.

Error traced to /var/log/Xorg.0.log (attached, see line 91):
(EE) NVIDIA: Failed to load module "glxserver_nvidia" (module does not exist, 0)

Temporary solution:
Manually edit /etc/X11/xorg.conf such that it contains the following:
---------------------
Section "Files"
ModulePath "/usr/lib/nvidia/xorg"
ModulePath "/usr/lib/xorg/modules"
EndSection
---------------------
The above paths are present in usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf, but not originally in /etc/X11/xorg.conf

Additional info:
* Likely nvidia-460.67-5
* Could be XOrg or the kernel when loading the graphics drivers.

If /etc/X11/xorg.conf is deleted, then the nvidia driver cannot be found during boot-up by systemctl-spawned XOrg server.
This causes the running instance of /usr/bin/X to terminate, subsequently killing any display manager, resulting in messages like "[FAILED] Failed to start Light Display Manager."
In this case, the DM can be started manually by first logging into a TTY and use 'systemctl start lightdm.service'.
After this everything works as normal, including glxserver, which seems a nit strange. Maybe in this case the paths in usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf were also picked up?


Steps to reproduce:

Use a /etc/X11/xorg.conf that does not contain the module paths given above.

Not sure if it is an isolated event, but people seems to experience similar, but slightly different, issues:

https://bbs.archlinux.org/viewtopic.php?id=258360
https://bbs.archlinux.org/viewtopic.php?id=258201
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Thursday, 29 April 2021, 21:55 GMT
Reason for closing:  Upstream
Additional comments about closing:  Can't reproduce and it does not appear to be a packaging problem. If it's some kind of race condition, you're better of communicating the problem in the nvidia linux upstream forums.

I'm afraid there's nothing we can do here. Request to re-open if something changes.
Comment by loqs (loqs) - Sunday, 04 April 2021, 18:10 GMT
How was xorg.conf created?

Is logind-check-graphical=true unset in /etc/lightdm/lightdm.conf and the system does not use early KMS [1]?
If so there is a race between the loading of the nvidia kernel modules and the starting of the X server.
The xorg.conf without the nvidia module path forces loading of the nvidia X driver which would otherwise fail to be loaded before the nvidia kernel module.
This is also the reason the module paths is not adjusted to include /usr/lib/nvidia/xorg" as detection relies on the nvidia kernel modules being loaded.

[1] https://wiki.archlinux.org/index.php/NVIDIA#DRM_kernel_mode_setting
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 05 April 2021, 07:05 GMT
Can't reproduce with the information provided. I use straight nvidia and lightdm as well. I have no special sauce in my xorg.conf and never touched that file manually. It's only ever been generated by nvidia-xconfig and nvidia-settings.
Comment by Axel (axeltherabbit) - Monday, 05 April 2021, 12:28 GMT
Thank you, you saved me, I was having an hard time, I can confirm it has nothing to do with lightdm because I had issue with gdm,lightdm and cdm.
I'm sure it's the same bug because I had the issue yesterday after updating and you opened the thread yesterday.
Comment by lm808 (lm808) - Monday, 05 April 2021, 13:34 GMT
@loqs @Svenstaro

How was xorg.conf created?
- Using nvidia-settings. One can also use the minimal set-up with /etc/X11/xorg.conf.d/20-nvidia.conf, as suggested in https://wiki.archlinux.org/index.php/NVIDIA#Minimal_configuration. The same problem will occur and can be remedied the same way.

Is logind-check-graphical=true unset in /etc/lightdm/lightdm.conf and the system does not use early KMS [1]?
- The line in /etc/lightdm/lightdm.conf is commented out. I never modified this before. I am also not aware of any special KMS settings.

If so there is a race between the loading of the nvidia kernel modules and the starting of the X server.
The xorg.conf without the nvidia module path forces loading of the nvidia X driver which would otherwise fail to be loaded before the nvidia kernel module.
- Do you mean 'The xorg.conf WITH the nvidia module path forces loading...'?

This is also the reason the module paths is not adjusted to include /usr/lib/nvidia/xorg" as detection relies on the nvidia kernel modules being loaded.
Comment by lm808 (lm808) - Monday, 05 April 2021, 13:53 GMT
@Svenstaro

Thanks. But clearly other users are experiencing the same issues, such as @Axel. I am happy to provide further diagnosis information as you request.

I attach 3 more Xorg.0.log files:

Xorg.0.log.1 - This is when xorg.conf is deleted.
Xorg.0.log.2 - This is also when xorg.conf is deleted, but after the above process (recorded in Xorg.0.log.1) has failed to launch LightDM. I then logged into TTY2 and manually started LightDM (systemctl start lightdm.service). Note that this was done without any change, the only difference being launching the DM manually. Everything launched and functioned successfully.

The real difference starts in Line 55 (log.1) and 59 (log.2), where in the first instance, the 'nvidia' driver was not matched. Again note that in both cases, there is no xorg.conf to guide the process.

Xorg.0.log.4.fixed - This is when I use a xorg.conf containing additional paths (as in my original post). Everything launched and functioned successfully.

It is interesting to note that I don't think this problem occurred after my last upgrade to nvidia-460.67-5. It is just that recent update likely widened the issue to affect the launch of XOrg. The reason being:
(1) Since nvidia-460.67-3 (or possibly earlier, I am not sure), programmes that potentially use 2D hardware acceleration were already playing up. In my case, this manifested as (a) MATLAB warning me that software OpenGL is being used, instead of hardware acceleration, and it throws errors when I force it to. I suspect this is a glxserver issue. (b) TeamViewer only refreshes part of the screen, when I open multiple sessions to the same machine i.e. when the load is high.
(2) Both problems are resolved when I did the fix as I posted in my original bug report.

Happy to provide earlier log files if needed, as I have BTRFS snapshots that date back to 2 pacman -Syu updates.

The Xorg.0.log.3 as I have originally provided in the bug report, is when I used an xorg.conf as freshly generated by nvidia-settings, without the additional paths in the "Files" Section.
Comment by David Roth (V1del) - Monday, 05 April 2021, 14:30 GMT
I don't see how this is a bug that Arch devs can do much about. We can't control the fact that nvidia-settings/nvidia-xconfig generate "wrong" configs nor that the kernel/systemd have a race on ssds between when the kernel tells the drm device is ready and it being actually ready.

This is a "case by case" issue that has to be fixed via the relevant configuration options... (though I'd say that a more proper mechanism between kernel/systemd should be found so that they can reliably communicate what to do, this currently technically can affect all drivers including FOSS ones from what I know, and should be rather handled in the relevant upstreams)

Loading...