FS#70287 - NVIDIA: Failed to load module "glxserver_nvidia" (module does not exist, 0)
Attached to Project:
Arch Linux
Opened by lm808 (lm808) - Sunday, 04 April 2021, 17:18 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Thursday, 29 April 2021, 21:55 GMT
Opened by lm808 (lm808) - Sunday, 04 April 2021, 17:18 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Thursday, 29 April 2021, 21:55 GMT
|
Details
Description:
This is already (temporarily) resolved, see below, but I think the root-cause of the problem still exists and not sure if it is an NVidia or Arch bug. When X Org Server is initialised, and directed by /etc/X11/xorg.conf to look for an nvidia driver, it cannot seem to find the module "glxserver_nvidia". This causes, in my case, lightdm-gtk3-greeter to freeze after attempting to log in, and the desktop not showing afterwards. Error traced to /var/log/Xorg.0.log (attached, see line 91): (EE) NVIDIA: Failed to load module "glxserver_nvidia" (module does not exist, 0) Temporary solution: Manually edit /etc/X11/xorg.conf such that it contains the following: --------------------- Section "Files" ModulePath "/usr/lib/nvidia/xorg" ModulePath "/usr/lib/xorg/modules" EndSection --------------------- The above paths are present in usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf, but not originally in /etc/X11/xorg.conf Additional info: * Likely nvidia-460.67-5 * Could be XOrg or the kernel when loading the graphics drivers. If /etc/X11/xorg.conf is deleted, then the nvidia driver cannot be found during boot-up by systemctl-spawned XOrg server. This causes the running instance of /usr/bin/X to terminate, subsequently killing any display manager, resulting in messages like "[FAILED] Failed to start Light Display Manager." In this case, the DM can be started manually by first logging into a TTY and use 'systemctl start lightdm.service'. After this everything works as normal, including glxserver, which seems a nit strange. Maybe in this case the paths in usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf were also picked up? Steps to reproduce: Use a /etc/X11/xorg.conf that does not contain the module paths given above. Not sure if it is an isolated event, but people seems to experience similar, but slightly different, issues: https://bbs.archlinux.org/viewtopic.php?id=258360 https://bbs.archlinux.org/viewtopic.php?id=258201 |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Thursday, 29 April 2021, 21:55 GMT
Reason for closing: Upstream
Additional comments about closing: Can't reproduce and it does not appear to be a packaging problem. If it's some kind of race condition, you're better of communicating the problem in the nvidia linux upstream forums.
I'm afraid there's nothing we can do here. Request to re-open if something changes.
Thursday, 29 April 2021, 21:55 GMT
Reason for closing: Upstream
Additional comments about closing: Can't reproduce and it does not appear to be a packaging problem. If it's some kind of race condition, you're better of communicating the problem in the nvidia linux upstream forums.
I'm afraid there's nothing we can do here. Request to re-open if something changes.
Is logind-check-graphical=true unset in /etc/lightdm/lightdm.conf and the system does not use early KMS [1]?
If so there is a race between the loading of the nvidia kernel modules and the starting of the X server.
The xorg.conf without the nvidia module path forces loading of the nvidia X driver which would otherwise fail to be loaded before the nvidia kernel module.
This is also the reason the module paths is not adjusted to include /usr/lib/nvidia/xorg" as detection relies on the nvidia kernel modules being loaded.
[1] https://wiki.archlinux.org/index.php/NVIDIA#DRM_kernel_mode_setting
I'm sure it's the same bug because I had the issue yesterday after updating and you opened the thread yesterday.
How was xorg.conf created?
- Using nvidia-settings. One can also use the minimal set-up with /etc/X11/xorg.conf.d/20-nvidia.conf, as suggested in https://wiki.archlinux.org/index.php/NVIDIA#Minimal_configuration. The same problem will occur and can be remedied the same way.
Is logind-check-graphical=true unset in /etc/lightdm/lightdm.conf and the system does not use early KMS [1]?
- The line in /etc/lightdm/lightdm.conf is commented out. I never modified this before. I am also not aware of any special KMS settings.
If so there is a race between the loading of the nvidia kernel modules and the starting of the X server.
The xorg.conf without the nvidia module path forces loading of the nvidia X driver which would otherwise fail to be loaded before the nvidia kernel module.
- Do you mean 'The xorg.conf WITH the nvidia module path forces loading...'?
This is also the reason the module paths is not adjusted to include /usr/lib/nvidia/xorg" as detection relies on the nvidia kernel modules being loaded.
Thanks. But clearly other users are experiencing the same issues, such as @Axel. I am happy to provide further diagnosis information as you request.
I attach 3 more Xorg.0.log files:
Xorg.0.log.1 - This is when xorg.conf is deleted.
Xorg.0.log.2 - This is also when xorg.conf is deleted, but after the above process (recorded in Xorg.0.log.1) has failed to launch LightDM. I then logged into TTY2 and manually started LightDM (systemctl start lightdm.service). Note that this was done without any change, the only difference being launching the DM manually. Everything launched and functioned successfully.
The real difference starts in Line 55 (log.1) and 59 (log.2), where in the first instance, the 'nvidia' driver was not matched. Again note that in both cases, there is no xorg.conf to guide the process.
Xorg.0.log.4.fixed - This is when I use a xorg.conf containing additional paths (as in my original post). Everything launched and functioned successfully.
It is interesting to note that I don't think this problem occurred after my last upgrade to nvidia-460.67-5. It is just that recent update likely widened the issue to affect the launch of XOrg. The reason being:
(1) Since nvidia-460.67-3 (or possibly earlier, I am not sure), programmes that potentially use 2D hardware acceleration were already playing up. In my case, this manifested as (a) MATLAB warning me that software OpenGL is being used, instead of hardware acceleration, and it throws errors when I force it to. I suspect this is a glxserver issue. (b) TeamViewer only refreshes part of the screen, when I open multiple sessions to the same machine i.e. when the load is high.
(2) Both problems are resolved when I did the fix as I posted in my original bug report.
Happy to provide earlier log files if needed, as I have BTRFS snapshots that date back to 2 pacman -Syu updates.
The Xorg.0.log.3 as I have originally provided in the bug report, is when I used an xorg.conf as freshly generated by nvidia-settings, without the additional paths in the "Files" Section.
This is a "case by case" issue that has to be fixed via the relevant configuration options... (though I'd say that a more proper mechanism between kernel/systemd should be found so that they can reliably communicate what to do, this currently technically can affect all drivers including FOSS ones from what I know, and should be rather handled in the relevant upstreams)