FS#39092 - [linux] [nvida] 3.13 + 331.49-1 Driver Breaks Xorg: Failed to initialize the NVIDIA graphics device
Attached to Project:
Arch Linux
Opened by Diego Flórez (Diego.Florez) - Saturday, 01 March 2014, 23:34 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 29 June 2014, 15:11 GMT
Opened by Diego Flórez (Diego.Florez) - Saturday, 01 March 2014, 23:34 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Sunday, 29 June 2014, 15:11 GMT
|
Details
Description:
After upgrade to linux 3.13 and nvidia 331.49-1 driver. Xorgs fail on start. Additional info: * package version(s) nvidia 331.49-1 nvidia-libgl 331.49-1 nvidia-utils 331.49-1 * config and/or log files etc. Xorg.0.log [ 152.331] Current Operating System: Linux whiteRabbit 3.13.5-1-ARCH #1 SMP PREEMPT Sun Feb 23 00:25:24 CET 2014 x86_64 (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 152.333] Initializing built-in extension MIT-SCREEN-SAVER [ 152.341] (==) Matched nvidia as autoconfigured driver 1 [ 152.341] (==) Matched nvidia as autoconfigured driver 4 [ 152.341] (==) Matched nvidia as autoconfigured driver 7 [ 152.341] (EE) Failed to load module "nouveau" (module does not exist, 0) [ 152.341] (II) LoadModule: "nvidia" [ 152.341] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so [ 152.342] (II) Module nvidia: vendor="NVIDIA Corporation" [ 152.342] (EE) Failed to load module "nv" (module does not exist, 0) [ 152.342] (EE) Failed to load module "fbdev" (module does not exist, 0) [ 152.342] (EE) Failed to load module "vesa" (module does not exist, 0) [ 158.708] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0. Please [ 158.708] (EE) NVIDIA(GPU-0): check your system's kernel log for additional error [ 158.708] (EE) NVIDIA(GPU-0): messages and refer to Chapter 8: Common Problems in the [ 158.708] (EE) NVIDIA(GPU-0): README for additional information. [ 158.708] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device! [ 158.708] (EE) NVIDIA(0): Failing initialization of X screen 0 [ 158.708] (II) UnloadModule: "nvidia" [ 158.708] (EE) Screen(s) found, but none have a usable configuration. [ 158.708] (EE) [ 158.708] (EE) no screens found(EE) [ 158.708] (EE) [ 158.708] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 158.708] (EE) [ 158.709] (EE) Server terminated with error (1). Closing log file. dmesg | grep nvidia -i [ 11.221180] nvidia: module license 'NVIDIA' taints kernel. [ 11.501607] nvidia 0000:07:00.0: enabling device (0006 -> 0007) [ 11.527650] [drm] Initialized nvidia-drm 0.0.0 20130102 for 0000:01:00.0 on minor 0 [ 11.527722] [drm] Initialized nvidia-drm 0.0.0 20130102 for 0000:07:00.0 on minor 1 [ 11.527726] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 331.49 Wed Feb 12 20:42:50 PST 2014 lspci 01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 750M] (rev a1) 07:00.0 3D controller: NVIDIA Corporation GK107M [GeForce GT 750M] (rev a1) lsmod nvidia 10636206 0 drm 239102 1 nvidia i2c_core 24760 4 drm,i2c_i801,nvidia,videodev Steps to reproduce: Upgrade archlinux to linux 3.13 with nvidia 331.49-1 delete /etc/X11/xorg.conf.d/20-nvidia.conf boot from new kernel and try startx |
This task depends upon
Closed by Sven-Hendrik Haase (Svenstaro)
Sunday, 29 June 2014, 15:11 GMT
Reason for closing: Upstream
Additional comments about closing: Seems to be fixed in upstream kernel and will be available soon in mainline.
Sunday, 29 June 2014, 15:11 GMT
Reason for closing: Upstream
Additional comments about closing: Seems to be fixed in upstream kernel and will be available soon in mainline.
lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX 650] (rev a1)
lsmod | grep nvidia
nvidia 10635438 49
drm 235934 2 nvidia
i2c_core 24184 3 drm,i2c_piix4,nvidia
Simple bug, though. nvidia-libgl contains symlinks to /usr/lib/libGL.so.XXX.XX, but these shared libraries seem to have been moved to /usr/lib/nvidia/
I got around this issue by making the symlinks myself;
file /usr/lib/libGL.so*
/usr/lib/libGL.so: symbolic link to `nvidia/libGL.so.331.49'
/usr/lib/libGL.so.1: symbolic link to `libGL.so'
Either way, whatever. Stuff works, good enough for me...
There are several thread in the forum relating to this bug:
https://bbs.archlinux.org/viewtopic.php?id=177721
https://bbs.archlinux.org/viewtopic.php?id=177525
This appeared in 3.13 or late 3.12 in conjunction with nvidia 331, I can't track it down precisely. I have the Dell M3800 here.
Thanks for your suggestion, but all libraries link are all correct:
lrwxrwxrwx 1 root root 15 feb 27 08:44 /usr/lib/libGL.so.1 -> libGL.so.331.49*
lrwxrwxrwx 1 root root 31 feb 27 08:44 /usr/lib/libGL.so.331.49 -> /usr/lib/nvidia/libGL.so.331.49*
lrwxrwxrwx 1 root root 15 feb 27 08:44 /usr/lib/libGL.so -> libGL.so.331.49*
Tolga Cakir (tolga9009):
Really thank you!. Adding "rcutree.rcu_idle_gp_delay=1" to kernel parameters in /boot/syslinux/syslinux.cfg solve the problem:
LABEL arch
MENU LABEL Arch Linux
LINUX ../vmlinuz-linux
APPEND root=/dev/sda2 rw rcutree.rcu_idle_gp_delay=1
INITRD ../initramfs-linux.img
Thomas Bächler (brain0):
Thanks for your interest and labor with this bug. Why this happen?. What for is "rcutree.rcu_idle_gp_delay=1"?
Solves for me the problem, too.
Which gpu are you using?
Please post the output of: lspci | grep -i -e vga -e 3d
Thanks for report.
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GTX 660M] (rev ff)
not a clue...
$ pacman -Qs nvidia
local/bumblebee 3.2.1-3
NVIDIA Optimus support for Linux through VirtualGL
local/lib32-nvidia-utils 334.21-1
NVIDIA drivers utilities (32-bit)
local/libcl 1.1-3
OpenCL library and ICD loader from NVIDIA
local/libvdpau 0.7-1
Nvidia VDPAU library
local/nvidia 334.21-2
NVIDIA drivers for linux
local/nvidia-utils 334.21-2
NVIDIA drivers utilities
$ optirun glxgears
[ 211.537688] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) Failed to load /usr/lib/xorg/modules/libglamoregl.so: libnvidia-glsi.so.334.21: cannot open shared object file: No such file or directory
[ 211.537735] [ERROR]Aborting because fallback start is disabled.
[ 64.200253] [ERROR]Cannot access secondary GPU - error: [XORG] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:1:0:0. Please
[ 64.200293] [ERROR]Aborting because fallback start is disabled.
I have the same problem. I'm using the optimus card without bumblebee but xrandr. It worked fine with my xorg.conf until the linux 3.13 update
$ dmesg | tail -n 14
[ 610.876848] type=1006 audit(1394544227.294:4): pid=1574 uid=0 old auid=4294967295 new auid=1000 old ses=4294967295 new ses=3 res=1
[ 612.925601] nvidia 0000:01:00.0: irq 49 for MSI/MSI-X
[ 612.934008] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934065] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934095] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934122] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934148] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934174] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934217] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 612.934243] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 618.240557] ACPI Warning: \_SB_.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20131115/nsarguments-95)
[ 622.516468] NVRM: RmInitAdapter failed! (0x25:0x28:1155)
[ 622.516477] NVRM: rm_init_adapter failed for device bearing minor number 0
[ 622.516500] NVRM: nvidia_frontend_open: minor 0, module->open() failed, error -5
nvidia 334.21-2
nvidia-libgl 334.21-5
nvidia-utils 334.21-5
Still the same problem, still the same solution.
Such as was suggested by Tolga Cakir (tolga9009) at March 02. Such as I reported the same day:
Adding "rcutree.rcu_idle_gp_delay=1" to kernel parameters in /boot/syslinux/syslinux.cfg solve the problem.
It's quite large and I'm not sure I'm comfortable putting that into the stable repo. Does it cause problems for otherwise unaffected users?
Adding "rcutree.rcu_idle_gp_delay=1" to the default kernel parameters for GRUB doesn't fix the issue for me.
dmesg | grep -i nvidia
cat /var/log/Xorg.0.log
lspci
Relevant dmesg output:
[ 4.197585] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input14
[ 4.197854] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input13
[ 4.198100] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input12
[ 4.198309] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:02.0/0000:01:00.1/sound/card1/input11
[ 4.660678] usb 2-2: r8712u: CustomerID = 0x0000
[ 4.660690] usb 2-2: r8712u: MAC Address from efuse = ec:1a:59:ff:5e:93
[ 4.660698] usb 2-2: r8712u: Loading firmware from "rtlwifi/rtl8712u.bin"
[ 4.660991] usbcore: registered new interface driver r8712u
[ 4.674968] nct6775: Found NCT6776D/F or compatible chip at 0x2e:0x290
[ 4.684455] systemd-udevd[170]: renamed network interface wlan0 to wlp2s0u2
[ 5.568838] r8712u 2-2:1.0 wlp2s0u2: 1 RCR=0x153f00e
[ 5.570109] r8712u 2-2:1.0 wlp2s0u2: 2 RCR=0x553f00e
[ 5.678845] IPv6: ADDRCONF(NETDEV_UP): wlp2s0u2: link is not ready
[ 16.192123] IPv6: ADDRCONF(NETDEV_CHANGE): wlp2s0u2: link becomes ready
[ 16.364130] NVRM: API mismatch: the client has the version 337.19, but
NVRM: this kernel module has the version 334.21. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 16.364148] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[ 31.390649] NVRM: API mismatch: the client has the version 337.19, but
NVRM: this kernel module has the version 334.21. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 31.390657] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
[ 213.659239] NVRM: API mismatch: the client has the version 337.19, but
NVRM: this kernel module has the version 334.21. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.
[ 213.659247] NVRM: nvidia_frontend_ioctl: minor 255, module->ioctl failed, error -22
My first thought from this is "oh, kernel modules haven't reloded" but this is coming right after a reboot, so I think that shouldn't be an issue.
Relevant portion of /var/log/Xorg.0.log:
[ 213.363] (II) LoadModule: "glx"
[ 213.363] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[ 213.378] (II) Module glx: vendor="NVIDIA Corporation"
[ 213.378] compiled for 4.0.2, module version = 1.0.0
[ 213.378] Module class: X.Org Server Extension
[ 213.378] (II) NVIDIA GLX Module 337.19 Tue Apr 29 19:48:33 PDT 2014
[ 213.378] Loading extension GLX
[ 213.378] (II) LoadModule: "nvidia"
[ 213.378] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
[ 213.378] (II) Module nvidia: vendor="NVIDIA Corporation"
[ 213.378] compiled for 4.0.2, module version = 1.0.0
[ 213.378] Module class: X.Org Video Driver
[ 213.378] (II) NVIDIA dlloader X Driver 337.19 Tue Apr 29 19:22:36 PDT 2014
[ 213.378] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
[ 213.378] (--) using VT number 2
[ 213.385] (II) Loading sub module "fb"
[ 213.385] (II) LoadModule: "fb"
[ 213.385] (II) Loading /usr/lib/xorg/modules/libfb.so
[ 213.385] (II) Module fb: vendor="X.Org Foundation"
[ 213.385] compiled for 1.15.1, module version = 1.0.0
[ 213.385] ABI class: X.Org ANSI C Emulation, version 0.4
[ 213.385] (WW) Unresolved symbol: fbGetGCPrivateKey
[ 213.385] (II) Loading sub module "wfb"
[ 213.386] (II) LoadModule: "wfb"
[ 213.386] (II) Loading /usr/lib/xorg/modules/libwfb.so
[ 213.386] (II) Module wfb: vendor="X.Org Foundation"
[ 213.386] compiled for 1.15.1, module version = 1.0.0
[ 213.386] ABI class: X.Org ANSI C Emulation, version 0.4
[ 213.386] (II) Loading sub module "ramdac"
[ 213.386] (II) LoadModule: "ramdac"
[ 213.386] (II) Module "ramdac" already built-in
[ 213.386] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the
[ 213.386] (EE) NVIDIA: system's kernel log for additional error messages and
[ 213.386] (EE) NVIDIA: consult the NVIDIA README for details.
[ 213.386] (EE) No devices detected.
[ 213.386] (EE)
Fatal server error:
[ 213.386] (EE) no screens found(EE)
pacman -Qs nvidia?
/etc/X11/xorg.conf.d/20-nvidia.conf:
Section "Device"
Identifier "Nvidia Card"
Driver "nvidia"
VendorName "NVIDIA Corporation"
Option "NoLogo" "true"
#Option "UseEDID" "false"
#Option "ConnectedMonitor" "DFP"
# ...
EndSection
I'm not sure if this shouldn't happen automatically when upgrading nvidia (or any package with a kernel module).
Edit:
Nvidia 340.17 beta: Still the same.
Edit:
https://devtalk.nvidia.com/default/topic/751903/linux/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/
https://patchwork.kernel.org/patch/4378851/
Note my journal.txt attachment below covers several reboots with the combination where X wasn't working as I was troubleshooting a bit before downgrading the packages.
journal.txt (458.1 KiB)
Have you tried linux 3.15 + patch in combination with nvidia-337.25-3 (not 340.17 beta)? I've had 1 out the 2 boots since I built linux 3.15 + patch cause X to fail as before.
Do I need to use the beta nvidia as well as the patch?
Have you tried linux 3.15 + patch in combination with nvidia-337.25-3 (not 340.17 beta)?
Yes, and everything works.
Jason Graham said:
I've had 1 out the 2 boots since I built linux 3.15 + patch cause X to fail as before.
Please report here https://devtalk.nvidia.com/default/topic/751903
Please also generate and post an "nvidia bug report" by running nvidia-bug-report.sh as root.