FS#74883 - [nvidia] nvidia package updated without the kernel getting updated

Attached to Project: Arch Linux
Opened by Arvid Norlander (VorpalGun) - Saturday, 28 May 2022, 18:34 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 06 June 2022, 00:11 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:

I ran pacman -Syu and nvidia got updated but not the linux package. This lead to the errors in the attached log file. The problem is that nvidia for kernel 5.18 got installed, but not kernel 5.18:

❯ ls /lib/modules/*
/lib/modules/5.17.9-arch1-1:
build modules.alias modules.builtin modules.builtin.bin modules.dep modules.devname modules.softdep modules.symbols.bin updates
kernel modules.alias.bin modules.builtin.alias.bin modules.builtin.modinfo modules.dep.bin modules.order modules.symbols pkgbase vmlinuz

/lib/modules/5.18.0-arch1-1:
extramodules


I would think these should be published in lock step. In addition perhaps it would be good if nvidia had a dependency on the specific kernel version it depends on to prevent this.

If I would not have caught this, it would have led to a system with no working GUI.

In case this is due to a broken mirror I have also attached my mirror list.

Additional info:
* package version(s): nvidia-515.43.04-6, linux-5.17.9.arch1-1
* config and/or log files etc: See attached log file.

Steps to reproduce:
* pacman -Syu during the moment when nvidia has been updated but not the linux package?! Possibly also requires the proper mirror selected.
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Monday, 06 June 2022, 00:11 GMT
Reason for closing:  Not a bug
Comment by Arvid Norlander (VorpalGun) - Sunday, 29 May 2022, 11:12 GMT
> 2022-05-28: A task closure has been requested. Reason for request: current kernel version on core is 5.18 you may have just updated at the wrong time

That is possible. But in that case shouldn't there be a specific dependency on linux-5.18 so that pacman errors out rather than allowing the broken upgrade to continue? It seems to me that the dependency is underspecified.
Comment by Adam Nielsen (Malvineous) - Sunday, 29 May 2022, 12:11 GMT
I just ran into this problem too. In my case I am using zfs-linux, and it prevents the kernel from being upgraded:

$ pacman -Su
error: failed to prepare transaction (could not satisfy dependencies)
:: installing linux (5.18.arch1-1) breaks dependency 'linux=5.15.12.arch1-1' required by zfs-linux

So while I wait for zfs-linux to be updated, I just go:

$ pacman -Su --ignore linux --ignore zfs-linux

I then get no errors but on reboot I also get no graphics, because `nvidia` installed with an incompatible kernel. If the kernel version was included in the dependency list for the nvidia package, I would've gotten a warning instead, reminding me to also `--ignore nvidia` when doing the upgrade.

Since the nvidia module does get compiled against a specific kernel version, and requires a specific version to work, could the version be included in the nvidia dependency list? It would save a lot of headaches.
Comment by Arvid Norlander (VorpalGun) - Sunday, 29 May 2022, 12:14 GMT
I agree that the way zfs module handles this better by preventing the upgrade that would break the system. That is preferable to what the nvidia module does where it silently breaks the system.
Comment by Adam Nielsen (Malvineous) - Sunday, 29 May 2022, 15:05 GMT
It would also be really helpful if the kernel version was included as part of the package version.

At the moment if I have to revert back to an older kernel, there is no way to figure out which version of the nvidia module you need to boot the system. It seems you have to try 20 different versions one by one until you find the one that works with that particular kernel version.

I've had to revert kernel versions a few times in the past due to significant bugs appearing in the latest one, and then had to go forwards a few versions to see when the bug appeared, and it was such a pain to find which nvidia package version I needed for the kernel versions I was testing.

If the package versions had a suffix with the kernel version they were compiled again so they were named like nvidia-515.43.04-5.17.9.arch1-1-x86_64.pkg.tar.zst that would solve another headache too.
Comment by Stéphan Bernard (phanou) - Monday, 30 May 2022, 13:43 GMT
Hi, I had this problem too, getting a black screen after a reboot (due to gdm service). I didn't solve it yet and I'm currently running xorg in fbdev mode with a downgraded kernel. It looks like kernel 5.18 has difficulties to load nvdia module.
The journalctl shows a : «kernel: kernel BUG at arch/x86/kernel/traps.c:252!» (line 1080 on joined dump).
BTW, thank you for your work.
Comment by Stéphan Bernard (phanou) - Thursday, 02 June 2022, 13:37 GMT
Seems to be duplicate of https://bugs.archlinux.org/task/74886
Comment by Arvid Norlander (VorpalGun) - Thursday, 02 June 2022, 17:22 GMT
@Stephan Bernard Your issue seem to be different from this but and may or may not be a duplicate of another bug. However either way it is very different from this bug.
Comment by sluice (sluice) - Saturday, 04 June 2022, 00:43 GMT
It took me 4 day's to sot out the mistake of those how fiddle with updates , and don't test them and put them in a live update .

The fix is as download the NVIDIA latest drivers , and compile them myself on your own Linux ver

the only way to grantee it works with arch Linux due to incompetence .

Loading...