Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#76223 - [linux] Kernel 6.0.2-arch1-1 Oops during boot (nouveau module) and hang during reboot/shutdown

Attached to Project: Arch Linux
Opened by Mark Clegg (mclegg) - Sunday, 16 October 2022, 20:57 GMT
Last edited by Jelle van der Waa (jelly) - Thursday, 14 September 2023, 17:51 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Jan Alexander Steffens (heftig)
David Runge (dvzrv)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Since upgrading to kernel 6.0.x (currently 6.0.2-arch1-1) my system will no longer reboot, or poweroff on shutdown. I'm also seeing a kernel Oops during startup, pointing to the nouveau module being responsible.

This is on a Lenovo Thinkpad W540 which has both integrated Intel video, and discrete NVidia (Quadro K2100M). I'm only using the builtin LCD panel, which is driven by the Intel driver. I don't currently have any displays attached to the NVidia, whose outputs are only available via a dock.

Blacklisting the nouveau module seems to resolve the issue, so I suspect the Oops is responsible for the failure to poweroff/reboot.

Additional info:
* package version(s)
core/linux 6.0.1.arch1-1
core/linux 6.0.1.arch2-1
core/linux 6.0.2.arch1-1
* config and/or log files etc.
dmesg output attached
* link to upstream bug report, if any

Steps to reproduce:
Shutdown or restart the system.
   dmesg (97.2 KiB)
This task depends upon

Closed by  Jelle van der Waa (jelly)
Thursday, 14 September 2023, 17:51 GMT
Reason for closing:  Deferred
Additional comments about closing:  Old kernel, please retry with the latest
Comment by Toolybird (Toolybird) - Sunday, 16 October 2022, 21:24 GMT
Good info / troubleshooting. It would be even better if you could report this upstream. There is an issue tracker here [1]. There is also a mailing list for nouveau [2]. But best of all would be a git bisection to identify the commit that caused the regression (but not sure how motivated you would be for this seeing as you have a workaround).

[1] https://gitlab.freedesktop.org/drm/nouveau
[2] https://lists.freedesktop.org/mailman/listinfo/nouveau
Comment by Mark Clegg (mclegg) - Monday, 17 October 2022, 16:12 GMT
I've created upstream issue: https://gitlab.freedesktop.org/drm/nouveau/-/issues/188
Not sure how I'd do a git bisection I'm afraid.
Comment by Mark Clegg (mclegg) - Monday, 17 October 2022, 17:13 GMT
I've created upstream issue: https://gitlab.freedesktop.org/drm/nouveau/-/issues/188
Not sure how I'd do a git bisection I'm afraid.
Comment by daniel trentman (dtrentman) - Tuesday, 01 August 2023, 18:15 GMT
same/similar problem with a macbookpro 2009, kernel 6.4.6, 6.4.7 and nouveau
have not tried nvidia driver, trying to get nouveau to work
same mesa driver in 6.4.5 works (kernel or nouveau bug?)

boot with modprobe.blacklist=nouveau
then modprobe nouveau
provides the attached information

(application/x-gzip)    info.tgz (28.4 KiB)
Comment by loqs (loqs) - Wednesday, 02 August 2023, 14:46 GMT
@dtrentman the backtrace you provided has no similarity to that provided by mclegg why do you believe it is the same issue?
6.4.6 only addressed a Zen2 CPU vulnerability [1] so I do not see how that could introduce a nouveau issue.

[1] https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.4.6
Comment by daniel trentman (dtrentman) - Wednesday, 02 August 2023, 17:00 GMT
It looks like the same issue because the machine fails to boot when nouveau is allowed to load, it has a 'kernel oops' as mentioned in the original description which I take as a kernel panic. After blacklisting nouveau in the boot line, just as in the original problem description, the machine boots. So that is why I think it is the same problem and similar to others who just use nomodeset in the boot. In my log after doing a 'modeprobe nouveau':

>Jul 31 20:29:55 lt-2 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000058

Given your information about 6.4.6, I will need to reconfirm that it panics too but 6.4.7, including 6.4.7-arch1-2, panics. Any other type of information I can provide for you?

Comment by loqs (loqs) - Wednesday, 02 August 2023, 17:36 GMT
The back trace indicates your NULL pointer dereference occurs in a different function after following a different call path which indicates a different issue. Your issue also appears to have been introduced in a 6.4.Y release.
Your issue appears to happen in drm_connector_register_all [1] possibly drm_device is some how NULL.

[1] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_connector.c#L706

Loading...