FS#75609 - System froze after updating to linux-5.19.1
Attached to Project:
Arch Linux
Opened by Adam Beavan (ajbeavan) - Monday, 15 August 2022, 08:34 GMT
Last edited by Toolybird (Toolybird) - Monday, 28 November 2022, 22:25 GMT
Opened by Adam Beavan (ajbeavan) - Monday, 15 August 2022, 08:34 GMT
Last edited by Toolybird (Toolybird) - Monday, 28 November 2022, 22:25 GMT
|
Details
Description:Updated to linux-5.19.1 unable to boot the
system. Just froze at the point where GRUB tries to load a
kernel image, had to rollback to previous version.
Additional info: * package version(s) * config and/or log files etc. * link to upstream bug report, if any Steps to reproduce: |
This task depends upon
Closed by Toolybird (Toolybird)
Monday, 28 November 2022, 22:25 GMT
Reason for closing: Fixed
Additional comments about closing: See comments
Monday, 28 November 2022, 22:25 GMT
Reason for closing: Fixed
Additional comments about closing: See comments
Usual advice for kernel regressions is:
- provide logs if possible
- perform a bisection
- at least provide your hardware details?
- read this [2]
Ball is in your court.
[1] https://wiki.archlinux.org/title/Bug_reporting_guidelines
[2] https://wiki.archlinux.org/title/Kernel#Troubleshooting
I am also running into a hang on boot after running
pacman -Syu
yesterday. So a full system upgrade, which bumped my kernel to 5.19.1 from 5.18.9.
Happy to provide any logs that I can, but I'm not sure how I would provide them since the I can only access the filesystem of the machine via a boot + arch-chroot from a USB stick with the arch installation on it, and networking doesn't seem to be working (on a wifi).
I can say that during boot the last message before the hang that I can see is something like "Reached Target System Time Set" at the default log level and quiet turned off.
Hardware is an Intel i9-9900K with an Nvidia RTX 2080. It is an Alienware 51m Laptop from 2020.
Happy to provide any other information if I can. I am in the process of doing a kernel downgrade to try and get things working again.
I am using syslinux as bootloader. In my case it doesn't freeze but it will just reset the machine and therefore I am in an endless boot loop.
With one possible outlier (Grub) they have in common the issue that, regardless of hardware, syslinux (whatever syslinux.cfg may contain), is incapable of booting this kernel. Attempts always end in bootloops. At least that is true for all who have reported the issue. I have not seen any reports of success.
Downgrading - typically to 5.18.16 always fixes the problem.
https://bbs.archlinux.org/viewtopic.php?id=278856
https://bbs.archlinux.org/viewtopic.php?pid=2052001#p2052001
[1] https://bugzilla.kernel.org/show_bug.cgi?id=216387
@ajbeavan, we're still waiting for your hardware details.
[1] https://bbs.archlinux.org/viewtopic.php?pid=2052866#p2052866
CPU : Intel(R) Core i7-10875
RAM : 16GB
Motherboard : Standard / GM7MP7P(TongFang)
Hard Drive: SSD 500GB Samsung 970
Graphics : IntelCometLake-H GT2[UHD] + nVidia GeForce RTX 2070 Mobile
[1] https://bbs.archlinux.org/viewtopic.php?pid=2052866#p2052866
Edit:
Upstream fix https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b1c742407571eff58b6de9881889f7ca7c4b4dc
That patch will make its way to a future stable release.
Edit:
Queued for 5.19.6 [2]
[1] https://wiki.archlinux.org/title/Unofficial_user_repositories#miffe
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/queue-5.19/x86-boot-don-t-propagate-uninitialized-boot_params-cc_blob_address.patch?id=c7f8020fe45ff597d8ec2d1a3da4bdf3ea1e86c0
FAT-fs (sdd): unable to read boot sector to mark fs as dirty
Sep 03 22:25:45 chippy kernel: sd 3:0:0:0: [sdc] No Caching mode page found
Sep 03 22:25:45 chippy kernel: sd 3:0:0:0: [sdc] Assuming drive cache: write through
Ever since the 5.19 kernel update my machine has crashed at least once a day. It's usually random crashes. However, I can also reproduce it crashing by just undocking my laptop, or if I just leave it on and step away from the computer for over an hour, I'll come back and its completely unresponsive, can't even jump into another tty. While I usually am docked I have had crashes while not connected to the dock.
My System info:
Host: 20L70025US ThinkPad T480s
OS: Arch Linux x86_64
CPU: Intel i7-8650U (8) @ 4.200GHz
GPU: Intel UHD Graphics 620
Kernel: 5.19.7-arch1-1
Mesa version: 22.1.7-1
Xserver version (if applicable): xorg-server 21.1.4-1
Desktop manager and compositor: i3-gaps 4.20.1-2
Dock: ThinkPad USB-C Dock Gen2 (LDC-G2)
I've attached the full journal log with `journalctl -b -1 -x` from boot to crash, where the system crashed after I undocked it.
With it my hp laptop 15-dc1004la boots again
Not a kernel dev or anything but noticed that in drivers/vfio there were some changes in the around may that brought the runtime D3 power management (now the vfio-pci can get in suspended state like the nvidia driver) but it seems to cause some troubles switching between states or something.
I don't know if all the others affected are using vfio too, if not this may be a bug in somewhere else that maybe it's getting exposed by vfio...
Here is the error I get without that parameter, after that the computer just freezes:
Sep 20 21:40:22 archlinux kernel: ACPI Error: Aborting method \_SB.PCI0.PGON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
Sep 20 21:40:22 archlinux kernel: ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
Sep 20 21:40:23 archlinux kernel: vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
System details:
Kernel Version: 5.19.12-arch1-1 (64-bit)
Processors: 12 × Intel® Core™ i7-9750H CPU @ 2.60GHz
Memory: 31.0 GiB of RAM
Graphics Processor: Mesa Intel® UHD Graphics 630
Graphics Processor: NVIDIA GeForce GTX 1650 Mobile
Manufacturer: HP
Product Name: OMEN by HP Laptop 15-dc1004la
Operating System: Arch Linux
Graphics Platform: X11
EDIT: just realized, this bug report is entirely different from my issue, perhaps it deserves its own bug report...
sorry for any confusion caused by this
I tried adding vfio-pci.disable_idle_d3=1 as a kernel parameter in grub, but this did not solve the issue for me.
I DO happen to be on a laptop that has both Intel and Nvidia GPUs, same as Adam, so was hoping his fix would work for me. Did I enter that kernel parameter correctly, or do I need to configure that somewhere else?