FS#75609 - System froze after updating to linux-5.19.1

Attached to Project: Arch Linux
Opened by Adam Beavan (ajbeavan) - Monday, 15 August 2022, 08:34 GMT
Last edited by Toolybird (Toolybird) - Monday, 28 November 2022, 22:25 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To No-one
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 7
Private No


Description:Updated to linux-5.19.1 unable to boot the system. Just froze at the point where GRUB tries to load a kernel image, had to rollback to previous version.

Additional info:
* package version(s)
* config and/or log files etc.
* link to upstream bug report, if any

Steps to reproduce:
This task depends upon

Closed by  Toolybird (Toolybird)
Monday, 28 November 2022, 22:25 GMT
Reason for closing:  Fixed
Additional comments about closing:  See comments
Comment by Toolybird (Toolybird) - Tuesday, 16 August 2022, 03:34 GMT
"It doesn't work" style bug reports are completely useless. If you're not prepared to put some effort in and provide a decent bug report, you're better off going through support channels first (forum/IRC/etc). Please re-read the bug guidelines [1]

Usual advice for kernel regressions is:
- provide logs if possible
- perform a bisection
- at least provide your hardware details?
- read this [2]

Ball is in your court.

[1] https://wiki.archlinux.org/title/Bug_reporting_guidelines
[2] https://wiki.archlinux.org/title/Kernel#Troubleshooting
Comment by Adam Beavan (ajbeavan) - Tuesday, 16 August 2022, 06:35 GMT
I will repeat 'No kernel image was loaded' which means no journal file would have been written too, I have provided all the information I have. If you know something different then please enlighten me.
Comment by michael buckley (mokchira) - Tuesday, 16 August 2022, 17:43 GMT
Hi, first time poster on here so please go easy :)
I am also running into a hang on boot after running
pacman -Syu
yesterday. So a full system upgrade, which bumped my kernel to 5.19.1 from 5.18.9.

Happy to provide any logs that I can, but I'm not sure how I would provide them since the I can only access the filesystem of the machine via a boot + arch-chroot from a USB stick with the arch installation on it, and networking doesn't seem to be working (on a wifi).

I can say that during boot the last message before the hang that I can see is something like "Reached Target System Time Set" at the default log level and quiet turned off.

Hardware is an Intel i9-9900K with an Nvidia RTX 2080. It is an Alienware 51m Laptop from 2020.

Happy to provide any other information if I can. I am in the process of doing a kernel downgrade to try and get things working again.
Comment by Christian (Watnuss) - Wednesday, 17 August 2022, 10:14 GMT
I also cannot boot into this kernel (tested on a Lenovo Thinkpad T14s and X395). Going back to any 5.18 kernel resolved the issue (whatever was the last in my local pacman pkg cache). I am happy to supply more information and testing stuff but I don't know how.

I am using syslinux as bootloader. In my case it doesn't freeze but it will just reset the machine and therefore I am in an endless boot loop.
Comment by Geoff (perseus) - Wednesday, 17 August 2022, 12:22 GMT
Two reasonably detailed Forum threads deal with this. I started one of them.

With one possible outlier (Grub) they have in common the issue that, regardless of hardware, syslinux (whatever syslinux.cfg may contain), is incapable of booting this kernel. Attempts always end in bootloops. At least that is true for all who have reported the issue. I have not seen any reports of success.

Downgrading - typically to 5.18.16 always fixes the problem.

Comment by Adam Beavan (ajbeavan) - Saturday, 20 August 2022, 16:14 GMT
Have just updated to the latest linux-5.19.2 after rolling back to 5.18.9, still have the same problem where the system fails to load the kernel image,could it possibly be a GRUB issue. have gone back to 15.8.9.
Comment by loqs (loqs) - Saturday, 20 August 2022, 23:02 GMT
Related upstream bug report [1]. Both syslinux and GRUB are reported affected.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216387
Comment by Toolybird (Toolybird) - Sunday, 21 August 2022, 05:03 GMT
@loqs, outstanding support in the forum! The bad commit is AMD related. It's still unclear if bootloader is a factor. Do you think the folks on Intel CPU's are experiencing a different bug?

@ajbeavan, we're still waiting for your hardware details.
Comment by loqs (loqs) - Sunday, 21 August 2022, 05:28 GMT
@toolybird I do not know if Intel CPU's can experience this issue. Could an affected Intel user would please test the patch or the built kernel with it applied from [1] and report back.

[1] https://bbs.archlinux.org/viewtopic.php?pid=2052866#p2052866
Comment by Adam Beavan (ajbeavan) - Sunday, 21 August 2022, 07:51 GMT
As requested the hardware details of the machine, which is a PCSpecialist Vyper Series Laptop
CPU : Intel(R) Core i7-10875
RAM : 16GB
Motherboard : Standard / GM7MP7P(TongFang)
Hard Drive: SSD 500GB Samsung 970
Graphics : IntelCometLake-H GT2[UHD] + nVidia GeForce RTX 2070 Mobile
Comment by loqs (loqs) - Sunday, 21 August 2022, 18:19 GMT Comment by Adam Beavan (ajbeavan) - Sunday, 28 August 2022, 20:47 GMT
I have not applied the kernel patch was sort of hoping the issue may have been resolved with future updates. But just installed the latest 5.19.4 and I am still getting the same system hang issues.
Comment by loqs (loqs) - Sunday, 28 August 2022, 21:03 GMT
The patch will be in linux-mainline 6.0-rc3 available prebuilt from [1] probably available tomorrow if you you do not want to apply the patch yourself or use a package I built.
That patch will make its way to a future stable release.
Queued for 5.19.6 [2]

[1] https://wiki.archlinux.org/title/Unofficial_user_repositories#miffe
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/queue-5.19/x86-boot-don-t-propagate-uninitialized-boot_params-cc_blob_address.patch?id=c7f8020fe45ff597d8ec2d1a3da4bdf3ea1e86c0
Comment by loqs (loqs) - Friday, 02 September 2022, 00:10 GMT
Is the issue resolved with linux 5.19.6.arch1-1?
Comment by Christian (Watnuss) - Friday, 02 September 2022, 07:46 GMT
Thanks for pinging the issue. Yes, it is resolved with 5.19.6-arch1-1 =)
Comment by Adam Beavan (ajbeavan) - Saturday, 03 September 2022, 19:59 GMT
Hi have just upgraded to 5.19.6, and I am still getting the system froze problem though is is now intermittent and will sometimes progress to the login stage before hanging. Will look for clues in the journal file...
Comment by Adam Beavan (ajbeavan) - Sunday, 04 September 2022, 06:30 GMT
Looking through the journal file, i am getting the following errors:
FAT-fs (sdd): unable to read boot sector to mark fs as dirty
Sep 03 22:25:45 chippy kernel: sd 3:0:0:0: [sdc] No Caching mode page found
Sep 03 22:25:45 chippy kernel: sd 3:0:0:0: [sdc] Assuming drive cache: write through
Comment by Adam Beavan (ajbeavan) - Monday, 05 September 2022, 19:19 GMT
This has not been fixed, the system still hangs but nothing is being written to the journal file, though I feel it is intel related.
Comment by loqs (loqs) - Monday, 05 September 2022, 20:01 GMT
@ajbeavan then bisect between 5.18 and 5.19 to find the cause, you may need to cherry-pick 4b1c742407571eff58b6de9881889f7ca7c4b4dc if your system was also affected by that issue.
Comment by Adam Beavan (ajbeavan) - Tuesday, 06 September 2022, 17:44 GMT
Latest kernel 5.19.7 seems to be working OK, no freeing at the moment.
Comment by Adam Beavan (ajbeavan) - Tuesday, 06 September 2022, 20:40 GMT
sorry spoke to soon system frozen, very frustrating.
Comment by Jay Wolff (jaywolff5) - Tuesday, 13 September 2022, 14:08 GMT
I've been lurking, following this thread and a few others for a few weeks now hoping there would be a resolution by now but I'm having similar issues still as well.

Ever since the 5.19 kernel update my machine has crashed at least once a day. It's usually random crashes. However, I can also reproduce it crashing by just undocking my laptop, or if I just leave it on and step away from the computer for over an hour, I'll come back and its completely unresponsive, can't even jump into another tty. While I usually am docked I have had crashes while not connected to the dock.

My System info:
Host: 20L70025US ThinkPad T480s
OS: Arch Linux x86_64
CPU: Intel i7-8650U (8) @ 4.200GHz
GPU: Intel UHD Graphics 620
Kernel: 5.19.7-arch1-1
Mesa version: 22.1.7-1
Xserver version (if applicable): xorg-server 21.1.4-1
Desktop manager and compositor: i3-gaps 4.20.1-2
Dock: ThinkPad USB-C Dock Gen2 (LDC-G2)

I've attached the full journal log with `journalctl -b -1 -x` from boot to crash, where the system crashed after I undocked it.
   crash.log (275.8 KiB)
Comment by loqs (loqs) - Tuesday, 13 September 2022, 17:21 GMT
@jaywolff5 please try linux 5.19.8.arch1-1
Comment by Jay Wolff (jaywolff5) - Tuesday, 13 September 2022, 19:37 GMT
Can confirm after the kernel update to 5.19.8-arch1-1 I can no longer reproduce a system crash by un-docking/re-docking the laptop. Sweet. Will also follow up on stability after a day or two of normal usage. Thanks for the heads up on the latest update @loqs
Comment by Adam Beavan (ajbeavan) - Wednesday, 14 September 2022, 16:25 GMT
Updated to the latest kernel and still getting random freezing,if I run libvirt and pci passthrough to the NVIDIA card then it will certainly freeze and require a number of power cycles to get a login prompt. Looks like I am going back to 5.18 again....
Comment by Jay Wolff (jaywolff5) - Friday, 16 September 2022, 13:37 GMT
Very happy to report that my system has been stable since the 5.19.8 update, haven't had a single crash since.
Comment by Adam Beavan (ajbeavan) - Thursday, 29 September 2022, 06:20 GMT
OK just an update on this one I have tried updating to kernel 5.19.11, but still getting a frozen system, my feeling is that the early loading of the vfio.pci drivers for PCI passthough might be the root cause of the problem. The laptop has both intel and NVIDIA graphics cards so i am modsetting the NVIDIA driver.
Comment by Luis Bocanegra (luisbocanegra) - Sunday, 02 October 2022, 10:39 GMT
@ajbeavan could you try setting disable_idle_d3=1 parameter for the vfio-pci module to see if it can boot?
With it my hp laptop 15-dc1004la boots again

Not a kernel dev or anything but noticed that in drivers/vfio there were some changes in the around may that brought the runtime D3 power management (now the vfio-pci can get in suspended state like the nvidia driver) but it seems to cause some troubles switching between states or something.
I don't know if all the others affected are using vfio too, if not this may be a bug in somewhere else that maybe it's getting exposed by vfio...

Here is the error I get without that parameter, after that the computer just freezes:
Sep 20 21:40:22 archlinux kernel: ACPI Error: Aborting method \_SB.PCI0.PGON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
Sep 20 21:40:22 archlinux kernel: ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
Sep 20 21:40:23 archlinux kernel: vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

System details:
Kernel Version: 5.19.12-arch1-1 (64-bit)
Processors: 12 × Intel® Core™ i7-9750H CPU @ 2.60GHz
Memory: 31.0 GiB of RAM
Graphics Processor: Mesa Intel® UHD Graphics 630
Graphics Processor: NVIDIA GeForce GTX 1650 Mobile
Manufacturer: HP
Product Name: OMEN by HP Laptop 15-dc1004la

Operating System: Arch Linux
Graphics Platform: X11

EDIT: just realized, this bug report is entirely different from my issue, perhaps it deserves its own bug report...
sorry for any confusion caused by this
Comment by Adam Beavan (ajbeavan) - Tuesday, 04 October 2022, 05:57 GMT
Have been running for a few days on the latest kernel 5.19.12 with the changed parameters for the vfio-pci driver as mentioned by Luis Bocanegra and I have had no system freezing so far, everything seems to be running fine.
Comment by michael buckley (mokchira) - Friday, 28 October 2022, 17:06 GMT
Just gave the system update another shot last night and still hit this issue. Went from 5.18.9.arch1-1 to 6.0.2.arch1-1 and hit the hang on boot again.
I tried adding vfio-pci.disable_idle_d3=1 as a kernel parameter in grub, but this did not solve the issue for me.
I DO happen to be on a laptop that has both Intel and Nvidia GPUs, same as Adam, so was hoping his fix would work for me. Did I enter that kernel parameter correctly, or do I need to configure that somewhere else?
Comment by Toolybird (Toolybird) - Monday, 28 November 2022, 22:24 GMT
Ok, pretty sure all the main problems here are resolved with latest kernels. If any issues remain, please open new ticket with *exact* details of hardware config, VFIO setup, etc.