FS#59945 - [linux] 4.18.* rcu_preempt detected stalls on CPUs / tasks (kernel panic?)
Attached to Project:
Arch Linux
Opened by yo (yo_arch) - Wednesday, 05 September 2018, 15:00 GMT
Last edited by Doug Newgard (Scimmia) - Thursday, 20 September 2018, 14:31 GMT
Opened by yo (yo_arch) - Wednesday, 05 September 2018, 15:00 GMT
Last edited by Doug Newgard (Scimmia) - Thursday, 20 September 2018, 14:31 GMT
|
Details
Hello everyone
Description: After updating the kernel from 4.17.9 to 4.18.[1-5] I got a kernel panic (I guess) by not being able to boot. At boot I get in order: loading kernel loading initramfs then, new screen with: starting version 239 and then nothing during a few minutes (doesn't ask me for my LUKS partition password as normal) then I get these messages (this is approximate, can not copy/past, no logs on the system): info rcu_preempt detected stalls on CPUs / tasks nonlazy_posted rcu_preempt kthread starved for jiffies RCU grace-period kthread stack dump And a few minutes later, more or less the same messages, and again, and again. However, once in 15 it boots well and asks me my password directly after "starting version 239" I tried 4.18.1 and 4.18.5 with the same issue and have one more time downgraded to kernel 4.17.9. Additional info: cat /proc/version Linux version 4.18.5-arch1-1-ARCH (builduser@heftig-12250) (gcc version 8.2.0 (GCC)) #1 SMP PREEMPT Fri Aug 24 12:48:58 UTC 2018 My computer is an acer with the following characteristics (see attachment) Steps to reproduce: turn on the laptop until boot |
This task depends upon
Closed by Doug Newgard (Scimmia)
Thursday, 20 September 2018, 14:31 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.18.9.arch1-1
Thursday, 20 September 2018, 14:31 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.18.9.arch1-1
https://lore.kernel.org/lkml/20180905084158.GR24124%40hirez.programming.kicks-ass.net/
Edit:
Fixed up link flyspray parses it as a url and an email address
I bet it's a CPU-linked kernel bug, as I also have a Intel Core 2 Duo (a slightly older version though).
UPDATE: workarounded by turning "Off" the "Intel SpeedStep" mode in BIOS options.
On my Dell XPS M1530 it's under "Performance -> SpeedStep Enable".
I checked the Intel SpeedStep but I don't have this option in my BIOS.
What am I supposed to do wih this patch? I mean, I am going to apply it on the 4.18.6 linux kernel, compile it and then install it.
But will this patch be applied to the next official kernels or I will have to download the code source of every new kernel, compile and install it manually?
It should be then pulled into mainline then as it is marked for stable for 4.18+ it will be queued for a future 4.18 stable release.
The package maintainer could apply it sooner. This does of course assume this fixes the issue for you.
Either "clocksource=hpet" or "tsc=unstable" should equally do the trick to avoid the early boot stalling.
Some of the Intel Core 2 {Duo,Quad} are affected, but apparently not all of them.
The future kernel 4.18.7, released tomorrow evening, will not contain the fix.
Mainline kernel 4.19-rc3 is going to have it included.
However I got a problem when generating the initramfs with mkinitcpio: https://bbs.archlinux.org/viewtopic.php?id=240379
Do you have an idea?
The 4.18.9 stable release is going to include it.
Here is the URL to it:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.18/clocksource-revert-remove-kthread.patch
I tried all the 4.18.* linux kernel with the same boot issue. I applied the patch
https://lore.kernel.org/lkml/20180905084158.GR24124%40hirez.programming.kicks-ass.net/
on the 4.18.6 linux kernel, installed it and my boot issue is SOLVED!!
So, this patch seems to be the solution of my kernel panic problem.
Could you tell me:
Who did code this patch and why? Was-it because of my bug opening?
How did you guess that it will fix my issue?
I dug into the https://git.kernel.org but I don't understand the logical. How do you follow a patch and know when it will be committed/added to the mainline?
Once again thank you for your time.
*) "Who did code this patch and why?"
The kernel developers/maintainers of the subsystem "timers" and source file clocksource.c, in this case Peter Zijlstra.
The patch to fix our boot stalling issue is just a revert patch, as the development for kernel 4.18 added source code,
which stalled the boot process.
Since Intel Core 2 hardware is quite old by now (1 decade) it is important to report issues like this
upstream and raise awareness, as most developers tend to use more up to date hardware by now.
*) "Was-it because of my bug opening?"
No.
*) "How did you guess that it will fix my issue?"
No guessing at all involved!
I took the initiative and reported it the upstream kernel mailing list.
Also, viktorj and I reported it upstream to the kernel bugzilla first and were told to take it to the related mailing list.
Several fellow Arch Linux users, including myself, already compiled a custom kernel with the patch applied,
booted successfully and through collaboration let the others know.
It was the hard work of Arch Linux users, who used git bisect - otherwise you're looking for the needle in the haystack.
*) "How do you follow a patch and know when it will be committed/added to the mainline?"
If you reported it upstream, then directly via e-mail, otherwise follow LKML or similar kernel mailing lists online (or subscribe to it).
Using git.kernel.org if you know where to look is also fine for commit logs.
If you want more information about this issue and how we got the point where stable kernel 4.18.9 is including the fix, read the corresponding Arch forum thread:
https://bbs.archlinux.org/viewtopic.php?id=239672
Please, if you have more questions, I kindly suggest that you ask them in the Arch forum instead.
[Edited to add]:
The issue is fixed. Had an additional "reboot session" and kernel 4.18.9 is rock solid.