FS#52246 - [linux] 4.9 freezes on initramfs loading.

Attached to Project: Arch Linux
Opened by Frederic Bezies (fredbezies) - Thursday, 22 December 2016, 23:21 GMT
Last edited by Tobias Powalowski (tpowa) - Friday, 27 January 2017, 06:57 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description: I cannot get past loading initramfs with linux kernel 4.9-1 on my Toshiba L300-2CZ laptop. Linux 4.8.xx and Linux 4.4.xx LTS is booting flawlessly. I thought first it was related to missing firmware in linux-firmware package.

I created another initramfs with fixed linux-firmware package, and still stuck on loading initramfs. Systemd is never launched. Only workaround ? Installing and booting on linux-lts.


Additional info:
Linux 4.9-1
linux-firmware 20161222.4b9559f-2


Steps to reproduce:

See description. Will add lspci in order to give more information on my computer which is intel based for both processeur and GPU.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Friday, 27 January 2017, 06:57 GMT
Reason for closing:  Fixed
Additional comments about closing:  4.9.6
Comment by Frederic Bezies (fredbezies) - Thursday, 22 December 2016, 23:22 GMT
Adding my lspci. If you need something else, just ask me.

Note : tried linux-zen 4.9 ? Same problem. Looks like my computer is not compatible with linux 4.9 :(
Comment by Frederic Bezies (fredbezies) - Friday, 23 December 2016, 16:24 GMT
Nothing is logged. Just stuck on initramfs loading. Tried nearly every trick from this page, and nothing is displayed :(

https://wiki.archlinux.org/index.php/General_troubleshooting#Boot_problems
Comment by Gene (GeneC) - Friday, 23 December 2016, 17:55 GMT
I am unable to boot on one Ivy bridge laptop - I use refind and refind prints kernel line and then no output at all after that from the kernel. Total boot fail. All jernels 4.8.15 and earlier work fine.

So far I have the following tested:

Lenovo Laptop W540 Ivy Bridge i7-4700 MQ - Boot Failed
Lenovo Laptop W520 Sandy Bridge i7-2720QM - OK
Desktop Ivy Bridge I7-4770 - OK
Desktop Ivy Bridge i7-4790 - OK
Desktop Ivy Bridge i7-4771 - OK
Desktop Core i7 Lynnfield (860) - OK
Desktop Ivy Bridge i7-4770K - OK
Desktop Skylake i5-6260U - OK (except i do have continued Display port Problems - have to reboot 1 - 10 times before it works)




Comment by Gene (GeneC) - Friday, 23 December 2016, 18:05 GMT
Frederic - can you print what you see (just one line) from this command please:

egrep 'model name' /proc/cpuinfo

Thanks.

gene
Comment by Frederic Bezies (fredbezies) - Friday, 23 December 2016, 18:14 GMT
[fred@fredo-arch-laptop ~]$ egrep 'model name' /proc/cpuinfo
model name : Pentium(R) Dual-Core CPU T4200 @ 2.00GHz
model name : Pentium(R) Dual-Core CPU T4200 @ 2.00GHz

Penryn generation I think?
Comment by Frederic Bezies (fredbezies) - Saturday, 24 December 2016, 17:18 GMT
Here is what I tried, without luck:

1) adding intel-ucode package
2) disabling CONFIG_MODVERSIONS as stated on this invalid bug : https://bugs.archlinux.org/task/51856

Any other ideas? Thanks.
Comment by Gerrit Großkopf (kingcreole) - Sunday, 25 December 2016, 09:31 GMT
my acer aspire E5-571G has the same/ a similar Problem, i5-5200U CPU, thanks for the lts hint :/

Normaly my biggest Problem would be the Geforce 840M with Optimus configuration but this one doesn't even get to decrypring the harddisk..
Comment by Gerrit Großkopf (kingcreole) - Sunday, 25 December 2016, 22:28 GMT
Didn't get it to run again till now, deactivated testing, now on kernel 4.8.13-1 with everything again, still crashing :/ really weird how it tries to start /bin/systemd while / is still perfectly encrypted... Maybe timing is wrong? Should there be another pause added somewhere in the init? Also maybe linux-git from aur, I'll try that tomorrow :)
Comment by Gerrit Großkopf (kingcreole) - Monday, 26 December 2016, 09:39 GMT
how did you install and boot on linux-lts? that doesn't work on my pc :/
Comment by Frederic Bezies (fredbezies) - Monday, 26 December 2016, 11:47 GMT
To Gerrit :

1) Make an Archlinux installation USB key.
2) Boot on it with a working connection
3) mount both /boot and / partitions
4) arch-chroot /mnt
5) pacman -Syy
6) pacman -s linux-lts
7) grub-mkconfig -o /boot/grub/grub.cfg
8) reboot and choose advanced options to see if you have both lts and last release listed in grub boot.
Comment by Gerrit Großkopf (kingcreole) - Monday, 26 December 2016, 12:17 GMT
Thanks for the help, i tried that and it definitely tries to start Linux 4.4something but it still creates kernelpanic
Comment by Gerrit Großkopf (kingcreole) - Tuesday, 27 December 2016, 11:55 GMT
I just found out, Manon 0.5.2 ( the Hardware detection tool in my bootmanager) detects my CPU as an eight core, with 2 cores enabled and 4 threads, this CPU should be a 2 core 4 threads one, maybe that has something to do with it?
Comment by Gerrit Großkopf (kingcreole) - Tuesday, 27 December 2016, 15:09 GMT
At the kernel panic message there was something about "CPU: 3" and in a freshly installed bootloader there is one time a message about Warning: CPU: 3 PID: 1 at arch/x86/kernel/smp.c:127 native_smp_send_reschedule+0x3a/0x40

I might be onto something here
Comment by Gerrit Großkopf (kingcreole) - Tuesday, 27 December 2016, 23:07 GMT
Damn, i must have seriously had some things messed up for a long time, i still had a /usr/lib64 folder so filesystem didn't update completely, afer also adding intel-ucode to the startup it starts the good old 4.8 kernel, updating stuff now, later i'll test 4.9 again and update this post, hope i'm not spamming too much

IT WORKS :) whew, that was a lot for a little update, kernel 4.9 is running, wishing you all a happy holliday or kernelupdate or whatever you want to celebrate :)
Comment by Frederic Bezies (fredbezies) - Sunday, 01 January 2017, 20:16 GMT
I tried to start a manjaro 17.0 alpha ISO on USB key which is using a linux 4.9 kernel. I got a kernel panic with a lot of ACPI related lines.

Could it lead to something interesting or is it useless ?

Edit : I started successfully Manjaro 17.0 alpha on my desktop (with and AMD Athlon X2 CPU), and I grabbed all ACPI config options in a file with zcat /proc/config.gz | grep ACPI > config-manjaro.log

Done the same on my archlinux desktop computer : zcat /proc/config.gz | grep ACPI > config-manjaro.log

Looks like these are the same options used.

After looking at this commit https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=4e41212d7ab49bc785eba7302dd1a9c0285fa41e I noticed the only new ACPI option was CONFIG_ACPI_WATCHDOG

Could it be related to this freeze ? Also, looks like package 4.9.0-2 will add some more WATCHDOG related options : https://git.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=7e19b29f9b72ca0b93898900acd776adf1fbe614

"+CONFIG_WATCHDOG_PRETIMEOUT_GOV=y
+# CONFIG_WATCHDOG_PRETIMEOUT_DEFAULT_GOV_NOOP is not set
+CONFIG_WATCHDOG_PRETIMEOUT_DEFAULT_GOV_PANIC=y
+CONFIG_WATCHDOG_PRETIMEOUT_GOV_NOOP=m
+CONFIG_WATCHDOG_PRETIMEOUT_GOV_PANIC=y"

Maybe the answer ?

Comment by Steven The (lembang) - Wednesday, 04 January 2017, 04:22 GMT
on thinkpad t440p i7-4910MQ with LVM on LUKS the init also failed, not even boot up.
Comment by Mike Cloaked (mcloaked) - Friday, 06 January 2017, 19:55 GMT
Looking at a few similar reports is it worth asking whether the users who have been hit by this have (LUKS) encrypted drives? It seems possible that upstream bugs at https://bugzilla.kernel.org/show_bug.cgi?id=191121 and https://bugzilla.kernel.org/show_bug.cgi?id=191801 are connected to this issue?
Comment by Frederic Bezies (fredbezies) - Friday, 06 January 2017, 20:06 GMT
No encrypted partitions on my laptop.
Comment by loqs (loqs) - Saturday, 07 January 2017, 00:51 GMT
Performing a bisection should find the first bad commit https://wiki.archlinux.org/index.php/Bisecting_bugs
If it is https://bugzilla.kernel.org/show_bug.cgi?id=191121 then 8e80632fb23f021ce5a6957f2edcdae4645a7030 has been identified as the cause from one bisection already but can not be cleanly reverted.
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8e80632fb23f021ce5a6957f2edcdae4645a7030
shows it's parent ( the previous commit as it only has a single parent) is 31ce8cc68180803aa481c0c1daac29d8eaceca9d
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=31ce8cc68180803aa481c0c1daac29d8eaceca9d
Checking out 31ce8cc68180803aa481c0c1daac29d8eaceca9d should produced a kernel that boots while applying 8e80632fb23f021ce5a6957f2edcdae4645a7030 should produced the bug assuming it is 191121
Although a full bisection would still probably be preferred by upstream and would identify the first bad commit if the cause is different.
Comment by Frederic Bezies (fredbezies) - Sunday, 08 January 2017, 17:06 GMT
Added a bug upstream : https://bugzilla.kernel.org/show_bug.cgi?id=192111

Hope things will move a little.
Comment by Frederic Bezies (fredbezies) - Monday, 09 January 2017, 17:52 GMT
If I insert acpi=off in grub line, it boots, but there is no working display.
Comment by Gaelic (gaelic) - Tuesday, 10 January 2017, 07:34 GMT
With version 4.9.2 (zen) my problems (Thinkpad T440) are gone.
Comment by Frederic Bezies (fredbezies) - Tuesday, 10 January 2017, 09:45 GMT
You're lucky. Whatever linux 4.9 version I try to boot on my laptop, it is always blocked on initramfs loading. Classic or Zen one ? Not a single change. Looks like 4.9, 4.9.1 and 4.9.2 are "kinda rotten" somewhere :(
Comment by Frederic Bezies (fredbezies) - Tuesday, 10 January 2017, 12:57 GMT
Can we revert this patch and see if it boots correctly?

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=a035dc674dd477e61e5b917c60c30622b6d083f8

Found on this forum thread :

https://bbs.archlinux.org/viewtopic.php?id=221538

Another upstream bug closed as unreproducible :

https://bugzilla.kernel.org/show_bug.cgi?id=188221

Or apply / revert the patch found in this bug report? Applied it, build a kernel and still no luck :(
Comment by Frederic Bezies (fredbezies) - Wednesday, 11 January 2017, 09:23 GMT
Looks like a bisecting find this commit to be "guilty" here, on bug I opened upstream : https://bugzilla.kernel.org/show_bug.cgi?id=192111

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=8b355e3bc1408be238ae4695fb6318ae502cae8e

Will try to revert it and see what is happening. Keeping fingers crossed!

Comment by Frederic Bezies (fredbezies) - Wednesday, 11 January 2017, 12:43 GMT
Good news, for once. Reverting commit against linux 4.9.2 code and my laptop can boot again.

Any hope, while waiting for a fix (which could be included in 4.10-rc4 - see https://bugzilla.kernel.org/show_bug.cgi?id=192111#c13 - to try a revert of this commit?

I'll try to see if my main computer (AMD athlonX2 based one) still boot with the kernel I built in order to test commit removal. And I'll do a report asap.

Edit : I'm writing the "modified" 4.9.2 linux kernel I built this morning. So far, so good.
Comment by Frederic Bezies (fredbezies) - Friday, 13 January 2017, 13:12 GMT
Can you add this patch to linux 4.9.3?

https://bugzilla.kernel.org/attachment.cgi?id=251331

Both my laptop (used for this bug report) and my main computer (AMD AthlonX2 215 based) are running with a patched linux 4.9.3 kernel.
Comment by Mike Cloaked (mcloaked) - Monday, 16 January 2017, 19:03 GMT
It seems that this issue is resolved as per https://bugzilla.kernel.org/show_bug.cgi?id=191121 but that the required patches won't be in 4.9.4-1 for arch unless they are backported into the current kernel in testing, but will be in 4.9.5 when that is released.
Comment by Frederic Bezies (fredbezies) - Friday, 20 January 2017, 13:11 GMT
Not in 4.9.5...

Here is the linux 4.9.5 lkml release message : https://lkml.org/lkml/2017/1/20/161

And you can see this commit was not added for 4.9.5... :(

http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=52d7e48b86fc108e45a656d8e53e4237993c481d

Let's keep fingers crossed for 4.9.6.
Comment by Javier Viñal (fjvinal) - Friday, 20 January 2017, 13:22 GMT
I have the same issue with LUKS. I have found a workaround changing the mkinitcpio.conf hooks to: systemd, sd-encrypt and sd-lvm2.
Comment by Frederic Bezies (fredbezies) - Monday, 23 January 2017, 20:21 GMT
It "smells" good for 4.9.6... Look at this commit : https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/commit/queue-4.9?id=2d75ae503743de5399f82528f9c660d28745e03e

"added patches:
rcu-narrow-early-boot-window-of-illegal-synchronous-grace-periods.patch
rcu-remove-cond_resched-from-tiny-synchronize_sched.patch
sunrpc-don-t-call-sleeping-functions-from-the-notifier-block-callbacks.patch
x86-pci-ignore-_crs-on-supermicro-x8dth-i-6-if-6f.patch"

First commit listed is the one fixing the bug :)
Comment by txtsd (txtsd) - Thursday, 26 January 2017, 08:38 GMT
I get stuck at initramfs as well.

$ egrep 'model name' /proc/cpuinfo
model name : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
model name : Intel(R) Core(TM)2 Duo CPU T5750 @ 2.00GHz
Comment by Frederic Bezies (fredbezies) - Thursday, 26 January 2017, 08:42 GMT
Fixed with linux 4.9.6 which was released on 2017-01-26 07:25:42 (GMT)

Cf : https://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/commit/?id=d72ebd1eb855e56138e171475376d4c4e9defe79

Well, looks like archlinux build server will have some work soon :)
Comment by Marisa Kirisame (Sayachan) - Thursday, 26 January 2017, 15:31 GMT
I've noticed this problem when I migrated my install from legacy bios boot to uefi using refind. I'm on 4.8.13 though.

I just get a freeze after refind loads the kernel.
Comment by Frederic Bezies (fredbezies) - Thursday, 26 January 2017, 21:49 GMT
@Marisa : this is surely another issue.

By the way, my laptop can boot on official linux 4.9.6 kernel package. At least! Bug is fixed for me.

Loading...