FS#46894 - [linux] System freeze on boot after linux-4.2.4-1-x86_64.pkg.tar.xz

Attached to Project: Arch Linux
Opened by Viorel-Cătălin Răpițeanu (Ravior) - Wednesday, 28 October 2015, 06:55 GMT
Last edited by Doug Newgard (Scimmia) - Sunday, 06 March 2016, 16:55 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:
The system freeze on boot (at the part where the partition's encryption key should be asked) using the following setup:
- GPT patition for UEFI
- cryptsetup partition
- LVM on the encrypted partition
- systemd-boot as boot loader

Additional info:
* linux-4.2.4-1-x86_64.pkg.tar.xz

Steps to reproduce:
* Simply upgrade from linux-4.2.3-1-x86_64.pkg.tar.xz to linux-4.2.4-1-x86_64.pkg.tar.xz using pacman.

Observation:
* When upgrading, the mkinitcpio for the new kernel is finished without an error. The hooks for encrypt and lvm2 were in place (the hang shouldn't be because of this).
This task depends upon

Closed by  Doug Newgard (Scimmia)
Sunday, 06 March 2016, 16:55 GMT
Reason for closing:  Fixed
Additional comments about closing:  linux 4.4.3-1
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Wednesday, 28 October 2015, 09:28 GMT
The hardware used is a Dell Latitude E6420.
Comment by Ross Pokorny (rosspoko) - Friday, 30 October 2015, 14:32 GMT
I am also having this issue on a Dell Latitude E6420. I use UEFI and the Syslinux bootloader. By adding "debug earlyprintk=efi,keep" to my kernel arguments, I was able to get some relevant output. The output appeared to be normal until it finished reading the ACPI tables, at which point it stopped. About 30 seconds later, it printed a message about initializing the random pool. Then nothing for several more minutes until it printed out a kernel panic and rebooted. Here are photos of the output as the kernel panic scrolled past:

https://goo.gl/photos/C256LsvhmLEuVYow6

I am not using any drive encryption.
Comment by Goldie Lin (goldie_lin) - Saturday, 31 October 2015, 17:15 GMT
I am also having this issue since linux-4.2.4, but solved by myself by manually mount ESP /boot and reinstall linux and re-generate initramfs.
Because I noticed a warning that point out my /boot is not mounted when generating the initramfs during installing linux.

FYI.

Description:
- The system freeze on boot at the login screen of the GDM of the GNOME 3 desktop environment.
(display screen freezen, the keyboard and mouse also dont work.)

Environment:
- Laptop: Lenovo ThinkPad X1 Carbon Gen3 (2015).
- arch: x86_64.
- GNOME 3 desktop environment.
- GPT patition for UEFI.
- systemd-boot as boot loader.
- no LVM.
- non-encrypted partition

Test steps and test results:
1. linux-4.2.3-1-x86_64.pkg.tar.xz --> Ok.
2. linux-4.2.4-1-x86_64.pkg.tar.xz --> Hang.
3. linux-4.2.5-1-x86_64.pkg.tar.xz --> Hang.
4. linux-4.2.3-1-x86_64.pkg.tar.xz --> Back to 4.2.3, Ok.
5. Manually mount /boot (efiboot ESP) partition (FAT32), reinstall linux-4.2.5, and reboot --> Ok, works again!

But I dont known why/when the /boot partition start not to be mounted anymore,
I remembered it will be automounted at boot a while ago.

Edit:
I found my /etc/fstab contained a "noauto" switch in the /boot mount options that cause it didn't mount issue.
So, removed it and reboot system, then /boot partition mounted at boot successfullly.
Comment by Guillaume Deshors (gdeshors) - Monday, 09 November 2015, 21:06 GMT
I am also affected, and am not alone. You can find more discussion about my searches here : https://bbs.archlinux.org/viewtopic.php?pid=1576429#p1576429

Description:
1/ The 201511 official Arch CD fails to boot on my machine, giving a kernel panic (I can provide a photo on demand)
2/ With my normal system, starting with linux 4.2.4-1, the computer freezes on boot, right after "Loading initial ramdisk". No further message is visible, and I can't find anything in any log.

I'm using :
- a dell optiplex 740 (AMD Athlon(tm) 64 X2 Dual Core Processor 5000+)
- a BIOS system
- grub2 as bootloader
- LVM partitions (except for /boot), without encryption

Additional info:

**** tested KO
* linux-4.2.4-1-x86_64.pkg.tar.xz
* linux-4.2.5-1-x86_64.pkg.tar.xz
* 201511 Arch CD

**** tested OK
* linux-4.2.3-1-x86_64.pkg.tar.xz
* linux-lts-4.1.12-1-x86_64.pkg.tar.xz
* 201509 Arch CD

Steps to reproduce:
* upgrade from linux-4.2.3-1-x86_64.pkg.tar.xz to linux-4.2.4-1-x86_64.pkg.tar.xz using pacman.
* or boot the november official CD

Observation:
* When upgrading, the mkinitcpio for the new kernel is finished without an error.
Comment by Ross Pokorny (rosspoko) - Monday, 09 November 2015, 21:17 GMT
@gdeshors Interesting that the lts (linux-lts-4.1.12-1-x86_64.pkg.tar.xz) kernel worked for you. It did NOT work for me.

On 4.2.5, I recommend that you try adding the kernel arguments that I did, to see if you get the same output.
Since you're on a BIOS system the arguments should be "debug earlyprintk=vga,keep"
Comment by Guillaume Deshors (gdeshors) - Monday, 09 November 2015, 21:33 GMT
I just tried, but didn't get any more output ; did I do it wrong ? I replaced
"linux /vmlinuz-linux root=UUID=c28e9c29-af1a-47bd-9952-189fe48c8755 rw quiet"
with
"linux /vmlinuz-linux root=UUID=c28e9c29-af1a-47bd-9952-189fe48c8755 rw debug earlyprintk=vga,keep"
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Tuesday, 17 November 2015, 23:06 GMT
I can confirm that the 2015/11 official Arch CD fails to boot on my machine, hanging after the uefi shell has been selected.

Also, adding debug earlyprintk=vga,keep still isn't displaying any message before the machine hangs.
Comment by Ross Pokorny (rosspoko) - Monday, 30 November 2015, 15:53 GMT
I just tried the linux-mainline package from AUR (version 4.4rc2-2) and it has the same problem.

For those of you who aren't getting any output with the "debug earlyprintk=vga,keep" parameters, try "debug earlyprintk=efi,keep"
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Monday, 28 December 2015, 22:57 GMT
Tried the latest mainline kernel from AUR (Ver. 4.4 RC7) and the same behaviour can be observed.

> For those of you who aren't getting any output with the "debug earlyprintk=vga,keep" parameters, try "debug earlyprintk=efi,keep"
Thanks for the debuging tips.

Using the latest kernel I can confirm the exact the bahaviour Ross Pokorny (rosspoko) has observed. Note that just like Ross, I'm using a Dell Latitude E6420.
Comment by kozaki (kozaki) - Friday, 01 January 2016, 22:52 GMT
Can you guys attach the boot log for this frozen boot (journalctl -b -1 for previous boot) with the kernel cheatcodes as per Viorel-Catalin?
I'd like to compare with #47509 - a Dell Inspiron Mini which freezes since kernel 4.1
Comment by Guillaume Deshors (gdeshors) - Saturday, 02 January 2016, 10:45 GMT
@kozaki, I don't think the system boots far enough to write any log to disk. I just did a failed boot and when I type journalctl -b -1 I get the output for the previous boot that didn't fail.

With "debug earlyprintk=efi,keep" I was able to get some output on screen but the behavior was one I had never seen : it seemed very slow, the lines appeared one by one every like 3 seconds or so. Is that expected ? After a few minutes the display stopped to change, I waited just one minute and rebooted, didnt see a kernel panic or anything but maybe it was yet to come.

I would also also like to request some help to do a bisection in the kernel, because I tried but I'm stuck. I exposed it here : https://bbs.archlinux.org/viewtopic.php?pid=1580061#p1580061
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Wednesday, 20 January 2016, 11:03 GMT
The discussion for this ticket was continued here:
> https://bugzilla.kernel.org/show_bug.cgi?id=110131

Since this was traced to an upstream regression, let's continue the discussion on the mentioned link. Thanks.
Comment by Guillaume Deshors (gdeshors) - Friday, 22 January 2016, 08:13 GMT
Hi Viorel-Catalin. Thanks for reporting upstream and doing the bisection ; I tried to do it but failed...

I know you asked that the discussion should be continued on the kernel bugtracker, but I would like to ask you a question without polluting yet the other bugtracker. I was surprised to read you tell that all the people encountering the bug were using UEFI, because I thought that's not my case. Now I'm in doubt and I don't know how to tell precisely.

Sure enough, I didn't get any output using earlyprintk=vga,keep and I did using earlyprintk=efi,keep. But :
* from my web searches, an optiplex 740 seems to be using a BIOS
* I don't have neither a /sys/firmware/efi directory nor a /boot/efi
* I don't have any reference to EFI in dmesg, but I have plenty to BIOS :

[joebar@bureau64:~] 1 $ dmesg | grep "BIOS"
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cfedffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfee0000-0x00000000cfee2fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfee3000-0x00000000cfeeffff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfef0000-0x00000000cfefffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000012fffffff] usable
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] DMI: Dell Inc OptiPlex 740/0UT225, BIOS 1.0.4 10/18/2006
[ 0.000000] AGP: Your BIOS doesn't leave a aperture memory hole
[ 0.000000] AGP: Please enable the IOMMU option in the BIOS setup
[ 0.060000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 0.400594] mtrr: probably your BIOS does not setup all CPUs.
[ 0.444287] HPET not enabled in BIOS. You might try hpet=force boot option
[ 12.072692] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[ 13.919732] ATOM BIOS: Maglev

So what do you think, should I contradict you in the kernel bugtracker, or is it some kind of emulation or whatever ? Thanks a lot for your advice.
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Friday, 22 January 2016, 11:50 GMT
Hi Guillaume,

> I was surprised to read you tell that all the people encountering the bug were using UEFI.
The only precise logs for this error are the ones left by Ross Pokorny (rosspoko). I can confirm that I'm seeing the same error trace, using the same hardware, also using UEFI, after the same kernel point fixing.
Because of this, I can only talk and test this particular scenario.

Regarding the problem you are seeing, you could try to see if that exact patch also introduced that regression, using your architecture. If it's the same, a note should be left saying that the patch affects more than 1 architecture, and if it isn't, a new ticket should be opened. In the latter case if you are interested, we could talk on IRC and preform a bisection on your hardware to identify the problem and raport it.

Kind regards,
Catalin
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Tuesday, 02 February 2016, 09:51 GMT
The fix for this problem was merged in the kernel's master branch. I'll update the status as soon as the fix will be found in Arch's kernel as well.
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Saturday, 27 February 2016, 13:59 GMT
The fix for this problem was integrated in the 4.4.3 kernel release.
I've just tested this using the linux kernel from the testing repository and I can confirm that it fixes the problem.

> Linux otp-crapitea-l1 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux

The moment the kernel version from testing (linux 4.4.3-1) enters the main base repository, this defect can be closed.
Comment by Viorel-Cătălin Răpițeanu (Ravior) - Sunday, 06 March 2016, 16:44 GMT
This defect can now be closed as fixed upstream.

Loading...