FS#46894 - [linux] System freeze on boot after linux-4.2.4-1-x86_64.pkg.tar.xz
Attached to Project:
Arch Linux
Opened by Viorel-Cătălin Răpițeanu (Ravior) - Wednesday, 28 October 2015, 06:55 GMT
Last edited by Doug Newgard (Scimmia) - Sunday, 06 March 2016, 16:55 GMT
Opened by Viorel-Cătălin Răpițeanu (Ravior) - Wednesday, 28 October 2015, 06:55 GMT
Last edited by Doug Newgard (Scimmia) - Sunday, 06 March 2016, 16:55 GMT
|
Details
Description:
The system freeze on boot (at the part where the partition's encryption key should be asked) using the following setup: - GPT patition for UEFI - cryptsetup partition - LVM on the encrypted partition - systemd-boot as boot loader Additional info: * linux-4.2.4-1-x86_64.pkg.tar.xz Steps to reproduce: * Simply upgrade from linux-4.2.3-1-x86_64.pkg.tar.xz to linux-4.2.4-1-x86_64.pkg.tar.xz using pacman. Observation: * When upgrading, the mkinitcpio for the new kernel is finished without an error. The hooks for encrypt and lvm2 were in place (the hang shouldn't be because of this). |
This task depends upon
Closed by Doug Newgard (Scimmia)
Sunday, 06 March 2016, 16:55 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.4.3-1
Sunday, 06 March 2016, 16:55 GMT
Reason for closing: Fixed
Additional comments about closing: linux 4.4.3-1
https://goo.gl/photos/C256LsvhmLEuVYow6
I am not using any drive encryption.
Because I noticed a warning that point out my /boot is not mounted when generating the initramfs during installing linux.
FYI.
Description:
- The system freeze on boot at the login screen of the GDM of the GNOME 3 desktop environment.
(display screen freezen, the keyboard and mouse also dont work.)
Environment:
- Laptop: Lenovo ThinkPad X1 Carbon Gen3 (2015).
- arch: x86_64.
- GNOME 3 desktop environment.
- GPT patition for UEFI.
- systemd-boot as boot loader.
- no LVM.
- non-encrypted partition
Test steps and test results:
1. linux-4.2.3-1-x86_64.pkg.tar.xz --> Ok.
2. linux-4.2.4-1-x86_64.pkg.tar.xz --> Hang.
3. linux-4.2.5-1-x86_64.pkg.tar.xz --> Hang.
4. linux-4.2.3-1-x86_64.pkg.tar.xz --> Back to 4.2.3, Ok.
5. Manually mount /boot (efiboot ESP) partition (FAT32), reinstall linux-4.2.5, and reboot --> Ok, works again!
But I dont known why/when the /boot partition start not to be mounted anymore,
I remembered it will be automounted at boot a while ago.
Edit:
I found my /etc/fstab contained a "noauto" switch in the /boot mount options that cause it didn't mount issue.
So, removed it and reboot system, then /boot partition mounted at boot successfullly.
Description:
1/ The 201511 official Arch CD fails to boot on my machine, giving a kernel panic (I can provide a photo on demand)
2/ With my normal system, starting with linux 4.2.4-1, the computer freezes on boot, right after "Loading initial ramdisk". No further message is visible, and I can't find anything in any log.
I'm using :
- a dell optiplex 740 (AMD Athlon(tm) 64 X2 Dual Core Processor 5000+)
- a BIOS system
- grub2 as bootloader
- LVM partitions (except for /boot), without encryption
Additional info:
**** tested KO
* linux-4.2.4-1-x86_64.pkg.tar.xz
* linux-4.2.5-1-x86_64.pkg.tar.xz
* 201511 Arch CD
**** tested OK
* linux-4.2.3-1-x86_64.pkg.tar.xz
* linux-lts-4.1.12-1-x86_64.pkg.tar.xz
* 201509 Arch CD
Steps to reproduce:
* upgrade from linux-4.2.3-1-x86_64.pkg.tar.xz to linux-4.2.4-1-x86_64.pkg.tar.xz using pacman.
* or boot the november official CD
Observation:
* When upgrading, the mkinitcpio for the new kernel is finished without an error.
On 4.2.5, I recommend that you try adding the kernel arguments that I did, to see if you get the same output.
Since you're on a BIOS system the arguments should be "debug earlyprintk=vga,keep"
"linux /vmlinuz-linux root=UUID=c28e9c29-af1a-47bd-9952-189fe48c8755 rw quiet"
with
"linux /vmlinuz-linux root=UUID=c28e9c29-af1a-47bd-9952-189fe48c8755 rw debug earlyprintk=vga,keep"
Also, adding debug earlyprintk=vga,keep still isn't displaying any message before the machine hangs.
For those of you who aren't getting any output with the "debug earlyprintk=vga,keep" parameters, try "debug earlyprintk=efi,keep"
> For those of you who aren't getting any output with the "debug earlyprintk=vga,keep" parameters, try "debug earlyprintk=efi,keep"
Thanks for the debuging tips.
Using the latest kernel I can confirm the exact the bahaviour Ross Pokorny (rosspoko) has observed. Note that just like Ross, I'm using a Dell Latitude E6420.
I'd like to compare with #47509 - a Dell Inspiron Mini which freezes since kernel 4.1
With "debug earlyprintk=efi,keep" I was able to get some output on screen but the behavior was one I had never seen : it seemed very slow, the lines appeared one by one every like 3 seconds or so. Is that expected ? After a few minutes the display stopped to change, I waited just one minute and rebooted, didnt see a kernel panic or anything but maybe it was yet to come.
I would also also like to request some help to do a bisection in the kernel, because I tried but I'm stuck. I exposed it here : https://bbs.archlinux.org/viewtopic.php?pid=1580061#p1580061
> https://bugzilla.kernel.org/show_bug.cgi?id=110131
Since this was traced to an upstream regression, let's continue the discussion on the mentioned link. Thanks.
I know you asked that the discussion should be continued on the kernel bugtracker, but I would like to ask you a question without polluting yet the other bugtracker. I was surprised to read you tell that all the people encountering the bug were using UEFI, because I thought that's not my case. Now I'm in doubt and I don't know how to tell precisely.
Sure enough, I didn't get any output using earlyprintk=vga,keep and I did using earlyprintk=efi,keep. But :
* from my web searches, an optiplex 740 seems to be using a BIOS
* I don't have neither a /sys/firmware/efi directory nor a /boot/efi
* I don't have any reference to EFI in dmesg, but I have plenty to BIOS :
[joebar@bureau64:~] 1 $ dmesg | grep "BIOS"
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000cfedffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000cfee0000-0x00000000cfee2fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000cfee3000-0x00000000cfeeffff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000cfef0000-0x00000000cfefffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000f0000000-0x00000000f3ffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000012fffffff] usable
[ 0.000000] SMBIOS 2.4 present.
[ 0.000000] DMI: Dell Inc OptiPlex 740/0UT225, BIOS 1.0.4 10/18/2006
[ 0.000000] AGP: Your BIOS doesn't leave a aperture memory hole
[ 0.000000] AGP: Please enable the IOMMU option in the BIOS setup
[ 0.060000] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 0.400594] mtrr: probably your BIOS does not setup all CPUs.
[ 0.444287] HPET not enabled in BIOS. You might try hpet=force boot option
[ 12.072692] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
[ 13.919732] ATOM BIOS: Maglev
So what do you think, should I contradict you in the kernel bugtracker, or is it some kind of emulation or whatever ? Thanks a lot for your advice.
> I was surprised to read you tell that all the people encountering the bug were using UEFI.
The only precise logs for this error are the ones left by Ross Pokorny (rosspoko). I can confirm that I'm seeing the same error trace, using the same hardware, also using UEFI, after the same kernel point fixing.
Because of this, I can only talk and test this particular scenario.
Regarding the problem you are seeing, you could try to see if that exact patch also introduced that regression, using your architecture. If it's the same, a note should be left saying that the patch affects more than 1 architecture, and if it isn't, a new ticket should be opened. In the latter case if you are interested, we could talk on IRC and preform a bisection on your hardware to identify the problem and raport it.
Kind regards,
Catalin
I've just tested this using the linux kernel from the testing repository and I can confirm that it fixes the problem.
> Linux otp-crapitea-l1 4.4.3-1-ARCH #1 SMP PREEMPT Fri Feb 26 15:09:29 CET 2016 x86_64 GNU/Linux
The moment the kernel version from testing (linux 4.4.3-1) enters the main base repository, this defect can be closed.