Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#57511 - [linux] 4.15 hangs during boot with intel-ucode
Attached to Project:
Arch Linux
Opened by Andrea Amorosi (AndreaA) - Wednesday, 14 February 2018, 20:33 GMT
Last edited by Jan Alexander Steffens (heftig) - Thursday, 19 April 2018, 18:04 GMT
Opened by Andrea Amorosi (AndreaA) - Wednesday, 14 February 2018, 20:33 GMT
Last edited by Jan Alexander Steffens (heftig) - Thursday, 19 April 2018, 18:04 GMT
|
DetailsDescription:
All the 4.15 kernels up to now released (at the moment 4.15.2-2) hang during boot on my asus 2752vx (intel i7-6700hq with nvidia 950m and optimus prime) if intel-ucode .img is passed as a parameter to initrd. These are the lines in grub.cfg: linux /vmlinuz-linux root=UUID=74296c4e-84df-4eda-87a1-09be9d8e114b rw pci=noaer nvidia-drm.modeset=1 initcall_debug no_console_suspend ignore_loglevel dyndbg="file suspend.c +p" echo 'Caricamento ramdisk iniziale...' initrd /intel-ucode.img /initramfs-linux.img At boot time the system writes the 'Caricamento ramdisk iniziale...' on the screen and that line remains on the video, but the system is completely unresponsive and adding the earlyprintk=efi,keep to the boot command I've discovered that 4.15.xxx kernels hang at the following line: ... x86: Booting SMP configuration ... Adding acpi=off let the boot precess to proceed a bit and a lot of messages are displayed, but then it hangs again when the Xsystem and sddm should be called. With the original kernel parameters (without acpi=off) and the initrd line modified in this way: initrd /initramfs-linux.img the system works perfectly. Reverting to the previous intel-ucode version does not solve the issue. Linux is installed on this pc since March 2016 and up to now it has worked correctly. |
This task depends upon
Closed by Jan Alexander Steffens (heftig)
Thursday, 19 April 2018, 18:04 GMT
Reason for closing: Fixed
Thursday, 19 April 2018, 18:04 GMT
Reason for closing: Fixed
Please let me know if you want me to try to revert again intel-ucode with the 4.15.3 or to revert to an older intel-ucode package.
Edit:
4.14.18+ not 4.14.8+
As there were three/four version of 4.15.1 all with config changes it will narrow it down a lot if you can find which exact package update that introduces the bug.
If you do not have the versions in your package cache https://archive.archlinux.org/packages/l/linux/ has the packages except 4.15.1-4 not sure if that was ever actually released.
linux-4.15.1-2-x86_64.pkg.tar.xz
linux-4.15.2-2-x86_64.pkg.tar.xz
linux-4.15.2-2-x86_64.pkg.tar.xz
linux-4.15.3-2-x86_64.pkg.tar.xz
do not boot if intel-ucode is used.
The last linux package (excluding -lts ones) that works correctly is linux-4.14.15-1-x86_64.pkg.tar.xz
If I reboot from a working kernel (lts with intel-ucode or not lts without intel-ucode) to load one of these bugged kernel using intel-ucode, it works correctly only the first time, but then if I reboot (or poweroff) it doesn't work anymore hanging at boot.
Then if I try to load a working kernel, the first time it doesn't load, but after that (forcing a poweroff) it starts working again.
It seems to me (but I don't know if it is possible with these complex pc) as if something dirty is put in Bios or Efi using the 4.15-x and intel-ucode and that two reload of a working kernel are needed to correct that.
Maybe the CPU tries to update the ucode and something goes wrong?
Opening a general bug report upstream documenting the issue started between 4.14 and 4.15 I would expect no response or you will be requested to perform a bisection anyway.
I can help with the bisection but it will probably clutter this bug report which is why I suggested opening a forum thread for it.
Dzen_Python also needs to do a separate bisection unless it turns out just a config change in the different 4.15.1 releases triggered the issue on that system.
4.15.3-2 during boot shows the warning you can see attached that does not happen in 4.15.3-1
https://bugs.archlinux.org/task/57578
sorry for the noise
It keeps hanging very early if booted with intel-ucode.
Intel Processor Microcode Package for Linux
20180312 Release
All the 4.15.x not lts versions show the issue.
What I meant to say is that the issue doesn't occur for me on the LTS kernel, which is what I now boot, until 4.16 comes out and I give that a go. Currently at 4.14.29-1-lts, no issues so far. I've booted a whole bunch of the 4.14 LTS series as 4.15 has been wonky since day one, haven't had a single issue with the LTS builds.
Until someone affected locates the cause and reports it to the relevant upstream for resolution the issue will remain unresolved.
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
to what was previously used
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
solves the issue.
I don't know why the issue does appear only if the intel-ucode is loaded, so maybe it is an issue given by trying to use 1000Hz setting with an i7-6700hq with updated intel-ucode.
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
to these ones
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
If intel-ucode is not used also this "test" 4.14 kernel version works correctly.
So the problem is not related to a specific kernel version but to CONFIG_HZ_1000=y and its interaction with intel-ucode and maybe my hardware.
Can someone else confirm this?
From what I've been able to find about CONFIG_HZ, which isn't much, it seems that HZ 1000 or so is what desktop systems are using but for mobile devices most things I've found recommend or use HZ 300.
> The timer sets the frequency that an interrupt wakes the kernel up so it can see if it has to do anything. 100Hz (every 10 ms) is traditional. Recently higher rates have been introduced. The more often the kernel wakes up, the lower the latency when it needs to do something. Thats the plus side. The down side is that there is more wasted time when there is nothing to do.
What I'm failing to underrstand is how this would possibly be interacting with Intel microcode updates. It also doesn't seem to affect that many people, I would expect this thread to be a lot busier in that case.
What CPUs do people have that are observing this issue? I'm on a Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz (according to /proc/cpuinfo).
Apr 16 11:14:02 archlinux kernel: DMAR: [INTR-REMAP] Request device [f0:1f.0] fault index 0 [fault reason 37] Blocked a compatibility format interrupt request
After some searching people suggested this was related to Intel IOMMU being turned on by default, but it just not being quite that safe to turn on by default. Following its suggestions I appended intel_iommu=off to my kernel boot line and now everything works. I've rebooted over 5 times now, the error doesn't show in the logs and the system boots, is fully responsive etc.
I'm not sure if this is the same bug everyone else is seeing, or if it happened to manifest in a similar enough way. Either way, have a look at your journalctl log for a 4.15 boot, see if this errors shows up and/or try booting with intel_iommu=off and share the results.
No hangs booting with intel-ucode and this is the output of dmesg | grep microcode:
[ 0.464853] calling save_microcode_in_initrd+0x0/0xa4 @ 1
[ 0.464853] initcall save_microcode_in_initrd+0x0/0xa4 returned 0 after 0 usecs
[ 0.810947] calling microcode_init+0x0/0x1fb @ 1
[ 0.811554] microcode: sig=0x506e3, pf=0x20, revision=0xc2
[ 0.812624] microcode: Microcode Update Driver: v2.2.
[ 0.812626] initcall microcode_init+0x0/0x1fb returned 0 after 1064 usecs