FS#75673 - [grub] long(er) start delays after upgrading from 2.06.r261 to 2.06.r297

Attached to Project: Arch Linux
Opened by Marcel Langner (LanMarc77) - Monday, 22 August 2022, 18:14 GMT
Last edited by Christian Hesse (eworm) - Saturday, 03 September 2022, 21:39 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Christian Hesse (eworm)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 6
Private No

Details

Description:
After the upgrade of grub from 2.06.r261 to 2.06.r297 and regenerating the config, as well as updating the binaries using grub-install I noticed a much longer delay before the system starts.
As I have an encrypted root and boot FS (without LVM, plain partitions), the delay happens after the message Slot 0 opens and after selecting the menu entry of which kernel to boot. I assume just before the initial ramdisk is loaded or while. I am uncertain.
The additional delay is around 15s from around 1..2s before the upgrade.

Someone else has the same issue and we already verified that downgrading seems to fix the issue. Also installing on a completely different computer seems to result in the additional delay when upgrading grub ( https://bbs.archlinux.org/viewtopic.php?id=279006 ).

After reading the bug report guidelines I am unsure if this is upstream or not and already apologize beforehand.
This task depends upon

Closed by  Christian Hesse (eworm)
Saturday, 03 September 2022, 21:39 GMT
Reason for closing:  Fixed
Additional comments about closing:  grub 2:2.06.r322.gd9b4638c5-4
Comment by Toolybird (Toolybird) - Wednesday, 24 August 2022, 06:26 GMT
Not many upstream commits between 2.06.r261 and 2.06.r297 so a git bisection [1] should be relatively painless. Feel like tackling it?

There was also a bunch of commits after Arch upgraded, so a build of git trunk might also be worth testing.

[1] https://wiki.archlinux.org/title/Bisecting_bugs_with_Git
Comment by Christian Hesse (eworm) - Wednesday, 24 August 2022, 06:37 GMT
I do not see this... Tested with ext4 on LVM on full encryption.
What filesystem do you use?
Comment by Christian Hesse (eworm) - Wednesday, 24 August 2022, 10:04 GMT
Pushed grub 2:2.06.r322.gd9b4638c5-1 to testing... Want to give that a try?
Comment by Marcel Langner (LanMarc77) - Wednesday, 24 August 2022, 19:04 GMT
I try to coordinate a test run with the other person that has the same issue.
I am using btrfs for the encrypted partition that contains root and /boot. /efi is normal fat32 partition.
Linux runs on a separated disk and the others are all bitlocker windows disks or unencrypted ntfs.
I run Linux from an NVME inside a USB case that is connected to a USB3.2 GEN2 port.
I also took a look at the commits but did not really understood a lot. There was a btrfs commit, but earlier I think and than a ptimer something but it looked it was only fir older i386 systems.

If the other person is not able to test I will starting next week. Then I could organize a test system that can break.
And Thank you!
Comment by Tom Yan (tom.ty89) - Thursday, 25 August 2022, 03:14 GMT
I have two laptops with UEFI firmware from different vendors, namely AMI and Phoenix, and it seems that the issue is only observable on the AMI one. (The two laptops differs drastically though. The AMI one is at the same time a cheap one with 11th gen Celeron and SATA, while the Phoenix one has octo-core Zen 3 and NVMe. Both loads kernel from the FAT32 ESP grub resides on.)
Comment by Tobias Brunner (tobru) - Thursday, 25 August 2022, 14:55 GMT
I tested 2.06.r322.gd9b4638c5-1 on my machine and the slow loading of the kernel image and initramfs is still there. With 2.06.r261.g2f4430cc0-1 loading of these two files is nearly instant.

I'm using ext4 on a LUKS encrypted /boot partition on a Thinkpad X1 Carbon Gen 10 (August 2022).
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 14:24 GMT
Can you guys manually edit your boot files and remove the calls to `fwsetup` which has been included in the configuation?
Comment by Tom Yan (tom.ty89) - Friday, 26 August 2022, 15:43 GMT
I write my own grub.cfg and it has no call to fwsetup. (Besides, doesn't that reboot to the firmware setup anyway? What does it have to do with the issue?)
Comment by Marcel Langner (LanMarc77) - Friday, 26 August 2022, 16:34 GMT
r322 does not change the loading speed for me
also removing fwsetup does not change anything
So I think we can go on testing the next bisecting step
Btw I also have an AMI bios
the other person I was talking about is using OpenZFS with the lts kernel and r322 also does not change his loading speed
Comment by roqz (roqz) - Wednesday, 31 August 2022, 02:43 GMT
I can confirm this issue as well with r322, I have a system with AMI BIOS, and I see a delay of around 15 seconds booting and a lot of hard disk activity through that time (as in motherboard's HDD light blinking very fast, sorry I can't be more precise! Hehe!). Initially I thought that it was related to the issue that required a manual grub-install and grub-mkconfig, but then noticed this opened bug.

Just updated another AMI system but takes less than two seconds in the same stage (as expected on both systems).
Comment by Adrian Czerniak (Abaddon) - Wednesday, 31 August 2022, 07:21 GMT
I also experience delay after I upgraded grub to 2.06.r322.gd9b4638c5-3 on Dell Optiplex 9020. I use ext4 on LUKS encrypted root partition. I also have additional encrypted software RAID but root partition is not on RAID/LVM.
Comment by Christian Hesse (eworm) - Wednesday, 31 August 2022, 07:30 GMT
Looks like we have several systems to reproduce this on... Anyone wants to bisect this, finally?
Comment by k1owas (k1owas) - Wednesday, 31 August 2022, 10:10 GMT
I can also reproduce this. System is in EFI Mode. Boot partition is ext2, root partition is ext4, no LVM or LUKS used for both. Loading kernel as well as initramfs images is taking much longer than before. When enabling debug logging in grub it seems like it is loading the same sectors over and over. ("efidisk.c:602:efidisk: reading 0x40 sectors at the sector 0x101400 from hd1", same msg for two other sectors, then repeating).
Comment by Marcel Langner (LanMarc77) - Wednesday, 31 August 2022, 18:51 GMT
I got a test system yesterday and will install arch and all the needed tools tomorrow.
I have never done bisecting before, but (I think) understood the process. As I have build other AUR packages in the past I am confident I can lift that weight.
Well, I keep you posted.
Comment by Marcel Langner (LanMarc77) - Thursday, 01 September 2022, 00:13 GMT
Got it.
The last fastloading version was r267.g887f98f0d
This was a commit that seems to be part of a bigger code change by Patrick Steinhardt. All individual commits I tried afterwards brought up other errors and did not even start the system.
The last of this commit series was r271.g1df293482 and with this the system started again and was slow loading. I could see some changes in how memory seems to be allocated and some changes in loop structures.
Picture of the problematic commit area attached.
I think this means it is upstream? Then how do we go from here?
Comment by Christian Hesse (eworm) - Thursday, 01 September 2022, 07:12 GMT
Ah, that's good news. Or at least reasonable progress. :)

Well... As I can not reproduce this myself - Are you willing to contact upstream? I think your best bet is via mailing list: https://lists.gnu.org/mailman/listinfo/grub-devel
Comment by Marcel Langner (LanMarc77) - Thursday, 01 September 2022, 09:39 GMT
I will signup for the grub devel list and report the situation there.
If they fix it I report back here.
Comment by Christian Hesse (eworm) - Thursday, 01 September 2022, 09:43 GMT
I am subscribed to the list and will follow. ;)
Comment by Christian Hesse (eworm) - Friday, 02 September 2022, 11:43 GMT Comment by Christian Hesse (eworm) - Friday, 02 September 2022, 12:24 GMT
Please try grub 2:2.06.r322.gd9b4638c5-4 from testing, which should fix / work around the issue.
Comment by Marcel Langner (LanMarc77) - Friday, 02 September 2022, 20:02 GMT
This works on my test system like before the change. I see you have increased the default heap size by 16. I will ask in the forum if also someone else can check this version.
Comment by Marcel Langner (LanMarc77) - Friday, 02 September 2022, 23:24 GMT
I just read what I have written and it is (like language naturally is) ambiguous.
It is booting fast with the version you put in testing. The change I was referring to is the original one, that made it slow.

Loading...