FS#75701 - grub 2:2.06.r322.gd9b4638c5-1 issue

Attached to Project: Arch Linux
Opened by maderios (maderios) - Friday, 26 August 2022, 11:46 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 27 September 2022, 07:56 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Christian Hesse (eworm)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 23
Private No

Details

Description:
The last grub package 2:2.06.r322.gd9b4638c5-1 is buggy. Unable to boot system after upgrading from 2:2.06.r297.g0c6c1aff2-1 to 2:2.06.r322.gd9b4638c5-1. Issue is present on my two laptops.
On laptop 1, only bios is available
On laptop 2, only "enter password" password dialog (i never saw before such one) is available

Additional info:
* package version(s) 2:2.06.r322.gd9b4638c5-1

Steps to reproduce: try to boot laptop, grub menu doesn't appear but only bios or "enter password" password dialog box.
To solve issue on my two laptops: chroot with livecd -> downgrade to grub 2:2.06.r297.g0c6c1aff2-1
This task depends upon

Closed by  Toolybird (Toolybird)
Tuesday, 27 September 2022, 07:56 GMT
Reason for closing:  Not a bug
Additional comments about closing:  2:2.06.r322.gd9b4638c5-3
Comment by dalto (dalto) - Friday, 26 August 2022, 12:35 GMT
This issue seems to be related to the changes to 30_uefi-firmware. A call to fwsetup was added. Removing this line will make the issue not occur.

Interestingly, the issue also doesn't occur if you call grub-install after updating the grub package.

This issue also doesn't impact all installs.

I am not sure if this is due to a past regression in grub or if the new changes require a new call to grub-install. However, I am not aware of any other time that upgrading grub required a call to grub-install.
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 14:18 GMT
I realized these are two distinct issues. Sorry.

I suspect this and https://bugs.archlinux.org/task/75673 is related.

The issue is (maybe) that the `fwsetup` command isn't included in the cases where the grub.efi image is created on a BIOS system.


https://git.savannah.gnu.org/cgit/grub.git/commit/?id=1e79bbfbda24a08cb856ff30f8b1bec460779b91

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=26031d3b101648352e4e427f04bf69d320088e77

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=0eb684e8bfb0a9d2d42017a354740be25947babe

And the delay is introduced because of the `fwsetup` call. That is my current impression.
Comment by dalto (dalto) - Friday, 26 August 2022, 14:45 GMT
If it is helpful, I can reproduce this 100% of the time in a VM following these steps.

Either install Arch on UEFI machine using version 2.06.r297 of grub or downgrade grub.

1. Run `grub-install`
2. Run `grub-mkconfig`
3. Reboot to ensure everything is working
4. Update grub
5. Run `grub-mkconfig`
6. Reboot and it should be broken

As a side note, if you run `grub-install` between steps 4 & 5, it will work.
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 14:48 GMT
You need to post details on how you boot that VM as well.
Comment by dalto (dalto) - Friday, 26 August 2022, 14:51 GMT
In my case it is a vmware VM. It is booting UEFI. That being said, I am sure I could reproduce with it any VM technology if replicating it with a different solution is needed.

What additional info is needed?
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 14:52 GMT
Ah, I just realized the issue.

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=26031d3b101648352e4e427f04bf69d320088e77

Introduces the new `--is-supported` command in the binary included with `grub-install`. Since people run `grub-mkconfig` only it will include the new flag which just crashes the grub program as the flag is not supported with the inlined binary.

Reverting this patch should fix the issue. Please confirm :)
Comment by dalto (dalto) - Friday, 26 August 2022, 15:01 GMT
Removing --is-supported doesn't resolve the issue for me. Removing the entire call to fwsetup does.

Either way, that commit is what introduces that line so reverting that commit should resolve the issue. Although, I am not sure what the path forward from there would be.
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 15:05 GMT
Interesting. I'm assuming there is a call to `grub_error` which doesn't exit the command cleanly somewhere?

https://git.savannah.gnu.org/cgit/grub.git/commit/?id=1e79bbfbda24a08cb856ff30f8b1bec460779b91
Comment by dalto (dalto) - Friday, 26 August 2022, 15:40 GMT
So according to that commit you posted, the prior versions tried to detect if fwsetup was supported and only registered the command if it was supported.

The newest version makes two changes related to this, it always registers the command fwsetup and then it always calls it for UEFI systems.

So if you had a system where it was not detecting(or failing to detect) that fwsetup was supported it would not be registered in prior versions. But since the new version invokes it 100% of the time it produces the error unless you run grub-install.

That is entirely consistent with the behaviour we are seeing where not everyone is impacted by the issue.

I guess the question becomes, if all that is true, what is the correct path forward?
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 15:47 GMT
Reverting the commit that introduces the call first of all. Christian has been informed already.

Then we need to figure out if upstream regards this as a regression. If they don't see it as a regression, then Arch needs to instruct users to always run `grub-install` after updates, or we need to wrap `grub-install` and produce monolithic binaries like other distros do for Secure Boot. I'll chat with them probably.

The issue is that blanket calling `grub-install` like some derivative distros has done isn't helpfull as it would make systems that uses secure boot and grub unbootable.
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 17:45 GMT
dalto, can you check if you can reproduce the issue with the -2 package release of grub?

https://archlinux.org/packages/testing/x86_64/grub/
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 17:48 GMT
I just realized that this change is going to break grub fo people that has already upgraded with `grub-mkconfig` as it would remove the `--is-supported` switch........

Need to think a bit more how we work around this.
Comment by dalto (dalto) - Friday, 26 August 2022, 18:03 GMT
foxboron, is there a direct link to that package? I am not seeing on the mirrors I have checked.

Or has it been pulled back already?
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 18:05 GMT
You can find it here :)

https://archive.archlinux.org/packages/g/grub/grub-2%3A2.06.r322.gd9b4638c5-2-x86_64.pkg.tar.zst

Feel free to test it, but I suspect systems that has worked around the issue will not get the issue reintroduced as we are removing options from the binary and they have a configuration utilizing it.
Comment by dalto (dalto) - Friday, 26 August 2022, 18:21 GMT
I tested the new package.

The only way I can find to make that fail is to install v322-1. Run both grub-install and grub-mkconfig. Upgrade to v322-2, run grub-install but *not* grub-mkconfig.

Compared to the issues with the current package, that seems like a pretty obscure breakage condition. You have to have v322-1 installed, have run grub-install with v322-1, upgrade to v322-2, run grub-install and not run grub-mkconfig.
Comment by Morten Linderud (Foxboron) - Friday, 26 August 2022, 18:25 GMT
Should I take it that you think it's fine?

I can accompany this package release with a note on arch-dev-public and perpare a news announcement.

Can see what Christian thinks as well
Comment by dalto (dalto) - Friday, 26 August 2022, 18:28 GMT
From my perspective, it seems significantly better than what is out there currently. I tested 3-4 different scenarios and they worked for me except for the one I mentioned above which is pretty obscure.

Of course, those are just my test results.

I would be glad to do more testing if it would help.
Comment by dalto (dalto) - Friday, 26 August 2022, 19:21 GMT
I suppose a different patch that would also work would be to revert the changes to 30_uefi-firmware.

Without those changes grub would work either way because fwsetup wouldn't be called.
Comment by agapito fernandez (agapito) - Friday, 26 August 2022, 19:51 GMT
I found something interesting: If I boot the installation media, wait for it to load and reboot the system without doing anything else, installed grub works fine and I can boot my system fast.
Comment by Christian Hesse (eworm) - Friday, 26 August 2022, 21:00 GMT
Perhaps we should just add an install/upgrade message that installed version and configuration should match, so running `grub-install` and `grub-mkconfig` one after the other is strongly advised.
Are there known issues with latest master (and no commit reverted) that break if both commands are run?
Comment by dalto (dalto) - Friday, 26 August 2022, 21:08 GMT
In my testing running grub-install prior to rebooting resolves the issue.

However, I haven't tested on any machines running secure boot or atypical configurations.

That being said, I think that solution will leave a lot users with broken systems if they don't carefully read all the output from pacman. If that is the approach would it make sense to also add an announcement?

Comment by Christian Hesse (eworm) - Friday, 26 August 2022, 21:18 GMT
Probably an announcement makes sense here, yes.
But we should be sure about the path to go before taking more steps.
Comment by dalto (dalto) - Friday, 26 August 2022, 21:22 GMT
To me, it seems like the path depends partially on what we think grub is likely to do.

If we think that they will treat it as a regression and revert it in subsequent releases, we should probably patch for now so new people aren't impacted.

If we think the opposite then maybe it makes more sense to stay the course. That will put grub users in an awkward spot though as it will be unclear when grub-install will need to be run in the future.
Comment by maderios (maderios) - Saturday, 27 August 2022, 11:53 GMT
I confirm grub 2:2.06.r322.gd9b4638c5-1 works in VM after doing first grub-install then grub-mkconfig
Comment by 014 (014) - Sunday, 28 August 2022, 17:50 GMT
>I confirm grub 2:2.06.r322.gd9b4638c5-1 works in VM after doing first grub-install then grub-mkconfig

The virtual machine distinction there could be important. When I ran the upgrade in a VirtualBox VM with no manual steps, booting did not break. However, when I did the same thing on a Dell laptop with EFI, the boot problem did occur and I had to go through the chroot rescue process.
Comment by dalto (dalto) - Sunday, 28 August 2022, 18:01 GMT
Yes, virtualbox behaves differently than other virtualization scenarios.

At this point the procedures have been done on countless installs on bare metal so we can confirm that the behaviour in vmware and bare metal is the same but VB is different.

Either way, there is plenty of confirmation that calling grub-install resolves this issue.

Comment by Robert (robson) - Sunday, 28 August 2022, 19:06 GMT
I also had a problem with running the system in EFI mode, installing version 2: 2.06.r322.gd9b4638c5-2 solved the problem.
Comment by Philip Müller (philm) - Tuesday, 30 August 2022, 05:34 GMT
So the issue is rather that 'fwsetup' in the past didn't had the function to check if the system supports it or not. It was also mentioned that without the support of the flag on your installed grub version on MBR it may crash grub. So here is what I think:

Before Menu changes

### BEGIN /etc/grub.d/30_uefi-firmware ###
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
### END /etc/grub.d/30_uefi-firmware ###

After Menu changes

### BEGIN /etc/grub.d/30_uefi-firmware ###
fwsetup --is-supported
if [ "$grub_platform" = "efi" -a "$?" = 0 ]; then
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
fi
### END /etc/grub.d/30_uefi-firmware ###

So if your system is UEFI based and support that function but you have an older grub in MBR now the grub.cfg will call 'fwsetup --is-supported' regardless. This may create some boot delay as also mentioned or crash the system or boot into your UEFI firmware. If you change it back to the old snippet you will still have the entry in your grub menu but only execute 'fwsetup' to enter your UEFI firmware as wanted. Reverting 1e79bbfbda24a08cb856ff30f8b1bec460779b91 might avoid the crashing on some systems.Reverting the 30_uefifirmware.in changes of 26031d3b101648352e4e427f04bf69d320088e77 will give you the old menu generation style.

So introducing a flag is fine, however adding a cmd to the 'grub.cfg' to be executed is not so great. So checking if the system supports it can also been done on 'grub-mkconfig' and the entry can be added or not as needed then.
Comment by Philip Müller (philm) - Tuesday, 30 August 2022, 05:41 GMT
More or less it was decided by upstream to check for UEFI firmware on runtime via this: https://git.savannah.gnu.org/cgit/grub.git/commit/util/grub.d/30_uefi-firmware.in?id=0eb684e8bfb0a9d2d42017a354740be25947babe and https://git.savannah.gnu.org/cgit/grub.git/commit/util/grub.d/30_uefi-firmware.in?id=26031d3b101648352e4e427f04bf69d320088e77 tried to optimize that change. So yes I get the approach on wanting to have that entry even if you created the menu on legacy BIOS and switched to UEFI if your hardware supports that, however the whole thing is not thought thru on the case you have an older grub installed on MBR which doesn't support that all.
Comment by Lorenzo Bettini (lorebett) - Tuesday, 30 August 2022, 11:22 GMT
I've been hit by this bug (LG GRAM 16). The solutions of running grub-install before rebooting after the upgrade (if you read this bug on time) or from arch-chroot (if you didn't know about running grub-install before rebooting) both work. However, after that, if I re-order the EFI entries to make another Linux installation first, I cannot boot Arch anymore using "configfile" in the other Linux grub configuration (I guess that's expected because that grub is not compatible anymore with this one).

I seem to understand that adding that flag is only meant as an optimization, which, however, is a small disaster for many users. Or is it useful for anything else? Has grub released for good this new version or are they still deciding?

For the moment, in my computers hit by this bug, I either ignore the grub upgrade or, instead of running grub-install, I revert the contents of /etc/grub.d/30_uefi-firmware to

menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}

(see https://bugs.archlinux.org/task/75701#comment210684) and regenerate the grub.cfg.

On a side note, when upgrading grub, you get a note about something has changed:

===> INFO: /etc/grub.d/30_os-prober changed. See file /var/log/grub-fix-initrd-generation.log.
===> INFO: /etc/grub.d/10_linux changed. See file /var/log/grub-fix-initrd-generation.log.

but /etc/grub.d/30_uefi-firmware is not mentioned at all...
Comment by dalto (dalto) - Tuesday, 30 August 2022, 11:25 GMT
As described above the --is-supported flag isn't the issue(or isn't the only issue). Even if you remove that, the failure still occurs, at least in my testing.

lorebett, those messages aren't coming from the grub package. They are coming from a 3rd party hook.
Comment by Lorenzo Bettini (lorebett) - Tuesday, 30 August 2022, 11:29 GMT
Does the problem occur if you select the "UEFI Firmware Settings," (I haven't tested that entry) or can't you just boot at all (which I seem to understand it's the main issue)?

Do you revert to

menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}

or anything else?
Comment by dalto (dalto) - Tuesday, 30 August 2022, 11:30 GMT
lorebett, that just adds a menu entry. The issue is that there is a call to fwsetup outside of that.
Comment by Lorenzo Bettini (lorebett) - Tuesday, 30 August 2022, 11:37 GMT
in fact I replaced the new entry (actually the generation of the entry)

fwsetup --is-supported
if [ "$grub_platform" = "efi" -a "$?" = 0 ]; then
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
fi

with what was before

menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}

the call is now ONLY in the menu entry as it used to be before.

I don't know if before it was still a problem before when you selected that entry... for sure with this new version the problem is always there, at the point that you cannot even get to the boot menu.
Comment by dalto (dalto) - Tuesday, 30 August 2022, 12:16 GMT
I suppose it is possible that I have been thinking about this backwards.

The reason that removing `is-supported` doesn't fix the problem is because the flag isn't implemented and is ignored which sends the user directly to the firmware.

The reason it works on virtualbox and certain real hardware for some people is the opposite? Because the call to fwsetup fails so things just keep rolling?

Either way, both the core problem and the fix is the same. The EFI stub is missing functionality that the updated grub is calling.
Comment by Philip Müller (philm) - Tuesday, 30 August 2022, 23:33 GMT
Upsteam most likely will drop the direct call of fwsetup in menu.cfg via this: https://lists.gnu.org/archive/html/grub-devel/2022-08/msg00374.html
Comment by Luna Jernberg (bittin1) - Wednesday, 31 August 2022, 12:00 GMT
was also affected and had to chroot my way out
Comment by Morten Linderud (Foxboron) - Thursday, 01 September 2022, 13:57 GMT
First of all; please don't post "me too" comments on bugs. Use the upvote button.

Now...

This issue is only ever relevant if you run `grub-mkconfig` without doing a `grub-install`. The reason why this issue is never really encountered on Arch Linux is because there is nothing that runs these things after kernel updates nor grub updates. So who does these things?

- https://gitlab.com/garuda-linux/packages/stable-pkgbuilds/garuda-hooks/-/blob/4f4da043088f45372877e9154fec36aa3f605d6b/grub-update.hook
- https://gitlab.manjaro.org/packages/core/grub/-/blob/master/update-grub
- https://github.com/endeavouros-team/PKGBUILDS/blob/a50b847e982c62f53f9858eea219927bb6498656/grub-tools/eos-grub-update-after-kernel.hook

So while it's great that people blame Arch for moving forward with a git release, it's quite shitty when dependant distros do not look at their own hooks. Arch users shouldn't be hitting this issue, if at all. While this would probably break most of the derivative distros that updated linux along with grub on older machines without `reboot into firmware` setups.
Comment by Norbert Preining (npreining) - Tuesday, 27 September 2022, 07:56 GMT
I see the very same behaviour with -4 package of grub.

The explanation that running `grub-mkconfig` without doing a `grub-install` is bogus. There are TONS of reasons to do this, and they all boil down to adding/removing kernel options.

This *IS* a valid usage, some options need to be set on the kernel command line, which in turn requires an update to grub.cfg, which is done by grub-mkconfig.

If grub-install call is required, grub-mkconfig or some plugin needs to warn the user - otherwise one can end up in an unbootable system.
Comment by Toolybird (Toolybird) - Tuesday, 27 September 2022, 07:56 GMT
Generally agree with your sentiments, and of course the situation is not ideal. However we already have

- front page notice
- pacman message upon upgrade
- warning in the wiki

Arch has a standard method for recovering an unbootable system [1]. There is little more we can do at this late stage..

[1] https://wiki.archlinux.org/title/General_troubleshooting#Fixing_a_broken_system

Loading...