FS#75701 - grub 2:2.06.r322.gd9b4638c5-1 issue
            Attached to Project:
            Arch Linux
            
Opened by maderios (maderios) - Friday, 26 August 2022, 11:46 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 27 September 2022, 07:56 GMT
          Opened by maderios (maderios) - Friday, 26 August 2022, 11:46 GMT
Last edited by Toolybird (Toolybird) - Tuesday, 27 September 2022, 07:56 GMT
| 
 | Details
                    Description: The last grub package 2:2.06.r322.gd9b4638c5-1 is buggy. Unable to boot system after upgrading from 2:2.06.r297.g0c6c1aff2-1 to 2:2.06.r322.gd9b4638c5-1. Issue is present on my two laptops. On laptop 1, only bios is available On laptop 2, only "enter password" password dialog (i never saw before such one) is available Additional info: * package version(s) 2:2.06.r322.gd9b4638c5-1 Steps to reproduce: try to boot laptop, grub menu doesn't appear but only bios or "enter password" password dialog box. To solve issue on my two laptops: chroot with livecd -> downgrade to grub 2:2.06.r297.g0c6c1aff2-1 | 
              This task depends upon
              
              
            
            
          
            Closed by  Toolybird (Toolybird)
Tuesday, 27 September 2022, 07:56 GMT
Reason for closing: Not a bug
Additional comments about closing: 2:2.06.r322.gd9b4638c5-3
          
        Tuesday, 27 September 2022, 07:56 GMT
Reason for closing: Not a bug
Additional comments about closing: 2:2.06.r322.gd9b4638c5-3
 
                      
Interestingly, the issue also doesn't occur if you call grub-install after updating the grub package.
This issue also doesn't impact all installs.
I am not sure if this is due to a past regression in grub or if the new changes require a new call to grub-install. However, I am not aware of any other time that upgrading grub required a call to grub-install.
I suspect this and https://bugs.archlinux.org/task/75673 is related.
The issue is (maybe) that the `fwsetup` command isn't included in the cases where the grub.efi image is created on a BIOS system.
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=1e79bbfbda24a08cb856ff30f8b1bec460779b91
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=26031d3b101648352e4e427f04bf69d320088e77
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=0eb684e8bfb0a9d2d42017a354740be25947babe
And the delay is introduced because of the `fwsetup` call. That is my current impression.
Either install Arch on UEFI machine using version 2.06.r297 of grub or downgrade grub.
1. Run `grub-install`
2. Run `grub-mkconfig`
3. Reboot to ensure everything is working
4. Update grub
5. Run `grub-mkconfig`
6. Reboot and it should be broken
As a side note, if you run `grub-install` between steps 4 & 5, it will work.
What additional info is needed?
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=26031d3b101648352e4e427f04bf69d320088e77
Introduces the new `--is-supported` command in the binary included with `grub-install`. Since people run `grub-mkconfig` only it will include the new flag which just crashes the grub program as the flag is not supported with the inlined binary.
Reverting this patch should fix the issue. Please confirm :)
Either way, that commit is what introduces that line so reverting that commit should resolve the issue. Although, I am not sure what the path forward from there would be.
https://git.savannah.gnu.org/cgit/grub.git/commit/?id=1e79bbfbda24a08cb856ff30f8b1bec460779b91
The newest version makes two changes related to this, it always registers the command fwsetup and then it always calls it for UEFI systems.
So if you had a system where it was not detecting(or failing to detect) that fwsetup was supported it would not be registered in prior versions. But since the new version invokes it 100% of the time it produces the error unless you run grub-install.
That is entirely consistent with the behaviour we are seeing where not everyone is impacted by the issue.
I guess the question becomes, if all that is true, what is the correct path forward?
Then we need to figure out if upstream regards this as a regression. If they don't see it as a regression, then Arch needs to instruct users to always run `grub-install` after updates, or we need to wrap `grub-install` and produce monolithic binaries like other distros do for Secure Boot. I'll chat with them probably.
The issue is that blanket calling `grub-install` like some derivative distros has done isn't helpfull as it would make systems that uses secure boot and grub unbootable.
https://archlinux.org/packages/testing/x86_64/grub/
Need to think a bit more how we work around this.
Or has it been pulled back already?
https://archive.archlinux.org/packages/g/grub/grub-2%3A2.06.r322.gd9b4638c5-2-x86_64.pkg.tar.zst
Feel free to test it, but I suspect systems that has worked around the issue will not get the issue reintroduced as we are removing options from the binary and they have a configuration utilizing it.
The only way I can find to make that fail is to install v322-1. Run both grub-install and grub-mkconfig. Upgrade to v322-2, run grub-install but *not* grub-mkconfig.
Compared to the issues with the current package, that seems like a pretty obscure breakage condition. You have to have v322-1 installed, have run grub-install with v322-1, upgrade to v322-2, run grub-install and not run grub-mkconfig.
I can accompany this package release with a note on arch-dev-public and perpare a news announcement.
Can see what Christian thinks as well
Of course, those are just my test results.
I would be glad to do more testing if it would help.
Without those changes grub would work either way because fwsetup wouldn't be called.
Are there known issues with latest master (and no commit reverted) that break if both commands are run?
However, I haven't tested on any machines running secure boot or atypical configurations.
That being said, I think that solution will leave a lot users with broken systems if they don't carefully read all the output from pacman. If that is the approach would it make sense to also add an announcement?
But we should be sure about the path to go before taking more steps.
If we think that they will treat it as a regression and revert it in subsequent releases, we should probably patch for now so new people aren't impacted.
If we think the opposite then maybe it makes more sense to stay the course. That will put grub users in an awkward spot though as it will be unclear when grub-install will need to be run in the future.
The virtual machine distinction there could be important. When I ran the upgrade in a VirtualBox VM with no manual steps, booting did not break. However, when I did the same thing on a Dell laptop with EFI, the boot problem did occur and I had to go through the chroot rescue process.
At this point the procedures have been done on countless installs on bare metal so we can confirm that the behaviour in vmware and bare metal is the same but VB is different.
Either way, there is plenty of confirmation that calling grub-install resolves this issue.
Before Menu changes
### BEGIN /etc/grub.d/30_uefi-firmware ###
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
### END /etc/grub.d/30_uefi-firmware ###
After Menu changes
### BEGIN /etc/grub.d/30_uefi-firmware ###
fwsetup --is-supported
if [ "$grub_platform" = "efi" -a "$?" = 0 ]; then
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
fi
### END /etc/grub.d/30_uefi-firmware ###
So if your system is UEFI based and support that function but you have an older grub in MBR now the grub.cfg will call 'fwsetup --is-supported' regardless. This may create some boot delay as also mentioned or crash the system or boot into your UEFI firmware. If you change it back to the old snippet you will still have the entry in your grub menu but only execute 'fwsetup' to enter your UEFI firmware as wanted. Reverting 1e79bbfbda24a08cb856ff30f8b1bec460779b91 might avoid the crashing on some systems.Reverting the 30_uefifirmware.in changes of 26031d3b101648352e4e427f04bf69d320088e77 will give you the old menu generation style.
So introducing a flag is fine, however adding a cmd to the 'grub.cfg' to be executed is not so great. So checking if the system supports it can also been done on 'grub-mkconfig' and the entry can be added or not as needed then.
I seem to understand that adding that flag is only meant as an optimization, which, however, is a small disaster for many users. Or is it useful for anything else? Has grub released for good this new version or are they still deciding?
For the moment, in my computers hit by this bug, I either ignore the grub upgrade or, instead of running grub-install, I revert the contents of /etc/grub.d/30_uefi-firmware to
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
(see https://bugs.archlinux.org/task/75701#comment210684) and regenerate the grub.cfg.
On a side note, when upgrading grub, you get a note about something has changed:
===> INFO: /etc/grub.d/30_os-prober changed. See file /var/log/grub-fix-initrd-generation.log.
===> INFO: /etc/grub.d/10_linux changed. See file /var/log/grub-fix-initrd-generation.log.
but /etc/grub.d/30_uefi-firmware is not mentioned at all...
lorebett, those messages aren't coming from the grub package. They are coming from a 3rd party hook.
Do you revert to
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
or anything else?
fwsetup --is-supported
if [ "$grub_platform" = "efi" -a "$?" = 0 ]; then
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
fi
with what was before
menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
fwsetup
}
the call is now ONLY in the menu entry as it used to be before.
I don't know if before it was still a problem before when you selected that entry... for sure with this new version the problem is always there, at the point that you cannot even get to the boot menu.
The reason that removing `is-supported` doesn't fix the problem is because the flag isn't implemented and is ignored which sends the user directly to the firmware.
The reason it works on virtualbox and certain real hardware for some people is the opposite? Because the call to fwsetup fails so things just keep rolling?
Either way, both the core problem and the fix is the same. The EFI stub is missing functionality that the updated grub is calling.
Now...
This issue is only ever relevant if you run `grub-mkconfig` without doing a `grub-install`. The reason why this issue is never really encountered on Arch Linux is because there is nothing that runs these things after kernel updates nor grub updates. So who does these things?
- https://gitlab.com/garuda-linux/packages/stable-pkgbuilds/garuda-hooks/-/blob/4f4da043088f45372877e9154fec36aa3f605d6b/grub-update.hook
- https://gitlab.manjaro.org/packages/core/grub/-/blob/master/update-grub
- https://github.com/endeavouros-team/PKGBUILDS/blob/a50b847e982c62f53f9858eea219927bb6498656/grub-tools/eos-grub-update-after-kernel.hook
So while it's great that people blame Arch for moving forward with a git release, it's quite shitty when dependant distros do not look at their own hooks. Arch users shouldn't be hitting this issue, if at all. While this would probably break most of the derivative distros that updated linux along with grub on older machines without `reboot into firmware` setups.
The explanation that running `grub-mkconfig` without doing a `grub-install` is bogus. There are TONS of reasons to do this, and they all boil down to adding/removing kernel options.
This *IS* a valid usage, some options need to be set on the kernel command line, which in turn requires an update to grub.cfg, which is done by grub-mkconfig.
If grub-install call is required, grub-mkconfig or some plugin needs to warn the user - otherwise one can end up in an unbootable system.
- front page notice
- pacman message upon upgrade
- warning in the wiki
Arch has a standard method for recovering an unbootable system [1]. There is little more we can do at this late stage..
[1] https://wiki.archlinux.org/title/General_troubleshooting#Fixing_a_broken_system