Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#36551 - [gummiboot] Hard freeze with gummiboot-35-1

Attached to Project: Arch Linux
Opened by Kirill Churin (reflexing) - Thursday, 15 August 2013, 19:09 GMT
Last edited by Tom Gundersen (tomegun) - Monday, 23 September 2013, 02:52 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Tobias Powalowski (tpowa)
Tom Gundersen (tomegun)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Upgrading to gummiboot-35-1 causes hard freeze after showing it's menu.
Booting with UEFI shell and downgrading to gummiboot-33-1 (with "gummiboot install" as it complains newer version installed) resolves issue.

It happens on my Asus Sabertooth Z77.

Steps to reproduce:
Install gummiboot 35-1
This task depends upon

Closed by  Tom Gundersen (tomegun)
Monday, 23 September 2013, 02:52 GMT
Reason for closing:  Fixed
Additional comments about closing:  in testing
Comment by Tom Gundersen (tomegun) - Sunday, 18 August 2013, 02:41 GMT
Strange. It works fine for me. Did you try v34?

The only changes since v33 are to the keyboard/menu handling. Does it work if you don't press any key and wait for the default entry to boot? How about disabling the menu (set the timeout to 0), does that still hang?

If everything else fails, any chance you could try to bisect it? git sources: http://cgit.freedesktop.org/gummiboot/
Comment by Tom Gundersen (tomegun) - Sunday, 18 August 2013, 03:30 GMT
Kay (upstream) suggests that (if you are able) you throw in a few Print() statements in the code to try to narrow down where it hangs. Also, if you could post your configuration that would be helpful (both your /boot/loader/loader.conf; and /boot/loader/entries/<the entry that hangs>.conf). Do you have several entries? Several OS's, kernels or even the EFI Shell? Do they all hang?
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 03:39 GMT
It hangs right after showing it's menu. I don't press any keys, it just hangs.

Yes, I have different entries, I will provide configs and try your suggestions a bit later.
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 07:07 GMT
My configs is nothing unusual:

find . -type f -printf "\n%p\n" -exec cat {} \;

./entries/linux.conf
title Arch Linux
linux /vmlinuz-linux
initrd /initramfs-linux.img
options root=/dev/mapper/arch-root init=/usr/lib/systemd/systemd resume=/dev/mapper/arch-swap nomodeset quiet rw
./entries/linux-lts.conf
title Arch Linux LTS
linux /vmlinuz-linux-lts
initrd /initramfs-linux-lts.img
options root=/dev/mapper/arch-root init=/usr/lib/systemd/systemd nomodeset rw
./entries/linux-ck.conf
title Arch Linux
linux /vmlinuz-linux-ck
initrd /initramfs-linux-ck.img
options root=/dev/mapper/arch-root init=/usr/lib/systemd/systemd nomodeset quiet elevator=bfq rw
./loader.conf
timeout 1
default linux
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 07:08 GMT
It boots default entry with "timeout 0"!
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 07:20 GMT
v34 works fine without hangs.
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 07:57 GMT
Bisected, bad commit is right after 34:
bc043b288f0531acf85643dd37e03b7bf344842 is the first bad commit
commit 1bc043b288f0531acf85643dd37e03b7bf344842
Author: Kay Sievers <kay@vrfy.org>
Date: Fri Aug 2 15:08:39 2013 +0200

handle Alt-key in line editor; use 'Q' to quit; use 'P' for print dump

:040000 040000 f170ba76cdb159be931ea44d43d207eea0f2a0a3 f3fb67a4abae8bbe0052886e8f70fb86c4fb01e6 M src

I don't know how to debug it further :)
Comment by Tom Gundersen (tomegun) - Sunday, 18 August 2013, 08:33 GMT
Thanks Kirill, that's very helpful. I'll forward it upstream.
Comment by Kay Sievers (kay) - Sunday, 18 August 2013, 12:08 GMT
Could you try, if disabling the call to the extended key handling in the firmware
causes the issue? With that, only the old key handling should be used, which will
not recognize Alt and CTRL, but all normal keys should work.

Thanks!

--- a/src/efi/gummiboot.c
+++ b/src/efi/gummiboot.c
@@ -353,7 +353,7 @@ static EFI_STATUS key_read(UINT64 *key) {
UINT32 shift = 0;
EFI_STATUS err;

- if (!checked) {
+ if (0 && !checked) {
err = LibLocateProtocol(&EfiSimpleTextInputExProtocolGuid, (VOID **)&TextInputEx);
if (EFI_ERROR(err))
TextInputEx = NULL;
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 12:19 GMT
It works, Kay!
Comment by Kay Sievers (kay) - Sunday, 18 August 2013, 12:48 GMT
Oh, weird. Maybe something goes wrong with the ReadKeyStrokeEx stuff.

If you press 'P' (shift 'p') in the menu, what are the values of:
UEFI version:
firmware vendor:
firmware version:
?
Comment by Kirill Churin (reflexing) - Sunday, 18 August 2013, 13:01 GMT
UEFI version: 2.31
firmware vendor: American Megatrends
firmware version: 4.653
Comment by Kay Sievers (kay) - Monday, 19 August 2013, 00:35 GMT
It works fine here on the 4 different UEFI machines I have, and in QEMU.

I really have no better idea than to add Print() statements. If you
like, it would be great if you could watch what it prints on your box.

There are 3 second sleeps after the print and the console gets messed up,
but maybe it will show if it really gets stuck in the firmware.

Patch and screenshot how it looks here is attached.

Thanks!
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 17:25 GMT
Kay, it just prints TextInputExt=?
   4kay.jpg (1.16 MiB)
Comment by Kay Sievers (kay) - Monday, 19 August 2013, 17:34 GMT
Oh, it prints that multiple times, so it's not dead. Weird!:)

You see the seconds printed, counting down?

After it reached the timeout, does it boot the default entry?

If you press a key, does the debug output something like:
ReadKeyStroke: Success
?
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 17:37 GMT
> You see the seconds printed, counting down?
no

> After it reached the timeout, does it boot the default entry?
no, all I see is on screenshot, repeating

> If you press a key, does the debug output something like: ReadKeyStroke: Success?
no
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 17:42 GMT
Well, it shows success for the first time by itself, I didn't press any keys.
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 17:52 GMT
Well, some more:

After first ReadKeyStrokeEx: success it waits for some second
then prints return
then prints first CallLocateProtocol…CallReadKeystrokeEx
then stops until I press any key

then it cycles as I showed before. Hope it helps.
Comment by Kay Sievers (kay) - Monday, 19 August 2013, 18:08 GMT
Sounds all pretty strange. Sorry for not having a better answer than
to add different Print()s. This patch makes it print the key codes.

The drawn menu is garbled and makes no sense any more, but I
can see all keys I press, and Enter will also boot the entry.

Do you see key codes with multiple key presses, like:
return key=0-0-73
or do you see only:
return: Not Ready
?
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 18:31 GMT
>The drawn menu is garbled and makes no sense any more, but I can see all keys I press, and Enter will also boot the entry.
Enter doesn't boot the entry, actually no matter what I press the result is the same. Nothing.

>Do you see key codes with multiple key presses, like:
>return key=0-0-73
no

>or do you see only:
>return: Not Ready
>?

Yep.

On the screenshot is all I see after boot without pressing any keys. After I press any key it loops with "return: Not Ready" no matter I press.
Comment by Kirill Churin (reflexing) - Monday, 19 August 2013, 19:27 GMT
>The drawn menu is garbled and makes no sense any more, but I can see all keys I press, and Enter will also boot the entry.
Enter doesn't boot the entry, actually no matter what I press the result is the same. Nothing.

>Do you see key codes with multiple key presses, like:
>return key=0-0-73
no

>or do you see only:
>return: Not Ready
>?

Yep.

On the screenshot is all I see after boot without pressing any keys. After I press any key it loops with "return: Not Ready" no matter I press.
Comment by Kay Sievers (kay) - Monday, 19 August 2013, 20:46 GMT
Maybe your firmware wants us to wait for key presses with the same API as we read
the key presses.

This consolidates all key press handling into one function and uses only one or
the other interface, never both at the same time. Thanks!
Comment by Kay Sievers (kay) - Monday, 19 August 2013, 21:12 GMT
I've committed it to upstream -git in the meantime:
http://cgit.freedesktop.org/gummiboot

(Looks cleaner, regardless if this is the reason for the issues on your box.)
Comment by Kirill Churin (reflexing) - Tuesday, 20 August 2013, 04:30 GMT
Installed from GIT master, unfortunately it doesn't respond to any keys as in the original bug report, just stuck in the menu :(
Do you have access to Asus Sabertooth Z77? Maybe it will simplify your work…
Comment by Kay Sievers (kay) - Tuesday, 20 August 2013, 10:02 GMT
Hmm, sad.
No, I don't have any access to ASUS hardware, only 4 different laptops, which all seem to work fine.

I'm running out of ideas, maybe a reset of the input device makes a difference:

+++ b/src/efi/gummiboot.c
@@ -359,6 +359,7 @@ static EFI_STATUS key_read(UINT64 *key, BOOLEAN wait) {
if (EFI_ERROR(err))
TextInputEx = NULL;

+ uefi_call_wrapper(TextInputEx->Reset, 2, TextInputEx, TRUE);
checked = TRUE;
}
Comment by Kirill Churin (reflexing) - Tuesday, 20 August 2013, 16:57 GMT
@Kay what commit should I apply patch on?
Comment by Kay Sievers (kay) - Tuesday, 20 August 2013, 17:41 GMT
I doesn't really matter, on top of -git would be fine.

Btw, there is not maybe a firmware update for your box? Is the version kind of up-to-date?
Comment by Kirill Churin (reflexing) - Tuesday, 20 August 2013, 19:09 GMT
@Kay can't apply it :( said patch is corrupted.

Firmware is up-to-date, on latest version 2003.
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 13:19 GMT
Oh, sorry, I just pasted it in. :)

Ok, another try. Pushed to upstream git. Would be great,
if you could git it a try again. Thanks a lot!
Comment by Øyvind Heggstad (Mr.Elendig) - Wednesday, 21 August 2013, 14:11 GMT
Asus P8Z77-V here, keyboard is still not responding with commit 5d85fe49d5f7c5bf4f0bb0485c25f945b3bd6e57

Just a loop of "return: Not Ready"
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 14:20 GMT
Hmm, there is no such message in upstream git repo, but only
in the patch here.

Are you sure you really run the clean git repo's version?
Comment by Øyvind Heggstad (Mr.Elendig) - Wednesday, 21 August 2013, 14:38 GMT
Sorry, forgot to mention that I added (a slightly modified) print patch on top of the latest commit. Keyboard doesn't repspond without it either though.
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 16:12 GMT
Ah, cool, so it's not that. :)

Here is another try with prints and a fallback. Maybe it reveals what's going wrong. Thanks!
Comment by Kirill Churin (reflexing) - Wednesday, 21 August 2013, 16:26 GMT
@Kay: good news for you, with latest git (without your 21:12 patch) IT WORKS!
Comment by Kirill Churin (reflexing) - Wednesday, 21 August 2013, 16:35 GMT
@Kay: latest GIT with 21:12 patch starts to cycle with InvalidParameter right after boot without any keypresses and no matter I press.
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 16:46 GMT
What? Git works for Kirill but not for Øyvind?

Now I'm confused. :)
Comment by Kirill Churin (reflexing) - Wednesday, 21 August 2013, 16:48 GMT
@Kay we'll wait… maybe he did something wrong
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 18:55 GMT
Weird, that InvalidParameter is returned, it really sounds like
someone messed up the call in the ASUS firmware.

Now that we have at least some ideas what's going on: If you are still willing
to test, this patch should reliably fall back to the old API, but also print
where things went wrong.

Thanks!
Comment by Kay Sievers (kay) - Wednesday, 21 August 2013, 23:34 GMT
Oops, I pushed other stuff, and unintended along with that some
changes that don't work with the above patch any more.

Git has a version now that is supposed to reliably fall back to
the old and working API as soon as the new API returns unexpected
errors.
Comment by Kirill Churin (reflexing) - Thursday, 22 August 2013, 15:57 GMT
@Kay Latest GIT works fine. Can you maybe make a branch with Prints to properly debug the issue?
Comment by Kirill Churin (reflexing) - Thursday, 22 August 2013, 17:52 GMT
@Kay I don't exactly know what I did, but gummiboot (commit 6feb7d971f79e) somehow doesn't count for timeout and doesn't fire up default entry :( I pressed Alt+t maybe…
Comment by Kay Sievers (kay) - Thursday, 22 August 2013, 21:47 GMT
Hmm, at least in theory, it should fall back to the old API as soon as there in an unexpected error received.

For now, I just tagged the current git as 36. If the problem still persists, we need to tune the workaround, I guess.
Comment by Øyvind Heggstad (Mr.Elendig) - Friday, 23 August 2013, 11:46 GMT
Still no luck for me, keyboard remains unresponsive with TAG:36
Comment by Tobias Powalowski (tpowa) - Saturday, 24 August 2013, 11:12 GMT
Kay would you tag .36 on git that our PKGBUILD is still working?
Comment by Kay Sievers (kay) - Saturday, 24 August 2013, 11:15 GMT
Done!
Comment by Tobias Powalowski (tpowa) - Saturday, 24 August 2013, 11:26 GMT
ld: cannot open linker script file /usr/lib/gnuefi/elf_x86_64_efi.lds: No such file or directory
nm: 'src/efi/gummiboot.so': No such file
GEN gummibootx64.efi
objcopy: 'src/efi/gummiboot.so': No such file
Comment by Tobias Powalowski (tpowa) - Saturday, 24 August 2013, 11:32 GMT
Fixed added --with-efi-ldsdir=
Comment by Tobias Powalowski (tpowa) - Saturday, 24 August 2013, 11:41 GMT
Does v36 now fix the issue for you guys?
Comment by Kirill Churin (reflexing) - Saturday, 24 August 2013, 14:33 GMT
@tpowa at least it doesn't freeze.
@Kay:
Something with timeouts:
After the menu fired up, it doesn't count to timeout — just responds to keys.
When I select "EFI Default Loader" in menu (which is the same gummiboot, but another file, I think), it fires up it's menu, and starts to count to the timeout, BUT doesn't respond to keypresses. After the timeout it boots default option.
Comment by Kay Sievers (kay) - Sunday, 25 August 2013, 12:36 GMT Comment by Kirill Churin (reflexing) - Monday, 26 August 2013, 15:59 GMT
@Kay:
Latest GIT: keypresses works, timeout works. :) Thanks!
BUT this problem still exists: When I select "EFI Default Loader" in menu (which is the same gummiboot, but another file, I think), it fires up it's menu, and starts to count to the timeout, BUT doesn't respond to keypresses.

but maybe doesn't matter… :)
Comment by Kay Sievers (kay) - Monday, 26 August 2013, 16:12 GMT
Yeah, the second error seems not detectable. Unless we miss something
in the way we call the key API, which could be, the firmware seems
really broken regarding the extended key API.

According to your debug, this is what happens:

The only indication we get is that get a "key" returned by the firmware
which has all bits set to 0, which should never happen. An that indication
only works one time after a reboot.

If we run the default loader a second time in the already running firmware,
the next calls into the firmware will always tell us that there is no key pressed.

Ideally someone would report that to ASUS, the EFI_SIMPLE_TEXT_INPUT_EX_PROTOCOL
seems not to do the right thing. All other vendors I've tested work just fine.
Comment by Tim L. (ledti) - Monday, 02 September 2013, 08:50 GMT
I actually remember testing gummiboot-35-1 a few weeks ago and running across this issue as well. I reverted to v33 as I figured it was a problem with my configuration (I didn't change anything) and didn't feel like fixing it at the time.

I'm using an ASRock 890GX Pro3, which uses American Megatrends.
Comment by Kirill Churin (reflexing) - Friday, 06 September 2013, 16:40 GMT
@ledti @Mr.Elendig how does gummiboot 36 from [testing] work for you?
Comment by Andreas (misc) - Tuesday, 10 September 2013, 13:46 GMT
Bit late for me to chime in: Since 35 (bisection identified the same "handle Alt-key" commit as cause) gummiboot accepts no keyboard input at all on my computer as well. I've tried 37, no change. Timeout works fine regardless of version.

Mainboard is a Gigabyte G1.Sniper M5 (rev. 1.0), bios F7 (American Megatrends, IIRC).
Comment by Kirill Churin (reflexing) - Tuesday, 10 September 2013, 13:57 GMT
@misc I had some problems regarding installing gummiboot, such as it doesn't actually update it in the EFI boot partition. Solved it with 'sudo gummiboot install' instead of 'sudo gummiboot update', then checking with 'sudo efibootmgr -v'. Maybe it's the issue in your case? Because gummiboot-36 from [testing] works fine for me.
Comment by Andreas (misc) - Tuesday, 10 September 2013, 15:15 GMT
Just tried that (changed both update lines in gummiboot.install to install), no change. Thanks still.

(The only related issue I had was that gummiboot wouldn't let me downgrade for testing. Got around that by "faking" the version number to the highest previously installed.)
Comment by Phil Puryear (philpuryear) - Tuesday, 17 September 2013, 04:18 GMT
After updating to v37, gummiboot is no longer responding to keypresses for me, either. I added a couple of debug Print()s to the code (attached), and it seems that my firmware exposes TextInputEx, but ReadKeyStrokeEx is returning "NotReady" every single time key_read is called, even when I am pressing keys.

I would report this upstream, but I am not sure where gummiboot's bugtracker is.

Motherboard: ASRock P67 Extreme4
Comment by Tim L. (ledti) - Tuesday, 17 September 2013, 04:54 GMT
I can +1 that. I rebooted several times with v37 and basically mashed my keyboard but have been unable to get into gummiboot's menu. I haven't tried the debug.patch, but I'm guessing it's the same for me as well since I'm using an ASRock board.
Comment by Kay Sievers (kay) - Sunday, 22 September 2013, 20:34 GMT
If you have a chance to try the version from the gummiboot git repo, please report the results.

We *try* harder now to work around the broken firmwares, but it's just a guess that this could work ...
Comment by Phil Puryear (philpuryear) - Sunday, 22 September 2013, 23:05 GMT
@kay I just tried out the latest git version and gummiboot is responding properly to keystrokes again, so it appears you guessed right!
Comment by Tom Gundersen (tomegun) - Sunday, 22 September 2013, 23:09 GMT
Great stuff guys, I'll push out a new release as soon as Kay tags it.

-t

Loading...