FS#33745 - [linux] 3.7.x - 3.12.x Unable to boot using EFI

Attached to Project: Arch Linux
Opened by Shawn Tan (shawntan) - Thursday, 07 February 2013, 05:22 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 13 August 2014, 07:17 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 56
Private No

Details

Description:

Unable to boot using linux-3.7.6 kernel using EFI boot with rEFInd.
After selection of kernel at the refind menu, the kernel begins to load, but stops after displaying the parameters the kernel is being loaded with.

Boots normally after switching back to linux-3.7.5 kernel and initramfs image.


Additional info:
* package version(s)

Name : linux
Version : 3.7.6-1

Name : refind-efi
Version : 0.6.7-1

* config and/or log files etc.


Steps to reproduce:

Use refind bootloader with linux 3.7.6 and initramfs.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 13 August 2014, 07:17 GMT
Reason for closing:  Fixed
Additional comments about closing:  3.16
Comment by Evangelos Foutras (foutrelis) - Thursday, 07 February 2013, 12:37 GMT
Can you please check if it's the same issue as  FS#33721 ?

Does the following combination work?

linux 3.7.6-1
refind-efi 0.6.6-1
Comment by Shawn Tan (shawntan) - Thursday, 07 February 2013, 20:38 GMT
I reinstalled the older version of refind-efi (0.6.6) using

efibootmgr -c -g -d /dev/sdX -p Y -w -L "rEFInd" -l '\EFI\refind\refind_<arch>.efi'

This didn't work either.

I tried installing the linux-3.7.6 package, but booting with the 3.7.5 kernel and image.
This booted, but the boot process stopped when attempting to mount /boot/efi, which was a vfat partition.

I suspect some kind of module problem?
Comment by Evangelos Foutras (foutrelis) - Thursday, 07 February 2013, 23:46 GMT
I was unable to reproduce the issue on a VirtualBox VM (with EFI enabled); linux 3.7.6 booted successfully using refind-efi 0.6.7.

I guess you could try rebuilding linux 3.7.6 with the following config change reverted:

https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=5cd6ba89c8dd1672bc68b5336aaff8e7b6d6d61e

Tobias Powalowski (kernel and refind-efi maintainer) might be able to provide further troubleshooting advice when he gets back. (He's currently away for a few days.)
Comment by Shawn Tan (shawntan) - Friday, 08 February 2013, 02:45 GMT
I just tried kernel 3.7.6 recompiled with EFI_VARS=y, and it is still unable to boot.

I am using the Lenovo X1 Carbon by the way, I'm not sure if that helps.
Comment by jwbirdsong (jwbirdsong) - Friday, 08 February 2013, 03:21 GMT
If I'm not very much mistaken that has intel integrated and uses i915 kernel module. IF that's the case check out https://bugs.archlinux.org/task/33062 and make sure to follow the link to FDO. Simply unpluging external monitor enabled mine to boot. (couple of work arounds are mention).

if I AM mistaken about graphic card please ignore and excuse the noise.

EDIT-- just as a point of fact... the bug didn't hit me until 3.7.4 (ish) (i believe)
Comment by Evangelos Foutras (foutrelis) - Saturday, 09 February 2013, 00:35 GMT
@Shawn: I meant to recompile the kernel with CONFIG_EFI_VARS=m (that was changed to CONFIG_EFI_VARS=y in the 3.7.6-1 package).

(Although, most likely, that's not causing this problem.)
Comment by Evangelos Foutras (foutrelis) - Saturday, 09 February 2013, 01:19 GMT
By the way, I've come across older reports of non-booting systems with previous 3.7.x kernels:

http://archlinux.2023198.n4.nabble.com/rEFInd-0-6-4-linux-3-7-2-1-fail-to-boot-td4683026.html

Never seen a valid fix being discussed, only that the issue magically disappears after upgrading to some newer kernel version. :\
Comment by Shawn Tan (shawntan) - Saturday, 09 February 2013, 08:24 GMT
@foutrelis Oops. But I'm using systemd, so it would make sense to compile it into the kernel right? How do I check what differences have been made to the .config for different arch releases of the kernel?

Crossing my fingers and hoping the stable release will boot for me. Heh.
Comment by George Brooke (GeorgeB) - Saturday, 09 February 2013, 14:38 GMT
I've just hit this on a Thinkpad X230 with the same package versions. I saw this as well with the first 3.7.x version to come into [core] but it disappeared with the next minor release before I had the time to report the bug.
I tried manually booting the kernel from the EFI shell and encountered the same freeze so to me it looks more like an efistub problem than a rEFInd specific one.
I belive that the boot parameters are printed by rEFInd and not the kernel as was suggested in the description, so perhaps the kernel does not even begin to boot.
Comment by Zach Wick (zwick) - Sunday, 10 February 2013, 13:01 GMT
I also have hit this bug doing a fresh install on a Thinkpad X230. At first I assumed that it was the rEFInd bug mentioned, but the same issue occurs when using Gummiboot (booting hangs after getting the boot params). Being a fresh install, I tried using the Arch Rollback Machine to find older packages of the kernel and of rEFInd, but was unsuccessful in applying them cleanly.
Comment by btby (btby) - Monday, 11 February 2013, 07:08 GMT
I've encountered the same issue with my ThinkPad T430, Intel graphics card, gummiboot and btrfs root. After upgrading to 3.7.6, gummiboot menu is followed by the black screen and the only solution that worked was downgrading to 3.7.5 (3.6.11 also works).
Comment by Evangelos Foutras (foutrelis) - Monday, 11 February 2013, 19:23 GMT
Linux 3.7.7 will appear in [testing] shortly. Doubt that fixes the issue but it's worth a try.
Comment by Shawn Tan (shawntan) - Tuesday, 12 February 2013, 04:46 GMT
Upgrading to 3.7.7 fixes the issue for me. Not sure if it will stay the same for future upgrades though.
Comment by Kent Smith (coat) - Tuesday, 12 February 2013, 05:30 GMT
3.7.7 also fixes the issue for me.
Comment by btby (btby) - Tuesday, 12 February 2013, 05:41 GMT
Confirmed, after upgrading to 3.7.7 from [testing] the issue is resolved for my setup.
Comment by George Brooke (GeorgeB) - Saturday, 16 February 2013, 16:32 GMT
And this problem reoccurs exactly the same for me with 3.7.8.
Comment by Steve Nims (sjnims) - Sunday, 17 February 2013, 18:22 GMT
I can confirm the same happened with my Toshiba Satellite - worked with linux kernel 3.7.7, upgraded to 3.7.8 and boot hanged, chrooted from the live cd and downgraded back to 3.7.7 and restarted...worked fine after that. Using rEFInd.
Comment by Steve Nims (sjnims) - Monday, 18 February 2013, 17:31 GMT
UPDATE: Tested 3.7.9 and same thing happened, rEFInd gets to initial boot screen, then hangs...downgraded back to 3.7.7 and worked again
Comment by Kent Smith (coat) - Wednesday, 20 February 2013, 02:57 GMT
Upgraded to 3.7.9, same issue as 3.7.6. Downgraded to 3.7.7 and I can now boot normally again. I am booting using efi stub with an entry in efibootmgr
Comment by Shawn Tan (shawntan) - Monday, 25 February 2013, 07:53 GMT
Confirmed 3.7.9 does not boot for me as well.
Comment by Tobias Powalowski (tpowa) - Wednesday, 27 February 2013, 11:35 GMT
Status on 3.8?
Comment by George Brooke (GeorgeB) - Thursday, 28 February 2013, 12:44 GMT
3.8.0-2-ARCH from [testing] boots fine for me here.
Comment by Shawn Tan (shawntan) - Friday, 01 March 2013, 02:08 GMT
3.8.1-1-ARCH boots for me.
Comment by Kent Smith (coat) - Friday, 01 March 2013, 03:24 GMT
3.8.1-1-ARCH boots fine for me now.
Comment by btby (btby) - Friday, 01 March 2013, 07:17 GMT
No boot issues with 3.8.1-1-ARCH here.
Comment by Shawn Tan (shawntan) - Tuesday, 19 March 2013, 19:50 GMT
Same problem arises in 3.8.3-2-ARCH
Comment by Evangelos Foutras (foutrelis) - Friday, 22 March 2013, 03:31 GMT
The developer of rEFInd (srs5694) might be onto something here:

https://bbs.archlinux.org/viewtopic.php?pid=1246697#p1246697
Comment by Michael Chou (MichaelChou) - Tuesday, 26 March 2013, 02:29 GMT
same problem in 3.8.4-1
i have to use grub2 instead
Comment by cfr (cfr42) - Tuesday, 26 March 2013, 03:38 GMT
Appeared for first time for me with 3.8.3 (all 3.7.* booted fine); persists in 3.8.4-1.
Comment by Steve Nims (sjnims) - Tuesday, 26 March 2013, 03:51 GMT
I haven't had an issue at all since 3.8.x went stable into core, currently running 3.8.4-1-ARCH. Are you folks remembering to copy over the new images created to /boot/efi/EFI/arch once the kernel updates?
Comment by Michael Chou (MichaelChou) - Tuesday, 26 March 2013, 04:12 GMT
to Steve Nims (sjnims):
I use the auto-detect feature of refind, so no need to copy the images. I use 3.8.4-1 in [testing] thought, haven't try 3.8.4-1 in [core].
Comment by mjb (mjb) - Tuesday, 26 March 2013, 08:05 GMT
Just here to confirm problem for 3.8.3 and 3.8.4-1 (all 3.7.* booted fine). Shouldn't this have a higher priority, as the installation guide will fail for this new kernels? And yes: I did copy the kernel images as usual. https://bbs.archlinux.org/viewtopic.php?pid=1249569#p1249569
Comment by cfr (cfr42) - Tuesday, 26 March 2013, 23:36 GMT
@sjnims,

I use the ext4 driver with the kernel and image on /boot so no need to copy them over to the ESP. (I don't use auto-detect but it is the same idea.)

I could ask, are you sure that you were copying the new images over before since all 3.7.* booted just fine for me...
Comment by Guillermo Vaya (driadan) - Thursday, 28 March 2013, 14:39 GMT
I have the same problem.
If I insert the arch cd/usb and choose a uefi shell v1, I can boot. Trying uefi shell v2 results in freeze, though
Comment by cfr (cfr42) - Friday, 29 March 2013, 00:07 GMT
@mjb,
Did you vote? This bug has only 2 votes despite the number of comments. I don't know if votes mean much but if they do, it is a shame people are not using them!

It is the severity rating that really puzzles me. How can something which prevents booting be "low" severity? I understand it might not be highest (there are other ways to boot) but "low" seems odd.
Comment by mjb (mjb) - Friday, 29 March 2013, 09:23 GMT
@cfr, yes I did. Seems odd to me, too. It seems like it prevents new users from installing and old users from updating the kernel.
Comment by Guillermo Vaya (driadan) - Friday, 29 March 2013, 09:55 GMT
@mjb @cfr,
probably because there are also
https://bugs.archlinux.org/index.php?do=details&action=details.addvote&task_id=34358 (close, but not the same)
https://bugs.archlinux.org/task/34401 (gummiboot related, but seems like the same problem)

and those are marked as critical, so they probably draw all the attention from people experiencing the problems
Comment by Shawn Tan (shawntan) - Thursday, 04 April 2013, 21:23 GMT
Same problem arises in 3.8.5-1-ARCH

This is with refind 0.6.8-1, though it's probably not related to refind
Comment by Matthias Kleemann (mkleemann) - Friday, 05 April 2013, 14:53 GMT
I can confirm the last comment: With linux-3.7.10-1 my Macbook boots with refind 0.6.8-1. Any tries with a linux-3.8.x failes.

Did anything change with the EFI stub? I use auto-recognition by refind.
Comment by Steve Nims (sjnims) - Friday, 05 April 2013, 15:40 GMT
I thought I was using the auto-recognition too, but have to manually copy over the files every kernel update. More than likely, something I set up is wrong with the auto-recognition, however I haven't had a problem since 3.8.x.
Comment by Matthias Kleemann (mkleemann) - Sunday, 07 April 2013, 16:22 GMT
I do not copy any files, but refind. The kernel I leave as is in /boot. There is only the refind-linux.conf.
Comment by cfr (cfr42) - Monday, 08 April 2013, 02:41 GMT
No 3.8.* works for me.

The developer of rEFInd suspects a bug in the way some Arch developers compile the kernel which manifested given various outside conditions or, possibly, a bug in the compiler being used to compile the kernel.

The trouble with this bug is that it almost certainly isn't a rEFInd bug even though there is a rEFInd work around for some situations.
Comment by cfr (cfr42) - Tuesday, 09 April 2013, 02:11 GMT
Incidentally, the alternative rEFInd binary doesn't work for me either.

Neither does updating my BIOS to the latest available...

Still fails for 3.8.8-2...
Comment by Tobias Powalowski (tpowa) - Thursday, 23 May 2013, 19:50 GMT
status on 3.9?
Comment by Shawn Tan (shawntan) - Thursday, 23 May 2013, 19:58 GMT
Same issue with 3.9.3-1
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 25 May 2013, 07:57 GMT
did 3.9.2-1 work for you?
Comment by Shawn Tan (shawntan) - Saturday, 25 May 2013, 07:59 GMT
Yes. It's weird, it seems like just odd numbered versions.
Comment by Shawn Tan (shawntan) - Monday, 27 May 2013, 20:24 GMT
Okay, not really. 3.9.4-1 isn't working for me. Same error.
Comment by cfr (cfr42) - Thursday, 30 May 2013, 22:40 GMT
Fails for all versions - odd or not - in my case.
Comment by Jan Alexander Steffens (heftig) - Friday, 07 June 2013, 22:52 GMT
This is really odd. Some kernel builds work, some don't, and I can't find a pattern.

Thinkpad x220, gummiboot or direct EFISTUB.
Comment by Shawn Tan (shawntan) - Friday, 14 June 2013, 09:23 GMT
3.9.5-1 doesn't boot as well.
Comment by Jan Alexander Steffens (heftig) - Saturday, 15 June 2013, 13:06 GMT
I enabled the diagnostic boot, and now I get an actual error message. It seems with the "quiet" boot (Thinkpad bootsplash) enabled, efi_printk() does nothing.

"Failed to alloc lowmem for boot params"
Comment by Ulf Winkelvos (uwinkelvos) - Sunday, 16 June 2013, 02:29 GMT
i guess that https://bugs.archlinux.org/task/34401 is the same issue as this one. Did some research on the bug and i am pretty sure its some strange offset/aligment problem. It's defintely related to the kernel version string. I'll play around with CONFIG_LOCALVERSION="-ARCH" and CONFIG_LOCALVERSION_AUTO=y to see if that helps.
Comment by Ulf Winkelvos (uwinkelvos) - Sunday, 16 June 2013, 06:04 GMT
Well this sucks. Playing around with those kernel version string parameters worked for me, when i bisected another kernel bug. 3.10-rc1 booted fine. 3.9-rc5 did not. So i managed to set the version string to 3.9-rc2, which i knew was working, via makefile and config, and the kernel booted just fine. But no luck for the stock kernel so far.

Is there an upstream bug report? This does not seem to be an arch only problem: http://forums.gentoo.org/viewtopic-p-7287332.html and according to this: http://www.rodsbooks.com/efi-bootloaders/efistub.html (7th bullet point) the efi-stub main developer has been informed. But i think we might need a place to gather some infos. (gcc version, system, bootloader, etc)
Comment by Steve Nims (sjnims) - Monday, 17 June 2013, 11:03 GMT
On a fresh install, I worked backwards from 3.9.6 until I was able to boot successfully...finally worked with 3.8.11-1.
Comment by Matthias Kleemann (mkleemann) - Monday, 17 June 2013, 16:52 GMT
Same here, still (64bit). On my 2007 Macbook (EFIv1) it ceased to work at kernels later than 3.7.10-1. Both show the same behaviour, hanging while booting.
Comment by Keshav Amburay (the.ridikulus.rat) - Tuesday, 18 June 2013, 12:29 GMT
Can you guys try booting with "efi_no_storage_paranoia" or https://bugs.archlinux.org/task/34641#comment111365 ?
Comment by Jan Alexander Steffens (heftig) - Tuesday, 18 June 2013, 13:20 GMT
I doubt that will help, since AFAIK the kernel doesn't try to store EFI variables while booting.
Comment by Ulf Winkelvos (uwinkelvos) - Tuesday, 18 June 2013, 23:37 GMT
well no, that does not help. Just resized my boot partition from 256 MiB to 512 MiB, as this is supposed to be the min fat32 partiton size and that does not work either. I am pretty clueless.
Comment by Jan Alexander Steffens (heftig) - Monday, 24 June 2013, 05:13 GMT
I recently ran into a similar problem with Grub (hangs early when trying to boot the kernel), vanished after rebuilding Grub with an old gcc. I wonder, does the same apply to this problem?

To clarify, try building the kernel with these packages installed (preferably in a chroot, so you don't mess up your install):

http://arm.konnichi.com/core/os/x86_64/cloog-0.17.0-2-x86_64.pkg.tar.xz
http://arm.konnichi.com/core/os/x86_64/gcc-4.7.1-1-x86_64.pkg.tar.xz
http://arm.konnichi.com/core/os/x86_64/gcc-libs-4.7.1-1-x86_64.pkg.tar.xz
http://arm.konnichi.com/core/os/x86_64/libmpc-0.9-2-x86_64.pkg.tar.xz
http://arm.konnichi.com/core/os/x86_64/libtool-2.4.2-6-x86_64.pkg.tar.xz
http://arm.konnichi.com/core/os/x86_64/ppl-0.12.1-1-x86_64.pkg.tar.xz

https://bugs.archlinux.org/task/35909
Comment by mjb (mjb) - Monday, 24 June 2013, 05:37 GMT
- I now have this problem with all possible bootloaders (grub, refind, gummiboot). As my BIOS doesn't have a legacy mode I can't update nor can I reinstall.

- This is my forum post about GRUB: https://bbs.archlinux.org/viewtopic.php?id=164101

- When I try to boot from the new install medium (13/06), which uses gummiboot, I get the error "Can't load initrd".

- Has anyone tried to downgrade GRUB? (instead of recompiling it?)
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 26 June 2013, 02:35 GMT
a clean chrooted 3.9.7-1-ARCH build with the gcc toolchain that jan supplied works for me with gummiboot, while stock 3.9.7-1 does not.
Comment by Shawn Tan (shawntan) - Friday, 05 July 2013, 18:53 GMT
linux-3.10-1 cannot boot.
Comment by Rich (snugglej) - Thursday, 11 July 2013, 01:36 GMT
Just wanted to see if anyone else is still having this problem? I'm unable to boot 3.9.9-1 (core repo). I have tried doing some testing but don't really know how to do testing for a kernel issue. My machine will not load this kernel at all. I have tried gummiboot, grub, booting into efi-shell, and also using one of the iso's and none of them have worked.

Any testing or things I can try? Currently I'm running 3.9.7 kernel

Running a Lenovo X230, with 3rd gen Intel i5 and UEFI is enabled and set as only boot (No Legacy).
Comment by mjb (mjb) - Thursday, 11 July 2013, 05:47 GMT
Yep, still having the problem with all bootloaders. In the meantime this has become a major problem for me: Neither can I update (at all!) nor can I reinstall (as the install media won't boot). I'm totally clueless.
Comment by Barry Hoffman (skyhog99) - Friday, 12 July 2013, 18:51 GMT
It's hardware related. I can reproduce the problem with kernel version 3.9.4 on an ASUS K55N laptop when booting natively with EFISTUB or gummiboot. ELILO does NOT have the problem; it boots fine. Also, the exact same distribution boots fine with all these methods (EFISTUB, gummiboot, ELILO, direct from the EFI Shell) on a Lenovo N580. There must be some hardware unique to the ASUS laptop that it's hanging on but it doesn't make sense that ELILO would be able to overcome it. Must be something very basic: e.g. keyboard, monitor?
Comment by Jan Alexander Steffens (heftig) - Friday, 12 July 2013, 19:00 GMT
The problem is in the kernel's EFI stub, which gets used when the kernel is booted as an EFI executable. Direct EFI booting, rEFInd, and gummiboot make use of this. ELILO and Grub do not, and boot the kernel in real mode.

It's likely that this is buggy firmware, which the EFI stub will have to be patched to work around.
Comment by Keshav Amburay (the.ridikulus.rat) - Saturday, 13 July 2013, 05:42 GMT
Can you guys try syslinux 6.xx from [testing] or https://aur.archlinux.org/packages/syslinux-firmware-git/ (builds from latest git). The [testing] pkg follows traditional linux boot protocol wihc is the way ELILO or GRUB boot the kernel, while syslinux-firmware-git uses EFI Handover Protocol to boot the kernel. EFI Handover Protocol is a subset of EFISTUB. It might be helpful to know whether the EFISTUB loader itself is causing the issue or even using EFI Handover Protocol leads to boot failure.
Comment by Barry Hoffman (skyhog99) - Saturday, 13 July 2013, 23:15 GMT
I just tried syslinux 6.01 on my ASUS K55N laptop and it also hangs. It gives an error "bzImage version 0x20c unsupported / Booting kernel failed: bad file number". This is the same error it gives on the previous versions of syslinux.efi
Comment by Ulf Winkelvos (uwinkelvos) - Tuesday, 16 July 2013, 01:54 GMT
3.9.9 was broken
3.10.1 is good again
Comment by Barry Hoffman (skyhog99) - Tuesday, 16 July 2013, 21:03 GMT
Just tried 3.10.1 and it's still broken for me on my K55N laptop but works fine on my Lenovo N580. Same behavior as on 3.9.4
Comment by Shawn Tan (shawntan) - Thursday, 18 July 2013, 14:40 GMT
3.10.1-1 booted for me.
Comment by Andrey Yankin (andrey013) - Friday, 26 July 2013, 19:50 GMT
Hi. I've never had this issue before.
Previous versions of kernel worked OK for me.

But after recent update(linux 3.10.2-1) my laptop(Lenovo G780) started to experience the same symptom.
REFInd begins to load the kernel, prints the parameters the kernel is being loaded with and then stops.
I've tried different versions of rEFInd (6.8 - 7.1) - problem is the same.

The most _interesting_ part is:

The problem disappears if an Archlinux bootable USB flash drive is connected (and rEFInd sees four new options from it).
System boots fine when there are other options for boot!

Now I boot the laptop with the USB drive in.
And looking forward for the new kernel release :)

1 What's going on?
2 Is this the same problem? Should I add a new task?
Comment by Andrey Yankin (andrey013) - Saturday, 27 July 2013, 12:42 GMT
linux 3.10.3-1 works fine again. It must have been other bug maybe.
Comment by Ulf Winkelvos (uwinkelvos) - Tuesday, 30 July 2013, 22:35 GMT
3.10.3-1-ARCH is good here, too.
Comment by Matthias Kleemann (mkleemann) - Friday, 02 August 2013, 19:23 GMT
Works for my x64 now too.
Comment by Robin Kreis (rkreis) - Wednesday, 14 August 2013, 07:44 GMT
More info:  FS#36519 . Briefly: 3.10.6-2 is broken one some machines where 3.10.3-1 works.
Comment by Jan Hodapp (sunomi) - Saturday, 17 August 2013, 00:25 GMT
Linux 3.10.3 is broken with gummiboot v33.1, it's the only bootloader, i get installed. Should I just wait?
Comment by Rich (snugglej) - Saturday, 17 August 2013, 00:37 GMT
I can verify that I have 3.10.5 working without a problem on my X230 but the latest release 3.10.6-2 is broken and has the same symptoms as the previous failures.

Symptoms -
After reboot and using the same boot line from either gummiboot or any other bootloader the screen will just be blank with no errors or anything, it doesn't look like the vmlinuz-linux (or whatever you named it) is working.
Anyone with a problem with the latest arch should just downgrade to the latest version that they have had working.

Comment by cfr (cfr42) - Sunday, 18 August 2013, 00:40 GMT
None work for me. Last working kernel was 3.7.*, if I remember correctly (or was it 3.6.*?).
Comment by mjb (mjb) - Sunday, 18 August 2013, 10:08 GMT
For me: last working version is 3.8.11-1. Maybe we are talking about a couple of different bugs here.
Comment by ValdikSS (ValdikSS) - Tuesday, 20 August 2013, 13:30 GMT
Still can't boot kernel 3.10.7 on Lenovo X220 with EFISTUB
Comment by Jan Hodapp (sunomi) - Tuesday, 20 August 2013, 21:03 GMT
I have not been able to boot my system up until now. I tried to make a completely new installation on UEFI. I can chroot into it from usb-stick and I get no error messages or anything after I select arch in gummiboot, just a black screen. Is there a way to tell, if I have got the same error you encounter, or if I just haven't installed gummiboot correctly? Shouldn't it give me some kind of error?
Comment by Rich (snugglej) - Tuesday, 20 August 2013, 21:48 GMT
Sunomi:

My error doesn't give any kind of error, it just boots through gummiboot to a blank screen with nothing else. You can try adding another boot option and a wait of a couple seconds to the gummmiboot to see if you are actually getting past the boot process and loading linux.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 20 August 2013, 21:50 GMT
If you can, try enabling diagnostic boot, so the firmware gives text output instead of displaying a logo.
Comment by Ulf Winkelvos (uwinkelvos) - Sunday, 25 August 2013, 15:02 GMT
all tested with refind and gummiboot:
linux-3.10.3-1-x86_64.pkg.tar.xz (good)
linux-3.10.7-1-x86_64.pkg.tar.xz (bad)
linux-3.10.9-1-x86_64.pkg.tar.xz (good)

@all suffering from this bug: I have mounted ESP on /boot and installed syslinux as a backup boot loader, which never failed me. This way i can easily switch from gummiboot to syslinux by toggeling the boot mode (legacy/uefi) in the "bios"
Comment by Keshav Amburay (the.ridikulus.rat) - Friday, 06 September 2013, 06:30 GMT
There seems to be some progress in this - http://permalink.gmane.org/gmane.linux.kernel.efi/1560 . I suggest trying to compile kernels with setup_efi_pci() commented out in eboot.c . I haven't experienced any boot issues with any of these kernels in my system (Thinkpad E430).
Comment by Guillermo Vaya (driadan) - Monday, 09 September 2013, 07:43 GMT
3.10.10 is again not working for me after several good ones
Comment by ValdikSS (ValdikSS) - Monday, 09 September 2013, 09:30 GMT
3.10.10 works for me on Lenovo X220
Comment by Guillaume BROGI (guiniol) - Monday, 09 September 2013, 17:49 GMT
3.10.10 (normal and ck) works for me but not 3.11.11 (linux-ck)
This is with a Lenovo X230
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 11 September 2013, 00:43 GMT
linux-3.10.10-1-x86_64.pkg.tar.xz (good)
linux-3.11-1-x86_64.pkg.tar.xz (bad)
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 11 September 2013, 04:16 GMT
I just tried what keshav suggested and it does work for me with the 3.11 kernel. The offending (partial)commit is: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/boot/compressed/eboot.c?id=dd5fc854de5fd37adfcef8a366cd21a55aa01d3d so applying a patch like this: http://pastebin.com/24kvw8kt works.
Comment by btby (btby) - Sunday, 15 September 2013, 23:21 GMT
Both 3.10 and 3.11 work without this problem on ThinkPad T430.

Ulf, hasn't pretty much the same patch been applied with http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/arch/x86/boot/compressed/eboot.c?id=f8b8404337de4e2466e2e1139ea68b1f8295974f introduced with the 3.10 release? Why did you have to patch again?

Because of the above patch and no boot problems, I'm beginning to think that the bug is resolved for my hardware. Will wait a couple of releases and report if this changes.
Comment by Jan Alexander Steffens (heftig) - Sunday, 15 September 2013, 23:25 GMT
No, that's a different patch.

It hit me again (Lenovo X220) with the recent 3.11.0 kernel I built. I did not use the PCI patch.
Comment by ValdikSS (ValdikSS) - Wednesday, 18 September 2013, 03:02 GMT
Lenovo X220: 3.11.1 works fine for me.
Comment by Kari Hreinsson (karihre) - Monday, 23 September 2013, 00:58 GMT
Macbook Air 2013, only boots occasionally with kernel 3.11.1 (but does in fact boot on occasion). I see the "Booting OS" dialog showing the kernel options, after that it just hangs completely unresponsive (adding debug to the kernel options does nothing). Booting from encrypted root. Got same results with the -lts kernel.

* refind-efi 0.7.3-2 (not latest version, but due to the most recent efibootmgr crashing on me I ran refined-install from the latest live-usb)
* efibootmgr 0.6.0-3
* linux 3.11.1-1
* linux-lkts 3.10.12-1
Comment by Keshav Amburay (the.ridikulus.rat) - Monday, 14 October 2013, 17:13 GMT
@All affected by this: Can you guys try syslinux-6.02 ? It uses EFI Handover Protocol which is a subset of EFISTUB, to boot the kernel. If syslinux 6.02 boots the kernel fine but EFISTUB (direct or via gummiboot/rEFInd) fails then it might be something very specific to EFISTUB. EFISTUB eventually goes through EFI Handover Protocol. Both GRUB and ELILO boot via traditional linux boot protocol which has no relation to EFISTUB or EFI Handover Protocol, so that should work fine but does not help in solving the bug. Note that EFISTUB, EFI Handover Protocol and Syslinux maintainer is same person - Matt Fleming aka mfleming . He can be contacted in #syslinux, both for Syslinux and EFISTUB issues.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 02 November 2013, 03:29 GMT
syslinux 6.02 does not boot at all in uefi mode... actually looks pretty similar to booting a broken kernel from gummieboot. i.e. immediat black screen. The only thing that works reliable is commenting setup_efi_pci() in boot.c
Comment by Keshav Amburay (the.ridikulus.rat) - Thursday, 21 November 2013, 09:51 GMT
@ALL: Can you guys try https://wiki.archlinux.org/index.php/Unified_Extensible_Firmware_Interface#Using_GRUB ? This does not fix the actual EFISTUB issue but rather uses GRUB to boot the kernel (which uses the traditional linux boot protocol, preceding the EFISTUB and EFI Handover protocol used by Syslinux 6.xx).
Comment by cfr (cfr42) - Friday, 22 November 2013, 03:24 GMT
For whatever it is worth, all recent kernels fail for me using EFISTUB (as I noted above) but GRUB continues to boot them reliably. Every kernel through 3.7.* worked fine. Every kernel from 3.8.* fails. GRUB really does work, though - if it weren't for GRUB, I wouldn't have been able to boot at all for months and months.

[Note: I have the issue unusually consistently. It doesn't come and go or affect some kernels and not others.]
Comment by Guillermo Vaya (driadan) - Wednesday, 18 December 2013, 10:21 GMT
on 3.12 I can only use .1, any later kernel gets me back to the same issue.
Comment by Andrey Yankin (andrey013) - Friday, 20 December 2013, 14:46 GMT
After recent update from linux-3.11.6-1-x86_64 to linux-3.12.5-1-x86_64 my Lenovo G780 discontinued to boot normally again.
My solution and workaround (as mentioned above) is to insert a flash drive before boot process.
Could anybody confirm this, please?
Comment by Ulf Winkelvos (uwinkelvos) - Tuesday, 31 December 2013, 02:20 GMT
3.12.6-1-ARCH works fine for me.
Comment by mid-kid (mid-kid) - Sunday, 12 January 2014, 00:16 GMT
3.12.7-1-ARCH fails to boot. Same symptoms as OP. Using refind-0.7.7 (binaries from official website)
Acer Aspire V5-121, AMD APU C-70

3.12.6-1-ARCH works fine.
Comment by ValdikSS (ValdikSS) - Sunday, 12 January 2014, 04:06 GMT
@Esteve Varela Colominas (mid-kid)
Can confirm. 3.12-7-1 won't boot using EFISTUB on Lenovo X220.
Comment by Rich (snugglej) - Sunday, 12 January 2014, 05:06 GMT
I can also confirm that the newest linux kernel 3.12.7-1 won't boot on my Lenovo X230 with EFISTUB.

I'm using gummiboot and I can verify that gummiboot works without a problem as I am also using it to boot to Windows loader. Once I enter linux it will not start loading arch linux. Is there anything I can test to figure out what is causing this?
Comment by mid-kid (mid-kid) - Sunday, 12 January 2014, 11:54 GMT
Updated my build machine over ssh. It never got back online.
It boots with BIOS, but the disk is GPT. It's using syslinux.
According to the case, It's a DELL optiplex 740.
Comment by sven (commonuser) - Sunday, 12 January 2014, 19:34 GMT
Same on a Lenovo Thinkpad T420, since 3.12.7 it wont boot any more via EFISTUB. The screen stays completely blank. Booting via the EFI shell (install iso) works fine though. Up to 3.12.6 it booted without problems.
Comment by Ulf Winkelvos (uwinkelvos) - Monday, 13 January 2014, 02:19 GMT
yep... 3.12.7-1 won't boot in with gummiboot on my dell xps13.
Could you guys verify, that apllying a patch like this: https://bugs.archlinux.org/task/33745#comment114120 works?!
Comment by Wim Herremans (herremaw) - Monday, 13 January 2014, 12:09 GMT
Same problem on on my Acer V3-571 laptop with InsydeH20 V2.07 UEFI.

I have been using reFIND bootmanager with ESP mounted at /boot/efi and with the kernels installed in /boot (on ext4 partition) and using the "ext4_x64.efi" driver of reFIND.

This configuration has worked for me for all 3.10, 3.11 and 3.12 kernels until I upgraded to kernel 3.12.7-1. It still works with kernel 3.12.6-1.

I can boot the 3.12.7-1 kernel using GRUB2. That is what I am doing temporarily. But I still hope that I can return to the reFIND - EFISTUB method.
Comment by Brian (bwright1558) - Monday, 13 January 2014, 18:25 GMT
I, too, confirm that 3.12.7 doesn't boot with UEFI. Neither the 3.12.7-1 nor 3.12.7-2 kernels will boot. No problem booting with 3.12.6-1 kernel.

I am using gummiboot with ESP mounted at /boot. This is on a Lenovo Thinkpad X1 Carbon.
Comment by Igor Stamatovski (igorstama) - Monday, 13 January 2014, 19:22 GMT
Same here 3.12.7 doesn't boot with UEFI. No problem booting with 3.12.6-1 kernel.

I am using gummiboot with ESP mounted at /boot. Lenovo Thinkpad X1 Carbon as well.

What next? Broken kernels and/or gummyboot just won't boot, good ones boot just fine. How to debug what's the problem?
Comment by Brian (bwright1558) - Monday, 13 January 2014, 20:38 GMT
I have verified that the following patch works for me: https://bugs.archlinux.org/task/33745#comment114120
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 00:01 GMT
I did some testing. Kernel 3.12.6-1 works installing from the core repository. Kernel 3.12.7-1 and 3.12.7-2 failed to boot when installed from core repository.
After reading https://bugs.archlinux.org/task/33745#comment108126, I decided to manually download and run the PKGBUILD files to build the linux 3.12.7 kernel instead of installing a pre-compiled kernel from the Arch core repository.

Results: After manually building and installing the linux 3.12.7 kernel using the PKGBUILD instead of the pre-compiled version from core, my machine successfully boots!

Hypothesis: The Arch developers are occasionally doing something strange when they create the pre-compiled linux kernel, as suggested in the following comment from this same thread: https://bugs.archlinux.org/task/33745#comment108126.
Comment by Tobias Powalowski (tpowa) - Tuesday, 14 January 2014, 07:15 GMT
Interesting, what makeflags do you use to compile the kernel? That is the only change I made to default makepkg.conf, I pass -j8.
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 08:34 GMT
I've tested this once with the default makepkg.conf, and once with MAKEFLAGS set to -j8. In both cases, the kernel compiled and installed successfully, and my machine boots without any problems. I then tried the 3.12.7-2 kernel from the repositories one last time just to be sure, and my machine failed to boot.
Comment by Thomas Bächler (brain0) - Tuesday, 14 January 2014, 09:46 GMT
Okay, to get a completely reproducible test case, use devtools to build everything:

* Choose an up to date mirror
* Install devtools
* Mount a btrfs file system on /var/lib/archbuild (another file system will also work, but creating the chroots will be less efficient due to lack of snapshots - as a test case, we could also try with and without btrfs)
* Add the following lines to the sudoers file using visudo:

brian ALL=NOPASSWD: /usr/bin/extra-i686-build
brian ALL=NOPASSWD: /usr/bin/extra-x86_64-build
brian ALL=NOPASSWD: /usr/bin/multilib-build
brian ALL=NOPASSWD: /usr/bin/multilib-staging-build
brian ALL=NOPASSWD: /usr/bin/multilib-testing-build
brian ALL=NOPASSWD: /usr/bin/staging-i686-build
brian ALL=NOPASSWD: /usr/bin/staging-x86_64-build
brian ALL=NOPASSWD: /usr/bin/testing-i686-build
brian ALL=NOPASSWD: /usr/bin/testing-x86_64-build

* Get the PKGBUILD directly from svn: svn co svn://svn.archlinux.org/packages/linux/trunk linux
* Build the package using 'sudo extra-x86_64-build -c' or 'sudo extra-i686-build -c'

tpowa, could you also do the same on your build machine and upload all test kernels for Brian to test? In particular, build packages with and without btrfs and recreate all chroots using '-c'.

This is a very weird problem: Both the working and non-working kernels have been built on your build machine.
Comment by Rod Smith (srs5694) - Tuesday, 14 January 2014, 12:06 GMT
I've run into problems with parallel compilation (via the -j switch to make) when building rEFInd; sometimes it just fails to compile at all. Note that this behavior is NOT consistent -- it may fail to compile on one run and then compile OK on the second run. I've never bothered to investigate this further; I just set my tools to do single-threaded compilation and moved on. It's conceivable, though, that the EFI stub code has some similar issue, and that part of the kernel, at least, needs to be compiled in a single thread in order to work correctly. If so, then the boot-time problem might come and go depending on some random factor -- build 1 might work, build 2 might work, and build 3 might fail, all from the same source code and on the same computer.

This is just an hypothesis, though; I haven't tried to reproduce this problem or track it down further. (I don't have a computer that's ever shown this symptom.)
Comment by Tobias Powalowski (tpowa) - Tuesday, 14 January 2014, 14:15 GMT
But shouldn't the kernel compile error out in such a case and break compilation?
Comment by Thomas Bächler (brain0) - Tuesday, 14 January 2014, 14:23 GMT
Of course. The fact that refind's makefiles are broken does not have any relation to this problem and I don't know how this comment is supposed to add anything useful to this report.
Comment by Tobias Powalowski (tpowa) - Tuesday, 14 January 2014, 14:42 GMT
https://dev.archlinux.org/~tpowa/linux-3.12.7-2-x86_64.pkg.tar.xz
compiled kernel with -j1 in a new setup chroot.
Please try this.
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 15:10 GMT
Tobias and Thomas, thanks for working with me on this issue. Unfortunately my machine still doesn't boot using the compiled kernel that Tobias posted above.
I've got my chroot environment setup on a btrfs and ext4 filesystem, along with the sudoers file configured as explained by Thomas. Just to rule out possible causes, I'll test -j1, -j2, and -j8 MAKEFLAGS under each filesystem. I'll post my results as soon as I find some time to compile the kernel under the two filesystems.
Comment by mid-kid (mid-kid) - Tuesday, 14 January 2014, 17:01 GMT
I don't think this is a compilation problem as I also tried to compile it (-j4, extra-x86_64-build), without success.
I think this is a bug in the kernel itself. Maybe the patch brian suggested (https://bugs.archlinux.org/task/33745#comment114120) works. I'm gonna try and report back.
Comment by Kristoffer Jan-Olov Tångfelt (revellion) - Tuesday, 14 January 2014, 17:50 GMT
Voted this bug up since i also suffer from not being able to boot the latest kernels using EFISTUB/Gummiboot(rather than rEFInd). Currently using 3.12.7-2-ARCH

Works fine with GRUB2 but using Gummiboot it just shows a black screen. Worked fine with earlier kernels.

Hardware

Lenovo Thinkpad T420i 4178-BAG
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 20:30 GMT
I have interesting results!

The 3.12.7-2 kernel from the core repository still does not boot on my machine with UEFI.
The 3.12.7-2 kernel compiled using the PKGBUILD on my local machine (not in a chroot environment) works, i.e. my machine successfully boots. Note that I tried this using -j2, -j4, and -j8 MAKEFLAGS, all of which resulted in successful boots. Also note that my filesystem is ext4.

Here are the interesting results. I set up two different chroot environments as instructed by Thomas. The first chroot was on a btrfs filesystem. The second chroot was on an ext4 filesystem.

The 3.12.7-2 kernel on the clean chroot btrfs filesystem was compiled with MAKEFLAGS="-j4" and using the 'sudo extra-x86_64-build -c' command as instructed by Thomas. I installed the generated tar.xz package using 'sudo pacman -U linux-3.12.7-2-x86_64.pkg.tar.xz' and the installation finished without any errors. I restarted my machine and it failed to boot.

The 3.12.7-2 kernel on the clean chroot ext4 filesystem was compiled with MAKEFLAGS="-j4" and using the 'sudo extra-x86_64-build -c' command as instructed by Thomas. I installed the generated tar.xz package using 'sudo pacman -U linux-3.12.7-2-x86_64.pkg.tar.xz' and the installation finished without any errors. I restarted my machine and... Yes!!! My machine successfully booted!

New hypothesis: compiling the linux kernel on a btrfs filesystem may randomly result in some UEFI machines not booting, particularly ones in which the kernel is being installed to an ext4 or other non-btrfs filesystem when it was compiled on btrfs.

Second hypothesis: compiling the linux kernel on an ext4 filesystem results in successful boots, at least on machines in which the ext4 compiled kernel is installed onto an ext4 or more generally, a non-btrfs filesystem.

Why is this happening? It may be a bug in how the kernel operates on btrfs systems.
What does everyone else think?
Comment by Max Liebkies (gegenschall) - Tuesday, 14 January 2014, 20:37 GMT
Same bug here. I didn't reboot after the upgrade from 3.12.6-1 to 3.12.7-1, so I only realized right now I'm affected by this bug. System:

Thinkpad X220 (4290-W1B), Bios: 1.39, FW: 1.24
UEFI Boot with gummiboot, /boot is vfat (EFI Partition), Kernel installed to /boot, LUKS in use

Last working kernel: 3.12.6-1 (have downgraded since)
Bad kernels so far: 3.12.7-1 and -2

Didn't try Tobias' kernel yet.
Comment by Max Liebkies (gegenschall) - Tuesday, 14 January 2014, 20:42 GMT
@Brian: My kernel is installed to a vfat fs and it exhibits the same behaviour. That doesn't falsify your hypothesis but somehow makes it unlikely. (?)
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 20:45 GMT
Thanks Max. I updated the hypothesis in my comment.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 14 January 2014, 20:50 GMT
@Brian: could you upload the two builds you made?
Comment by Brian (bwright1558) - Tuesday, 14 January 2014, 21:02 GMT Comment by mid-kid (mid-kid) - Tuesday, 14 January 2014, 21:02 GMT
I built the kernel with the patch, but, nothing new here: http://mid-kid.imly.org/midrepo/linux-efi-3.12.7-2-x86_64.pkg.tar.xz (WARNING: slow connection, please be kind) (for headers, add -headers to the url, same for docs)

@Brian: Yesterday, I built 3.12.7-1 (-j4, ext4, extra-x86_64-build), but it didn't work either. Maybe the patches added in -2 did something, or it's the fact that I changed the pkgname to linux-selfbuild (For the sake of not having to replace my current), because otherwise, I don't know how it worked with you.

I'm going to try compiling it without changing anything to the PKGBUILD, and getting it directly from SVN (instead of ABS). Let's see if it works.

EDIT: NVM, you can use the linux-headers from core with @Brian's kernel.
Currently running @Brian's ext4 build. It works!
Comment by Thomas Bächler (brain0) - Tuesday, 14 January 2014, 22:17 GMT
Brian, this all seems random - I don't think tpowa even builds on btrfs, but rather on ext4. I would love to know a way to compare those kernel images ...
Comment by Bastian Beranek (totsilence) - Tuesday, 14 January 2014, 22:30 GMT
I also suffer from this problem: 3.12.6 worked fine and both 3.12.7-1 and 3.12.7-1 fail to boot my Lenovo Thinkpad W520. I'm using gummiboot.

Will try to compile my own kernel next.

Edit: Compiling it myself didn't help. Still can't boot 3.12.7... I'll keep 3.12.6 for now.
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 15 January 2014, 00:09 GMT
I am still pretty sure its an alignment problem (see: https://bugs.archlinux.org/task/33745#comment111240) and it's definitely related to the setup_efi_pci, as patching that out never failed me. I will try tobias and brians kernels now. Lets see if they work, but why are all of those different in size anyhow?
Comment by Brian (bwright1558) - Wednesday, 15 January 2014, 01:20 GMT
Not sure how to compare the kernel images. One thing to compare is the file size of the tar.xz packages. Tobias' version, my version built on btrfs, and my version built on ext4 all have a different file size. Maybe this is a sign of what might be wrong?

Let's not completely dismiss the ext4/btrfs hypothesis just yet. Perhaps the makefile for the kernel does some configuration behind the scenes that we have not considered, for example the makefile is setting some flags in the kernel during compilation depending on the type of filesystem it is being built on. If the filesystem type is not the issue, then maybe it is some other flag(s) in the kernel's makefile.
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 15 January 2014, 01:48 GMT
I am pretty sure that i have build kernels on ext4 that did not work, but i have to verify that tomorrow. Strangely enough this time your ext4 kernel is the only 3.12.7-2 kernel that does work, all others don't:

======
kernel good/bad
======
GOOD arch3-12-6/boot/vmlinuz-linux (arch stock 3.12.6-1)
BAD arch3-12-7-2/boot/vmlinuz-linux (arch stock 3.12.7-2)
BAD btrfs/boot/vmlinuz-linux (brian's btrfs 3.12.7-2)
GOOD ext4/boot/vmlinuz-linux (brian's ext4 3.12.7-2)
BAD tpowa/boot/vmlinuz-linux (tobias' 3.12.7-2)

In lack of any better idea or knowledge in the field of uefi binaries I gathered some file size stats. I played around with objdump too and there are differences, but i could not tell what they mean. Interestingly the not working 3.12.7-2 images all have the same sizes, but as i could see with objdump and md5sum different content. Anybody got any idea?

======
ls -l
======
3875280 20. Dez 19:40 arch3-12-6/boot/vmlinuz-linux
3867728 12. Jan 13:10 arch3-12-7-2/boot/vmlinuz-linux
3867728 14. Jan 20:13 btrfs/boot/vmlinuz-linux
3867664 14. Jan 21:05 ext4/boot/vmlinuz-linux
3867728 14. Jan 15:30 tpowa/boot/vmlinuz-linux
======
md5sum
======
c2784212a2d4fb6333ef42267887023b arch3-12-6/boot/vmlinuz-linux
8b42b8f424aac23dfcaffe25ed0723cf arch3-12-7-2/boot/vmlinuz-linux
bae23189d4a7a59bace614339a4239b6 btrfs/boot/vmlinuz-linux
69539063e24920c14bb48bdb4dd99631 ext4/boot/vmlinuz-linux
85502b4ea01e83274621d2837be1c9bd tpowa/boot/vmlinuz-linux
======
size -Ax
======
arch3-12-6/boot/vmlinuz-linux :
section size addr
.setup 0x41e0 0x200
.reloc 0x20 0x43e0
.text 0x3addd0 0x4400
Total 0x3b1fd0
------
arch3-12-7-2/boot/vmlinuz-linux :
section size addr
.setup 0x41e0 0x200
.reloc 0x20 0x43e0
.text 0x3ac050 0x4400
Total 0x3b0250
------
btrfs/boot/vmlinuz-linux :
section size addr
.setup 0x41e0 0x200
.reloc 0x20 0x43e0
.text 0x3ac050 0x4400
Total 0x3b0250
------
ext4/boot/vmlinuz-linux :
section size addr
.setup 0x41e0 0x200
.reloc 0x20 0x43e0
.text 0x3ac010 0x4400
Total 0x3b0210
------
tpowa/boot/vmlinuz-linux :
section size addr
.setup 0x41e0 0x200
.reloc 0x20 0x43e0
.text 0x3ac050 0x4400
Total 0x3b0250

Comment by Max Liebkies (gegenschall) - Wednesday, 15 January 2014, 05:02 GMT
I just compiled my own kernel and it seems to work. Steps: abs, makepkg, install. Nothing special.
Comment by Tobias Powalowski (tpowa) - Wednesday, 15 January 2014, 08:41 GMT
For completition, I compile all kernels on a ext4 filesystem.
Comment by mid-kid (mid-kid) - Wednesday, 15 January 2014, 09:06 GMT
@gegenschall: Can you post your kernel so we can compare?
Comment by Olli Laasonen (laasonen) - Wednesday, 15 January 2014, 09:58 GMT
I've got the same bug here on my Thinkpad T430s. It didn't reboot after upgrading to 3.12.7-2 from 3.12.6-1.

I'm using gummiboot, my boot partition is 1GB fat32 (EFI Partition), root partition is ext4, partition table is gpt and I'm running the x86_64 version.
Comment by Bastian Beranek (totsilence) - Wednesday, 15 January 2014, 10:04 GMT
Just tried Brian's ext4 compiled kernel and it doesn't work for me (contrary to what everybody else seems to say).
Comment by Max Liebkies (gegenschall) - Wednesday, 15 January 2014, 10:58 GMT
Yes, sure:

https://dl.dropboxusercontent.com/u/79189298/linux-3.12.7-2-x86_64.pkg.tar.xz
https://dl.dropboxusercontent.com/u/79189298/linux-docs-3.12.7-2-x86_64.pkg.tar.xz
https://dl.dropboxusercontent.com/u/79189298/linux-headers-3.12.7-2-x86_64.pkg.tar.xz

[Btw, one workaround seems to be:
Install elilo from aur, copy over elilox64.efi and elilo.conf to /boot(/EFI) and configure elilo.conf accordingly. Set up your normal EFI boot manager to chainload elilo which then loads the kernel. This could be set up as a permanent backup solution in case this bugs occurs again, so you're not a sitting duck with your unbootable machine]
Comment by Brian (bwright1558) - Wednesday, 15 January 2014, 16:41 GMT
Okay, I'm convinced that I was wrong about the ext4/btrfs thing.
Seeing how the kernel in the repositories works for some people and not others, and also my kernel works for some people and not others, perhaps this is specifically hardware related.
I'll post back after I've have a chance to try Max's kernel.
Comment by Brian (bwright1558) - Wednesday, 15 January 2014, 17:29 GMT
Just tried Max's kernel. My machine fails to boot.
I'm out of ideas now. What could possibly be going on? UEFI is a must-have for me. I don't mind compiling my own kernel just to get it working; however, I would still prefer to use the one from the repositories (when it works) because it is quicker to install.
Comment by Andrey Yankin (andrey013) - Wednesday, 15 January 2014, 17:31 GMT
Here is so much testing going on!
Could somebody please give any comment on my workaround to this problem?

3.10.2-1 and 3.12.5-1 were bad for me.
I used to insert a flash drive into USB before starting my laptop to boot the kernels, which failed to boot otherwise.
A mere presence of any (maybe not even bootable) flash drive let the miracle happen.
Could you please confirm/disprove/elaborate on this?
Comment by mid-kid (mid-kid) - Wednesday, 15 January 2014, 17:38 GMT
@andrey013
I disprove. I've tried it several times with different kernels. It makes no difference for me.

Both max's and brian's kernels work on my PC.
Comment by Brian (bwright1558) - Wednesday, 15 January 2014, 18:43 GMT
@andrey013
I also disprove. Having a flash drive in any of the USB ports on my machine made no difference.
Comment by Bastian Beranek (totsilence) - Wednesday, 15 January 2014, 21:19 GMT
I think we should stop guessing in the blue and take this to the LKML or the kernel bug tracker. People there should know best how to fix this. Compiling on a different filesystem? USB stick present during boot? Please guys, let's admit we're lost :)

I strongly believe it's related to something that was changed in the kernel itself in version 3.12.7 (at least there seem to be a _lot_ of people which were able to boot 3.12.6 but not 3.12.7). And the list of changes between 3.12.6 and 3.12.7 is not so long.
Comment by Max Liebkies (gegenschall) - Wednesday, 15 January 2014, 21:34 GMT
This might be a superfluous comment, but: Absolutely!
Comment by Brian (bwright1558) - Wednesday, 15 January 2014, 22:22 GMT
I just submitted a bug report to the kernel bug tracker: https://bugzilla.kernel.org/show_bug.cgi?id=68761
Comment by Bastian Beranek (totsilence) - Wednesday, 15 January 2014, 22:24 GMT
Thanks Brian! I'm trying to bisect 3.12.6 to 3.12.7 now, but it's my first time doing a kernel bisect so lets see if I'll succeed. If I manage to identify the faulty commit I'll comment on your bug report in the kernel bugzilla.
Comment by Steve Nims (sjnims) - Wednesday, 15 January 2014, 22:44 GMT
Bastian, please repost here if/when you find something out. Thanks!
Comment by Ulf Winkelvos (uwinkelvos) - Thursday, 16 January 2014, 01:14 GMT
While i agree that this should be discussed in the kernel bug, I am pretty sure this is not an isolated problem with this kernel version, as the exact same behaviour could be seen on atleast a dozen Kernels starting from 3.7.x. While not all machines fail on the same kernels there are clusters of kernels that fail for a certain set of machines. My dell xps 13 fhd is way more picky than my lenovo w520, but this too has failed on atleast 2-3 kernelversions that I know of. Another strange thing is that this seems to be an arch/gentoo problem, while I have hardly read anything about fedora, mint, debian, or ubuntu. That is probably related to the fact that it does make a difference on which system affected kernel versions are compiled. I tried Max' kernel and it worked for me, while I build one on my server in a ext4 chroot and that one does not work.
Comment by Ulf Winkelvos (uwinkelvos) - Thursday, 16 January 2014, 01:57 GMT
When i apply the no_setup_efi_pci patch to kernel 3.12.7-2 it boots fine.
Comment by Bastian Beranek (totsilence) - Thursday, 16 January 2014, 08:48 GMT
Completed the bisect. Here's the log:

git bisect start
# bad: [4301b7a8fe14a787fbf0bb9cad16b623f45956f6] Linux 3.12.7
git bisect bad 4301b7a8fe14a787fbf0bb9cad16b623f45956f6
# good: [d0266db287d492abe63e19859ad99dd232bc0e89] Linux 3.12.6
git bisect good d0266db287d492abe63e19859ad99dd232bc0e89
# good: [f3c1f0d0aaf20f9dee35ae99ec8b8705af4dc60e] drm/radeon: fix render backend setup for SI and CIK
git bisect good f3c1f0d0aaf20f9dee35ae99ec8b8705af4dc60e
# good: [f3b578d9d009a9f670e893cec8579aa069aaaccb] mm: numa: avoid unnecessary work on the failure path
git bisect good f3b578d9d009a9f670e893cec8579aa069aaaccb
# bad: [e93b100931a45490cd07960a1ec51d9d8e5100cb] GFS2: Fix slab memory leak in gfs2_bufdata
git bisect bad e93b100931a45490cd07960a1ec51d9d8e5100cb
# bad: [eede0e9020693adaeed01fb464261a00ce9d05ad] mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully
git bisect bad eede0e9020693adaeed01fb464261a00ce9d05ad
# good: [ef36ec29945653ced2c30158213841d248299a8a] mm: fix TLB flush race between migration, and change_protection_range
git bisect good ef36ec29945653ced2c30158213841d248299a8a
# bad: [9c612a77032a98b264d12fd6e3df2ca530d968d2] mm: numa: defer TLB flush for THP migration as long as possible
git bisect bad 9c612a77032a98b264d12fd6e3df2ca530d968d2
# good: [186fa6eb6131954d17457f37283e654cb079c25b] mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates
git bisect good 186fa6eb6131954d17457f37283e654cb079c25b
# first bad commit: [9c612a77032a98b264d12fd6e3df2ca530d968d2] mm: numa: defer TLB flush for THP migration as long as possible

I'll try to apply the inverse of 9c612a77032a98b264d12fd6e3df2ca530d968d2 on top of 3.12.7 next.
Comment by Jan Alexander Steffens (heftig) - Thursday, 16 January 2014, 09:09 GMT
Your bisect produced nonsense, since the problem seemingly appears and disappears at random if you rebuild the same code.

Also, it happens in the very early boot code (arch/x86/boot/compressed). The mm subsystem (and the rest of the kernel) isn't even loaded at this time.

It might have something to do with the alignment or the content of the compressed kernel. Using another compression type (such as XZ or LZO) might hide the problem again.
Comment by Bastian Beranek (totsilence) - Thursday, 16 January 2014, 09:09 GMT
Also, with 9c612a77032a98b264d12fd6e3df2ca530d968d2 I actually got an error message:

"Failed to alloc lowmem for boot params"

This can be found in arch/x86/boot/compressed/eboot.c
Comment by Bastian Beranek (totsilence) - Thursday, 16 January 2014, 12:34 GMT
Thanks Jan, looking at the bisect result, I now tend to agree with you :/
Comment by Wim Herremans (herremaw) - Thursday, 16 January 2014, 14:56 GMT
I have an Acer laptop V3-571 on which I use reFIND/EFISTUB as bootmanager/bootloader and on which I have both an Arch Linux and a Fedora 20 installation. Both Arch and Fedora are at kernel version 3.12.7. The Arch kernel fails to boot, the Fedora kernel boots.

Yesterday, I built the 3.12.7-2 Arch kernel myself and it did not boot. Today, I built it again, and now it boots.
Comment by Brian (bwright1558) - Saturday, 18 January 2014, 20:21 GMT
@herremaw, are you sure that when you built the kernel the second time, you were using the 3.12.7 kernel? I assume you were using the Arch Linux PKGBUILD to compile the kernel. Just an FYI, even though it's not yet in the repositories (core or testing), the PKGBUILD is already updated to use the 3.12.8 kernel, which when I built it resulted in a successful boot compared to the 3.12.7 kernel. This was when I was using gummiboot.

I now believe this bug is a combination of the hardware, kernel, and bootloader. Originally I was only using gummiboot which resulted in the kernel not booting. Then I decided to try refind. Same result. It did not boot. Then I tried GRUB, following the UEFI installation and configuration instructions from the Arch Linux Beginner's Guide. This resulted in a successful boot.

What's strange is that some people have said that all 3 bootloaders failed to work, others have said that all 3 bootloaders work, and the remaining people, myself included, could only get one of the bootloaders to work (specifically GRUB).
Comment by Rod Smith (srs5694) - Saturday, 18 January 2014, 20:50 GMT
This bug is specific to the EFI stub loader. Both gummiboot and rEFInd are boot managers: They can load a Linux kernel only via the kernel's EFI stub loader. GRUB, ELILO, and SYSLINUX are all boot loaders that load the kernel and start it running themselves, without relying on the EFI stub loader. (GRUB can also load the Linux kernel as an EFI application, relying on the EFI stub loader.) If GRUB is failing in a configuration that does NOT launch the kernel using its EFI stub loader, then that's (probably) a different bug than the one under discussion here.

For those doing kernel re-builds, might I suggest that you clean your temporary files and re-build the kernel multiple times, without making any other changes? I suggest four or five re-builds. The most notable aspect of this bug is its amazing inconsistency. If multiple builds of the same kernel produce different boot results, then that's important information -- it could indicate some sort of problem during the build process that can come and go or something that's sensitive to build-specific information. (Two back-to-back builds of the same kernel will not produce 100% identical binaries because information like the kernel build time will be embedded in the kernels. Although I can't suggest how, such differences might be feeding into this bug.) If several builds from the same source file produce consistent results, on the other hand, then that suggests something more static (but still changeable from one version to another) is at the root of the problem.
Comment by Wim Herremans (herremaw) - Sunday, 19 January 2014, 07:52 GMT
@Brian, yes I am sure that I was using kernel version 3.12.7-2. The only difference between the 2 builds was the time and the Arch installation. The latter needs some explanantion. I have 2 Arch installations on the same computer: one is the Arch system for daily use, the other is an Arch installation that I use to do some experiments.

The first build (bad) was made on the 2nd Arch system, the second one (good) on the 1st Arch system. The next days I did some rebuilds again on both systems. Both systems produced some good and some bad builds. So, there is no point in trying to find out if a difference between the 2 systems could be the cause of the different results.

I also have GRUB2 installed and I have configured reFIND to display the option of calling GRUB2. GRUB2 has always worked for me, even when the EFISTUB loader fails, but I still prefer the beauty and simplicity of reFIND over GRUB2.

@Rod Smith, when doing rebuilds, I have always started from a fresh copy of the /var/abs/core/linux directory. So, there cannot be any interference with temporary files left over form previous builds. I am convinced that several builds of the same source produce different results for the EFISTUB loader problem: it is either random, or depending on the date and time.
Comment by Grayson MacKenzie (graboy) - Monday, 20 January 2014, 21:08 GMT
@andrey013
I can confirm this. I am on a Lenovo Thinkpad x220. First failed to boot when upgrading to 3.12.7-2, when I plugged in a bootable flash drive I successfully booted. I am now reverted to 3.12.6-1.
Comment by norealname4u (talu2pwal) - Wednesday, 22 January 2014, 13:09 GMT
I wasn't bitten by this until yesterday. Attempting to reinstall arch on my thinkpad x230 ended up with refind working but a machine unable to boot, stuck to the kernel parameters line. After restoring backups and running updates where the linux kernel got updated to 3.12.8-1, it doesn't boot anymore. My arch-ck 3.12.8-1-ck kernel does boot though.
Comment by norealname4u (talu2pwal) - Wednesday, 22 January 2014, 14:22 GMT
Using Arch rollback machine I tried downgrading to a few previous 3.12 kernels that I know worked, but for some reason they don't now.
As others reported success with 3.10.10-1, I tried it and it does boot on my thinkpad x230.
Comment by Wim Herremans (herremaw) - Wednesday, 22 January 2014, 16:56 GMT
@norealname4u, that is strange. Kernel versions 3.12.7-1 and 3.12.7-2, as coming from the repositories, failed to boot on my Acer V3-571, but 3.12.8-1 boots without problems.
Comment by Thomas Bächler (brain0) - Wednesday, 22 January 2014, 17:06 GMT
talu2pwal's comment suggests that the kernel gets corrupted while reading it from the ESP, depending on where on the ESP the kernel is. This sounds like file system corruption, or the inability to parse the file system properly from the firmware in some cases. If that is the case, there is probably not much we can do. (Note: GRUB probably uses its own fat32 code to read the kernel, while gummiboot uses the firmware's API.)

As a test, can anyone make multiple copies of the same kernel on the ESP and try to boot each one of them? What about re-creating the file system on the ESP from scratch, does that change anything (be careful, you need to make sure that any other OS, like Windows, still boots afterwards, so only do this if you know what you are doing)? Or simply defragmenting the ESP?

(More factors I can think about: this may depend on file size, fragmentation, ...)
Comment by norealname4u (talu2pwal) - Wednesday, 22 January 2014, 17:15 GMT
I actually repartitioned / reformatted my drive (ssd) a few times both while doing the fresh install that failed to boot and trying to fix the issue, I used gdisk from a bootable arch install media.
I also used efibootmgr to try making new entries for refind. As my fresh install still locked up, I restored from backups to find out that the linux kernel was also locking up. At first I suspected that the change of UUID could be the issue so I ran a chroot from the installation media and reinstalled the linux package with pacman which didn't fix the problem.
Then I found out my arch-ck kernel wasn't locking up, so I updated the associated refind_linux.conf and I managed to boot in arch-ck.

My ESP is a gdisk 512MB EF00 partition formatted in fat32 mounted on /boot/efi (followed instructions from the beginner's guide).
Let me know if I can provide more information or run some tests to help.

---
thinkpad x230 with 16GB RAM, crucial M4 SSD
Comment by Matt Runion (mrunion) - Wednesday, 22 January 2014, 18:02 GMT
The same as norealname4u(talu2pwal) was experiencing, I could not boot with 3.12.8-1. It stopped with the kernel params displayed. Looking at the log, there were no entries from those boot attempts, so whatever was occurring didn't get far enough in the process to start logging in the journal. I have not had a problem with a Kernel since sometime in February 2013 (almost 1 year ago).

1st boot attempts after upgrade resulted in just the kernel params showing. After 6 reboot attempts, the machine just booted. I shrugged it off. Next day (this morning), about 15 reboots didn't work. I plugged a USB thumb drive I use to transfer files via sneaker-net into a USB port and it booted. I don't know if it was coincidence or not. I downgraded the Kernel to 3.12.7-2 because I have a meeting today and need it.

Also, the same for us is I have the ESP mounted at /boot/efi (Arch lives in /boot/efi/EFI/arch/*). I also use rEFInd and it boots from /boot/efi/EFI/boot. What may be different (and may or may not be an issue) is when I followed the Beginners guide last year (got the new laptop at the end of January), I created an ESP -- but I already had one for Windows. So my machine has TWO ESP partitions -- FS0: and FS1:. I boot my machine from FS1. I must be honest in telling you that I am not a UEFI expert, so any questions you have might need specific details for me to answer.

My machine: Acer v3-771, i7 16GB RAM, Intel/nVidia Optima, 1TB HD, updated daily, [testing] repo NOT enabled.

If I get a chance tonight I will upgrade the Kernel again and make multiple copies of it.

--------

Wim Herremans(herremaw) has an Acer V3-571 and CAN boot the latest, but not the previous 2 Kernels. I have a V3-771 and it's the opposite for me right now. Not sure if that matters.

Comment by norealname4u (talu2pwal) - Wednesday, 22 January 2014, 18:12 GMT
Following Wim Herremans comment, I started doubting myself so I tried 3.12 kernels from the ARM with the following results:

boots >> linux-3.10.10-1-x86_64
hangs << linux-3.12-1-x86_64
hangs << linux-3.12.1-2-x86_64
boots >> linux-3.12.1-3-x86_64
boots >> linux-3.12.2-1-x86_64
boots >> linux-3.12.3-1-x86_64
boots >> linux-3.12.4-1-x86_64
boots >> linux-3.12.5-1-x86_64
boots >> linux-3.12.6-1-x86_64
hangs << linux-3.12.7-1-x86_64
hangs << linux-3.12.7-2-x86_64
hangs << linux-3.12.8-1-x86_64

Looking at my backups the kernel installed was linux-3.12.7-2-x86_64 and I'm pretty sure it worked fine at the time, actually I've been using arch on this laptop for a year and didn't encounter this bug before.

I tried making several copies of the 3.12.8-1 kernel in /boot/efi/EFI/arch{1..6} and none of those worked.

*Note* Not sure it matters but I should mention that I upgraded my crucial M4 SSD firmware from 040H to 070H before attempting the fresh install.
Comment by Wim Herremans (herremaw) - Thursday, 23 January 2014, 07:33 GMT
Just for additional information about the configuration on my Acer V3-571.

Contrary to what most other people are doing, I keep the kernels in /boot on the root ext4 partition. The ESP I mount on /boot/efi. I have installed reFIND in /boot/efi/EFI/refind. I have also installed reFIND's ext4 driver in /boot/efi/EFI/refind/drivers such that both reFIND and the UEFI firmware can read the kernels from the ext4 root partition.

This setup has served me well for at least 6 months, till kernel version 3.12.7 came along.
Comment by Fnord Popos (noddy) - Thursday, 23 January 2014, 13:42 GMT
I have a thinkpad x220 that's also affected by this. It was affected around the 3.7 era early last year, too.

I have a trick that I've been using since then to boot. With affected kernels, if i launch the efi shell, then use it to re-launch boot manager, and then use that to launch the kernel, it boots most of the time.

This worked both with refind and gummiboot.

It might help somebody stuck with a non-booting kernel. It's also a nice demonstration of efi stub's handoff code being at fault.
Comment by Matt Runion (mrunion) - Thursday, 23 January 2014, 13:50 GMT
My setup is identical to herremaw above except that I have TWO EF00 partitions (the one with Windows and the one I created for Arch) AND I don't have the ext4 driver installed in rEFInd.

--------

Most recent testing information:

Last night I re-updated the Kernel to 3.12.8. I then told the machine to reboot (not POWEROFF, just REBOOT). It rebooted just fine. I did this 5 times -- reboot, sign into Enlightenment, reboot. Afterwards I shut the laptop down for the evening.

This morning at work I powered on -- it would NOT boot. It stopped after displaying the Kernel params. I CTL+ALT+DEL and tried to boot again and it failed again. I went into the EFI Shell and back out (just to do something different), and it failed to boot. I waited for 10-15 seconds on the rEFInd boot menu and then tried -- still failed.

I plugged in my USB thumb drive and the rebooted -- started up just fine! This thumb drive only has a few documents and image files on it I use to transfer between machines, it is NOT A BOOT DRIVE FOR ARCH OR ANY OS (not shouting, just emphasizing). Why does having a USB thumb drive in a USB port make the boot process work? I have a USB mouse receiver plugged in all the time, but that doesn't matter.

I will see if gummiboot does the same thing...

--------

UPDATE: Gummiboot acts the same way, no USB thumb drive, no boot. I removed the wireless mouse receiver and tried rebooting -- still no booting. I went into the BIOS (or whatever it's called now) and moved all USB devices further down below Arch in the boot order, and the machine would still not boot with 3.12.8 unless I had a USB drive plugged in.

I think this may make my issue different than the crux of this thread, right?

Comment by Matt Runion (mrunion) - Thursday, 23 January 2014, 15:33 GMT
OK, I [SOLVED] my problem! Obviously it is not the solution for everybody (anybody) else, but here is the results:

I check my BIOS version against what Acer said mine should be for the V3-771G. I had 2.16 and the latest update was 2.23. I cringed, but updated my BIOS to 2.23. I had to reset the boot options to boot rEFInd again, but after that the machine booted just fine -- no USB thumb drive required.

Obviously, everyone else's mileage will vary, but does this not give some indication that whatever is being done at the earliest levels of the Kernel booting is affected by something in the BIOS?

Also, I only tested rebooting a couple of times. This may cause other issues down the road, or in fact not have completely solved my problem, and I just "lucked up" for a couple of boots. I will also update the forum post I have been participating in with this same info. (https://bbs.archlinux.org/viewtopic.php?id=175662)
Comment by Rich (snugglej) - Friday, 24 January 2014, 03:49 GMT
This seems to affect a lot of the Lenovo machines out here. I'm wondering if it's something half baked about the kernel and the interface for EFI that Lenovo did. Arch just released a new kernel and that one also doesn't work for me. I would like to get it working though.

GUMMIBOOT doesn't work
EFI Shell v1 & V2 do not work

I know this probably isn't the right place to post this but can someone explain to me how to patch the kernel with the efi patch? http://pastebin.com/24kvw8kt I really would like to try it out and see if that will fix my issues.

Thanks
Comment by Ulf Winkelvos (uwinkelvos) - Friday, 24 January 2014, 17:47 GMT
@Rich: the answer in this case is: don't, but try the following kernel package. (see https://wiki.archlinux.org/index.php/Kernels/Compilation/Arch_Build_System for building your own patched kernel package)

@all: looks like Matt Fleming found a solution to the problem. i.e. the reloc2.patch. I built a stock 3.12.7-2 arch kernel with this patch and uploaded it here [1]. Plz try the kernel package or build it yourself [2] and report back to the upstream bug report [3].

[1] https://mega.co.nz/#!0UsxBKLS!FkFIK_av-cNrxmdg2fNc9_GM3UvasGJb3IAgIpAnBjk
[2] https://mega.co.nz/#!AF9hTCaL!lIHgKSbdSo08y1uacAMQvE1Ho9LjrjL27BPGtVWSGl4
[3] https://bugzilla.kernel.org/show_bug.cgi?id=68761
Comment by norealname4u (talu2pwal) - Friday, 24 January 2014, 21:35 GMT
I gave the pre-built patched a try and it seems to have fixed the issue with my thinkpad x230.


Comment by sven (commonuser) - Sunday, 26 January 2014, 17:18 GMT
Current 3.13-1 from testing boots fine on my Thinkpad T420.
Comment by norealname4u (talu2pwal) - Sunday, 26 January 2014, 17:42 GMT
Local compilation of the patched kernel with a modified pkgname and it hangs, the reloc2.patch doesn't fix the issue for me.

As 3.12.9-1 is out, I gave it a try and it hangs too but with a an additional twist, this time I have graphical artifacts appearing on top of the screen, a few lines of different colored pixel suggesting some kind of memory corruption.
Comment by Rich (snugglej) - Monday, 27 January 2014, 02:24 GMT
@norealname4u interesting, what is your computer model? I'm going to give the patched version of the kernel a try. It looks like some people are saying that the kernel working is by luck because of the memory pointers being incorrectly referenced. How many people with an x230 can report that this reloc2.patch works?
Comment by Rich (snugglej) - Monday, 27 January 2014, 04:07 GMT
Okay reporting back after trying the kernel 3.12.7.2-arch-kernel when building it showed as 3.12.7.3 which was weird. The kernel did the same thing as @talu2pwal and basically had a bunch of colored pixels across the top of the screen. This also did the same thing on 3.12.9.1 which from my understanding also has the patch applied.

Update: So I followed some other suggestions and the 3.13 mainline pkg from Aur compiled and booted without a problem on my computer.

I'm running a lenovo x230 with the latest bios anymore suggestions?
Comment by mjb (mjb) - Monday, 27 January 2014, 06:45 GMT
@norealname4u I too have that few lines of different colored pixels at the top. Interestingly though: if I
1) drop to UEFI Shell x86_64 v1 first
2) then exit the Shell
3) then boot
everything works! This is very strange... I have a Toshiba Satellite L855-14Z.
Comment by Wim Herremans (herremaw) - Monday, 27 January 2014, 07:20 GMT
The kernels 3.12.8-1 and 3.12.9-1 from the official repository boot without problems on my Acer V3-571 with reFIND as bootmanager and the kernel's EFISTUB as bootloader.

Kernel versions 3.12.7-1 and 3.12.7-2 were the only ones that hung at boot without any output, even when adding the boot parameter "ignore_loglevel". Rebuilding kernel version 3.12.7-2 myself, always with the same procedure and always from a fresh copy of /var/abs/core/linux, sometimes resulted in a good kernel and sometimes in a bad one.
Comment by Matthias Beyer (musicmatze) - Monday, 27 January 2014, 13:54 GMT
@herremaw 3.12.8-1 and 3.12.9-1 don't boot for me, running a ThinkPad X220. I'm still on 3.12.6-1, all from the official repositories.

I have an idea, regarding searching the bug with git: You could try to find the subset of changes which changed from each kernel release to its successor. All you need is a list of working/nonworking kernels and a bit of git knowhow. I already asked how to do this with git here[0], but I didn't get an answer, yet! I don't know if it helps, but it could minimize the set of code you have to search in! I don't know if this is possible with git and I don't know if this is already done by one of you, but it may helps!

[0]: http://stackoverflow.com/questions/21117901/git-get-subset-of-changes-from-several-diffs
Comment by Faheem Pervez (qwerty12) - Monday, 27 January 2014, 15:46 GMT
With Ulf Winkelvos's linux-efibootfix-3.12.7-3, I've had no problems starting up Arch Linux through EFISTUB on my Thinkpad X230 (BIOS 2.57). From a cold boot, my startup sequence looks like this: Windows 8 boot menu->Choose next device (Linux Boot Manager) to boot from and restart->PreLoader starts->Gummiboot is loaded->kernel's EFISTUB part is run. I've also had no problems when setting the PreLoader to BootNext through efibootmgr and having the Windows 8 part skipped.

I've applied Matt Fleming's reloc2.patch to graysky's 3.12.9-1-ck and that's also started up fine, too.
Comment by Wim Herremans (herremaw) - Monday, 27 January 2014, 17:31 GMT
@Matthias Beyer, I don't have any experience with git or with the kernel sources. So, I don't think that I am the right person to do so.

Also let me remind you that I have built both booting and non-booting kernels from the same source (3.12.7-2), using the same procedure. So, what is the point in looking for differences in the source? There is no difference between 3.12.7-2 and 3.12.7-2. It just seems to be randomness at work.
Comment by norealname4u (talu2pwal) - Monday, 27 January 2014, 22:58 GMT
@Matthias Beyer: it is harder than that, a working kernel might also have the bug but be unaffected. Ulf supposed it could be an alignment issue which means some writes are going to some part of memory they shouldn't. Compiling the same kernel sources with a different name is enough to trigger the bug or avoid it. We would have to find the very first kernel to be affected by this bug and compare it with the previous working version to have a chance to locate the bug.

@Wim Herremans: This kind of randomness is consistent with writing to some part of memory you're not supposed to, sometimes it has no visible effect if at all and other times it causes unexpected behavior or crashes the whole system. As it happens early in the boot process, we have no feedback, error message or logs to look at, which makes things harder to debug.
Comment by Jakub Schmidtke (sjakub) - Tuesday, 28 January 2014, 08:56 GMT
I have the same problem with 3.13-1 and 3.13-2 on my ThinkPad T440p. 3.12.9-1 works fine.
I am using reFIND + EFISTUB.
Comment by Kyle Nusbaum (knusbaum) - Tuesday, 28 January 2014, 19:45 GMT
@Matthias Beyer I'm having trouble with 3.12.8-1 on ThinkPad E531 as well. Both gummiboot and UEFI Shell V2 cause kernel to hang.

3.12.6-1 works fine for me.
Comment by Dragomir Ivanov (drago.ivanov) - Sunday, 02 February 2014, 10:47 GMT
@Matthias Beyer, Thinkpad T430s here, with kernel 3.12.8-1-ARCH, I need 2(two) USB mass storage flash drives hooked in, in order to boot into the kernel. But I need to enter the BIOS first, and exit from it without saving the changes. If I try with one mass storage flash drive, it doesn't work. Is there any plan how to track this issue?
Comment by Dragomir Ivanov (drago.ivanov) - Sunday, 02 February 2014, 10:48 GMT
Forgot to tell: gummiboot-41
Comment by Matthias Beyer (musicmatze) - Sunday, 02 February 2014, 11:00 GMT
@Dragomir Ivanov, I'm just a normal user, so I cannot help you!
Comment by Dragomir Ivanov (drago.ivanov) - Sunday, 02 February 2014, 11:08 GMT
@Matthias Beyer, Ah I am sorry. Anyways, is there any development on the issue you may know. Should I try repo-ck kernel builds?
Comment by Matthias Beyer (musicmatze) - Sunday, 02 February 2014, 11:27 GMT
@Dragomir Ivanov: Sorry, I don't know. Every time a new kernel arrives in the repo, I try to upgrade and have my boot-stick next to my device to downgrade again if it does not work. I don't know anything else except that some kernels are broken for me!
Comment by norealname4u (talu2pwal) - Sunday, 02 February 2014, 13:47 GMT
@Dragomir Ivanov: There is an upstream bug report with more information (but no solution yet) at https://bugzilla.kernel.org/show_bug.cgi?id=68761
You should probably get a kernel that's working for you and build it with a separate naming scheme so you always have a working kernel in your boot menu.

I followed the wiki instructions for compiling a kernel with a modified pkgbase here: https://wiki.archlinux.org/index.php/Kernels/Compilation/Arch_Build_System
Comment by Matthias Beyer (musicmatze) - Thursday, 06 February 2014, 20:06 GMT
norealname4u, Wim Herremans: We could try to get the objdump-diff of the kernels you built! I do not have the required knowledge on this, but we could build kernels, test them if they fail or not and check the diff of the objdump (if they are build from the same source). Also, we could check if this error occours only on _every second build_ (or every third or whatever). Maybe this could be a way of figuring out whats wrong! (Sure, this is only possible if there is a difference between two kernels from the same source, I know that! But _if_ there is a difference, we have more problems as just the kernel, I guess...)
Comment by Rich (snugglej) - Thursday, 06 February 2014, 21:15 GMT
@drogo.ivanov I am running the ck and mainline kernels without a problem. The mainline is currently in a release candidate for its next major release so I would highly recommend you installing and running the ck kernel. I'm using gummiboot, if you want to not use the EFI stub and always have a working kernel I would recommend using GRUB as I have not had one person say the machine stopped booting after kernel update with GRUB Loader installed. This is because it doesn't use the EFISTub to load the kernel from my understanding.
Comment by Wim Herremans (herremaw) - Friday, 07 February 2014, 14:25 GMT
@Matthias Beyer, I agree with you, but I don't know how to do that. If anybody knows how to compare the binary booting and non-booting kernels built from the same source I will gladly provide my self-built kernels.

The kernel 3.12.9-2 installed from the Arch repo, boots on my machine. Taking into account norealname4u's suggestion, I have tried to build the same version of the kernel with a different name that I keep in my boot menu in order to have a fallback if an upgrade of the offical kernel fails to boot. I have needed several rebuilds to get a booting 3.12.9-2 kernel with name 'vmlinuz-linux-custom'. So, also for version 3.12.9-2, I see the same weirdness that builds from the same source sometimes result in a booting kernel and sometimes don't. The only consistency is that a binary that boots once, always boots and a binary that fails to boot, always fails.
Comment by Matthias Beyer (musicmatze) - Friday, 07 February 2014, 14:37 GMT
Wim Herremans: Can you send me the images? I'll try to get a objdump-diff! You can upload them somewhere, too, and send me the download link! Please provide the information which one works for you and which one doesn't! Please, don't rename the binaries! (Note: I'm not a maintainer, just a normal user, so I don't know how much I can do... I just want to experiment!)
Comment by Wim Herremans (herremaw) - Friday, 07 February 2014, 16:10 GMT
@Matthias Beyer, I have uploaded 3 3.12.7-2 kernels to this location: http://home.scarlet.be/herremaw/
Comment by Matthias Beyer (musicmatze) - Friday, 07 February 2014, 17:08 GMT
Wim Herremans: Are the "bad" and the "good" ones are compiled from the same source? The diff is _really_ huge (>3100k lines)! So if this is really the same source, I think this will not lead to a solution...
Comment by Wim Herremans (herremaw) - Friday, 07 February 2014, 17:30 GMT
@Matthias Beyer, They are indeed built from the same source. Only, the time of build is different. I also don't understand why the result is so much different on every build.

I have also double checked right now that I have uploaded the correct images and that they boot or don't boot as indicated.
Comment by Wim Herremans (herremaw) - Saturday, 08 February 2014, 07:41 GMT
@Matthias Beyer, I have added 2 more 3.12.7-2 images that were built on 2014-01-17 and 2014-01-18, one bad and one good kernel. See http://home.scarlet.be/herremaw/

While I am quite sure that all these kernels were built from the same source (3.12.7-2), I am not sure that I did not install updates on my system. I am fairly sure that I did not install updates between the last 2 builds. So, if the differences between the builds stem from the system environment and not from the source, it is possible that the difference between the last 2 builds is smaller.
Comment by Matthias Beyer (musicmatze) - Saturday, 08 February 2014, 09:10 GMT
Wim Herremans: Unfortunately, the diff between the latest two kernels is also 151 MB ^= 3,000k lines. Damn. This does not lead to a solution, I guess...
Comment by norealname4u (talu2pwal) - Wednesday, 12 February 2014, 01:09 GMT
@Matthias Beyer: Unfortunately I got rid of the non-working kernels I had, I could try to rebuild those if you want.
Comment by Khang Minh (k.minh) - Wednesday, 12 February 2014, 13:42 GMT
With linux 3.12.9 booting via Gummiboot also fails with a blank screen (the live CD also fails). Like some users posted above, try entering the shell v1 a few times and then you will be able to boot into Arch (randomly).

Not sure if 3.13.2 would fix this issue: https://www.archlinux.org/packages/testing/i686/linux/

Grub works though.
Comment by Dariusz Zając (Kazuldur) - Monday, 17 February 2014, 14:35 GMT
It worked for me with linux 3.13.1, it doesn't work again with linux 3.13.2 and 3.13.3.
Comment by Matthias Beyer (musicmatze) - Friday, 21 February 2014, 17:08 GMT
I'm run now with kernel 3.13.4 now (updated after 28 days uptime from 3.12.6, finally)!

Edit: I'm on a Lenovo Thinkpad X220 , booting with rEFInd!
Comment by xdmx (xdmx) - Saturday, 22 February 2014, 02:36 GMT
I've just upgraded to 3.13.4 and it still doesn't boot. I had to downgrade to 3.12.5 again
As many others I'm using Gummiboot on a Lenovo Carbon X1
Comment by Sean Lynch (seanl) - Saturday, 22 February 2014, 06:13 GMT
Working for me with 3.13.4-1-ARCH and Gummiboot on Lenovo X220.
Comment by norealname4u (talu2pwal) - Monday, 24 February 2014, 12:02 GMT
I just upgraded to the current arch kernel 3.13.5-1 and it fails to boot on my thinkpad x230 with refind albeit in a different manner: this time it shows the starting vmlinuz line followed by kernel parameters before hanging.
Comment by Wim Herremans (herremaw) - Monday, 24 February 2014, 15:04 GMT
Kernels 3.13.4-1 and 3.13.5-1 from repo both boot with reFIND on my Acer V3-571.
Comment by Eric Siegel (nticompass) - Monday, 03 March 2014, 02:02 GMT
I have a Lenovo Thinkpad T430.

With rEFInd 0.7.7 installed, I can only get 3.12.6 to boot. No later. Any newer (including the 3.13.5) hangs when booting.
Comment by Kyle Nusbaum (knusbaum) - Monday, 03 March 2014, 19:23 GMT
@nticompass -
Same here, but my machine is a Lenovo ThinkPad E531. I'm usin gummiboot rather than rEFInd, though.
Even manually loading the kernel via UEFI Shell V2 causes a hang.
Comment by Thorben Krüger (benthor) - Wednesday, 05 March 2014, 12:11 GMT
Lenovo T440s here, using gummiboot.

- linux-3.12.9-2-x86_64 boots without issue
- linux-3.13.3-1-x86_64 boots without issue
- linux-3.13.4-1-x86_64 hangs with a blank screen
- linux-3.13.5-1-x86_64 hangs with a blank screen

"hangs with a blank screen" means that I do not see ANY indication that the kernel even loaded. I hit enter in gummiboot and am greeted with a blank (black) screen. I have to do a hard power cycle of the machine, nothing else works.

Comment by Jakub Schmidtke (sjakub) - Saturday, 08 March 2014, 18:05 GMT
Lenovo T440p; rEFInd.
3.12.9-1 works
3.13.2-3 works
3.13.5-1 doesn't work
3.13.6-1 works
Comment by Alex (nylocx) - Sunday, 09 March 2014, 10:00 GMT
Hi, I just got hit by this bug for the first time on my Lenovo T540p with kernel 3.13.6-1 and gummiboot. All Kernels up to this one worked fine.
I just installed the lts kernel in addition to the latest kernel to be able to boot.
Update: I build the kernel my self with the ABS PKGBUILD file but this didn't help.
Comment by Roelof Rietbroek (Strawpants) - Monday, 10 March 2014, 20:42 GMT
Same here:
3.13.5 worked fine but 3.13.6 hangs on a dell XPS13 Haswell using gummiboot
Comment by Niels-Oliver Walkowski (cutuchiqueno) - Tuesday, 11 March 2014, 10:43 GMT
I had gummiboot working on DELL XPS 13 Sputnik 3 (Haswell) with kernel 3.13.5, after upgrade to kernel 3.13.6 I ran into the same issue
Comment by Eric Siegel (nticompass) - Thursday, 13 March 2014, 12:21 GMT
Lenovo T430, rEFInd.

3.13.5-1 doess not work!
3.13.6-1 works!

I can boot my kernel with rEFInd again!
Comment by Simon Pinfold (synap5e) - Saturday, 15 March 2014, 09:54 GMT
HP Envy dv6, both rEFInf and gummiboot

3.13.5-1 does work
3.13.6-1 does not work (blank screen immediately on trying to boot the kernel)
Comment by Matthias Beyer (musicmatze) - Monday, 24 March 2014, 08:18 GMT
3.13.6-1 running here.

Lenovo X220, rEFInd.
Comment by Uuno Turhapuro (durazell) - Monday, 24 March 2014, 16:48 GMT
3.12.6-1 and latest kernel fails using arch live usb, efi shell v1/v2 and gummiboot. Grub2 (chainloaded from gummiboot) goes past kernel line and hangs without output after text "Loading initramfs..". No boot parameters give output. (gigabyte Z77X-UD3H)
Comment by Mike Cloaked (mcloaked) - Tuesday, 25 March 2014, 18:16 GMT
On a recently installed Lenovo Thinkpad S540 with hybrid Intel/ATI graphics, it looks like I have been bitten by this same bug. The boot manager is rEFInd, and prior to this morning kernel 3.13.6 was booting just fine. After pacman update to 3.13.7 the boot failed just after rEFInd begins to boot the efistub kernel. The blue band appears as usual at the top of the screen, and then gives a line informing of booting vmlinuz, but at the next line "Loading with parameters..." with the quote of the kernel line it hangs. Downgrading to the previous 3.16.6 kernel boots just fine as normal again. I have not found anything in the logs to narrow down where to look but I will do a more detailed look at the systemd journal and report back if I find any information of significance.
Comment by Alex (nylocx) - Tuesday, 25 March 2014, 20:49 GMT
For me (Lenvo T540p) 3.13.7 is still not booting with gummiboot, chainloading grub from gummiboot and booting the kernel from there works fine.
Comment by Thomas Bächler (brain0) - Wednesday, 26 March 2014, 00:45 GMT
I just got a brand-new ThinkPad T440s and updated it to the latest firmware. With 3.13.7-1, gummiboot fails, but booting directly via efistub works.

It's a good thing that I can now reproduce this bug myself, but so far it hasn't helped me understand it.
Comment by scott (acp0112) - Wednesday, 26 March 2014, 04:26 GMT
Lenovo w540, gummiboot.

3.12.9-2 works
3.13.4-1 works
3.13.5-1 works
3.13.6-1 hangs
3.13.7-1 hangs
Comment by blash (blash) - Wednesday, 26 March 2014, 12:50 GMT
Lenovo S440, gummiboot

works until 3.13.6-1
fails since 3.13.6-2
Comment by Thomas Bächler (brain0) - Wednesday, 26 March 2014, 13:36 GMT
Can everyone please try booting directly via efistub without using gummiboot or refind? Just create an entry using efibootmgr, see the wiki for details.

I could only reproduce this problem with gummiboot, but my Thinkpad boots fine without gummiboot.
Comment by blash (blash) - Wednesday, 26 March 2014, 14:33 GMT
I can confirm this. Booting via efistub without gummiboot works.
Comment by Wim Herremans (herremaw) - Wednesday, 26 March 2014, 18:44 GMT
Kernels 3.13.4-1, 3.13.5-1, 3.13.6-1 and 3.13.7-1 from repo all boot with reFIND on my Acer V3-571.

I have downgraded to version 3.12.7-2, which does not boot with reFIND (it boots with GRUB2 though), to see if I could boot this kernel via efistub directly.

I have copied vmlinuz-linux and initramfs-linux.img to EFI System Partition and I have installed a boot entry for it by calling efibootmgr as follows:

efibootmgr -d /dev/sda -p 2 -L efistub -l /vmlinuz-linux -u "root=/dev/sda11 rw initrd=/initramfs-linux.img"

It does not boot and I don't get any error messages. It is just dead. But it does respond to Ctrl-Alt-Delete by rebooting. Just as with reFIND.

I have also tried to run it from the EFI shell, with the same result.

Afterwards, I have upgraded to kernel 3.13.7-1 again and I have copied vmlinuz-linux and initramfs-linux.img to the EFI System Partition again, to see if I could boot this kernel via EFISTUB directly. The result is that it boots normally, thus confirming that the boot entry was configured correctly with efibootmgr.
Comment by scott (acp0112) - Wednesday, 26 March 2014, 20:06 GMT
I was able to boot 3.13.7-1 directly via efistub. Where 3.13.7-1 previously hanged when using gummiboot.
Comment by Mike Cloaked (mcloaked) - Wednesday, 26 March 2014, 20:34 GMT
Does anyone know what the difference is between an efistub kernel booted directly as opposed to boot via rEFInd or gummiboot? What is it about either boot manager that changes the way the kernel is booted that makes the difference? I am not technically competent to understand those differences, but perhaps developers working on this issue may have a better chance to understand what needs to be changed that in either boot manager that might get a solution to this?
Comment by Keshav Amburay (the.ridikulus.rat) - Wednesday, 26 March 2014, 22:54 GMT
@brain0: I think the main point of discussion of this bug is boot hang when efistub is used, immaterial of whether direct (efibootmgr/shell) or via gummiboot/rEFInd etc. The issue at hand seems to be setup_efi_pci() in eboot.c of kernel source. COmmenting this function seems to solve the issue for most of the people. If the kernel is booting directly but fails via gummiboot for you, then the issue is in gummiboot and not in kernel efistub code.
Comment by Thomas Bächler (brain0) - Wednesday, 26 March 2014, 23:08 GMT
If you would actually READ this report and the comments, you'd know that commenting a random function in the kernel code does NOT solve the issue. In fact, nobody knows what "the issue" actually is and all attempts to solve it have turned out to be entire failures.

The main point of this discussion by the way is not the discussion of hanging boots, but a pointless stream of "me too" comments. Your comment however has exceeded the uselessness of those comments. You show complete ignorance at the difficulty and variety of the problem and at the same time act like a smartass trying to lecture others. A few more of such comments and I might just close the bug since keeping it open only serves to fill my inbox, but not to add any useful new information.
Comment by Dragomir (drago) - Thursday, 27 March 2014, 08:06 GMT
brain0, don't close this issue. This is serious bug, and every bit of information, how irrelevant it may seem, may help.
What I am wondering, is only ArchLinux users have experienced this problem? If yes, we may search why is that. Maybe because we use vanilla (kernel, gummiboot, etc.), but others use patched versions.
Probably if Fedora guys had it, the best kernel developers could step in?
Can we summarize information so far?
I myself have this issue, on my Thinkpad T430s, which is pitty.
Comment by Thomas Bächler (brain0) - Thursday, 27 March 2014, 09:29 GMT
As for other distributions: They all install GRUB by default, which does not use efistub. The EFI boot manager that are not bootloaders (refind, gummiboot) are only really popular here. For some weird reason, Kay Sievers said somewhere that the problem existed in Fedora 20, but not in Rawhide.

Summary of information so far:

* Booting with EFISTUB may hang the system with no output and no indication of what the error may be.
* This is not bound to any changes in the code, but two builds of the exact same source may result in one booting and one failing kernel.
* This is limited to certain firmware, but present across several vendors.
* It depends on side-effects that are hard to reproduce: Depending on who tries to reproduce it, one of these may be the case:
+ It works without gummiboot but fails with gummiboot.
+ It works when first entering EFI shell before booting.
+ It only works when a USB device is plugged in.
+ It works when chainloading from refind to gummiboot to Linux, but not otherwise.
+ ... (more obscure situations)

None of this bug report's comments has added any new information to that in the last 4 weeks. I actually believed in a common but subtle configuration error, but now that I can reproduce the bug myself (thanks to my new Thinkpad), I am entirely clueless.
Comment by Dragomir Ivanov (drago.ivanov) - Thursday, 27 March 2014, 22:33 GMT
Since this bug is hard to catch, maybe time will pass before fix appears, can we setup ArchWiki on the topic:
"If you use any Thinkpad and want UEFI, go to this ArchWiki page on how to install GRUB chainloader."

Comment by Thomas Bächler (brain0) - Thursday, 27 March 2014, 22:53 GMT
Yes please, the wiki should make users aware of these problems.
Comment by Simon Pinfold (synap5e) - Thursday, 27 March 2014, 23:03 GMT
There already is a warning at the top of the efisub wiki page[1] linking this thread.

[1] https://wiki.archlinux.org/index.php/EFISTUB
Comment by Daniel McLellan (raininja) - Friday, 28 March 2014, 17:33 GMT
I have been affected by this boot issue- I surmise the problem is indeed in eboot.c, and specifically commit dd5fc854de5fd37adfcef8a366cd21a55aa01d3d. I have spoken with Matthew Fleming (kernel dev at Intel) in IRC and posted to lkml.

I had been inadvertently regressing this particular commit with a patch I have been using since 3.7.7.

System: Host: anduril Kernel: 3.12.4-2-hplove x86_64 (64 bit) Desktop: Enlightenment 0.18.99.18202
Distro: Arch Linux
Machine: System: Hewlett-Packard product: HP EliteBook 2570p v: A1009D11
Mobo: Hewlett-Packard model: 17DF v: KBC Version 61.23
Bios: Hewlett-Packard v: 68ISB Ver. F.42 date: 07/17/2013
Comment by norealname4u (talu2pwal) - Thursday, 03 April 2014, 20:16 GMT
Not sure if this is related, but as a new development on my thinkpad x230 the ck kernel fails to boot too now. Albeit with a slightly different symptoms: the boot process hangs at the start right after displaying kernel parameters.

I use efistub and used the ck kernel line as a safe failover up until the last 2 revisions. I'm now frozen to 3.12.6-1-ARCH which is the last kernel version that boots for me, none of the 3.13 worked.

I'm largely ignorant about the linux kernel and its boot process, but the bug when the boot process hangs before even showing anything or a few lines of colored pixels feels like memory corruption of some kind, or maybe wandering in the wrong neighborhood of memory. I probably don't know what I'm talking about but as the bug happens after the screen refresh following and before something is displayed I'd start by looking at what happens in this time frame.
Comment by ValdikSS (ValdikSS) - Thursday, 10 April 2014, 10:20 GMT
3.14-4 is not booting on X220 again.
Comment by Max Liebkies (gegenschall) - Thursday, 10 April 2014, 10:26 GMT
And we're back to this whole shebang: 3.14-4 not booting.
Comment by sven (commonuser) - Thursday, 10 April 2014, 17:11 GMT
3.14-4 via EFISTUB on ThinkPad T420 is not booting, again hangs with a blank screen. Booting via EFI shell from the Arch Installation Image works. The 3.13.x version worked fine. Looks like the same problem back in the 3.12.x days.
Comment by Konsi (0x6b) - Thursday, 10 April 2014, 22:46 GMT
3.14-4 rEFInd on ThinkPad X230 is not booting. (Also tried from uEFI console)
Comment by Rich (snugglej) - Friday, 11 April 2014, 04:46 GMT
I am able to boot via GRUB-EFI (100%) since it does not use the same mechanism as Gummiboot/rEFInd (EFISTUB(?)), are we still thinking it is the kernel that is having the problems or could it be a mechanism in Gummiboot/rEFInd that is failing and this should be looked at those devs? Sorry I'm not very big on how the boot process works exactly.

I have given up on ever getting this fixed and gone back to using Grub, I would prefer to use Gummiboot but almost none of the kernels including some of the AUR kernels such as linux-ck have stopped working in different builds. Linux-ck was my go to kernel and only stopped working in the last couple of builds.
Comment by Thomas Bächler (brain0) - Friday, 11 April 2014, 06:12 GMT
According to Matt Fleming, the most likely cause is that the EFI stub code in Linux fails to initialize memory properly. Thus, the bug depends on how the firmware initializes memory and thus seems to occur randomly in some situations. People have been able to reproduce this bug with starting EFI stub directly from the firmware, bypassing gummiboot or refind entirely.

I was actually looking into readding efilinux to the repositories, so you could call efilinux from gummiboot to boot the kernel.
Comment by Daniel McLellan (raininja) - Friday, 11 April 2014, 13:01 GMT
the faulty code is in eboot.c, commit dd5fc854de5fd37adfcef8a366cd21a55aa01d3d
Comment by Max Liebkies (gegenschall) - Friday, 11 April 2014, 13:48 GMT
This commit is was first merged into 3.8, if I'm not mistaken. This seems unlikely...
Comment by Thomas Bächler (brain0) - Friday, 11 April 2014, 18:08 GMT
Okay, just to narrow this down and maybe provide a workaround. Do kernels also fails to boot using efi handover? For this test, I built and uploaded efilinux packages to https://dev.archlinux.org/~thomas/efilinux/

Install it, then create boot entires for them. I'll use gummiboot as an example:

title Arch Linux (EFILINUX)
efi \EFI\efilinux\efilinux.efi
options xx -f \vmlinuz-linux initrd=\initramfs-linux.img root=/dev/xyz rw ...

The 'xx' after options is weird - if I write 'options -f ...', it doesn't work. This seems like a problem in gummiboot. Just put anything between the 'options' and '-f'. This should work similarly with refind.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 April 2014, 00:58 GMT
all tested on lenovo w520
---
bad: linux-3.14-4-x86_64.pkg.tar.xz - stock - gummiboot -> kernel
bad: linux-3.14-4-x86_64.pkg.tar.xz - stock (recompiled -j8) - gummiboot -> kernel
bad: linux-3.14-4-x86_64.pkg.tar.xz - stock (recompiled -j1) - gummiboot -> kernel
good: linux-3.14-4-x86_64.pkg.tar.xz - stock - gummiboot -> eflinux -> kernel
good: linux-3.14-4-x86_64.pkg.tar.xz - stock (recompiled -j8 - no setup_pci) - gummiboot -> kernel
Comment by Jakub Schmidtke (sjakub) - Saturday, 12 April 2014, 02:46 GMT
I did the efilinux.efi test with rEFInd and it works!
None of the 3.14-* kernels worked for me so I was stuck with 3.13.6-1 (the last one that did work).
With the efilinux.efi handover I am able to boot 3.14-5 :)
This is on T440p.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 April 2014, 04:43 GMT
all tested on lenovo w520 (will try the dell xps 13 I usualy use for testing this bug on sunday)
---
bad: linux-3.14-4-x86_64.pkg.tar.xz - stock (recompiled -j8 - double check) - gummiboot -> kernel
good: linux-3.14-4-x86_64.pkg.tar.xz - stock (recompiled -j8 - setup_pci_hack2) - gummiboot -> kernel
Comment by Thomas Bächler (brain0) - Saturday, 12 April 2014, 07:01 GMT
@Ulf,Jakub: I was hoping that efilinux helps, since it is less invasive than switching to grub. However, it is still weird as efilinux is basically the same code as EFI stub (written by the same guy, and efilinux was an experiment before implementing the stub).

@Jakub: This is really weird. I am using the T440s, doesn't that use even the same firmware as the T440p. The T440s boots fine all the time.

@Ulf: Can you recompile a kernel with the setup_pci hack several times? Since this bug occurs randomly, you might just have had luck. If this still works after several recompilations, we might be on to something here. If this is the case, you should post this to Matt on the upstream bug report.
Comment by sven (commonuser) - Saturday, 12 April 2014, 14:33 GMT
In the Arch git, a patch [1] targeting EFISTUB, was added to 3.14-4. Can someone try if 3.14-3 is still booting? Maybe that patch is causing the problem, even tough it is from Fleming himself ;-).

[1] https://projects.archlinux.org/svntogit/packages.git/commit/trunk?h=packages/linux&id=8a0cef5c69929f717a19f9624e4e1f798b53d1f6
Comment by Thomas Bächler (brain0) - Saturday, 12 April 2014, 16:02 GMT
I already received reports that 3.13-3 also failed while 3.13-2 worked. Same mystery as always, unrelated to this patch.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 April 2014, 17:01 GMT
@Thomas: the attached patch does not work reproducible (this bug is really annoying) ... but this might have been a mistake on my side, as I previously commented out the whole for loop in setup_efi_pci and that worked. My guess is that I failed to install or boot the kenel with only the memcpy part commented out. Now I tried the patch several times and it has not worked one single time. In contrast commenting out the setup_efi_pci() call or setting nr_pci to 0, so effectivly not going through the for loop, works reproducible every time on my W520. On my Dell XPS 13 commenting out the setup_efi_pci() worked every time before. So i am pretty sure this is related to that method.
@all: please try this very old patch: http://pastebin.com/24kvw8kt. To investigate this further I realy need some feedback, whether this "fixes" the bug or not.
Comment by Ross Williams (gunzy83) - Tuesday, 15 April 2014, 12:31 GMT
@Ulf I tested the old patch from that pastebin link with my kernel PKGBUILD based on the ABS linux PKGBUILD for 3.13.8-1 and it boots successfully on my XPS13 (L322x, Ivy Bridge i5). The only other additional patch I have added to that that build that is not is for the trackpad driver (cypress_ps2). linux 3.13.7-1 and 3.13.8-1 from the core repo do not boot at all, all 3.13 series kernels prior to that up to and including 3.13.6-1 boot fine on my hardware. I am yet to test 3.14 but I will report back when I have. Cheers.
Comment by Thomas Bächler (brain0) - Tuesday, 15 April 2014, 12:39 GMT
@gunzy83: Stop that nonsense and read what this bug is about! That your system boots with this patch is likely to be mere coincidence. I will believe that the patch helps if you build 20 or 30 kernels with that patch and every one of them boots. Comments like that are not helping at all.
Comment by Ross Williams (gunzy83) - Tuesday, 15 April 2014, 13:00 GMT
@Thomas Fair enough. I was responding to Ulf with some feedback but if that is unhelpful I am sorry. I will continue testing my kernel builds with and without the patch Ulf posted to build a larger sample set to see if there is a pattern or if it is truly random. I have never had a kernel version from core or compiled by me that didn't match the behaviour of the other (both boot or don't boot). This is the first time this has been different for me so please forgive my excitement/enthusiasm.
Comment by sven (commonuser) - Tuesday, 15 April 2014, 19:08 GMT
3.14.1-1 boots fine via EFISTUB on my ThinkPad T420. Because of the following commit in upstream?

commit 579d8f085b5745ea443a7e79b8283178f28981e0
Author: Borislav Petkov <bp@suse.de>
Date: Sat Jan 18 12:48:17 2014 +0100

x86/efi: Make efi virtual runtime map passing more robust
Comment by Phil Schaf (flying-sheep) - Tuesday, 15 April 2014, 20:30 GMT
for me, however, 3.14 ist the first version that doesn’t boot. neither .0 nor .1
Comment by Ulf Winkelvos (uwinkelvos) - Friday, 18 April 2014, 20:39 GMT
@Ross: Thanks for verifing this.
@Gerd: this did not solve the issue on my dell xps13
@Thomas: There is absolutely no need to disregard what Keshav and Ross say. I can only repeat what I and them have been saying: commenting out setup_efi_pci() "fixes" the bug any single time on atleast some systems. That is exactly why I wanted more sample data, as Matt kind of reacts the same way as you do. But if this works for 10 people instead of just 3 he might be more willing to listen to this. I put fixes above in quotes, as i really think this only shadows the bug. This bug is caused by some kind of memory corruption and storing all pci option roms into memory certainly does not help then. When we see that commenting out that function works for more people, matt or others might find out what exactly gets overwriten by those roms and why this happens. (double free?)
Comment by Steven V (steabert) - Wednesday, 23 April 2014, 20:39 GMT
I'm not entirely sure if my problem is similar to this bug, as it has nothing to do with rEFInd mentioned in the original report, but I'm unable to boot since 3.14 using gummiboot. In short: 3.13.8-1 boots, 3.14-1 up to 3.14.1-1 did not boot. With 3.14.1, I continued testing by trying to boot from EFI shell with ignore_loglevel and earlyprintk=efi, which gives the attached output (it's rather large, I don't know if I'm supposed to attach images).

If this does not fit here, or I can help with testing something, please tell me. I tried to read through all the comments but it got a bit messy. I also read through https://bugzilla.kernel.org/show_bug.cgi?id=68761 and it seemed to have stopped at a point where the bug is still not solved. The issue there seems to match my problem well: when trying to boot it hangs without any output.
Comment by Daniel McLellan (raininja) - Tuesday, 29 April 2014, 20:27 GMT
personally, I think this is the same issue that has plagued me since 3.8 ish - furthermore- isn't that contained in the title of this bug?????
Comment by Daniel Micay (thestinger) - Thursday, 08 May 2014, 00:37 GMT
Please try 3.14.3 from [testing] if you're still afflicted by this since that's the first revision with the current set of patches upstream.

I've built at least a dozen vanilla kernels and grsecurity patched kernels with no issues since the code32_start patch was backported for the Arch package (T530 with up-to-date firmware, works with both gummiboot and direct efistub). The LTS kernel (3.10.39) also works fine for the past couple of point releases. It's possible that there are similar bugs on other hardware, but I don't think there are any left on mine. The majority of kernel builds didn't work before.
Comment by Ulf Winkelvos (uwinkelvos) - Friday, 09 May 2014, 01:05 GMT
3.14.2-1 and 3.14.3-1 work on both my systems.
Comment by Jakub Schmidtke (sjakub) - Friday, 09 May 2014, 03:30 GMT
My experiences with 3.14* on Lenovo T440p:
3.14-3 and 3.14-5 - don't work
3.14.1-1, 3.14.2-1 and 3.14.3-1 - all work fine
Comment by Eric Siegel (nticompass) - Friday, 09 May 2014, 22:55 GMT
Kernel 3.14.2-1 works fine with rEFInd on my Lenovo T430 :-)
Comment by JH (centos) - Sunday, 11 May 2014, 14:21 GMT
3.14.2-1 and 3.14.3-1 work fine on my Thinkpad E430. Unlike 3.14.1-1.
Comment by Daniel Micay (thestinger) - Monday, 12 May 2014, 03:16 GMT
While this has definitely improved, I've recently managed to compile a few kernels that work with grub (no efistub) but not gummiboot...
Comment by Mike Cloaked (mcloaked) - Monday, 12 May 2014, 18:18 GMT
For my machine I noticed that Lenovo released a new BIOS today for my Thinkpad S540 and the list of changes is:
Version 1.55
UEFI: 1.55 / ECP: 1.55

(New) Added support for the UEFI DriverOrder feature.
(New) Updated the Diagnostics module to version 2.03.00.
(Fix) Fixed an issue where the power button did not work while the lid was closed.
(Fix) Fixed an issue where UEFI KeyShiftState was not correctly returned for some keys.
(Fix) Fixed an issue where SMBIOS type 15 structure (System Event Log) was incorrect.
(Fix) Fixed an issue where the LCD brightness control might not work on Linux.

I just wonder if the SMBIOS type 15 structure may have any bearing on this efistub issue? I guess it will take a while to see if I get an further boot failures - recent kernels have booted OK for my with rEFInd but I will report if there is any problem with updated kernels.

I also noticed that they released a new BIOS for the W540 in the past week also, and again owners of that machine who update might check if that makes any difference also.
Comment by Mike Cloaked (mcloaked) - Thursday, 22 May 2014, 14:39 GMT
Does anyone know what the reason for the delayed release of refind-efi version 0.8.1 in the arch repos might be due to? It would be quite important to test further to see if either the newer version of refind or updated BIOS firmwares have an impact on this bug.
Comment by Mike Cloaked (mcloaked) - Thursday, 22 May 2014, 16:03 GMT
I tried to build refind-efi using the PKGBUILD from the page at https://www.archlinux.org/packages/extra/x86_64/refind-efi/ - and got a reiserfs fail in the build. I then changed to the UDK2014 instead of 2010 for the tianocore library and that build failed differently with

Make.tiano:46: recipe for target 'refind_x64.dll' failed
make[1]: *** [refind_x64.dll] Error 1
make[1]: Leaving directory '/home/mike/Documents/install_stuff/arch/refind-debugging/builds/src/refind-0.8.1/refind'
Makefile:34: recipe for target 'tiano' failed
make: *** [tiano] Error 2
==> ERROR: A failure occurred in build().
Aborting...

Has anyone been able to build refind-efi version 0.8.1 so that it can be tested?
Comment by Mike Cloaked (mcloaked) - Sunday, 01 June 2014, 18:03 GMT
The gnu-efi build for rEFInd started building correctly after an update to the gnu-efi-libs package a few days ago, so the arch refind-efi package is now up to date, and is the latest upstream version. The tianocore build is still failing to build as far as I know. So the arch package is now at version 0.8.1 So it would be useful to know if there are any machines that still have a boot problem with the newest build?
Comment by Jakub Schmidtke (sjakub) - Monday, 02 June 2014, 02:42 GMT
I am still having this issue with kernel 3.14.5-1 and refind-efi 0.8.1-1.
Comment by Daniel McLellan (raininja) - Wednesday, 04 June 2014, 23:44 GMT
@uwinklevos- that is the code that my kernel bisection revealed as faulty.
Comment by xdmx (xdmx) - Saturday, 07 June 2014, 12:14 GMT
I've upgraded to 3.14.5-1 and it's working. I use Gummiboot on a Lenovo Carbon X1
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 11 June 2014, 20:16 GMT
Have not had any problems from 3.14.2-1 on to testing/3.15.0-1. But this happend before...

@Daniel: if you still run into trouble, you might try the patch [1] I posted at [2]. It does not fix the problem but betterifies the logging in combination with "ignore_loglevel earlyprintk=efi" as kernel parameters.

[1] https://bugzilla.kernel.org/attachment.cgi?id=134131
[2] https://bugzilla.kernel.org/show_bug.cgi?id=68761
Comment by Eric Siegel (nticompass) - Friday, 13 June 2014, 02:26 GMT
3.14.6-1 loads just fine with rEFInd 0.8.1-1 on my Lenovo T430.
Comment by Sam Stuewe (HalosGhost) - Wednesday, 18 June 2014, 17:43 GMT
I can confirm that UEFI boot with gummiboot/syslinux silently fails with 3.15.1-1 (3.15.0-1 still boots fine).
Comment by Matt (mattdcm) - Thursday, 19 June 2014, 13:37 GMT
On my Lenovo Ideapad Y500, 3.15.1-1 fails silently when booted with gummiboot or directly via EFISTUB, but works with refind or grub. I have never experienced this problem before now.
Comment by Ross Williams (gunzy83) - Saturday, 21 June 2014, 09:00 GMT
No boot with stock 3.14.6-1 stock but boots fine with 3.15.1-1 stock on my XPS13 L322x. Custom kernel worked with 3.14.6-1 but not 3.15.1-1. This is still with Gummiboot or EFISTUB. BIOS is version A09.

@Ulf I added your patch to your custom kernel and it now boots but takes a while to do so as it logs everything. I have no idea what I am looking at in the logs before the normal kernel log lines, anything I can do to help?
Comment by Alex (nylocx) - Saturday, 21 June 2014, 09:08 GMT
Problem is here again with kernel 3.15.1-1-ARCH on my Lenovo ThinkPad T540P booting with gummiboot and efistub. It was gone between 3.13.7 + 1 and last 3.14 kernel.
Comment by Ulf Winkelvos (uwinkelvos) - Sunday, 22 June 2014, 05:43 GMT
3.15.1-1 is good on my XPS 13. I missed out 3.14.6-1, so i cant say if that would have worked.

@Ross: The logging patch is only really useful, when it is compiled in a kernel that does not boot.
Comment by Ross Williams (gunzy83) - Sunday, 22 June 2014, 07:30 GMT
@Ulf Yep, that is what makes this weird. My custom 3.15.1-1 kernel (ARCH + single patch to cypress_ps2.c) booted with your patch but will not boot without it on my system. 3.15.1-1 stock boots fine however. Just tried Refind and my custom 3.15.1-1 boots fine as well.
Comment by Tomas Coufal (Tumi) - Sunday, 22 June 2014, 11:23 GMT
Same here as @Alex. Problem appears now with 3.15.1-1 kernel and gummiboot, but never before. I'm using Lenovo laptop (T540p) too.
Comment by Mike Cloaked (mcloaked) - Sunday, 22 June 2014, 16:17 GMT
Does anyone have any information about how to get some real diagnostic information "before" the kernel even loads - i.e. how to diagnose what is happening during the very earliest stages of the boot process for any of the available bootloaders or boot managers i.e. gummiboot, refind, or for direct boot of the efistub. This problem seems to cause a failure before the kernel has had a chance to start executing, and it is seemingly random as to which kernel, which bootloader, and which hardware is affected at any time. There seems to be no consistent factor pointing to a cause and the continued reports of one kernel booting and another failing to boot does not give the developers any real information to go on. It has been suggested that a memory misalignment of some kind during the first stages of loading the efistub kernel is a possible reason for the failed boot when it occurs but it would appear to me that a way of getting a memory dump of some kind right at the point the kernel loads, for both successful and failed boots for a specific machine and a specific kernel "might" just provide some real data to work on. I did wonder if a cloud survey for people to enter their hardware, kernel version, bootloader and fail/success as a large table might give some slight clues but I now doubt that it would.

So has anyone got any experience with diagnosing really early boot problems as this bug relates to? There are a lot of experts who have been looking into this but so far all attempts at triage have not led to a solution.
Comment by Celti Burroughs (Celti) - Sunday, 22 June 2014, 18:33 GMT
3.15.1-1 fails on my Lenovo Z710 with Gummiboot. First time it's happened to me on this laptop.
Comment by Sam Stuewe (HalosGhost) - Friday, 27 June 2014, 20:27 GMT
With the update to 3.15.2-1 in [testing], this bug has disappeared again.
Comment by Mike Cloaked (mcloaked) - Friday, 04 July 2014, 20:53 GMT
I was trying to find some information about uefi debugging and just maybe there is some way forward to getting some information in the link at http://wiki.osdev.org/UEFI if any user who finds a kernel where it won't boot the efi stub loader. By compiling the boot manager with debugging turned on and setting up a connection for debug output to be recorded there is a chance that some information might be captured that will give a vital clue as to what is happening when the efi stub fails to load the kernel.

I also found there are links to edk2 debug information at https://uefidk.com/sites/default/files/UDK_Debugger_Tool_User_Manual-SR1_v1_10.pdf which is a link from https://uefidk.com/develop/intel-uefi-tools-and-utilities/intel-uefi-development-kit-debugger-tool

For quite some time I have not had an efi stub kernel fail to load on my uefi machines, but if someone has a recent kernel that won't load in rEFInd or gummiboot then although this possible route to debug will take time and effort to set up maybe it will give the first positive information about what is going on with this bug?
Comment by Ulf Winkelvos (uwinkelvos) - Sunday, 06 July 2014, 17:29 GMT
linux-3.15.2-1 : good
linux-3.15.3-1 : bad
linux-3.15.3-1 (rebuild) : good
linux-3.15.3-1 (rebuild, err logging patched) : good

with my error logging patch applied and earlyprintk=efi, I can see that setup_pci always fails on my dell xps13 fhd ivb, but the kernel still boots. I upoaded the the kernel (stock arch 3.15.3-1 with [1]) here: [2]. If someone wants to try the kernel, watch out for the very first logging messages.

@Tobias, Thomas: Could we apply [1] to the stock arch kernel? I will ask Matt to include it in mainline, but i guess this will take some time.

[1] https://bugzilla.kernel.org/attachment.cgi?id=134131
[2] https://mega.co.nz/#!FQB3BIbL!zu4BnrziYVi30ueC-Coop8s3w-q8RWZL4vEeHmW4B0c
Comment by Murari Soundararajan (halfwit) - Thursday, 10 July 2014, 08:49 GMT
I have slightly more than just a "me too" to add. I don't know if this might help anyone but for me (HP Envy, Intel firmware), this problem isn't restricted to kernel versions or builds - on the same installed kernel (3.15.1-1), with nothing changed between reboots, sometimes I can boot and sometimes boot fails (exactly the same symptoms as everyone above). Has anyone else checked if _this_ is why some versions of the kernel seem to work and some don't?
Comment by Thomas Bächler (brain0) - Thursday, 10 July 2014, 16:12 GMT Comment by Ulf Winkelvos (uwinkelvos) - Thursday, 10 July 2014, 19:11 GMT
Tried it on top of linux-git together with my loggig patch [2] and despite setup_efi_pci() still fails my xps 13 booted up just fine. I will try both patches on top of 3.15.4-1 now.

[2] https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/patch/?id=b50a7cee9532e9b17e8a90b022121188c9fc718c
Comment by Ulf Winkelvos (uwinkelvos) - Friday, 11 July 2014, 00:11 GMT
Build two kernels with both patches applied ontop of 3.15.4-1 and both work fine. (setup_efi_pci() is still failing though) Your are very welcome to try out the src [1] and the clean room build kernel package [2].

[1] https://mega.co.nz/#!VlcEjDoK!51qTJvZHs5w6s4WFblLSS8cBXyPwuJqS9Ih2ado7ufc
[2] https://mega.co.nz/#!gosH0QKa!pnBrFsvVtyKHRkKnA6G2ZW7dM_Ea4WBLlhGGHEj5Cbo
Comment by Jakub Schmidtke (sjakub) - Friday, 11 July 2014, 02:58 GMT
I tried the kernel linked by Ulf, and it boots fine on T440p.
Comment by Tobias Powalowski (tpowa) - Friday, 11 July 2014, 06:48 GMT
Added Thomas patch to 3.15.5-2 in [testing], please give feedback if this is the final fix for this issue.
Comment by Steven V (steabert) - Friday, 11 July 2014, 13:59 GMT
For me, 3.15.5-2 still doesn't work (Dell Latitude E4310).
Comment by Sean Lynch (seanl) - Friday, 11 July 2014, 18:31 GMT
After having no problems for many kernel versions, with the last working version being 3.15.3, I am getting a blank screen again with 3.15.5 and gummiboot on my Lenovo Thinkpad X220. Downgrading back to 3.15.3 has made the problem go away for now, but I'm going to have to put the kernel version on hold again, which makes me sad.
Comment by Daniel Micay (thestinger) - Friday, 11 July 2014, 18:33 GMT
3.15.5-1 or 3.15.5-2? Please try testing/linux if you haven't already.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 July 2014, 11:59 GMT
@Tobias: 3.15.5-2 boots up fine, but although I really think his patch will solve the problem for some systems, history of this bug proved us wrong so many time. Would you therefore please add my patch too? Additional logging does not hurt at this point.

@Steven: the kernel i posted above does not boot on your system, odes it?

Cheers, Ulf
Comment by Steven V (steabert) - Saturday, 12 July 2014, 12:59 GMT
@Ulf: the 3.15.4-2 kernel you posted above does not boot either. It does print "setup_efi_pci() failed" instead of giving just a blank screen, but sits there and does nothing after that.
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 July 2014, 13:17 GMT
thats interesting, though...
Comment by Ulf Winkelvos (uwinkelvos) - Saturday, 12 July 2014, 15:00 GMT
Is anyone else, apart from Steven and me seeing this message with my kernel, if it's booting or doesn't?
Comment by Max Liebkies (gegenschall) - Tuesday, 15 July 2014, 10:53 GMT
Kernel 3.15.5-1 brings me back to this bug after a long time of working kernels. Have to resort to booting with EFILINUX. Hardware: Lenovo X220. [I won't bother tracing it down again, this gets too annoying and tedious.]
Comment by Daniel Micay (thestinger) - Tuesday, 15 July 2014, 15:15 GMT
@gegenshall: Try the 3.15.5-2 package in [testing], it includes a fix for this. There may be other issues, but it does fix some of them.
Comment by Dmytro Kostiuchenko (edio) - Thursday, 17 July 2014, 17:26 GMT
Can confirm booting issue with 3.15.5 on Lenovo X220. linux-3.15.3 works fine
Comment by Mike Cloaked (mcloaked) - Thursday, 17 July 2014, 20:20 GMT
In the last comment does your problem booting for the X220 refer to using kernel 3.15.5-1 from [core] which does not have the patches referred to by Tobias, or is it with the kernel 3.15.5-2 which is in [testing]?
Comment by Dmytro Kostiuchenko (edio) - Friday, 18 July 2014, 06:21 GMT
2Mike
That was 3.15.5-1 from [core].

Will check 3.15.5-2 today and let you know if it works.
Comment by Max Liebkies (gegenschall) - Friday, 18 July 2014, 09:30 GMT
3.15.5-2 from testing works just finde, whereas 3.15.5-1 from core doesn't. Seems this is finally fixed! Big yay!
Comment by Mike Cloaked (mcloaked) - Friday, 18 July 2014, 09:54 GMT
It will indeed be great if the patch that Thomas Bächler reported on Thursday, 10 July that is now in 3.15.5-2-ARCH does fix this issue but, given the random nature of the appearance of problematic boots, caution is needed before any celebration, and it is important to wait and see if any users report that for their own systems the boot problem remains with this new kernel. If, over a period of time, any further reports concerning booting the efi stub kernel continue to arrive, then further work will be necessary on this issue. In my own case my 4 machines that boot with the efi stub loader in the kernel have not had a problematic efi stub boot for some time, so if no further reports that anyone has the problem with efi stub boot over a couple of weeks, and also including the next kernel build after 3.15.5-2-ARCH as well, then confidence may grow that the bug has indeed been fixed. So it needs as many users as possible to boot the new kernel and report if the problem occurs before this bug can be closed.
Comment by Dmytro Kostiuchenko (edio) - Friday, 18 July 2014, 12:32 GMT
3.15.5-2 from testing booted just fine for me on Lenovo x220. Using it for an hour, haven't noticed any issues so far.
Comment by Jakub Schmidtke (sjakub) - Friday, 18 July 2014, 15:04 GMT
Lenovo T440p: Versions 3.15.5-1, 3.15.5-2 and 3.15.6-1 all work fine...
Comment by Mike Cloaked (mcloaked) - Friday, 18 July 2014, 20:44 GMT
Just for completeness, and to make it easier to find, I am putting the link to the current latest comment which is from eight days ago in the upstream kernel bug report which is at https://bugzilla.kernel.org/show_bug.cgi?id=68761#c105 even though way back in this thread there are links to the same upstream report.
Comment by Mike Cloaked (mcloaked) - Tuesday, 29 July 2014, 08:49 GMT
With the next kernel now released (3.15.7-1-ARCH) if there are no further reports of this issue with the new kernel, and also when 3.16 is released then is that an appropriate time to close this bug?
Comment by Ulf Winkelvos (uwinkelvos) - Wednesday, 30 July 2014, 20:49 GMT
I agree. If the first few releases of 3.16 are good, we should close this bug and in case it is not fixed open new upstream bugs like Matt suggested.
Comment by David Rheinsberg (dvdhrm) - Thursday, 31 July 2014, 23:25 GMT
This should be fixed for good now (see patch below which is now upstream). Can we close this?


commit c7fb93ec51d462ec3540a729ba446663c26a0505
Author: Michael Brown <mbrown@fensystems.co.uk>
Date: Thu Jul 10 12:26:20 2014 +0100

x86/efi: Include a .bss section within the PE/COFF headers

The PE/COFF headers currently describe only the initialised-data
portions of the image, and result in no space being allocated for the
uninitialised-data portions. Consequently, the EFI boot stub will end
up overwriting unexpected areas of memory, with unpredictable results.

Fix by including a .bss section in the PE/COFF headers (functionally
equivalent to the init_size field in the bzImage header).

Signed-off-by: Michael Brown <mbrown@fensystems.co.uk>
Cc: Thomas Bächler <thomas@archlinux.org>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Comment by Federico (fedev) - Thursday, 31 July 2014, 23:43 GMT
I downloaded the Arch installer today and I'm facing what I believe is the same issue expressed here. When booting the installer using EFI, I either receive a blank screen (you can see some dots at the top, most likely the terminal's output) or I see the screen starting in the middle of the screen and wrap around the screen to the other side (https://app.younited.com/?shareObject=75e984b3-1c7c-383b-f6bd-3b6be81a5f55).

This is the installer I used:

archlinux-2014.07.03-dual.iso

Could the installer be fixed please?

Thanks!

Edited: to add installer name

Loading...