FS#29957 - Boot failure after Kernel Upgrade

Attached to Project: Arch Linux
Opened by Curtis (foxcm2000) - Sunday, 20 May 2012, 19:26 GMT
Last edited by Thomas Bächler (brain0) - Monday, 21 May 2012, 15:36 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Dave Reisner (falconindy)
Architecture x86_64
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

I have a small Atom server (SUPERMICRO X7SPA-H with integrated Atom) that has been running fine since 2010 and had about 130 days of uptime since its last reboot. I did a standard full-system upgrade today and the system failed to come back up after the reboot.

What happens:
System starts up fine, GRUB comes up and gives the standard kernel choices. The boot process shows a brief message about loading the kernel and then proceeds *very* rapidly to a login prompt that does not actually let you login. The system cannot be rebooted using ctrl-alt-delete since it complains that it can't find files on the drive.

I think the problem is that the kernel cannot mount the partitions on the boot drive correctly, and therefore can't load anything from the boot disk. I think that an error is displayed on the boot screen but it disappears way too quickly for me to read it and it never gets logged to disk. The Grub menu defaults to disk-by-uuid, but manually putting in /dev/sda1 does not help. The system is able to find a boot partition and load the kernel, but it fails right afterwards (I'm not even sure it can run init).

I don't think that the issue is with the disk. When I boot from an Arch installer USB stick, I can mount the boot drive just fine and all the files are there. I've run fsck and it comes back clean. I can even get the machine mostly running via chroot, but it cannot boot on its own anymore.

Kernel version: 3.3.6-1. I also tried a fallback to 3.2.9 but that didn't help.

Have there been any major changes to the boot process in the last 4 months that would cause this failure? This is not my only Arch machine, but I haven't had any issues like this when upgrading other machines. This is a headless server that does not have a whole lot of stuff running on it (no X-server, no d-bus, no wireless networking, etc.).

This task depends upon

Closed by  Thomas Bächler (brain0)
Monday, 21 May 2012, 15:36 GMT
Reason for closing:  Not a bug
Additional comments about closing:  Unsupported setup.
Comment by Tobias Powalowski (tpowa) - Monday, 21 May 2012, 13:49 GMT
Just a guess, this sounds like a issue with your initramfs could it be that /boot is a separate partition and it was not mounted while updating?
Comment by Thomas Bächler (brain0) - Monday, 21 May 2012, 14:02 GMT
Your information is very contradictory. You claim the system doesn't boot, but then say you get a login screen - which one is it? Please clarify on what is actually happening before we proceed.
Comment by Curtis (foxcm2000) - Monday, 21 May 2012, 15:26 GMT
Initramfs: Could be an issue, but I'm not sure what the issue is. I've already reinstalled the kernel packages just in case something got messed up. I have a single partition that includes /boot as well as /bin, /sbin, and /lib, so the basic utilities should all be available. The contents of /var and /usr are stored on different partitions though.

Your information is very contradictory. --> I agree, I've never seen anything like this before and I've been using Linux for over 10 years. Here's what is happening:
1. Grub works in as much as it comes up and I can choose the kernel to boot.
2. For about a second I get the standard display that comes up when loading a kernel. From my understanding, that means the system can at least get access to the disk partition holding the kernel and read the vmlinuz file.
3. Immediately after the kernel message pops up, the screen jumps to a login prompt. None of the usual services begin loading like they normally would. I think there is some sort of error being displayed, but the initial screen jumps to the login prompt so quickly that you can't read the error, and it never gets logged to disk anywhere. The actual login prompt fails immediately after you type in any username. That is, instead of prompting for a password the login simply clears and prompts for another username. I can sometimes see a flash of some sort of error message, but it disappears so quickly that I can't read it.
Comment by Thomas Bächler (brain0) - Monday, 21 May 2012, 15:35 GMT
/usr on a separate partition is not supported (and hasn't been for a long time, although things (mostly) worked until recently). The topic has been discussed to death on mailing lists and in the forums, and there are workarounds.

The error messages you (don't) see are that bash (the interpreter for the init scripts) is unable to find its shared libraries, most likely.

This is what you need to do:
- Boot from a live medium.
- Mount everything (including /boot, /var, /usr, /dev, /proc, /sys).
- Add the 'shutdown' hook to /etc/mkinitcpio.conf.
- Rebuild the initramfs (mkinitcpio -p linux).
- Profit.

Any further information can be obtained from the forums or mailing lists, or by flooding Dave with emails.

Loading...