FS#9014 - mkinitcpio: boot fails after kernel upgrade, filesystem module missing

Attached to Project: Arch Linux
Opened by robin wood (dninja) - Monday, 24 December 2007, 15:57 GMT
Last edited by Roman Kyrylych (Romashka) - Saturday, 06 September 2008, 07:33 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Aaron Griffin (phrakture)
Thomas Bächler (brain0)
Architecture i686
Severity Medium
Priority Normal
Reported Version 2007.08-2
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:

I've been having this problem for a while, if I upgrade the kernel or hwdetect, filesystem and bash my server refuses to boot. The only way I can get it to boot is to boot a live cd, chroot into the existing system and rollback any of the 4 upgraded packages.

I've just tried upgrading the kernel to kernel26-2.6.23.12-3-i686.pkg.tar.gz. If I boot in normal mode it starts booting, does the udev init scripts then I get what looks like a memory dump which finishes with this line:

modprobe exited with preempt_count 1

After a second or so with this on the screen I get another dump which ends with this line:

[<0104482>] sysenter_past_espx 0x6b/0xa1

this whole message then just repeats.

If I boot the fallback kernel I get this:

Attempting to create root device '/dev/sda5'
ERROR: Failed to parse block device name for '/dev/sda5'
unknown
ERROR: root fs cannot be detected. Try using the rootfstype= kernel parameter.
Waiting for devices to settle...done

Root device '/dev/sda5' doesn't exist, attempting to create it
ERROR: failed to parse block device name for '/dev/sda5'
ERROR: unable to create/detect root device '/dev/sda5'
Dropping to a recovery shell... type 'exit' to reboot

Even after waiting a while there are no /dev/sda devices created.

The working packages are:

hwdetect-0.8-11-i686.pkg.tar.gz
bash-3.2.025-2-i686.pkg.tar.gz
kernel26-2.6.23.8-1-i686.pkg.tar.gz
filesystem-2007.11-2-i686.pkg.tar.gz

and so far these versions have caused problems:

hwdetect-0.9-1-i686.pkg.tar.gz
filesystem-2007.11-3-i686.pkg.tar.gz
bash-3.2.025-4-i686.pkg.tar.gz
kernel26-2.6.23.12-3-i686.pkg.tar.gz
kernel26-2.6.23.9-1-i686.pkg.tar.gz

Different combinations of the above packages cause different errors, the one described above is for the working hwdetect, bash and filesystem with the 2.6.23.12-3 kernel.

The server is a shuttle pc with no special hardware or anything custom or unusual in it. I'm not running any custom scripts on boot.

My grub entries are:

# (0) Arch Linux
title Arch Linux
root (hd0,1)
kernel /vmlinuz26 root=/dev/sda5 ro
initrd /kernel26.img

# (1) Arch Linux
title Arch Linux Fallback
root (hd0,1)
kernel /vmlinuz26 root=/dev/sda5 ro
initrd /kernel26-fallback.img

/boot is a different partition in case that matters.

As I say, I've had this a few times with different combinations of packages but I haven't had time to be able to workout which packages are actually causing the problems, all I know is that when I rollback these 4 packages to the older ones it boots and it all works.

If you want any logs or any extra info please ask. I'm reluctant to upgrade the packages again and cause the problem but will if I need to as I can restore it (it is just a pain!)
This task depends upon

Closed by  Roman Kyrylych (Romashka)
Saturday, 06 September 2008, 07:33 GMT
Reason for closing:  Fixed
Comment by Dr. Markus Waldeck (waldeck) - Friday, 04 January 2008, 11:36 GMT
I have the same problem. Installed packages:
hwdetect 0.9-1
bash 3.2.025-4
kernel26 2.6.23.12-3
filesystem 2007.11-3
Comment by Roman Kyrylych (Romashka) - Thursday, 10 January 2008, 23:12 GMT
Weird.
AFAIR hwdetect is not used by anything except the installer. I doubt bash or filesystem could be the cause of the bug.
Cannot it be a broken image generation with mkinitcpio? :-/
Comment by robin wood (dninja) - Thursday, 10 January 2008, 23:17 GMT
If you want me to try any experiments I can do but each time I do it it takes me 10 mins to recover as I have to boot the live cd and rollback packages so please try to get as much out of each test as possible.
Comment by Roman Kyrylych (Romashka) - Thursday, 10 January 2008, 23:25 GMT
I cannot give you any advice on this weird issue.
Can you tell (by memory) what other error messages were (with other combinations of packages) and if they differed much?
Are you sure your filesystem is not corrupted?
Comment by Roman Kyrylych (Romashka) - Thursday, 10 January 2008, 23:27 GMT
Also, could you attach your rc.conf and mkinitcpio.conf?
Comment by robin wood (dninja) - Thursday, 10 January 2008, 23:38 GMT
Sorry, its been a while since I had the error. The only other error message I can remember was similar to the one at the top of this post http://bbs.archlinux.org/viewtopic.php?id=40478 , that was probably with this kernel kernel26-2.6.23.9-1-i686.pkg.tar.gz as I had that error, rolled back and left it for a while then tried again with the latest kernel.

I had this on two machines which are both running fine so the filesystem seems to be ok.

mkinitcpio.conf
-----
MODULES="pata_via ata_generic sata_via"
BINARIES=""
FILES=""
HOOKS="base udev autodetect pata scsi sata usb net keymap encrypt filesystems"

rc.comf
-------
LOCALE="en_GB"
HARDWARECLOCK="localtime"
TIMEZONE="GB"
KEYMAP=uk
CONSOLEFONT=
CONSOLEMAP=
USECOLOR="yes"

MOD_AUTOLOAD="yes"
MOD_BLACKLIST=()
MODULES=(8139too mii capability dm-crypt aes-i586)
USELVM="no"

HOSTNAME="thorbardin"
lo="lo 127.0.0.1"
eth0="eth0 192.168.0.8 netmask 255.255.255.0 broadcast 192.168.0.255"
INTERFACES=(lo eth0)
gateway="default gw 192.168.0.254"
ROUTES=(gateway)

DAEMONS=(syslog-ng network sshd ntpd dhcpd named portmap netfs crond fam postfix authdaemond courier-imap spamd xinetd fetchmail nfslock nfsd httpd mysqld squid samba vmwa
re)
Comment by Dr. Markus Waldeck (waldeck) - Friday, 11 January 2008, 07:43 GMT
I solved the problem!
mkinitcpio.conf:
MODULES="ata_generic ata_piix ext3"
mkinitcpio -g /boot/kernel26.img
Comment by Tobias Powalowski (tpowa) - Friday, 11 January 2008, 07:46 GMT
it seems you needed the filesystem module didn't you?
Comment by robin wood (dninja) - Friday, 11 January 2008, 09:52 GMT
before I try this, should I try ext3 which is my boot partition or reiser which is my root or both? Any guesses?
Comment by Tobias Powalowski (tpowa) - Friday, 11 January 2008, 09:54 GMT
your root should be the right one, because boot is read by grub already to get the kernel loaded.
Comment by robin wood (dninja) - Friday, 11 January 2008, 10:12 GMT
That worked, both machines are now running the latest versions of everything. I just added reisiserfs to the MODULES line on both machines.

Would anyone mind starting a forum thread just to explain why this happened to some machines and not others?
Comment by Tobias Powalowski (tpowa) - Friday, 11 January 2008, 10:13 GMT
there must be a bug in the autodetection of filesystems
Comment by Aaron Griffin (phrakture) - Friday, 11 January 2008, 20:19 GMT
Before this goes any farther, I need to quash the rumor that hwdetect is related. hwdetect should not be used anywhere in any of these tools.
Comment by Aaron Griffin (phrakture) - Friday, 11 January 2008, 20:27 GMT
OK, so to summarize this issue in one sentence: your root device is not being detected.

We can deal with that. Lets try this. When your system is booted, run "lsmod | grep permanent" to get the modules loaded in early userspace for your disk. Mine, for instance, are "generic" and "piix".

Once we have those, you can do the upgrade, and run the following "mkinitcpio -v | grep modules". This will NOT generate an image, so don't worry. It will simply list all the files added to the image, and you can verify that those modules marked as permanent are included.

If they *are*, in fact, included, then boot until it gets to the "break" and try running "modprobe" for those two modules. If that STILL doesn't work, we can begin debugging from there.



Comment by robin wood (dninja) - Friday, 11 January 2008, 20:35 GMT
on my now working machine I have just ran lsmod | grep permanent and got nothing back.

the mkinitcpio.conf file is the same as mine above but now has a MODULES line like this
MODULES="pata_via ata_generic sata_via reiserfs"

Does this mean anything or do you want me to have a go at breaking it again and trying this then?
Comment by Aaron Griffin (phrakture) - Friday, 11 January 2008, 20:35 GMT
Ah, I am foolish. I'll leave the above for future reference, but apparenly the entire issue was that your filesystem was not being detected by mkinitcpio's autodetection.

In this case, and *EVERY CASE* that boot fails, the fallback image WILL work.

Now, as to WHY the filesystem was not detected, is unknown. Anyone with this issue, could you please run:

$ /usr/lib/klibc/bin/fstype < /dev/sdaX

where sdaX is your root partition?
Comment by robin wood (dninja) - Friday, 11 January 2008, 20:38 GMT
output is:

/usr/lib/klibc/bin/fstype < /dev/sda5
FSTYPE=reiserfs
FSSIZE=50001412096

And the fallback image didn't work, I can't remember the error it gave but the only way I could get the machine to boot was from a live cd
Comment by Tobias Powalowski (tpowa) - Saturday, 12 January 2008, 09:06 GMT
just a note here about hwdetect, this is only invoked on the ISO to ensure the same module order everytime you use the ISO.
On a normal arch system it is not used in any kind.
Comment by robin wood (dninja) - Saturday, 23 February 2008, 22:07 GMT
I've just upgraded a machine which was working fine on all the 2.6.23 kernel releases to 2.6.24 and just had the same problem, adding the reiserfs line to /etc/mkinitcpio.conf and rebuilding using a live cd worked again.

I thought I'd report this one as this machine came through most (because I may have missed some upgrades not actual problems) of the 2.6.23 kernels without any problems but has just failed on upgrading to 2.6.24.

Just in case it helps, the output from the commands you asked me to run before are

/usr/lib/klibc/bin/fstype < /dev/sda6
FSTYPE=reiserfs
FSSIZE=20003815424

lsmod | grep permanent
returns nothing
Comment by Maciej Libuda (Mefju) - Sunday, 30 March 2008, 22:00 GMT
I have this problem too. I'm using xfs filesystem on my root partition. Image is generated properly only when I'm doing it from rescue cd. Maybe it's related with mkinitcpio.
Comment by Thomas Bächler (brain0) - Wednesday, 02 April 2008, 22:40 GMT
Could any of you upload a broken image file? It's impossible to diagnose this otherwise.
Comment by robin wood (dninja) - Wednesday, 02 April 2008, 23:14 GMT
I could try to create one, I assume its the initrd file you want.
Comment by Thomas Bächler (brain0) - Thursday, 03 April 2008, 07:42 GMT
Of course, the kernel26.img file (btw, you could always keep a working copy of kernel26.img as a separate file so you don't break anything).
Comment by Greg (dolby) - Monday, 21 July 2008, 16:19 GMT
Still a problem?
Comment by Maciej Libuda (Mefju) - Monday, 01 September 2008, 17:18 GMT
From my side, this bug can be closed. I've done several image rebuilds last time using ext3 and xfs on root partitions. Everything works fine.
Comment by robin wood (dninja) - Monday, 01 September 2008, 17:41 GMT
Everything is working ok here too so I'm ok with closing it.

Shame we never really got to the bottom of it though.

Loading...