FS#34568 - [linux] 3.8.4-1 fails with fsck boot error on raid0

Attached to Project: Arch Linux
Opened by Dutch de Ruyter (straykat59) - Tuesday, 02 April 2013, 03:54 GMT
Last edited by Tobias Powalowski (tpowa) - Thursday, 23 May 2013, 19:59 GMT
Task Type Bug Report
Category Kernel
Status Closed
Assigned To Tobias Powalowski (tpowa)
Thomas Bächler (brain0)
Architecture x86_64
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description: linux 3.8.4-1 (x86_64) fails with fsck boot error on raid0

I am however, able to boot the linux-lts & my own custom kernels. Custom kernel is built from the abs linux package.

The / partition is on a raid0 with an ext4 filesystem.
fsck -f /dev/md127 comes back clean & the raid & two SATA hard drives are healthy.

Additional info: Boot error message:

[code]Loading ../initramfs-linix.img......ready.
Probing EDD (edd=off to disable)... ok
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
:: running early hook [udev]
:: running hook [udev]
:: Triggering uevents...
Waiting 10 seconds for device /dev/md127 ...
:: performing fsck on '/dev/md127' ...
fsck.ext2: Invalid argument while trying to open /dev/md127
/dev/md127:
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>

ERROR: fsck failed on '/dev/md127'
:: mounting '/dev/md127' on real root
mount: you must specify the filesystem type
You are now being dropped into an emergancy shell.
sh. can't access tty; job control turned off
[rootfs /]#[/code]

The error message is transcribed from a photo taken of the screen as the error is not being logged (or I can't find it).

* package version(s): linux-3.8.3-1, linux-3.8.3-2 & linux-3.8.4-1
* config and/or log files etc:

I have attached my config file from my current custom kernel (which does boot).

[code]HOOKS="base udev autodetect modconf block mdadm_udev filesystems keyboard fsck"[/code]

Steps to reproduce: Reboot into linux-3.8.4-1.

I first put this up on the Kernel & Hardware forums:

[url]https://bbs.archlinux.org/viewtopic.php?id=160230[/url]
   config (66.3 KiB)
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Thursday, 23 May 2013, 19:59 GMT
Reason for closing:  Not a bug
Comment by Dutch de Ruyter (straykat59) - Friday, 05 April 2013, 01:40 GMT
Linux has been upgraded to 3.8.5-1. I have also upgraded my custom kernel accordingly. Linux-lts was upgraded a few days ago.

The upgraded stock kernel is still failing with the same boot error.

Both the linux-lts & linux-custom kernels boot successfully.
Comment by Dutch de Ruyter (straykat59) - Sunday, 07 April 2013, 04:30 GMT
Update;
I have for the last few weeks been recompiling the stock (abs) kernel with one item from the menuconfig uncommented each time trying to find the problem. I suspected that, because my custom kernel booted, the problem was something compiled into the stock kernel. Last night I uncommented the very unlikely:

Processor type and features ---> [ ] Symmetric multi-processing support

And the resulting kernel booted.

My CPU is the single core AMD Athlon 64 4000+.

I am not sure as to why having multi-core support in the 3.8.* kernel would cause raid to fail on a single core system?
Comment by Dutch de Ruyter (straykat59) - Sunday, 07 April 2013, 09:31 GMT
I have just rebuilt my custom kernel with SMP enabled in order to confirm my previous entry. I expected it not to boot, however, it did boot.

To date, I have established that I can only get a stock Arch 3.8.* kernel to boot by disabling SMP but found my custom kernel will boot with SMP enabled!

I am now at a loss isolating this bug. Any help would be appreciated.
Comment by Dutch de Ruyter (straykat59) - Tuesday, 09 April 2013, 07:09 GMT
Another set of upgrades to linux & linux-lts. Linux is now at 3.8.6-1 as is my custom kernel.

Bug status is the same with the stock linux failing with the same boot error.
Comment by Dutch de Ruyter (straykat59) - Monday, 15 April 2013, 23:29 GMT
Another set of upgrades to linux & linux-lts. Linux is now at 3.8.7-1 as is my custom kernel.

Bug status is the same with the stock linux failing with the same boot error.
Comment by Dutch de Ruyter (straykat59) - Sunday, 28 April 2013, 00:32 GMT
I apologise for sounding monotonous but:

Another set of upgrades to linux & linux-lts. Linux is now at 3.8.8-2 as is my custom kernel.

Bug status is the same with the stock linux failing with the same boot error.
Comment by Dutch de Ruyter (straykat59) - Monday, 29 April 2013, 04:47 GMT
Another set of upgrades to linux & linux-lts. Linux is now at 3.8.10-1 as is my custom kernel.

Bug status is the same with the stock linux failing with the same boot error.
Comment by Dutch de Ruyter (straykat59) - Wednesday, 01 May 2013, 05:30 GMT
I have just compiled a vanilla (no change made to the PKGBUILD so no change to the menuconf) linux 3.9-2 from ABS testing & it booted :-)

As an added bonus my printing system now also works again after not working with linux 3.8.*.

Can I suggest holding off closing this bug report till linux 3.9 goes to stable & still boots please.
Comment by Dutch de Ruyter (straykat59) - Thursday, 09 May 2013, 07:29 GMT
I have just compiled a vanilla (no change made to the PKGBUILD so no change to the menuconf) linux 3.9.1-1 from ABS testing & it failed to boot with the usual boot error message :-(

So I am back to trying to debug this issue.

Can I please again ask for help in isolating this bug.

Thanks.
Comment by Dutch de Ruyter (straykat59) - Sunday, 12 May 2013, 08:30 GMT
Just upgraded to linux 3.9.2-1 from abs testing & have the usual boot error message.
Comment by Dutch de Ruyter (straykat59) - Sunday, 12 May 2013, 23:27 GMT
linux 3.9.2-1 is now in core & I have installed it (after a pacman -Syu) in the usual way (pacman -S linux).

Fails to boot with the usual boot error message.

I was able to get linux 3.9-2 to boot, but 3.9.1 & 3.9.2 fail!

What changed between 3.9 & 3.9.1?
Comment by Jesse (Nalthos) - Monday, 13 May 2013, 02:55 GMT
I have this same issue with a RAID 5 array and Arch's stock kernel. I can mount my RAID onto new_root in the emergency shell and then exit it to get Arch to boot up fine, but my server is headless so this is not a great solution.
Comment by Dutch de Ruyter (straykat59) - Saturday, 18 May 2013, 11:56 GMT
I have built & now using a new multi core rig without RAID so I am not able to reproduce this bug.
Comment by Tobias Powalowski (tpowa) - Thursday, 23 May 2013, 19:59 GMT
Jesse please open a new report, you have a different setup.
Closing this one due to not reproducable anymore.

Loading...