FS#16373 : [mdadm] Software raid assemly failing during boot

FS#16373 - [mdadm] Software raid assemly failing during boot

Attached to Project: Arch Linux
Opened by Brent Pitman (bpitman0001) - Sunday, 27 September 2009, 16:55 GMT
Last edited by Dan Griffiths (Ghost1227) - Thursday, 25 February 2010, 21:02 GMT

Task Type	Bug Report
Category	Packages: Core
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Aaron Griffin (phrakture) Thomas Bächler (brain0)
Architecture	x86_64
Severity	High
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	2 simt (simt) (2009-12-01) Kenni Lund (Kenni) (2009-10-16)
Private	No

Details

The problem started with my recent upgrade to:

Targets (18): alsa-lib-1.0.21.a-1 apache-2.2.13-1 coreutils-7.6-1
dbus-core-1.2.16-1 dhcpcd-5.1.0-1 hdparm-9.27-1
hwdetect-2009.09-5 junit-4.7-1 kernel26-2.6.30.6-1
rrdtool-1.3.8-2 kmbrnt-serverstats-0.7-3
libmysqlclient-5.1.38-1 mysql-clients-5.1.38-1 mysql-5.1.38-1
udev-146-2 vi-050325-1 xulrunner-1.9.1.3-1
xz-utils-4.999.9beta-1

I think the problem was introduced in the first half of Sept??

During boot, I get a failure when trying to create /dev/md0 from /dev/sda5,/dev/sdb5. A few lines later, it waits 10 secs, then spits out a warning for increasing rootdelay (which doesn't help) and drops into a shell prompt (reboot or exit to continue). Note that I've been running with this config for years. I don't have the exact error message - it only prints to the terminal and I didn't write it down. :(

On my dual cpu, single-core machines, I've seen this happen once in 5-10 reboots. On my single cpu, quad-core machine it happens every time. However, if I put a usb stick in it it works fine. I'm guessing the extra device is slowing things down enough that the md0 assembly has what it needs when called?? I rebooted 5-10 times with and without the usb stick - 100% failed without it and 100% succeeded with it. On the usb, I have archlinux install. Yes, I can tell the difference between booting from my disks and booting from the install "cd".

The is troublesome because I now can't trust machines to come back up; driving into the datacenter is pretty inconvenient.

This task depends upon

Closed by Dan Griffiths (Ghost1227)
Thursday, 25 February 2010, 21:02 GMT
Reason for closing: Not a bug
Additional comments about closing: OP requested close

Comment by Gerardo Exequiel Pozzi (djgera) - Sunday, 27 September 2009, 18:13 GMT

Maybe, your issue is related with this: ~~FS#15756~~ - [dmraid] Can't boot after latest update - kernel panic

Comment by what now? (whoops) - Sunday, 27 September 2009, 18:48 GMT

I don't know if this is related (sorry if it's not): http://bbs.archlinux.org/viewtopic.php?pid=626791. If it is, I can confirm the problem. That was 100% fail on core2duo x86_64 for me, I gave up on it as I didn't really know what's going on or what I was doing ;).

Comment by Brent Pitman (bpitman0001) - Sunday, 27 September 2009, 19:19 GMT

I'm not sure if the associated bugs are related?? I don't see a kernel panic. No real error messages, other than assembly of /dev/md0 failed. And it doesn't happen all the time. My slower machines have seen it once. My faster machine sees it every time, unless a usb device is found.

Another theory (just throwing it out there)...this problem doesn't surface until the 2nd reboot after installing:

Targets (25): apr-1.3.8-2 cloog-ppl-0.15.7-1 coreutils-7.5-1
damageproto-1.2.0-1 fakeroot-1.13-1 fontsproto-2.1.0-1
imagemagick-6.5.5.6-1 libarchive-2.7.1-1 libdrm-2.4.13-1
libfontenc-1.0.5-1 xf86vidmodeproto-2.2.99.1-1
libxxf86vm-1.0.99.1-1 libgl-7.5.1-1 libice-1.0.6-1
libxau-1.0.5-1 libxcursor-1.1.10-1 xineramaproto-1.1.99.1-1
libxinerama-1.0.99.1-1 man-db-2.5.6-1 mesa-7.5.1-1
module-init-tools-3.10-1 pciutils-3.1.4-1 pixman-0.16.0-1
syslog-ng-3.0.4-1 xorg-font-utils-7.4-3

Note that I run every partition (including /boot) on software raid. I have for years without incident.

[root@arizona ~]# egrep -v '^#' /etc/mkinitcpio.conf
MODULES="pata_acpi ata_generic scsi_mod ata_piix ipv6"
BINARIES=""
FILES=""
HOOKS="base udev raid autodetect pata scsi sata keymap filesystems"

## from /boot/grub/menu.lst
# (0) Arch Linux
title Arch Linux [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/md0 ro rootfstype=reiserfs md=0,/dev/sda5,/dev/sdb5
initrd /kernel26.img

All my machines have identical configuration.

Comment by Brent Pitman (bpitman0001) - Sunday, 27 September 2009, 19:41 GMT

Here's a section of dmesg (of course, I only get this when it works):

In the text below, I'm curious what these are:
> Driver 'sd' needs updating - please use bus_type methods
> md0: unknown partition table

ata_piix 0000:00:1f.2: setting latency timer to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: SATA max UDMA/133 cmd 0xdc30 ctl 0xdc28 bmdma 0xdc40 irq 23
ata2: SATA max UDMA/133 cmd 0xdc38 ctl 0xdc2c bmdma 0xdc48 irq 23
ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.01: SATA link down (SStatus 0 SControl 0)
ata1.00: ATA-7: WDC WD2500YS-18SHB2, 20.06C07, max UDMA/133
ata1.00: 488281250 sectors, multi 8: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA WDC WD2500YS-18S 20.0 PQ: 0 ANSI: 5
ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.01: SATA link down (SStatus 0 SControl 0)
ata2.00: ATA-6: HDS722525VLSA80, V36OA6MA, max UDMA/100
ata2.00: 488397168 sectors, multi 8: LBA48
ata2.00: configured for UDMA/100
scsi 1:0:0:0: Direct-Access ATA HDS722525VLSA80 V36O PQ: 0 ANSI: 5
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
Driver 'sd' needs updating - please use bus_type methods
sd 0:0:0:0: [sda] 488281250 512-byte hardware sectors: (250 GB/232 GiB)
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb:<5>sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 sda3 sda4 < sdb1 sdb2 sdb3 sdb4 < sda5 sdb5 sda6 >
sdb6 >
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] Attached SCSI disk
floppy0: no floppy controllers found
md: linear personality registered for level -1
md: multipath personality registered for level -4
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
xor: automatically using best checksumming function: generic_sse
generic_sse: 9835.200 MB/sec
xor: using function: generic_sse (9835.200 MB/sec)
async_tx: api initialized (async)
raid6: int64x1 2109 MB/s
raid6: int64x2 2743 MB/s
raid6: int64x4 2061 MB/s
raid6: int64x8 1914 MB/s
raid6: sse2x1 4293 MB/s
raid6: sse2x2 5089 MB/s
raid6: sse2x4 7937 MB/s
raid6: using algorithm sse2x4 (7937 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: raid10 personality registered for level 10
md: bind<sda5>
md: bind<sdb5>
raid1: raid set md0 active with 2 out of 2 mirrors
md0: unknown partition table
REISERFS (device md0): found reiserfs format "3.6" with standard journal
REISERFS (device md0): using ordered data mode
REISERFS (device md0): journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
REISERFS (device md0): checking transaction log (md0)
REISERFS (device md0): Using r5 hash to sort names
rtc_cmos 00:04: RTC can wake from S4
rtc_cmos 00:04: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, y3k, 242 bytes nvram, hpet irqs
udev: starting version 146
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0

Comment by Aaron Griffin (phrakture) - Monday, 28 September 2009, 15:50 GMT

You can ignore the "driver needs updating one", but the "unknown partition table" error is more scary.

Hmmmm, is this actually a raid6 array? Is it possible it's detecting things wrongly?

Comment by Brent Pitman (bpitman0001) - Monday, 28 September 2009, 17:33 GMT

Not raid6 - it's all raid1.

[root@arizona ~]# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[1]
224010240 blocks [2/2] [UU]

md3 : active raid1 sda3[0] sdb3[1]
2000000 blocks [2/2] [UU]

md2 : active raid1 sda2[0] sdb2[1]
2000000 blocks [2/2] [UU]

md1 : active raid1 sda1[0] sdb1[1]
128384 blocks [2/2] [UU]

md0 : active raid1 sdb5[1] sda5[0]
16000640 blocks [2/2] [UU]

unused devices: <none>

Comment by Tobias Powalowski (tpowa) - Wednesday, 30 September 2009, 05:36 GMT

You can ignore the unkown partitiontable error, it tries to assemble partitions but falls back to normal raid.

Comment by Tobias Powalowski (tpowa) - Wednesday, 30 September 2009, 05:40 GMT

Also it is recommended to use UUID for assembling raid arrays.
You need a custom mdadm.conf for this and recreation of your initramfs afterwards.

# To capture the UUIDs for all your RAID arrays to this file, run these:
# to get a list of running arrays:
# # mdadm -D --scan >>/etc/mdadm.conf
# to get a list from superblocks:
# # mdadm -E --scan >>/etc/mdadm.conf

Comment by Brent Pitman (bpitman0001) - Wednesday, 30 September 2009, 16:49 GMT

-D--scan output already appended to mdadm.conf.
I can add -E. However, when the system fails, it fails during root partition assembly. How is this going to help?

Comment by Tobias Powalowski (tpowa) - Wednesday, 30 September 2009, 16:51 GMT

could you add mdadm.conf please?

Comment by Brent Pitman (bpitman0001) - Wednesday, 30 September 2009, 17:57 GMT

[root@arizona ~]# egrep -v '^#' /etc/mdadm.conf

DEVICE partitions

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=0ef30cf4:4d105619:231d80eb:21b56d4d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=daacfb24:f194901a:41e3ec93:421e060b
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ec82bf03:1708d5ab:25ea56d9:f5b41768
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=0b64d55a:60bc570c:cc9087dc:e332ad34
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=035393fd:d335732f:ce8c9b5d:85495e6a

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 15:16 GMT

Crap, I just did a full upgrade on my headless server. As it didn't come up, I attached a serial console and rebooted:
--
Waiting 10 seconds for device /dev/md0 ...

Root device '/dev/md0' doesn't exist, attempting to create it
ERROR: Failed to parse block device ids for '/dev/md0'
ERROR: Unable to detect or create root device '/dev/md0'
You are being dropped to a recovery shell
Type 'reboot' to reboot
Type 'exit' to try and continue booting
NOTE: klibc contains no 'ls' binary, use 'echo *' instead

If the device '/dev/md0' gets created while you are here,
try adding 'rootdelay=10' or higher to the kernel command-line
ramfs$ exit
Trying to continue (this will most likely fail)...
:: Initramfs Completed - control passing to kinit
IP-Config: no devices to configure
Waiting 0 s before mounting root device...
md: Will configure md0 (super-block) from /dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3, below.
kinit: Unable to mount root fs on device dev(9,0)
kinit: init not found!
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: kinit Not tainted 2.6.31-ARCH #1
Call Trace:
[<ffffffff81384198>] ? panic+0x9a/0x154
[<ffffffff8106765c>] ? exit_ptrace+0xbc/0x160
[<ffffffff8105de88>] ? do_exit+0x6c8/0x7a0
[<ffffffff8105e092>] ? sys_exit+0x22/0x30
[<ffffffff8100c382>] ? system_call_fastpath+0x16/0x1b
--

I checked the output from pacman before I rebooted and there were no errors.

Any ideas?

Comment by Aaron Griffin (phrakture) - Friday, 16 October 2009, 16:12 GMT

Can we see your kernel params and mdadm.conf ?

Comment by Brent Pitman (bpitman0001) - Friday, 16 October 2009, 16:44 GMT

Does it happen every time you boot? I had one machine consistently failing, but others not consistent. I found that connecting a drive to USB "fixed" mdadm (workaround until there is a real fix).

Comment by Aaron Griffin (phrakture) - Friday, 16 October 2009, 16:47 GMT

That makes sense. It could be related to device order... hmmm

Comment by Thomas Bächler (brain0) - Friday, 16 October 2009, 16:56 GMT

The only way we can get to the bottom of this is that someone affected boots with break=y and plays around with any raid tools that are in initramfs. I never had a md setup on Arch, so I can't help you much from there.

Maybe tpowa knows more about it, he did some work on the raid in initramfs.

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 17:17 GMT

/boot/grub/menu.lst:
--
# (0) Arch Linux
title Arch Linux [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/md0 ro md=0,/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3 console=ttyS0,115200
initrd /kernel26.img
--

/etc/mdadm.conf:
--
# grep -v "^#" mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=8cb3320c:06c4fe14:44b2a4ff:0d372c36
ARRAY /dev/md1 level=raid1 num-devices=4 UUID=90bcd9eb:a98afabf:6bfd6349:5ee01994
ARRAY /dev/md2 level=raid5 num-devices=4 UUID=699606c1:85ded7ff:c9668349:033f6432
ARRAY /dev/md3 level=raid5 num-devices=4 UUID=b59566c5:fbbaad63:9eba66c2:3b57b5af
--

Hooks in mkinitcpio.conf:
--
HOOKS="base udev autodetect pata scsi sata mdadm usbinput keymap filesystems"
--

RAID arrays:
md0 = / (RAID5)
md1 = /boot (RAID1)
md2 = swap (RAID5)
md3 = /data (RAID5)

This happened as kernel was upgraded from 2.6.30.6-1 -> 2.6.31.4-1. No changes were made to the config.

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 17:19 GMT

I read that hint, plugging in a WD external HDD and/or an Arch Linux USB stick, didn't fix the issue.

It is consistent, I haven't been able to boot even a single time after the upgrade.

Comment by Thomas Bächler (brain0) - Friday, 16 October 2009, 17:29 GMT

Are the hard drives even detected? Could you run 'echo /dev/*' after booting with the "break=y" option?

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 17:41 GMT

Doesn't seem like it...as far as I can see, there're two mdadm-related binaries in the ramfs; mdassemble and mdassemble.static. The static binary returns:
md: md0 stopped.
mdadm: no devicemd: md1 stopped.
s found for /devmd: md2 stopped.
/md0
mdadm: no md: md3 stopped.
devices found for /dev/md1
mdadm: no devices found for /dev/md2
mdadm: no devices found for /dev/md3

which I can disect into:
md: md0 stopped.
md: md1 stopped.
md: md2 stopped.
md: md3 stopped.
mdadm: no devices found for /dev/md0
mdadm: no devices found for /dev/md1
mdadm: no devices found for /dev/md2
mdadm: no devices found for /dev/md3

And for the device list:
ramfs$ echo /dev/*
/dev/0:0:0:0 /dev/2:0:0:0 /dev/3:0:0:0 /dev/4:0:0:0 /dev/5:0:0:0 /dev/console /dev/cpu_dma_latency /dev/full /dev/kmem /dev/kmsg /dev/mcelog /dev/mem /dev/mice /dev/mouse0 /dev/network_latency /dev/network_throughput /dev/null /dev/port /dev/psaux /dev/ptmx /dev/random /dev/snapshot /dev/tty /dev/tty0 /dev/tty1 /dev/tty10 /dev/tty11 /dev/tty12 /dev/tty13 /dev/tty14 /dev/tty15 /dev/tty16 /dev/tty17 /dev/tty18 /dev/tty19 /dev/tty2 /dev/tty20 /dev/tty21 /dev/tty22 /dev/tty23 /dev/tty24 /dev/tty25 /dev/tty26 /dev/tty27 /dev/tty28 /dev/tty29 /dev/tty3 /dev/tty30 /dev/tty31 /dev/tty32 /dev/tty33 /dev/tty34 /dev/tty35 /dev/tty36 /dev/tty37 /dev/tty38 /dev/tty39 /dev/tty4 /dev/tty40 /dev/tty41 /dev/tty42 /dev/tty43 /dev/tty44 /dev/tty45 /dev/tty46 /dev/tty47 /dev/tty48 /dev/tty49 /dev/tty5 /dev/tty50 /dev/tty51 /dev/tty52 /dev/tty53 /dev/tty54 /dev/tty55 /dev/tty56 /dev/tty57 /dev/tty58 /dev/tty59 /dev/tty6 /dev/tty60 /dev/tty61 /dev/tty62 /dev/tty63 /dev/tty7 /dev/tty8 /dev/tty9 /dev/ttyS0 /dev/ttyS1 /dev/ttyS2 /dev/ttyS3 /dev/urandom /dev/vcs /dev/vcs1 /dev/vcsa /dev/vcsa1 /dev/zero

I don't know what the x:0:0:0 devices are, but if it's not the harddrives, it doesn't seem like the harddrives are detected.

Any further ideas?

Comment by Aaron Griffin (phrakture) - Friday, 16 October 2009, 18:46 GMT

So most likely this isn't raid related at all. Do you happen to know the modules your disk controllers use?

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 19:35 GMT

It's the ahci module, the controller which the RAID is connected to is a Intel ICH9R:
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)

Comment by Kenni Lund (Kenni) - Friday, 16 October 2009, 21:15 GMT

Problem solved...apparently something went wrong when the new kernel was installed and mkinitcpio was run (even though the pacman log doesn't contain any errors).

Booting a live CD, assembling the arrays, chroot'ing into the enviroment and running "mkinitcpio -p kernel26" fixed the problem.

I wonder if this is an obscure bug of mkinitcpio or what else it could be. According to the original description of the bug in this bugreport, bpitman0001 was able to boot his system when extra USB devices were connected. It didn't work in my case, so the two issues are probably not related after all.

Comment by Tobias Powalowski (tpowa) - Saturday, 17 October 2009, 13:14 GMT

Just one note here, dmraid and mdraid is something totally different.
DMRAID is used for bios raid devices, they are created by the bios of the dmraid controller.
MDRAID is the standard linux software raid assembling.
I don't know if setting a raid on top of a DMRAID makes that much sense.

For mdraid assembling, it is highly recommended to use the custom mdadm.conf file with UUID assembling.

greetings
tpowa

Comment by Thomas Bächler (brain0) - Saturday, 17 October 2009, 13:46 GMT

Attach the complete pacman.log snippet of the whole -Syu session (starting with "Starting full system upgrade"). There were obscure issues in mkinitcpio, I thought they should be eliminated.

Comment by Kenni Lund (Kenni) - Saturday, 17 October 2009, 13:48 GMT

@tpowa

Woops, I didn't even notice that...I'm on MDRAID, I don't use the BIOS functionality.

However, when I reread the description and comments in this bug report, it seems like everyone is talking about MDRAID (/dev/mdX)...(?) - In that case, only the name of the bugreport needs to be renamed.

Comment by Kenni Lund (Kenni) - Saturday, 17 October 2009, 13:58 GMT

@brain0

pacman -Syu session attached
pacman -Q package list attached

pacman.log (4.4 KiB)

package_list.txt (6.7 KiB)

Comment by Thomas Bächler (brain0) - Saturday, 17 October 2009, 14:05 GMT

I don't get why this would break your initramfs, at least fallback should have been fine (or even both). The log looks completely clean, like everything went fine.

Comment by Kenni Lund (Kenni) - Saturday, 17 October 2009, 14:22 GMT

I got to think about something...after running mkinitcpio manually and rebooting, it wouldn't let me boot, as I needed to run fsck first. I didn't think about it when it happened and I didn't take notice of what it wanted to fix, but if the boot-filesystem (including the initramfs) was partly damaged, I suppose that this also could be the reason for the problem...?

Comment by Tobias Powalowski (tpowa) - Monday, 02 November 2009, 20:05 GMT

can we close this i think your system works fine now again?

Comment by Brent Pitman (bpitman0001) - Monday, 02 November 2009, 20:24 GMT

My systems are still broken, though I haven't tried upgrading since I filed this bug. Since they are remote, I'd rather not test them until we have a potential solution. I haven't seen anything in this big that would suggest we understand why raid assembly us failing.

Comment by Tobias Powalowski (tpowa) - Saturday, 21 November 2009, 08:39 GMT

You should change your grub line to something like this propably, when using mdadm hook:
root=/dev/disk/by-uuid/<youruuid>

Then the uuid assembly will be used.
Also please look if blkid is present on your systems and if your filesystem module is in your initramfs

Comment by Brent Pitman (bpitman0001) - Monday, 23 November 2009, 02:59 GMT

I just upgraded all systems. The problem still exists (very annoying as all systems are remote - anyone have recommendations for a terminal server?). The slower systems demonstrated ~20% failure rate and the faster system worked once without the usb stick in, but otherwise failure rate still seems to be near 100% without the workaround. Note that all my systems are x86_64 - not sure if this matters.

I need some help testing the suggestions in the last previous comment (53552). Replacing root=/dev/md0 with root=/dev/disk/by-uuid/<youruuid> (and deleting md=...) resulting in the same error messages. I'm not confident that the config is right:

# (3) Arch Linux (by-uuid)
title Arch Linux (by-uuid) [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/disk/by-uuid/0ef30cf4:4d105619:231d80eb:21b56d4d ro rootfstype=reiserfs
initrd /kernel26.img

I don't know how to test if blkid is on the system. I'm also don't know how to validate initramfs, though, this is in my mkinitcpio.conf:
MODULES="pata_acpi ata_generic scsi_mod ata_piix ipv6"
BINARIES=""
FILES=""
HOOKS="base udev raid autodetect pata scsi sata keymap filesystems"

Comment by Tobias Powalowski (tpowa) - Monday, 23 November 2009, 06:31 GMT

Try if blkid is available on your system, if not then autodetection of your filesystem will not work!
type blkid
If an error happens install util-linux-ng again with pacman -S util-linux-ng

You can add reiserfs to your mkinitcpio.conf MODULES array.

- you need to change raid to mdadm hook
- Make sure your mdadm.conf file includes all your arrays
You can use mdadm -Es >>/etc/mdadm.conf for this.
- run mkinitcpio -p kernel26

Comment by Brent Pitman (bpitman0001) - Monday, 23 November 2009, 15:36 GMT

I can try these things, but they seem like things the system MUST do for software raid to work. In my case, it's worked for years, but recently started experiencing sporadic failure.

[root@arizona ~]# blkid
/dev/sdb1: UUID="daacfb24-f194-901a-41e3-ec93421e060b" TYPE="linux_raid_member"
/dev/sdb2: UUID="ec82bf03-1708-d5ab-25ea-56d9f5b41768" TYPE="linux_raid_member"
/dev/sdb3: UUID="0b64d55a-60bc-570c-cc90-87dce332ad34" TYPE="linux_raid_member"
/dev/sdb5: UUID="0ef30cf4-4d10-5619-231d-80eb21b56d4d" TYPE="linux_raid_member"
/dev/sdb6: UUID="035393fd-d335-732f-ce8c-9b5d85495e6a" TYPE="linux_raid_member"
/dev/sda1: UUID="daacfb24-f194-901a-41e3-ec93421e060b" TYPE="linux_raid_member"
/dev/sda2: UUID="ec82bf03-1708-d5ab-25ea-56d9f5b41768" TYPE="linux_raid_member"
/dev/sda3: UUID="0b64d55a-60bc-570c-cc90-87dce332ad34" TYPE="linux_raid_member"
/dev/sda5: UUID="0ef30cf4-4d10-5619-231d-80eb21b56d4d" TYPE="linux_raid_member"
/dev/sda6: UUID="035393fd-d335-732f-ce8c-9b5d85495e6a" TYPE="linux_raid_member"
/dev/md0: UUID="720597d0-62ed-48d7-843c-b73ef66c6809" TYPE="reiserfs"
/dev/md1: UUID="d50f8c66-4705-41d5-8931-e99fe59dde64" TYPE="reiserfs"
/dev/md2: UUID="d332c1e9-3966-4791-a739-5df7fbe35852" TYPE="swap"
/dev/md3: UUID="7fa23a34-3f6e-40fe-bf26-38da15d53230" TYPE="swap"
/dev/md4: UUID="51f329ea-e8eb-45b8-b2e0-7ee9a89f56db" TYPE="reiserfs"
/dev/sdc1: LABEL="ARCHISO_COHYAE4A" UUID="5d4be3f0-aed4-4b82-b702-a130b56fbd9f" TYPE="ext2"

I likely won't go to the datacenter again until after the holidays. I'll try adding reisefs to MODULES and replacing HOOK raid with mdadm at that time.

Note that my mdadm.conf file has always had the the arrays defined. However, the uuid reported by mdadm is different than that by blkid (blkid has these uuid's for sdb, not md).

[root@arizona ~]# mdadm -Es
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=daacfb24:f194901a:41e3ec93:421e060b
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ec82bf03:1708d5ab:25ea56d9:f5b41768
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=0b64d55a:60bc570c:cc9087dc:e332ad34
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=0ef30cf4:4d105619:231d80eb:21b56d4d
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=035393fd:d335732f:ce8c9b5d:85495e6a

Comment by Tobias Powalowski (tpowa) - Monday, 23 November 2009, 16:16 GMT

also missing in your mkinitcpio.conf are the raid modules
raid1 md-mod
We had issues with detecting the raid module lately, which is imho now fixed in git tree.
Adding them manually should be safe and work.

Grub enty should look like this:
root=/dev/disk/by-uuid/blablabla-0fb1-46a3-acfd-blablabla

Comment by Shem Valentine (xvalentinex) - Tuesday, 01 December 2009, 03:33 GMT

I have been experiencing a very similar issue.
mdraid assembly fails on boot and kicks to emergency console.
I am using the mdadm hook instead of raid.

This box has been running fine for over a year, and it was within the last month or two that this started occuring. I don't reboot often, so I can't give a definitive time.

The interesting thing is that if I revert to the fallback kernel, it will boot just fine.

I'm assuming this boils down to a driver issue (my problem at least). I'm not the greatest with mkinitcpio, but is there a way to see what modules are being autodetected?

I will try the UUID in my kernel params vs /dev/mdx, and report back.

Comment by Shem Valentine (xvalentinex) - Tuesday, 01 December 2009, 04:00 GMT

Okay, so I added md_mod, raid0 and raid1 to my MODULES array in mkinitcpio.conf and I can boot from the default kernel now.

On a side note Assembling RAID Devices in /etc/rc.sysinit gives a Fail status.

Running the command (/sbin/mdadm --assemble --scan) from console produces no errors, but ends with an exit status 2.

I would assume this is because of the mdadm hook in mkinitcpio, causes the kernel to assemble the raid arrays, and when init tries to assemble the already assembled arrays it fails.

btw, I didn't boot via UUID, my kernel param is still root=/dev/md2, and this is on i686

Comment by Paul Mattal (paul) - Thursday, 25 February 2010, 13:16 GMT

Wait-- isn't the hook you're supposed to put in mkinitcpio called 'mdadm' these days, for RAID assembly based on the installed the mdadm.conf?

Shouldn't that be in the mkinitcpio.conf? I don't see it there in the posted configs. I think in my RAID configs, it comes before "filesystems".

Comment by Paul Mattal (paul) - Thursday, 25 February 2010, 13:17 GMT

Ah, nevermind, I see tpowa discovered that earlier in the thread.

You still having the same troubles?

Comment by Brent Pitman (bpitman0001) - Thursday, 25 February 2010, 15:35 GMT

I've made the changes posted in previous comments. I've rebooted ~10 times without issue. I think this ticket can be closed. Thanks!

Comment by Thomas Bächler (brain0) - Thursday, 25 February 2010, 16:01 GMT

Okay, a few more remarks: md_mod, raid0 and raid1 should be added and loaded by the mdadm hook automatically. If that is still not working with the latest versions of mkinitcpio and mdadm, it is a bug. So you can create a separate mkinitcpio image with those modules removed (copy mkinitcpio.conf to mkinitcpio.2.conf, generate a new image with mkinitcpio -g /boot/kernel26-test.img -c /etc/mkinitcpio.2.conf) and see if that image still fails.

The FAILED on rc.sysinit is known and we probably can't fix it currently. Things will still work though.

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#16373 - [mdadm] Software raid assemly failing during boot

Details

Loading...