Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#16373 - [mdadm] Software raid assemly failing during boot
Attached to Project:
Arch Linux
Opened by Brent Pitman (bpitman0001) - Sunday, 27 September 2009, 16:55 GMT
Last edited by Dan Griffiths (Ghost1227) - Thursday, 25 February 2010, 21:02 GMT
Opened by Brent Pitman (bpitman0001) - Sunday, 27 September 2009, 16:55 GMT
Last edited by Dan Griffiths (Ghost1227) - Thursday, 25 February 2010, 21:02 GMT
|
DetailsThe problem started with my recent upgrade to:
Targets (18): alsa-lib-1.0.21.a-1 apache-2.2.13-1 coreutils-7.6-1 dbus-core-1.2.16-1 dhcpcd-5.1.0-1 hdparm-9.27-1 hwdetect-2009.09-5 junit-4.7-1 kernel26-2.6.30.6-1 rrdtool-1.3.8-2 kmbrnt-serverstats-0.7-3 libmysqlclient-5.1.38-1 mysql-clients-5.1.38-1 mysql-5.1.38-1 udev-146-2 vi-050325-1 xulrunner-1.9.1.3-1 xz-utils-4.999.9beta-1 I think the problem was introduced in the first half of Sept?? During boot, I get a failure when trying to create /dev/md0 from /dev/sda5,/dev/sdb5. A few lines later, it waits 10 secs, then spits out a warning for increasing rootdelay (which doesn't help) and drops into a shell prompt (reboot or exit to continue). Note that I've been running with this config for years. I don't have the exact error message - it only prints to the terminal and I didn't write it down. :( On my dual cpu, single-core machines, I've seen this happen once in 5-10 reboots. On my single cpu, quad-core machine it happens every time. However, if I put a usb stick in it it works fine. I'm guessing the extra device is slowing things down enough that the md0 assembly has what it needs when called?? I rebooted 5-10 times with and without the usb stick - 100% failed without it and 100% succeeded with it. On the usb, I have archlinux install. Yes, I can tell the difference between booting from my disks and booting from the install "cd". The is troublesome because I now can't trust machines to come back up; driving into the datacenter is pretty inconvenient. |
This task depends upon
Closed by Dan Griffiths (Ghost1227)
Thursday, 25 February 2010, 21:02 GMT
Reason for closing: Not a bug
Additional comments about closing: OP requested close
Thursday, 25 February 2010, 21:02 GMT
Reason for closing: Not a bug
Additional comments about closing: OP requested close
FS#15756- [dmraid] Can't boot after latest update - kernel panicAnother theory (just throwing it out there)...this problem doesn't surface until the 2nd reboot after installing:
Targets (25): apr-1.3.8-2 cloog-ppl-0.15.7-1 coreutils-7.5-1
damageproto-1.2.0-1 fakeroot-1.13-1 fontsproto-2.1.0-1
imagemagick-6.5.5.6-1 libarchive-2.7.1-1 libdrm-2.4.13-1
libfontenc-1.0.5-1 xf86vidmodeproto-2.2.99.1-1
libxxf86vm-1.0.99.1-1 libgl-7.5.1-1 libice-1.0.6-1
libxau-1.0.5-1 libxcursor-1.1.10-1 xineramaproto-1.1.99.1-1
libxinerama-1.0.99.1-1 man-db-2.5.6-1 mesa-7.5.1-1
module-init-tools-3.10-1 pciutils-3.1.4-1 pixman-0.16.0-1
syslog-ng-3.0.4-1 xorg-font-utils-7.4-3
Note that I run every partition (including /boot) on software raid. I have for years without incident.
[root@arizona ~]# egrep -v '^#' /etc/mkinitcpio.conf
MODULES="pata_acpi ata_generic scsi_mod ata_piix ipv6"
BINARIES=""
FILES=""
HOOKS="base udev raid autodetect pata scsi sata keymap filesystems"
## from /boot/grub/menu.lst
# (0) Arch Linux
title Arch Linux [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/md0 ro rootfstype=reiserfs md=0,/dev/sda5,/dev/sdb5
initrd /kernel26.img
All my machines have identical configuration.
In the text below, I'm curious what these are:
> Driver 'sd' needs updating - please use bus_type methods
> md0: unknown partition table
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi0 : ata_piix
scsi1 : ata_piix
ata1: SATA max UDMA/133 cmd 0xdc30 ctl 0xdc28 bmdma 0xdc40 irq 23
ata2: SATA max UDMA/133 cmd 0xdc38 ctl 0xdc2c bmdma 0xdc48 irq 23
ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.01: SATA link down (SStatus 0 SControl 0)
ata1.00: ATA-7: WDC WD2500YS-18SHB2, 20.06C07, max UDMA/133
ata1.00: 488281250 sectors, multi 8: LBA48 NCQ (depth 0/32)
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access ATA WDC WD2500YS-18S 20.0 PQ: 0 ANSI: 5
ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata2.01: SATA link down (SStatus 0 SControl 0)
ata2.00: ATA-6: HDS722525VLSA80, V36OA6MA, max UDMA/100
ata2.00: 488397168 sectors, multi 8: LBA48
ata2.00: configured for UDMA/100
scsi 1:0:0:0: Direct-Access ATA HDS722525VLSA80 V36O PQ: 0 ANSI: 5
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
Driver 'sd' needs updating - please use bus_type methods
sd 0:0:0:0: [sda] 488281250 512-byte hardware sectors: (250 GB/232 GiB)
sd 1:0:0:0: [sdb] 488397168 512-byte hardware sectors: (250 GB/232 GiB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb:<5>sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: sda1 sda2 sda3 sda4 < sdb1 sdb2 sdb3 sdb4 < sda5 sdb5 sda6 >
sdb6 >
sd 0:0:0:0: [sda] Attached SCSI disk
sd 1:0:0:0: [sdb] Attached SCSI disk
floppy0: no floppy controllers found
md: linear personality registered for level -1
md: multipath personality registered for level -4
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
xor: automatically using best checksumming function: generic_sse
generic_sse: 9835.200 MB/sec
xor: using function: generic_sse (9835.200 MB/sec)
async_tx: api initialized (async)
raid6: int64x1 2109 MB/s
raid6: int64x2 2743 MB/s
raid6: int64x4 2061 MB/s
raid6: int64x8 1914 MB/s
raid6: sse2x1 4293 MB/s
raid6: sse2x2 5089 MB/s
raid6: sse2x4 7937 MB/s
raid6: using algorithm sse2x4 (7937 MB/s)
md: raid6 personality registered for level 6
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: raid10 personality registered for level 10
md: bind<sda5>
md: bind<sdb5>
raid1: raid set md0 active with 2 out of 2 mirrors
md0: unknown partition table
REISERFS (device md0): found reiserfs format "3.6" with standard journal
REISERFS (device md0): using ordered data mode
REISERFS (device md0): journal params: device md0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
REISERFS (device md0): checking transaction log (md0)
REISERFS (device md0): Using r5 hash to sort names
rtc_cmos 00:04: RTC can wake from S4
rtc_cmos 00:04: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, y3k, 242 bytes nvram, hpet irqs
udev: starting version 146
sd 0:0:0:0: Attached scsi generic sg0 type 0
sd 1:0:0:0: Attached scsi generic sg1 type 0
Hmmmm, is this actually a raid6 array? Is it possible it's detecting things wrongly?
[root@arizona ~]# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[1]
224010240 blocks [2/2] [UU]
md3 : active raid1 sda3[0] sdb3[1]
2000000 blocks [2/2] [UU]
md2 : active raid1 sda2[0] sdb2[1]
2000000 blocks [2/2] [UU]
md1 : active raid1 sda1[0] sdb1[1]
128384 blocks [2/2] [UU]
md0 : active raid1 sdb5[1] sda5[0]
16000640 blocks [2/2] [UU]
unused devices: <none>
You need a custom mdadm.conf for this and recreation of your initramfs afterwards.
# To capture the UUIDs for all your RAID arrays to this file, run these:
# to get a list of running arrays:
# # mdadm -D --scan >>/etc/mdadm.conf
# to get a list from superblocks:
# # mdadm -E --scan >>/etc/mdadm.conf
I can add -E. However, when the system fails, it fails during root partition assembly. How is this going to help?
DEVICE partitions
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=0ef30cf4:4d105619:231d80eb:21b56d4d
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=daacfb24:f194901a:41e3ec93:421e060b
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ec82bf03:1708d5ab:25ea56d9:f5b41768
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=0b64d55a:60bc570c:cc9087dc:e332ad34
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=035393fd:d335732f:ce8c9b5d:85495e6a
--
Waiting 10 seconds for device /dev/md0 ...
Root device '/dev/md0' doesn't exist, attempting to create it
ERROR: Failed to parse block device ids for '/dev/md0'
ERROR: Unable to detect or create root device '/dev/md0'
You are being dropped to a recovery shell
Type 'reboot' to reboot
Type 'exit' to try and continue booting
NOTE: klibc contains no 'ls' binary, use 'echo *' instead
If the device '/dev/md0' gets created while you are here,
try adding 'rootdelay=10' or higher to the kernel command-line
ramfs$ exit
Trying to continue (this will most likely fail)...
:: Initramfs Completed - control passing to kinit
IP-Config: no devices to configure
Waiting 0 s before mounting root device...
md: Will configure md0 (super-block) from /dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3, below.
kinit: Unable to mount root fs on device dev(9,0)
kinit: init not found!
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: kinit Not tainted 2.6.31-ARCH #1
Call Trace:
[<ffffffff81384198>] ? panic+0x9a/0x154
[<ffffffff8106765c>] ? exit_ptrace+0xbc/0x160
[<ffffffff8105de88>] ? do_exit+0x6c8/0x7a0
[<ffffffff8105e092>] ? sys_exit+0x22/0x30
[<ffffffff8100c382>] ? system_call_fastpath+0x16/0x1b
--
I checked the output from pacman before I rebooted and there were no errors.
Any ideas?
Maybe tpowa knows more about it, he did some work on the raid in initramfs.
--
# (0) Arch Linux
title Arch Linux [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/md0 ro md=0,/dev/sda3,/dev/sdb3,/dev/sdc3,/dev/sdd3 console=ttyS0,115200
initrd /kernel26.img
--
/etc/mdadm.conf:
--
# grep -v "^#" mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=4 UUID=8cb3320c:06c4fe14:44b2a4ff:0d372c36
ARRAY /dev/md1 level=raid1 num-devices=4 UUID=90bcd9eb:a98afabf:6bfd6349:5ee01994
ARRAY /dev/md2 level=raid5 num-devices=4 UUID=699606c1:85ded7ff:c9668349:033f6432
ARRAY /dev/md3 level=raid5 num-devices=4 UUID=b59566c5:fbbaad63:9eba66c2:3b57b5af
--
Hooks in mkinitcpio.conf:
--
HOOKS="base udev autodetect pata scsi sata mdadm usbinput keymap filesystems"
--
RAID arrays:
md0 = / (RAID5)
md1 = /boot (RAID1)
md2 = swap (RAID5)
md3 = /data (RAID5)
This happened as kernel was upgraded from 2.6.30.6-1 -> 2.6.31.4-1. No changes were made to the config.
It is consistent, I haven't been able to boot even a single time after the upgrade.
md: md0 stopped.
mdadm: no devicemd: md1 stopped.
s found for /devmd: md2 stopped.
/md0
mdadm: no md: md3 stopped.
devices found for /dev/md1
mdadm: no devices found for /dev/md2
mdadm: no devices found for /dev/md3
which I can disect into:
md: md0 stopped.
md: md1 stopped.
md: md2 stopped.
md: md3 stopped.
mdadm: no devices found for /dev/md0
mdadm: no devices found for /dev/md1
mdadm: no devices found for /dev/md2
mdadm: no devices found for /dev/md3
And for the device list:
ramfs$ echo /dev/*
/dev/0:0:0:0 /dev/2:0:0:0 /dev/3:0:0:0 /dev/4:0:0:0 /dev/5:0:0:0 /dev/console /dev/cpu_dma_latency /dev/full /dev/kmem /dev/kmsg /dev/mcelog /dev/mem /dev/mice /dev/mouse0 /dev/network_latency /dev/network_throughput /dev/null /dev/port /dev/psaux /dev/ptmx /dev/random /dev/snapshot /dev/tty /dev/tty0 /dev/tty1 /dev/tty10 /dev/tty11 /dev/tty12 /dev/tty13 /dev/tty14 /dev/tty15 /dev/tty16 /dev/tty17 /dev/tty18 /dev/tty19 /dev/tty2 /dev/tty20 /dev/tty21 /dev/tty22 /dev/tty23 /dev/tty24 /dev/tty25 /dev/tty26 /dev/tty27 /dev/tty28 /dev/tty29 /dev/tty3 /dev/tty30 /dev/tty31 /dev/tty32 /dev/tty33 /dev/tty34 /dev/tty35 /dev/tty36 /dev/tty37 /dev/tty38 /dev/tty39 /dev/tty4 /dev/tty40 /dev/tty41 /dev/tty42 /dev/tty43 /dev/tty44 /dev/tty45 /dev/tty46 /dev/tty47 /dev/tty48 /dev/tty49 /dev/tty5 /dev/tty50 /dev/tty51 /dev/tty52 /dev/tty53 /dev/tty54 /dev/tty55 /dev/tty56 /dev/tty57 /dev/tty58 /dev/tty59 /dev/tty6 /dev/tty60 /dev/tty61 /dev/tty62 /dev/tty63 /dev/tty7 /dev/tty8 /dev/tty9 /dev/ttyS0 /dev/ttyS1 /dev/ttyS2 /dev/ttyS3 /dev/urandom /dev/vcs /dev/vcs1 /dev/vcsa /dev/vcsa1 /dev/zero
I don't know what the x:0:0:0 devices are, but if it's not the harddrives, it doesn't seem like the harddrives are detected.
Any further ideas?
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
Booting a live CD, assembling the arrays, chroot'ing into the enviroment and running "mkinitcpio -p kernel26" fixed the problem.
I wonder if this is an obscure bug of mkinitcpio or what else it could be. According to the original description of the bug in this bugreport, bpitman0001 was able to boot his system when extra USB devices were connected. It didn't work in my case, so the two issues are probably not related after all.
DMRAID is used for bios raid devices, they are created by the bios of the dmraid controller.
MDRAID is the standard linux software raid assembling.
I don't know if setting a raid on top of a DMRAID makes that much sense.
For mdraid assembling, it is highly recommended to use the custom mdadm.conf file with UUID assembling.
greetings
tpowa
Woops, I didn't even notice that...I'm on MDRAID, I don't use the BIOS functionality.
However, when I reread the description and comments in this bug report, it seems like everyone is talking about MDRAID (/dev/mdX)...(?) - In that case, only the name of the bugreport needs to be renamed.
pacman -Syu session attached
pacman -Q package list attached
root=/dev/disk/by-uuid/<youruuid>
Then the uuid assembly will be used.
Also please look if blkid is present on your systems and if your filesystem module is in your initramfs
I need some help testing the suggestions in the last previous comment (53552). Replacing root=/dev/md0 with root=/dev/disk/by-uuid/<youruuid> (and deleting md=...) resulting in the same error messages. I'm not confident that the config is right:
# (3) Arch Linux (by-uuid)
title Arch Linux (by-uuid) [/boot/vmlinuz26]
root (hd0,0)
kernel /vmlinuz26 root=/dev/disk/by-uuid/0ef30cf4:4d105619:231d80eb:21b56d4d ro rootfstype=reiserfs
initrd /kernel26.img
I don't know how to test if blkid is on the system. I'm also don't know how to validate initramfs, though, this is in my mkinitcpio.conf:
MODULES="pata_acpi ata_generic scsi_mod ata_piix ipv6"
BINARIES=""
FILES=""
HOOKS="base udev raid autodetect pata scsi sata keymap filesystems"
type blkid
If an error happens install util-linux-ng again with pacman -S util-linux-ng
You can add reiserfs to your mkinitcpio.conf MODULES array.
- you need to change raid to mdadm hook
- Make sure your mdadm.conf file includes all your arrays
You can use mdadm -Es >>/etc/mdadm.conf for this.
- run mkinitcpio -p kernel26
[root@arizona ~]# blkid
/dev/sdb1: UUID="daacfb24-f194-901a-41e3-ec93421e060b" TYPE="linux_raid_member"
/dev/sdb2: UUID="ec82bf03-1708-d5ab-25ea-56d9f5b41768" TYPE="linux_raid_member"
/dev/sdb3: UUID="0b64d55a-60bc-570c-cc90-87dce332ad34" TYPE="linux_raid_member"
/dev/sdb5: UUID="0ef30cf4-4d10-5619-231d-80eb21b56d4d" TYPE="linux_raid_member"
/dev/sdb6: UUID="035393fd-d335-732f-ce8c-9b5d85495e6a" TYPE="linux_raid_member"
/dev/sda1: UUID="daacfb24-f194-901a-41e3-ec93421e060b" TYPE="linux_raid_member"
/dev/sda2: UUID="ec82bf03-1708-d5ab-25ea-56d9f5b41768" TYPE="linux_raid_member"
/dev/sda3: UUID="0b64d55a-60bc-570c-cc90-87dce332ad34" TYPE="linux_raid_member"
/dev/sda5: UUID="0ef30cf4-4d10-5619-231d-80eb21b56d4d" TYPE="linux_raid_member"
/dev/sda6: UUID="035393fd-d335-732f-ce8c-9b5d85495e6a" TYPE="linux_raid_member"
/dev/md0: UUID="720597d0-62ed-48d7-843c-b73ef66c6809" TYPE="reiserfs"
/dev/md1: UUID="d50f8c66-4705-41d5-8931-e99fe59dde64" TYPE="reiserfs"
/dev/md2: UUID="d332c1e9-3966-4791-a739-5df7fbe35852" TYPE="swap"
/dev/md3: UUID="7fa23a34-3f6e-40fe-bf26-38da15d53230" TYPE="swap"
/dev/md4: UUID="51f329ea-e8eb-45b8-b2e0-7ee9a89f56db" TYPE="reiserfs"
/dev/sdc1: LABEL="ARCHISO_COHYAE4A" UUID="5d4be3f0-aed4-4b82-b702-a130b56fbd9f" TYPE="ext2"
I likely won't go to the datacenter again until after the holidays. I'll try adding reisefs to MODULES and replacing HOOK raid with mdadm at that time.
Note that my mdadm.conf file has always had the the arrays defined. However, the uuid reported by mdadm is different than that by blkid (blkid has these uuid's for sdb, not md).
[root@arizona ~]# mdadm -Es
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=daacfb24:f194901a:41e3ec93:421e060b
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=ec82bf03:1708d5ab:25ea56d9:f5b41768
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=0b64d55a:60bc570c:cc9087dc:e332ad34
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=0ef30cf4:4d105619:231d80eb:21b56d4d
ARRAY /dev/md4 level=raid1 num-devices=2 UUID=035393fd:d335732f:ce8c9b5d:85495e6a
raid1 md-mod
We had issues with detecting the raid module lately, which is imho now fixed in git tree.
Adding them manually should be safe and work.
Grub enty should look like this:
root=/dev/disk/by-uuid/blablabla-0fb1-46a3-acfd-blablabla
mdraid assembly fails on boot and kicks to emergency console.
I am using the mdadm hook instead of raid.
This box has been running fine for over a year, and it was within the last month or two that this started occuring. I don't reboot often, so I can't give a definitive time.
The interesting thing is that if I revert to the fallback kernel, it will boot just fine.
I'm assuming this boils down to a driver issue (my problem at least). I'm not the greatest with mkinitcpio, but is there a way to see what modules are being autodetected?
I will try the UUID in my kernel params vs /dev/mdx, and report back.
On a side note Assembling RAID Devices in /etc/rc.sysinit gives a Fail status.
Running the command (/sbin/mdadm --assemble --scan) from console produces no errors, but ends with an exit status 2.
I would assume this is because of the mdadm hook in mkinitcpio, causes the kernel to assemble the raid arrays, and when init tries to assemble the already assembled arrays it fails.
btw, I didn't boot via UUID, my kernel param is still root=/dev/md2, and this is on i686
Shouldn't that be in the mkinitcpio.conf? I don't see it there in the posted configs. I think in my RAID configs, it comes before "filesystems".
You still having the same troubles?
The FAILED on rc.sysinit is known and we probably can't fix it currently. Things will still work though.