FS#37961 - Race between mdadm(_udev) and lvm2 hook
Attached to Project:
Arch Linux
Opened by Arno (ihad) - Saturday, 30 November 2013, 18:30 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Sunday, 01 December 2013, 06:39 GMT
Opened by Arno (ihad) - Saturday, 30 November 2013, 18:30 GMT
Last edited by Gerardo Exequiel Pozzi (djgera) - Sunday, 01 December 2013, 06:39 GMT
|
Details
Description:
Today I rebooted my system after some time and got dropped to the rescue shell, because the mdadm, or mdadm_udev hook didn't assmble the RAID arrays in time for the lmv2 hook. 1. The setup: ~~ $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md1 : active raid5 sde[0] sdh[4] sdg[2] sdf[1] 8790405120 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4][UUUU] md127 : active raid5 sdb2[0] sdd2[3] sda2[2] sdc2[1] 1464862656 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU] unused devices: <none> ~~ md127 contains the rootfs: ~~ $ mount ... /dev/mapper/raid-root on / type xfs (rw,relatime,attr2,inode64,noquota) ... ~~ 2. The Problem: Both hooks, mdadm and mdadm_udev, don't wait until the kernel has assembled the raids, so lvm vgscan + lvm vgchange -ay fail. Starting lvmetad doesn't help, either. In my case assembling the raids takes about 20 seconds, give or take: ~~ [snip] [ 5.296915] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 5.297120] ata6.15: Port Multiplier 1.2, 0x197b:0x0325 r0, 15 ports, feat 0x5/0xf [ 5.299579] ata6.00: hard resetting link [ 5.783729] ata6.00: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 5.783755] ata6.01: hard resetting link [ 6.270389] ata6.01: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 6.270414] ata6.02: hard resetting link [ 6.757050] ata6.02: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 6.757075] ata6.03: hard resetting link [ 7.243709] ata6.03: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 7.243734] ata6.04: hard resetting link [ 8.266888] ata6.04: failed to resume link (SControl 0) [ 8.267090] ata6.04: SATA link down (SStatus 0 SControl 0) [ 8.267134] ata6.05: hard resetting link [ 9.290205] ata6.05: failed to resume link (SControl 0) [ 9.290406] ata6.05: SATA link down (SStatus 0 SControl 0) [ 9.290450] ata6.06: hard resetting link [ 10.313527] ata6.06: failed to resume link (SControl 0) [ 10.313727] ata6.06: SATA link down (SStatus 0 SControl 0) [ 10.313771] ata6.07: hard resetting link [ 11.336849] ata6.07: failed to resume link (SControl 0) [ 11.337049] ata6.07: SATA link down (SStatus 0 SControl 0) [ 11.337094] ata6.08: hard resetting link [ 12.360171] ata6.08: failed to resume link (SControl 0) [ 12.360373] ata6.08: SATA link down (SStatus 0 SControl 0) [ 12.360418] ata6.09: hard resetting link [ 13.383480] ata6.09: failed to resume link (SControl 0) [ 13.383679] ata6.09: SATA link down (SStatus 0 SControl 0) [ 13.383724] ata6.10: hard resetting link [ 14.406818] ata6.10: failed to resume link (SControl 0) [ 14.407017] ata6.10: SATA link down (SStatus 0 SControl 0) [ 14.407062] ata6.11: hard resetting link [ 15.430135] ata6.11: failed to resume link (SControl 0) [ 15.430335] ata6.11: SATA link down (SStatus 0 SControl 0) [ 15.430380] ata6.12: hard resetting link [ 16.453456] ata6.12: failed to resume link (SControl 0) [ 16.453655] ata6.12: SATA link down (SStatus 0 SControl 0) [ 16.453700] ata6.13: hard resetting link [ 17.476777] ata6.13: failed to resume link (SControl 0) [ 17.476977] ata6.13: SATA link down (SStatus 0 SControl 0) [ 17.477022] ata6.14: hard resetting link [ 18.500092] ata6.14: failed to resume link (SControl 0) [ 18.500291] ata6.14: SATA link down (SStatus 0 SControl 0) [ 18.500888] ata6.00: ATA-8: ST3000DM001-9YN166, CC4H, max UDMA/133 [ 18.500890] ata6.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 18.501413] ata6.00: configured for UDMA/133 [ 18.501972] ata6.01: ATA-8: ST3000DM001-9YN166, CC4H, max UDMA/133 [ 18.501973] ata6.01: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 18.502496] ata6.01: configured for UDMA/133 [ 18.503050] ata6.02: ATA-8: ST3000DM001-9YN166, CC4H, max UDMA/133 [ 18.503051] ata6.02: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 18.503587] ata6.02: configured for UDMA/133 [ 18.504151] ata6.03: ATA-8: ST3000DM001-9YN166, CC4H, max UDMA/133 [ 18.504153] ata6.03: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 18.504680] ata6.03: configured for UDMA/133 [ 18.504725] ata6: EH complete [ 18.504855] scsi 5:0:0:0: Direct-Access ATA ST3000DM001-9YN1 CC4H PQ: 0 ANSI: 5 [ 18.504977] sd 5:0:0:0: [sde] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 18.504980] sd 5:0:0:0: [sde] 4096-byte physical blocks [ 18.505064] sd 5:0:0:0: [sde] Write Protect is off [ 18.505067] sd 5:0:0:0: [sde] Mode Sense: 00 3a 00 00 [ 18.505077] scsi 5:1:0:0: Direct-Access ATA ST3000DM001-9YN1 CC4H PQ: 0 ANSI: 5 [ 18.505095] sd 5:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 18.505265] sd 5:1:0:0: [sdf] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 18.505267] sd 5:1:0:0: [sdf] 4096-byte physical blocks [ 18.505334] sd 5:1:0:0: [sdf] Write Protect is off [ 18.505337] sd 5:1:0:0: [sdf] Mode Sense: 00 3a 00 00 [ 18.505357] sd 5:1:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 18.505445] scsi 5:2:0:0: Direct-Access ATA ST3000DM001-9YN1 CC4H PQ: 0 ANSI: 5 [ 18.505789] sd 5:2:0:0: [sdg] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 18.505791] sd 5:2:0:0: [sdg] 4096-byte physical blocks [ 18.738022] sde: unknown partition table [ 18.738055] sd 5:2:0:0: [sdg] Write Protect is off [ 18.738059] sd 5:2:0:0: [sdg] Mode Sense: 00 3a 00 00 [ 18.738066] scsi 5:3:0:0: Direct-Access ATA ST3000DM001-9YN1 CC4H PQ: 0 ANSI: 5 [ 18.738115] sd 5:2:0:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 18.738170] sd 5:3:0:0: [sdh] 5860533168 512-byte logical blocks: (3.00 TB/2.72 TiB) [ 18.738172] sd 5:3:0:0: [sdh] 4096-byte physical blocks [ 18.738222] sd 5:0:0:0: [sde] Attached SCSI disk [ 18.738225] sd 5:3:0:0: [sdh] Write Protect is off [ 18.738229] sd 5:3:0:0: [sdh] Mode Sense: 00 3a 00 00 [ 18.738252] sd 5:3:0:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 18.753173] sdf: unknown partition table [ 18.769158] sdg: unknown partition table [ 18.784408] sdh: unknown partition table [ 18.793227] sd 5:1:0:0: [sdf] Attached SCSI disk [ 18.793259] sd 5:2:0:0: [sdg] Attached SCSI disk [ 18.892777] sd 5:3:0:0: [sdh] Attached SCSI disk [ 32.050091] md: md127 stopped. [ 32.051031] md: bind<sdc2> [ 32.051204] md: bind<sda2> [ 32.051428] md: bind<sdd2> [ 32.051567] md: bind<sdb2> [ 32.106573] raid6: sse2x1 4654 MB/s [ 32.163234] raid6: sse2x2 7013 MB/s [ 32.219897] raid6: sse2x4 8279 MB/s [ 32.219900] raid6: using algorithm sse2x4 (8279 MB/s) [ 32.219901] raid6: using intx1 recovery algorithm [ 32.220039] async_tx: api initialized (async) [ 32.220125] xor: measuring software checksum speed [ 32.253210] prefetch64-sse: 14538.000 MB/sec [ 32.286554] generic_sse: 13742.400 MB/sec [ 32.286556] xor: using function: prefetch64-sse (14538.000 MB/sec) [ 32.287385] md: raid6 personality registered for level 6 [ 32.287387] md: raid5 personality registered for level 5 [ 32.287389] md: raid4 personality registered for level 4 [ 32.287563] md/raid:md127: device sdb2 operational as raid disk 0 [ 32.287566] md/raid:md127: device sdd2 operational as raid disk 3 [ 32.287567] md/raid:md127: device sda2 operational as raid disk 2 [ 32.287568] md/raid:md127: device sdc2 operational as raid disk 1 [ 32.287817] md/raid:md127: allocated 4332kB [ 32.287839] md/raid:md127: raid level 5 active with 4 out of 4 devices, algorithm 2 [ 32.287840] RAID conf printout: [ 32.287841] --- level:5 rd:4 wd:4 [ 32.287842] disk 0, o:1, dev:sdb2 [ 32.287843] disk 1, o:1, dev:sdc2 [ 32.287844] disk 2, o:1, dev:sda2 [ 32.287845] disk 3, o:1, dev:sdd2 [ 32.287861] md127: detected capacity change from 0 to 1500019359744 [ 32.288412] md127: unknown partition table [ 36.903361] md: md1 stopped. [ 36.904109] md: bind<sdf> [ 36.904234] md: bind<sdg> [ 36.904397] md: bind<sdh> [ 36.904566] md: bind<sde> [ 36.905817] md/raid:md1: device sde operational as raid disk 0 [ 36.905820] md/raid:md1: device sdh operational as raid disk 3 [ 36.905821] md/raid:md1: device sdg operational as raid disk 2 [ 36.905822] md/raid:md1: device sdf operational as raid disk 1 [ 36.906038] md/raid:md1: allocated 4332kB [ 36.906058] md/raid:md1: raid level 5 active with 4 out of 4 devices, algorithm 2 [ 36.906059] RAID conf printout: [ 36.906060] --- level:5 rd:4 wd:4 [ 36.906061] disk 0, o:1, dev:sde [ 36.906062] disk 1, o:1, dev:sdf [ 36.906063] disk 2, o:1, dev:sdg [ 36.906063] disk 3, o:1, dev:sdh [ 36.906082] md1: detected capacity change from 0 to 9001374842880 [ 36.931384] md1: unknown partition table [snip] ~~ After much trial and error I came up with the following solution: ~~ $ /usr/lib/initcpio/hooks # diff -u mdadm mdadm_custom --- mdadm 2013-11-30 18:49:12.662707310 +0100 +++ mdadm_custom 2013-11-30 18:49:48.755836776 +0100 @@ -5,6 +5,8 @@ # for partitionable raid, we need to load md_mod first! modprobe md_mod 2>/dev/null + echo "Waiting ${rootdelay} seconds for RAID to assemble!" + sleep ${rootdelay} if [ -n "$md" ]; then echo 'DEVICE partitions' >"$mdconfig" ~~ I'm pretty sure that there is a better way to wait for RAID assembly, but I came up empty, unfortunately. Additional info: * package version(s) linux 3.12.1-3 mkinitcpio 0.15.0-1 * config and/or log files etc. ~~ [root@hadante] /etc # grep -v '^#' mkinitcpio.conf | grep -v '^$' MODULES="" BINARIES="" FILES="" HOOKS="base udev autodetect modconf block mdadm_custom lvm2_manual filesystems keyboard fsck" ~~ [root@hadante] /lib/initcpio/hooks # cat lvm2_manual #!/usr/bin/ash run_hook() { udevadm settle lvm vgscan lvm vgchange -ay } # vim: set ft=sh ts=4 sw=4 et: ~~ Steps to reproduce: [root@hadante] # systemctl reboot (with 8 SATA-HD + 1 DVD-RW-Writer) Full dmesg attached. |
This task depends upon
Closed by Gerardo Exequiel Pozzi (djgera)
Sunday, 01 December 2013, 06:39 GMT
Reason for closing: Not a bug
Sunday, 01 December 2013, 06:39 GMT
Reason for closing: Not a bug
[ 32.050091] md: md127 stopped.
This is after udev had already assembled the device and mdassemble runs. Presumably, since lvm assembly is based on lvmetad and udev events, lvm assembly tries to take place in parallel with mdassemble breaking your already assembled array.
If you can reproduce this with the *actual* mdadm_udev and lvm2 hooks from the Arch Linux packages, then I'm interested.
~~
HOOKS="base udev autodetect modconf block mdadm_udev lvm2 filesystems keyboard fsck"
~~
with an explicit entry for md127 in /etc/mdadm.conf. No entry for md1, but md1 is not needed for booting, just hosting Virtual machines.
I got a rescue shell after ${rootdelay} seconds.
After
~~
$ lvm vgscan
$ lvm vgchange -ay
$ mount /dev/mapper/raid-root /new_root
$ exit
(some error about job control and that booting would most likely fail)
$ exit
~~
the system booted.
mdadm is a run_hook, lvm2 is a run_earlyhook, which starts lvmetad. If lvmetad is supposed to "vgscan" everything, does it wait until the RAIDS are assembled? Please note that it takes 20+ seconds...
If not, it's the wrong way around (in my case): First assemble the RAIDS, then lvm vgscan.
Long story short:
On the first reboot I got a rescue shell with the actual, original archlinux hooks (mdadm_udev before lvm2) with an explicit /etc/mdadm.conf generated by mdadm --detail --scan for _only_ md127.
But this doesn't line up with what I point out to be problematic... Please attach your whole /etc/mkinitcpio.conf and the dmesg output from a failed boot with it.
I reinstalled mkinitcpio, regenerated the initrd and now it just works with this:
~~
HOOKS="base udev autodetect modconf block mdadm_udev lvm2 filesystems keyboard fsck"
~~
No idea what went wrong in the first place. Anyhow, thanks for your help, and sorry for bothering you!