FS#32558 - [mdadm] udev-based incremental assembly failure

Attached to Project: Arch Linux
Opened by Paul Gideon Dann (giddie) - Friday, 09 November 2012, 10:22 GMT
Last edited by Toolybird (Toolybird) - Sunday, 11 June 2023, 04:11 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

mkinitcpio 0.11.0-1

Following the recent thread with a similar title, I decided to try switching to to the mdadm_udev hook from mdadm on several boxes. This worked fine on a couple of boxes, but I have one old server I use at home that has an unusual configuration:

There are 4 disks. They all have a small partition at the start, followed by a large partition spanning the rest of the disk. The small partitions of all the disks are joined in RAID1 to form my /boot.

3 of the disks are joined in RAID0 to form /dev/md2, which is then joined to the 4th disk in RAID1 to form /dev/md1, my main storage area, which is an LVM PV.

Personalities : [raid1] [raid0]
md1 : active raid1 sda2[3] md2[2]
488283136 blocks super 1.2 [2/2] [UU]

md2 : active raid0 sdc2[0] sdb2[2] sdd2[1]
519650304 blocks super 1.2 256k chunks

md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
101312 blocks [4/4] [UUUU]

My /etc/mdadm.conf is fine, and the system boots OK with the mdadm hook, but if I boot with mdadm_udev instead, I get a "no volume groups" error.

I suspect this is caused by the RAID-on-RAID configuration. Something interesting happens when I boot from an Arch ISO: the /dev/md1 array (given a different device node) is partially detected, presumably because /dev/md2 isn't up yet, and as a result is not activated.

I spent some time scratching my head, tying "mdadm --auto-detect" and various similar incantations (even --re-add) in the hope of bringing the array up, but in the end the trick was to *stop* /dev/md1 before auto-detecting again. (It took for too long for me to think of that.) The issue seems to be that mdadm gets stuck with the partial array and for some reason won't complete it, but once the partial array is removed, it can detect the whole thing together without any problem.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 11 June 2023, 04:11 GMT
Reason for closing:  Fixed
Additional comments about closing:  Old and stale (ignoring the recent comment hijack). Assuming the original problem is no longer happening.
Comment by Dave Reisner (falconindy) - Saturday, 10 November 2012, 00:11 GMT
This has little to do with mkinitcpio. reassigning.
Comment by Thomas Bächler (brain0) - Monday, 12 November 2012, 12:16 GMT
I have an idea:

When a new md* device is added, the udev rule first checks if it is part of another RAID, and then runs blkid to add the actual data to the udev database. This is fine, unless you have an md* device containing RAID metadata - in that case, the incremental assembly rules will not be executed.

This udev rule should be reordered to reflect this situation, or even better, split into two rule files that are executed in the proper order (there's two distinct tasks going on, so I don't understand why there is only one rule file). Furthermore, the udev rule must be modernized, it still uses deprecated features like 'blkid -o udev'.
Comment by Thomas Bächler (brain0) - Saturday, 09 February 2013, 17:43 GMT
Sorry for the very long delay, but now I finally got to fixing this: I'll push mdadm 3.2.6-2 to testing in a few minutes, please try if it fixes your problem.
Comment by Paul Gideon Dann (giddie) - Monday, 11 February 2013, 09:47 GMT
Great; thanks for working on this. I'll try to test it again some time soon.
Comment by Paul Gideon Dann (giddie) - Wednesday, 13 February 2013, 11:05 GMT
Hmm; some progress I believe, but I'm afraid this didn't work quite as planned. I updated to mkinitcpio 0.13.0-1 and tested with the following hooks:

HOOKS="base udev autodetect modconf block mdadm_udev lvm2 filesystems keyboard fsck"

I get dropped to a recovery shell, in which I discover the following:

# cat /proc/mdstat
Personalities : [raid1] [raid0]
md1 : inactive sda2[3](S)
488283136 blocks super 1.2

md2 : active raid0 sdc2[0] sdb2[2] sdd2[1]
519650304 blocks super 1.2 256k chunks

md0 : active raid1 sdd1[0] sda1[3] sdc1[2] sdb1[1]
101312 blocks [4/4] [UUUU]

unused devices: <none>

-----

You can see the correct /dev/md1 configuration in the original post. It looks like it's getting stuck having half-built /dev/md1, and is unable to complete building it when /dev/md2 becomes available.

Even more concerning is that the "mdadm" hook is now also broken. When I boot with "mdadm" instead of "mdadm_udev", I find that the md devices are built OK, but LVM fails:

mdadm: /dev/md0 has been started with 4 drives.
mdadm: /dev/md/2 has been started with 3 drives.
mdadm: /dev/md/1 has been started with 2 drives.
[ 8.709859] device-mapper: table: 253:0: linear: dm-linear: Device lookup failed
[ 8.712366] device-mapper: table: 253:1: linear: dm-linear: Device lookup failed
ERROR: device '/dev/mapper/vg-root' not found. Skipping fsck.
ERROR: Unable to find root device '/dev/mapper/vg-root'.

And then at the shell, I see my two LVs, with status -wi-d----. If I do "lvchange -ay" for both of my LVs, and then "exit", the system is able to boot.

I hope this might help you a little to figure out what's going on.
Comment by Thomas Bächler (brain0) - Wednesday, 13 February 2013, 12:03 GMT
Concerning mdadm_udev, can you enter mdadm --incremental /dev/md2 --offroot and see if there is any error? Is there a DEVICE line in /etc/mdadm.conf?

Concerning the lvm failure, I am lost - it actually detects your volume groups fine but fails to activate them. The error message is not helpful either.
Comment by Paul Gideon Dann (giddie) - Wednesday, 13 February 2013, 12:20 GMT
My /etc/mdadm.conf consists of the following:

------
ARRAY /dev/md0 UUID=dfbf33b3:429b728e:27fa015f:20b17ea3
ARRAY /dev/md/2 metadata=1.2 UUID=466243a3:baee9ece:aabfba6d:ad575c84 name=Evey:2
ARRAY /dev/md/1 metadata=1.2 UUID=f79b7c3a:fa76f0b8:ff87e3f1:08ddafbf name=Evey:1

MAILADDR root@localhost
------

The ARRAY lines are all I get from "mdadm --detail --scan". Would DEVICE lines help? Isn't the theory that Udev will react appropriately and build the array as the new device appears?

I'll get back to you with results for the mdadm_udev test.
Comment by Thomas Bächler (brain0) - Wednesday, 13 February 2013, 12:35 GMT
Try with 'DEVICE partitions /dev/md*' as its own line (alternatively, 'DEVICE *' may work, too). As it stands, mdadm will only scan partitions and containers (whereas the manpage doesn't say exactly what that means), but nothing else. Even if added by udev, mdadm will skip devices that are not allowed by a DEVICE statement.

For details, see the the information on incremental assembly in the mdadm manpage and the help on the DEVICE option in the mdadm.conf manpage.
Comment by Paul Gideon Dann (giddie) - Monday, 18 February 2013, 14:58 GMT
This seems very relevant: https://bugs.archlinux.org/task/33851
Comment by Dave Reisner (falconindy) - Monday, 18 February 2013, 18:16 GMT
I can't see how it is. The linked FS us strictly about lvm and devicemapper changes.
Comment by Paul Gideon Dann (giddie) - Tuesday, 19 February 2013, 09:53 GMT
I think it explains the issue assembling my LVM array (the issue when using the mdadm hook), but you're right that it doesn't apply to the mdadm_udev issue. It means there are definitely two separate issues in my initcpio.

The task also suggests changing /dev/md/1 and /dev/md/2 to /dev/md1 and /dev/md2 in the mdadm.conf file, which is certainly worth a shot. Not sure when I'll have a chance to test this out. I don't have the opportunity to reboot this box very often.
Comment by Paul Gideon Dann (giddie) - Friday, 22 February 2013, 10:31 GMT
I tried "mdadm --incremental /dev/md2 --offroot" as you suggested, and that does the trick. The array was started and the LVM volumes were recognised and activated automatically. The array started rebuilding, but that might be because of the crash that required me to reboot the machine.
Comment by Paul Gideon Dann (giddie) - Tuesday, 23 April 2013, 09:32 GMT
I just tested this setup in a VM, and the some problem occurs when attempting to boot from the following configuration:

/dev/sda: 2GB
/dev/sdb: 1GB
/dev/sdc: 1GB

Each with a 200Mb boot partition, and the rest for storage.

/dev/md0: RAID1 /dev/sd[abc]1
/dev/md1: RAID0 /dev/sd[bc]2
/dev/md2: RAID1 /dev/sda2 /dev/md1

/dev/md2 is an LVM PV, but that seems irrelevant. The mdadm_udev hook is being used.

In this setup, running "mdadm --incremental /dev/md1 --offroot" at startup brings the /dev/md2 array up and enables boot. It sounds like there needs to be some iteration of "mdadm --incremental" in the hook until nothing further is detected?
Comment by Doug Newgard (Scimmia) - Tuesday, 12 May 2015, 16:35 GMT
Status?
Comment by Paul Gideon Dann (giddie) - Wednesday, 13 May 2015, 11:29 GMT
I don't have this setup any more, but it shouldn't be too hard to create a test in a VM according to the above description.
Comment by Wilken Gottwalt (Akiko) - Monday, 05 June 2023, 05:51 GMT
I can jump in here. I have a similar setup and encounter this issue since about 2-3 weeks. I have a raid10, or more specific, the raid1+0 near copy version on md0, which always assembles half and the booting stops and I have to stop and reassemble the raid ("mdadm --assemble md0") to get it working.

my hooks:
HOOKS=(base udev autodetect modconf kms keyboard keymap block mdadm_udev encrypt lvm2 filesystems fsck)

my raids:
ARRAY /dev/md0 metadata=0.90 UUID=d162ba5d:4e198eb0:6b6ec825:eadd56bd
ARRAY /dev/md1 metadata=1.2 name=none:storage UUID=b3fbc621:b5534e8b:e11e742c:4162963f

md0 : active raid10 nvme0n1p1[0] nvme2n1p1[3] nvme3n1p1[2] nvme1n1p1[1]
4000795136 blocks 128K chunks 2 near-copies [4/4] [UUUU]
bitmap: 5/30 pages [20KB], 65536KB chunk

EDIT: maybe this could be an issue, the udev assemble rules look for filesystem ids "linux_raid_member" and "md_inc", but this is not the case for a setup using encryption
/dev/nvme0n1p1: UUID="5a486cc9-3ea3-4f6c-b067-c414999a024a" TYPE="crypto_LUKS" PARTUUID="6e115e2b-01"
/dev/nvme3n1p1: UUID="d162ba5d-4e19-8eb0-6b6e-c825eadd56bd" TYPE="linux_raid_member" PARTUUID="69814d9c-01"
/dev/nvme2n1p1: UUID="d162ba5d-4e19-8eb0-6b6e-c825eadd56bd" TYPE="linux_raid_member" PARTUUID="6932cffb-01"
/dev/nvme1n1p1: UUID="5a486cc9-3ea3-4f6c-b067-c414999a024a" TYPE="crypto_LUKS" PARTUUID="76a375e3-01"

Though, this worked for years. I think the two different UUIDs can be ignored. blkid does not show the raid UUIDs, that are the "fs" ids. All the raid volumes have the correct uuid.
# mdadm --examine /dev/nvme*n1p1 | grep UUID
UUID : d162ba5d:4e198eb0:6b6ec825:eadd56bd
UUID : d162ba5d:4e198eb0:6b6ec825:eadd56bd
UUID : d162ba5d:4e198eb0:6b6ec825:eadd56bd
UUID : d162ba5d:4e198eb0:6b6ec825:eadd56bd
Comment by Toolybird (Toolybird) - Monday, 05 June 2023, 06:07 GMT
@Akiko, please don't jump in on 10 year old issues :) Your issue is bound to be different.

Please have a look at  FS#78661 

Loading...