FS#7699 - udev / mdadm conflicts, esp. on partitionable raids

Attached to Project: Arch Linux
Opened by Michal Soltys (msoltyspl) - Saturday, 28 July 2007, 22:35 GMT
Last edited by Roman Kyrylych (Romashka) - Saturday, 09 February 2008, 17:18 GMT
Task Type Bug Report
Category System
Status Closed
Assigned To No-one
Architecture All
Severity High
Priority Normal
Reported Version 2007.05 Duke
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Scenario:

partitionable raid-5 array created with mdadm, i.e. mdadm -C /dev/md/d0 -l5 -n4 -x1 -e1 /dev/sd{a,b,c,d,e}4 -z8388608 --auto=p6

On the array we create 2 partitions - nevermind the size/id/etc., just be sure they're created. Then we update mdadm.conf, automatically or not, to:

ARRAY /dev/md/d0 UUID=<valid uid>
CREATE auto=p6 mode=0660 group=disk owner=root

Next, to see what is happening on udev side, let's start udevd manually, with debug options (--debug-trace --verbose).

Under standard arch's udev rules, and fully updated arch64 (inc. mdadm 2.6.2-1), following happens, starting with clean, unassembled array:

1) mdadm -A /dev/md/d0

No rules in arch for md_d* case at all - so only standard /dev/md_d0 is created, whereas mdadm tries to create md/d0 and symlink it as /dev/md_d0. Unless of course we explicitely prohibit mdadm from creating nodes and/or symlinks.

No big deal yet, but...

2) mdadm -S /dev/md/d0

"add" uevent for every existing partition is generated; looking at udevd trace, /dev/md_d0p1 and /dev/md_d0p2 are created, replacing already existing symlinks created by mdadm in step 1) (as per udev rules). Looks like partition related uevents happen later, and assembling array in step 1) causes only md_d0 uevent.

3) mdadm -A /dev/md/d0

Here - "remove" uevent for every created partition; causing in most cases removal of symlinks created by mdadm. Similar remark as above.


It seems that if any partition / mdadm operation happens between 1) and 2), or 2) and 3), respective "add" and "remove" will happen then.


On the related subject - as I've modified a bit my setup, I've added simple raid-all hook combining raid and raid-partitions functionality (and relying only on mdadm).


Workarounds / solutions:


General approach is to stop mdadm and udev getting in each others way.

For example - as I decided to use subdirectory approach - md/N md/dNpM, I've added to mdadm.conf:

CREATE owner=root group=disk symlinks=no auto=p2

Important part is symlinks part, which will stop mdadm from creating them (my initcpio included mdadm.conf by default now).

On the udev front I've replaced:

KERNEL=="md[0-9]*", NAME="md%n", SYMLINK+="md/%n"

with

KERNEL=="md[0-9]*", NAME="%k"
KERNEL=="md_d[0-9]*", NAME="%k"

(both rules can be removed actually, as it's the default behaviour)

This way, mdadm and udev can coexist peacefully. The only thing to remember, is that udev will wait a bit with partition uevents (some next mdadm use, or some partition related operation).


Other possible approaches - stop mdadm from touching devices at all (i.e. use auto=no in mdadm.conf) and write proper udev rules, remembering that disk device node would have to be created by some other means (echo "add" to uevent, udevtrigger, mknod, ...)


Or the other way around - stop udev from making any nodes (i.e. NAME="", or even ignore_device), and rely on mdadm only.


[Originally I posted it on forums here: http://bbs.archlinux.org/viewtopic.php?id=35657 but this is the updated version. There're also udev traces there.]
This task depends upon

Closed by  Roman Kyrylych (Romashka)
Saturday, 09 February 2008, 17:18 GMT
Reason for closing:  Duplicate
Additional comments about closing:  superceded by  FS#9122 
Comment by Tobias Powalowski (tpowa) - Sunday, 29 July 2007, 07:36 GMT
udev is the node creator and symlinker, imho mdadm shouldn't be involved with it.
mdadm is probably tuned also for systems that don't run udev so it has the ability to create nodes itself, but on arch it shouldn't do it.
you use a very non normal setup here with own hooks and such stuff, so very difficult to solve here something.
Comment by Christ Schlacta (aarcane) - Tuesday, 31 July 2007, 00:46 GMT
if you need partitions on raid arrays, you should use lvm. it simplifies matters greatly.
Comment by Michal Soltys (msoltyspl) - Tuesday, 31 July 2007, 15:49 GMT
@Tobias:

Heh, yes - my setup is not the most default thing under the sun, but not that terribly different either :)

I agree that it would be great if udev did all the stuff, but it does seem to have problems with raid partitions (add / remove uevents for paritions happen later). At the same time, mdadm needs actual device node, and will create all the ones it requires, unless specifically told not to do so. Archlinux in its scripts (rc.sysinit, initcpio's raid hooks) also creates part of the nodes manually (using mknod), assuming mdadm will need them (whereas it doesn't). Finally udev will try to its own thing as well (also arch's udev scripts have no rules for partitionable raid, so default nodes are created with no symlinks ; plus it creates symlinks in the "other" direction - so i.e. mdadm if asked to assemble /dev/md/d0 would create /dev/md0 as a symlink and /dev/md/0 as a node, udev would make symlink from /dev/md/0 to /dev/md0).

And true - in majority of cases everything will still work well. But it's a bit messy and complicated.

But, it could be easily simplified. Consider following:

To udev rules, add:

KERNEL=="md[0-9]*", NAME=""
KERNEL=="md_d[0-9]*", NAME=""

It would make udev not create the nodes, while letting the related uevents run.

In default mdadm.conf, add i.e. ...

CREATE mode=0660 group=disk owner=root

...with appropriate comment about why it's here.

In rc.sysinit, change all the part responsible for raid initialization, into simple: mdadm -A --scan . If mdadm.conf has user defined arrays, it will assemble them (and everything else it can). If not, it will assemble what it can. And create all the nodes it needs (or fix the one that exist and are inappropriate).

As for initcpio - why rely on md= ? (predating early userspace stuff, and used to assemble arrays w/o ramdisks - afaik). Especially that partition raids hook already includes full mdadm. The only thing it needs, is i.e. root=/dev/md* in kernel commandline, and in the hook - mdadm -A --scan - similary as in rc.sysinit. mdadm.conf should be included by default as well.

In this approach - everything what has to be done - is done, no conflicts, and much simpler (one hook in initcpio, ~ one line in rc.sysinit, no old md= kernel line).


@Chris:

There's lvm, on the first raid partition :) I had my reasons for this setup - boots outside any raid, d0p1 with lvm for ease of management (root, usr, var ...), d0p2 for carefully (including future grow) aligned xfs (xfs'es su & sw options), "inner" partition table on gpt due to xfs size >2tb. So far it works beautifully.
Comment by Michal Soltys (msoltyspl) - Friday, 03 August 2007, 04:54 GMT
Btw, do you accept diffs ? I could craft something more general after doing a few more tests. Then you would see if it's worthwile to consider or not.
Comment by Roman Kyrylych (Romashka) - Friday, 11 January 2008, 13:24 GMT
Status?
Comment by Roman Kyrylych (Romashka) - Friday, 11 January 2008, 13:25 GMT
ah, forgot about  FS#9122  :)
adding it as a related task
Comment by Michal Soltys (msoltyspl) - Thursday, 17 January 2008, 19:11 GMT
Ah yes. This report is pretty old. #9122 covers it all.

I guess this task can be just closed.

Loading...