FS#18365 : [mkinitcpio] mdadm hook broken

FS#18365 - [mkinitcpio] mdadm hook broken

Attached to Project: Arch Linux
Opened by Arno (ihad) - Wednesday, 17 February 2010, 00:24 GMT
Last edited by Dan Griffiths (Ghost1227) - Thursday, 18 February 2010, 02:38 GMT

Task Type	Bug Report
Category	Packages: Extra
Status	Closed
Assigned To	Tobias Powalowski (tpowa) Thomas Bächler (brain0)
Architecture	All
Severity	Low
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

Description: mdadm hook is broken in mkinitcpio

Just wasted some hours to get my box booting again:
setup:
4 sata hard disks, raid 5, lvm on top, kernel panic.
I don't know why, but the mdadm hook is a link to
the raid hook, forcing me to use mdassemble. I was
able to crash my raid with it before and don't want
to do it again.
Solution:
1. boot your box somehow
2. Create a valid /etc/mdadm.conf:
$ mdadm --brief --scan <raid-device> >> /dev/mdadm.conf
3. vi(m) /lib/initcpio/hooks/mdadm
4. for brute force add the following lines after
mdconfig= (line 5 for me)
if [[ -f /etc/mdadm.conf ]]
then
/bin/mknod /dev/md0 b 9 0
mdadm --assemble /dev/md0 -c ${mdconfig}
fi
This will at least get you md0, add more mknods for
any other raid arrays. md1 will be 9 1, and so on, see
<kernel-source>/Documentation/devices.txt
5. vi(m) /etc/mkinitcpio.conf
-Add your (s)ata driver to MODULES
-MODULES="/bin/mknod /sbin/mdadm"
6. rebuild the initrd. You can force it with:
pacman -S kernel26
7. reboot
of course /boot and /var need to be mounted for this to
work

Additional info:
* package version(s)
mkinitcpio 0.6.2-1
kernel26 2.6.32.8-1
* config and/or log files etc.
mdadm.conf:
ARRAY /dev/md0 metadata=0.90 UUID=some-uuid

I'm sure there has to be a more elegant soution than
presented above, but that makes it work if just have
one raid array. BTW, it worked for me before the
klibc update. Whoever gets this report, feel free to
contact me.

Steps to reproduce:
see above

This task depends upon

Closed by Dan Griffiths (Ghost1227)
Thursday, 18 February 2010, 02:38 GMT
Reason for closing: Not a bug
Additional comments about closing: Original poster requested close - not a bug

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 07:58 GMT

why does the orgínal not work for you?
it uses mdassemble from mdadm package and not the klibc one.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 08:08 GMT

Half of what you did in your "solution" part is plainly wrong. Other than that, I've had lots of confirmations that the mdadm hook works exactly as before the update without problems, so I am of course suspecting PEBKAC. In fact, nothing changes in that hook except that the static mdassemble binary has been replaced with a dynamic one.

Also, the mdadm hook is no link to the raid hook but the other way around. As I said the mdadm hook has barely changed.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 08:20 GMT

By the way, I don't see what your original problem was. How I understand it, there was no problem at all, you just didn't want to use mdassemble - which the mdadm hook has been using since it existed.

You will not get ANY solution to your problem by posting some uneducated random tampering with hooks that makes no sense to a reader. The only way this problem will get solved is if you find out what exactly goes wrong with the original hooks, which seem to work for everyone but you. You will also get no help if you don't even post your mkinitcpio configuration that failed.

Comment by Arno (ihad) - Wednesday, 17 February 2010, 15:33 GMT

I suspected mkinitcpio since it was the only thing that changed/got updated since my last reboot. I'm sorry if that's wrong. Unfortunately I don't have an older version to compare the packages.

Maybe it's PEBKAC indeed. My setup is as follows:

$ cat /proc/partitions
major minor #blocks name

8 0 156290904 sda
8 1 156288321 sda1
8 16 488386584 sdb
8 17 96358 sdb1
8 18 488287642 sdb2
8 32 488386584 sdc
8 33 96358 sdc1
8 34 488287642 sdc2
8 64 488386584 sde
8 65 96358 sde1
8 66 488287642 sde2
8 48 488386584 sdd
8 49 96358 sdd1
8 50 488287642 sdd2

$ cat /proc/cmdline
root=/dev/mapper/raid-root ro

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb2[0] sde2[3] sdd2[2] sdc2[1]
1464862656 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

When I rebooted my box yesterday, it dropped me to a shell, because the root filesystem could not be mounted. This was not surprising, since the raid was not assembled (the raid456 module was loaded). So I took a look at the mdadm-hook. When I modified the mdadm-hook to have mdadm instead of mdassemble assemble the raid it started working again. It is quite possible that I'm doing something wrong.

Last, but not least: I'm sorry for my harsh tone and posting my wrong solution. I shouldn't write bug reports so late in the evening. It won't happen again.

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 15:41 GMT

You still don't post your config file, how should we help?
according to your setup you use lvm2 on top of mdadm, you need to configure this.

Comment by Arno (ihad) - Wednesday, 17 February 2010, 15:49 GMT

sorry, I forgot: here's my /etc/mkinitcpio.conf (comments stripped):

MODULES="pata_acpi pata_amd ata_generic scsi_mod sata_nv ahci raid456 dm_mod xfs"
BINARIES=""
FILES=""
HOOKS="base udev autodetect pata scsi sata usbinput keymap mdadm lvm2 filesystems"

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 16:11 GMT

No need to apologize, I have done the same recently on various projects, writing very unfriendly bug reports, I guess everyone has.

So, your configuration looks reasonable. When you are dropped to the emergency shell, are /dev/sd* detected properly? Does running "mdassemble" manually do something there? Does /etc/mdadm.conf look reasonable in the emergency shell?

I don't know anything about RAID setups, I guess tpowa has done more in this context than me. Anyway, trying to run the mdassemble / lvm vgchange -ay commands manually might give you more hints about what is going on.

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 16:28 GMT

please give your boot comamand line for further debugging

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 16:30 GMT

ah sorry i see your cmdline now, what do you have in /etc/mdadm.conf?

Comment by Arno (ihad) - Wednesday, 17 February 2010, 16:55 GMT

This is my current /etc/mdadm.conf:
ARRAY /dev/md0 metadata=0.90 UUID=fba93001:fd6f2edc:6b4cd62c:fd31e028
though I changed it yesterday evening. I don't have the original any more, but IIRC it was
ARRAY /dev/md/0_0 metadata=0.90 UUID=fba93001:fd6f2edc:6b4cd62c:fd31e028

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 16:57 GMT

the mdadm hook doesn't support custom naming of raid arrays, only /dev/md[0-9] or raid_arrays /dev/md[0-9]-d[0-9]

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 16:59 GMT

also you can remove the metadata from mdadm.conf file

Comment by Tobias Powalowski (tpowa) - Wednesday, 17 February 2010, 17:04 GMT

Little correction it doesn't support custom naming by commandline assembling by config file it should work.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 17:12 GMT

Basically, the mdassemble binary from the mdadm package should support everything that mdadm.conf supports! That is what confuses me here, I don't see what's going wrong. That's why we should try doing everything manually via command line in the emergency shell and debug from there.

Comment by Arno (ihad) - Wednesday, 17 February 2010, 18:09 GMT

Ok, I did some more debugging, and now I know what the problem is. My mdassemble is broken. For for some reason I have two mdassemble's:

$ ls -l /bin/ldassemble
-rwxr-xr-x 1 root root 7024 Feb 11 20:11 /bin/mdassemble
$ ls -l /sbin/mdassemble
-rwxr-xr-x 1 root root 161216 Feb 7 11:46 /sbin/mdassemble

/bin/mdassemble actually gets included into the initrd, but it doesn't work, and it's also not owned by any package (any more?).

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 18:26 GMT

Ha! The BINARIES="..." statement checks the path automatically (with which afaik), maybe we should include an "add_binary /sbin/mdassemble" statement instead of relying on the path.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 18:40 GMT

Okay, this should be safer: http://repos.archlinux.org/wsvn/packages/?op=comp&compare[]=%2Fmdadm%2Ftrunk@67473&compare[]=%2Fmdadm%2Ftrunk@69106

Comment by Arno (ihad) - Wednesday, 17 February 2010, 19:17 GMT

Ok, thanks. Problem solved. Just for the record. What's with /bin/mdassemble? Does anyone have an idea where it came from? I'm quite sure I didn't put it there myself.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 20:55 GMT

What does "file" say about it?

Comment by Arno (ihad) - Wednesday, 17 February 2010, 21:04 GMT

$ file mdassemble
mdassemble: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked (uses shared libs), stripped
$ md5sum mdassemble
70fe3ad318ba342759f5633f2e3f21d4 mdassemble
$ sha1sum mdassemble
b022ac47b946e830986eda67d4d6c2dab97ff59a mdassemble
$ ldd mdassemble
not a dynamic executable
# strings mdassemble
/lib/klibc-UqSadMgryalzKq_XarP9XnQvbXQ.so
[stripped garbage]
/dev/md%s%d
/proc/devices
Block devices:
md: Unknown device name: %s
md: Loading md%s%d: %s
md: starting md%d failed
raid=
noautodetect
partitionable
part
linear
raid0
super-block
/dev/md0
Error: mdp devices detected but no mdp device found!
md: open failed - cannot start array %s
md: Ignoring md=%d, already autodetected. (Use raid=noautodetect)
md: Too few arguments supplied to md=.
md: md=%d, Minor device number too high.
md: md=%s%d, Specified more than once. Replacing previous definition.
md: md=%s%d - too many md initialisations
md: Will configure md%d (%s) from %s, below.
md: Skipping autodetection of RAID arrays. (raid=noautodetect)
(%d,%d)
/sys/block/%s/dev
/sys/block/%s/range
/dev/%s
/dev/

I'm on x86_64, btw.

Comment by Thomas Bächler (brain0) - Wednesday, 17 February 2010, 21:41 GMT

It's an old klibc-based mdassemble binary (not the mdadm one, but the broken klibc tool) which found its way to /bin/ - no idea how.

Comment by Arno (ihad) - Wednesday, 17 February 2010, 21:54 GMT

Ok then. Thanks for the help. BTW, is there an easy way to edit out my "solution" before sending a close request?
It worked because explicitly including mdadm into the initrd via BINARIES provided me with an executable that actually ran and assembled the array.
Also, creating the /dev/md0 device node obviously isn't necessary. As you said, the solution is wrong.

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#18365 - [mkinitcpio] mdadm hook broken

Details

Loading...