Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
https://wiki.archlinux.org/title/Bug_reporting_guidelines
Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.
REPEAT: Do NOT report bugs for outdated packages!
FS#57860 - [mdadm] mdadm_udev does not assemble degraded array during boot
Attached to Project:
Arch Linux
Opened by Daan van Rossum (drrossum) - Friday, 16 March 2018, 15:03 GMT
Last edited by Toolybird (Toolybird) - Sunday, 11 June 2023, 04:23 GMT
Opened by Daan van Rossum (drrossum) - Friday, 16 March 2018, 15:03 GMT
Last edited by Toolybird (Toolybird) - Sunday, 11 June 2023, 04:23 GMT
|
DetailsDescription:
mkinitcpio's mdadm_udev hook uses mdadm upstream's udev rules to incrementally assemble raid arrays. 64-md-raid-assembly.rules uses systemd.timer to delay starting of partially assembled arrays to fix issues with slow array members (e.g. usb devices). But systemd.timer is not functional in initcpio. In case the array holds the root filesystem this has severe consequences: - partially assembled array remain "inactive" - the root FS is not available and does not get mounted - boot process is interrupted with an emergency shell From here, the following steps resume the boot process with a degraded array: # mdadm --incremental --run --scan # mount /dev/md0 /new_root # exit This can be fixed using custom install and hook files for the mdadm_udev hook, see attached. I think it should be mdadm_udev's default behavior to auto-assemble arrays (that are configured for auto-assembly in mdadm.conf) in degraded mode if necessary, instead of interrupting the boot process. Additional info: mkinitcpio 24-2 mdadm 4.0-1 Steps to reproduce: - Create a md raid-1 array - move a system's root FS on the raid-1 array - poweroff, remove one of the raid members - poweron - boot fails due to root FS not mounted |
This task depends upon
FS#49071, as the mdadm_udev hook will then only return once the arrays are assembled, in time for the lvm2 hook to start using the lv metadata.The way I understand it, the package build file "mdadm_udev_install" misses the following build rules:
add_systemd_unit "mdadm-last-resort@.service"
add_systemd_unit "mdadm-last-resort@.timer"
This makes sure that the required systemd units are copied to the initramfs. They are triggered by 64-md-raid-assembly.rules as described above.
No additional fixes should be required.
The fix I suggested is independent of the init system. It is less complicated than it may seem from the attached files. In fact, there is a one-line install patch and a few-line hook. The hook sleeps for 10 seconds if degraded arrays are detected and then retries to assemble the arrays and starts them even if some are still degraded.
--- /usr/lib/initcpio/install/mdadm_udev
+++ /etc/initcpio/install/mdadm_udev
@@ -12,8 +12,6 @@
add_binary "/usr/bin/mdadm"
add_file "/usr/lib/udev/rules.d/63-md-raid-arrays.rules"
add_file "/usr/lib/udev/rules.d/64-md-raid-assembly.rules"
+
+ add_runscript
}
help() {
Hopefully all the currently open initrd bugs with LVM, MD, etc. will be history after the anticipated Dracut switch.
Hopefully all the currently open initrd bugs with LVM, MD, etc. will be history after the anticipated Dracut switch.
Here is my test case:
==0==) Build raid1 array
==1==) Add 1 line to /usr/lib/initcpio/install/mdadm_udev
--- /usr/lib/initcpio/install/mdadm_udev.orig 2020-03-15 16:37:07.967859694 +0300
+++ /usr/lib/initcpio/install/mdadm_udev 2020-03-15 16:38:18.824530798 +0300
@@ -12,6 +12,8 @@
add_binary "/usr/bin/mdadm"
add_file "/usr/lib/udev/rules.d/63-md-raid-arrays.rules"
add_file "/usr/lib/udev/rules.d/64-md-raid-assembly.rules"
+
+ add_runscript
}
help() {
==2==) Add new file /usr/lib/initcpio/hooks/mdadm_udev (mdadm_udev.hook from bug report)
==3==) mkinitcpio -p linux
==4==) shutdown -h now
==5==) Remove one hard drive and turn on the computer
It's started!
==6==) Shutdown computer, attach disk back and start the computer again
==7==) Attach disk back to raid
mdadm --manage --add /dev/md126 /dev/sdb2
==8==) Wait for sync to be complete
watch 'cat /proc/mdstat'
Tobias, did you check if booting on a degraded array is now possible with mdadm_udev? This could close this one, and, if not it's a real issue.
See my comment from Sunday, 01 September 2019.
After install, removing -hda test1.qcow2 causes the system to fail booting.
bug is still present, easy fix is described in comment from Sunday, 01 September 2019
In short: the proposed workaround seems to depend on optional hooks in the initramfs, so it may not work for all users.
See pacman -Ql mdadm | grep mdadm-last-resort
The only dependency for this fix is systemd - I would guess a solution for 99% of all systems.
"Note: mdadm is deprecated. If using it you will see ==> WARNING: Hook 'mdadm' is deprecated. Replace it with 'mdadm_udev' in your config when doing an upgrade."
In case of systemd init it is a simple two line oversight that needs to be fixed.
The mdadm udev rules themselves are already targeted at systemd, which is obvious since they rely on the timer. The fix proposed by the op using an additional hook may work in the generic case, but the one by eaut is actually correct for a system using systemd. The more generic fix replaces a broken part of the systemd way with a workaround, but leaves the broken part in place, making for an awkward solution.
The correct solution for a system using busybox would be to change the udev rules themselves such that they do not rely on systemd. That would of course be a little more complicated than anything already proposed here.
I suggest doing two things. First, fix the obvious bug. Then deal with the more generic problem separately.
* apply eaut's fix in this context to solve the problem for systemd users
* open a new ticket to fix the rules for systems not running systemd but busybox init in the mkinitcpio. This can be done either by any of the following:
1. enhancing the udev rules that they are agnostic to sysdemd/busybox
2. creating a separate set of udev rules for this case
3. creating a separate mkinitcpio hook (mdadm_busybox) which fixes the udev rules by running the hook script provided by the op, but does not make the regular mdadm+systemd case awkward.
Opening a new ticket also should increase the chance to get anything done quickly for the busybox side, because all relevant information can be presented in a well sorted and concise manner.
The original description of this ticket does not mention busybox and therefore the fix applied for this ticket should be the obvious one, adding the missing timer to the initcpio.