Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#36850 - Shutdown marks array as dirty, causing resync on reboot

Attached to Project: Arch Linux
Opened by Alex Leach (spleach) - Tuesday, 10 September 2013, 15:30 GMT
Last edited by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:22 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To No-one
Architecture All
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Each time I shutdown my system, my 3 disk RAID 5 device is marked as "dirty", causing a full reconstruction of the device and much stress!

Relevant boot-time messages, from `dmesg -T`:


[Tue Sep 10 14:14:04 2013] md/raid:md126: not clean -- starting background reconstruction
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sde operational as raid disk 0
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sdd operational as raid disk 1
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sdg operational as raid disk 2
[Tue Sep 10 14:14:04 2013] md/raid:md126: allocated 3272kB
[Tue Sep 10 14:14:04 2013] md/raid:md126: raid level 5 active with 3 out of 3 devices, algorithm 0
[Tue Sep 10 14:14:04 2013] RAID conf printout:
[Tue Sep 10 14:14:04 2013] --- level:5 rd:3 wd:3
[Tue Sep 10 14:14:04 2013] disk 0, o:1, dev:sde
[Tue Sep 10 14:14:04 2013] disk 1, o:1, dev:sdd
[Tue Sep 10 14:14:04 2013] disk 2, o:1, dev:sdg
[Tue Sep 10 14:14:04 2013] md126: detected capacity change from 0 to 600131502080
[Tue Sep 10 14:14:04 2013] RAID conf printout:
[Tue Sep 10 14:14:04 2013] --- level:5 rd:3 wd:3
[Tue Sep 10 14:14:04 2013] disk 0, o:1, dev:sde
[Tue Sep 10 14:14:04 2013] disk 1, o:1, dev:sdd
[Tue Sep 10 14:14:04 2013] disk 2, o:1, dev:sdg
[Tue Sep 10 14:14:04 2013] md126: unknown partition table
[Tue Sep 10 14:14:04 2013] md: md126 switched to read-write mode.
[Tue Sep 10 14:14:04 2013] md: resync of RAID array md126
[Tue Sep 10 14:14:04 2013] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[Tue Sep 10 14:14:04 2013] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[Tue Sep 10 14:14:04 2013] md: using 128k window, over a total of 293032960k.


This is explained by Neil Brown, (at: http://permalink.gmane.org/gmane.linux.raid/35016 ), as so:-

But for you, the system shuts down with the array marked 'dirty'. This
suggests that on your machine 'mdmon' is being killed while the array is
still active.

The solution he shared was:-

If you arrange that the shutdown script runs
mdadm --wait-clean --scan


Similar bugs have been reported for gentoo (https://bugs.gentoo.org/show_bug.cgi?id=395203), and there's also reports on the linux-raid list (e.g. http://www.spinics.net/lists/raid/msg35494.html)

I haven't hacked systemd startup and shutdown scripts before, so I'm not too confident in doing so without some advice / assistance.. But, having had a look through /usr/lib/systemd/system/, I imagine that adding a script in /usr/lib/systemd/system/shutdown.target.wants/ could be the solution.



Additional info:
* package version(s)

mdadm 3.3-1

$ uname -r
3.11.0-1-ck

* config and/or log files etc.

/etc/mdadm.conf
-------------------
DEVICE partitions

ARRAY /dev/md/imsm0 metadata=imsm UUID=33ed5b80:85fff00c:444b3615:26b20276

ARRAY /dev/md/RAID5 metadata=imsm container=33ed5b80:85fff00c:444b3615:26b20276 member=0 UUID=4a8cf69c:2eaab219:0276f4b3:6f901377

PROGRAM /usr/bin/logger



/etc/mkinitcpio.conf
-------------------
MODULES="ext4 mvsas raid456"
BINARIES="/usr/bin/mdmon"
HOOKS="base udev autodetect block keyboard fsck modconf mdadm_udev vboxhost filesystems"


-------------------

Steps to reproduce:

- Created array container (with imsm metadata), followed by array, with mdadm 3.2.6-4.
- Configure the above files.
- Enable mdadm.service
- When array is "clean", reboot
- Watch device resync :(

I've tried to stop the resync operation, using the /proc filesystem in a couple of different ways, but some event always retriggers the resync operation. e.g.

$ sudo sh -c 'echo "idle" > /sys/block/md126/md/sync_action'
$ cat /sys/block/md126/md/sync_action
resync

At the same time, in `journalctl -xb`, I get the messages:-

Sep 10 14:25:34 beasty sudo[5011]: me : TTY=tty1 ; PWD=/home/me ; USER=root ; COMMAND=/usr/bin/sh -c echo "idle" > /sys/block/md126/md/sync_action
Sep 10 14:25:34 beasty sudo[5011]: pam_unix(sudo:session): session opened for user root by me(uid=0)
Sep 10 14:25:34 beasty kernel: md: md126: resync done.
Sep 10 14:25:34 beasty sudo[5011]: pam_unix(sudo:session): session closed for user root
Sep 10 14:25:34 beasty kernel: md: checkpointing resync of md126.
Sep 10 14:25:34 beasty kernel: md: resync of RAID array md126
Sep 10 14:25:34 beasty kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Sep 10 14:25:34 beasty kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
Sep 10 14:25:34 beasty kernel: md: using 128k window, over a total of 293032960k.
Sep 10 14:25:34 beasty kernel: md: resuming resync of md126 from checkpoint.

This task depends upon

Closed by  Dave Reisner (falconindy)
Tuesday, 10 September 2013, 17:22 GMT
Reason for closing:  Not a bug
Additional comments about closing:  mkinitcpio, at least, is WAI
Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 15:39 GMT
The shutdown hook in the initramfs does this (which you're lacking). I'm fairly sure the wiki documents as much, too.
Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 15:49 GMT
Thanks for the quick response. I've just rechecked the Arch RAID wiki, at https://wiki.archlinux.org/index.php/RAID, but there's no mention of using the shutdown hook.

The example given shows:-

HOOKS="base udev autodetect block mdadm_udev filesystems usbinput fsck"

I checked the "Software RAID and LVM" and "Installing with Fake RAID" wiki pages too; a search for "shutdown" comes up with nothing on any of the Arch wiki pages...
Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 15:58 GMT
Feel free to add it as you see fit.

Unless addition of the shutdown hook doesn't work, I'll close as this WAI.
Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 16:09 GMT
Okay, I just found your same recommendation on the Arch wiki, at https://wiki.archlinux.org/index.php/Mkinitcpio#Common_hooks

I hadn't been to that page before, so prob worth adding specific mention on the RAID wiki pages. I was previously using help documentation from the command `mkinitcpio -H <HOOK>`, which only mentions the shutdown hook as being useful when /usr is on a separate partition.

I'll test it out and will then try and add some clarification to the RAID wiki page.

Thanks!
Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 16:27 GMT
It didn't work; my array's resync'ing right now :(

Steps:-
- Array was clean, previous resync operation had completed.

1. Added shutdown as last HOOK in /etc/mdadm.conf, so it's now:-

HOOKS="base udev autodetect block keyboard fsck modconf mdadm_udev vboxhost filesystems shutdown"

2. Ran mkinitcpio -p linux-ck

3. Reboot, array resync'ing...

Probably worth mentioning that my RAID array is not my root partition - I very recently configured it as an incremental backup partition - mounted at /media/RAID5/.
Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 16:57 GMT
> It didn't work; my array's resync'ing right now :(
I'm going to guess that you didn't extract the newly built image to /run/initramfs, or else it would have. Without rebooting on the new image, the necessary gears to make the shutdown hook work don't exist yet.
Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 17:05 GMT
Isn't that what the shutdown hook does, when running mkinitcpio?

Just had a look in /run/initramfs/ and there are a load of files and folders with a modification time of just a few minutes before my last system start, probably when I last run mkinitcpio.

How else should I extract the initramfs image there?

Thanks again for the assistance!
Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:12 GMT
> probably when I last run mkinitcpio.
Wrong.

> How else should I extract the initramfs image there?
If you've already rebooted, you don't need to. It's a one time thing.
Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 17:13 GMT
Okay, Sorry, missed the second sentence of your reply (wasn't in the email notification I got).

I've just rebooted after the last resync completed and indeed, the resync hasn't been triggered again, and the device's partition was mounted properly. Thanks again for the help!

Re: the RAID wiki page; do you think it should mention both the shutdown hook and extraction of the image after mkinitcpio creation?
Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:22 GMT
You can mention the caveat about needing to extract the image the first time, but it'll eventually be moot (it'll be "fixed" at some point).

Loading...