FS#36850 : Shutdown marks array as dirty, causing resync on reboot

FS#36850 - Shutdown marks array as dirty, causing resync on reboot

Attached to Project: Arch Linux
Opened by Alex Leach (spleach) - Tuesday, 10 September 2013, 15:30 GMT
Last edited by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:22 GMT

Task Type	Bug Report
Category	Packages: Extra
Status	Closed
Assigned To	No-one
Architecture	All
Severity	Critical
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

Description:

Each time I shutdown my system, my 3 disk RAID 5 device is marked as "dirty", causing a full reconstruction of the device and much stress!

Relevant boot-time messages, from `dmesg -T`:

[Tue Sep 10 14:14:04 2013] md/raid:md126: not clean -- starting background reconstruction
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sde operational as raid disk 0
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sdd operational as raid disk 1
[Tue Sep 10 14:14:04 2013] md/raid:md126: device sdg operational as raid disk 2
[Tue Sep 10 14:14:04 2013] md/raid:md126: allocated 3272kB
[Tue Sep 10 14:14:04 2013] md/raid:md126: raid level 5 active with 3 out of 3 devices, algorithm 0
[Tue Sep 10 14:14:04 2013] RAID conf printout:
[Tue Sep 10 14:14:04 2013] --- level:5 rd:3 wd:3
[Tue Sep 10 14:14:04 2013] disk 0, o:1, dev:sde
[Tue Sep 10 14:14:04 2013] disk 1, o:1, dev:sdd
[Tue Sep 10 14:14:04 2013] disk 2, o:1, dev:sdg
[Tue Sep 10 14:14:04 2013] md126: detected capacity change from 0 to 600131502080
[Tue Sep 10 14:14:04 2013] RAID conf printout:
[Tue Sep 10 14:14:04 2013] --- level:5 rd:3 wd:3
[Tue Sep 10 14:14:04 2013] disk 0, o:1, dev:sde
[Tue Sep 10 14:14:04 2013] disk 1, o:1, dev:sdd
[Tue Sep 10 14:14:04 2013] disk 2, o:1, dev:sdg
[Tue Sep 10 14:14:04 2013] md126: unknown partition table
[Tue Sep 10 14:14:04 2013] md: md126 switched to read-write mode.
[Tue Sep 10 14:14:04 2013] md: resync of RAID array md126
[Tue Sep 10 14:14:04 2013] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[Tue Sep 10 14:14:04 2013] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[Tue Sep 10 14:14:04 2013] md: using 128k window, over a total of 293032960k.

This is explained by Neil Brown, (at: http://permalink.gmane.org/gmane.linux.raid/35016 ), as so:-

But for you, the system shuts down with the array marked 'dirty'. This
suggests that on your machine 'mdmon' is being killed while the array is
still active.

The solution he shared was:-

If you arrange that the shutdown script runs
mdadm --wait-clean --scan

Similar bugs have been reported for gentoo (https://bugs.gentoo.org/show_bug.cgi?id=395203), and there's also reports on the linux-raid list (e.g. http://www.spinics.net/lists/raid/msg35494.html)

I haven't hacked systemd startup and shutdown scripts before, so I'm not too confident in doing so without some advice / assistance.. But, having had a look through /usr/lib/systemd/system/, I imagine that adding a script in /usr/lib/systemd/system/shutdown.target.wants/ could be the solution.

Additional info:
* package version(s)

mdadm 3.3-1

$ uname -r
3.11.0-1-ck

* config and/or log files etc.

/etc/mdadm.conf
-------------------
DEVICE partitions

ARRAY /dev/md/imsm0 metadata=imsm UUID=33ed5b80:85fff00c:444b3615:26b20276

ARRAY /dev/md/RAID5 metadata=imsm container=33ed5b80:85fff00c:444b3615:26b20276 member=0 UUID=4a8cf69c:2eaab219:0276f4b3:6f901377

PROGRAM /usr/bin/logger

/etc/mkinitcpio.conf
-------------------
MODULES="ext4 mvsas raid456"
BINARIES="/usr/bin/mdmon"
HOOKS="base udev autodetect block keyboard fsck modconf mdadm_udev vboxhost filesystems"

-------------------

Steps to reproduce:

- Created array container (with imsm metadata), followed by array, with mdadm 3.2.6-4.
- Configure the above files.
- Enable mdadm.service
- When array is "clean", reboot
- Watch device resync :(

I've tried to stop the resync operation, using the /proc filesystem in a couple of different ways, but some event always retriggers the resync operation. e.g.

$ sudo sh -c 'echo "idle" > /sys/block/md126/md/sync_action'
$ cat /sys/block/md126/md/sync_action
resync

At the same time, in `journalctl -xb`, I get the messages:-

Sep 10 14:25:34 beasty sudo[5011]: me : TTY=tty1 ; PWD=/home/me ; USER=root ; COMMAND=/usr/bin/sh -c echo "idle" > /sys/block/md126/md/sync_action
Sep 10 14:25:34 beasty sudo[5011]: pam_unix(sudo:session): session opened for user root by me(uid=0)
Sep 10 14:25:34 beasty kernel: md: md126: resync done.
Sep 10 14:25:34 beasty sudo[5011]: pam_unix(sudo:session): session closed for user root
Sep 10 14:25:34 beasty kernel: md: checkpointing resync of md126.
Sep 10 14:25:34 beasty kernel: md: resync of RAID array md126
Sep 10 14:25:34 beasty kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Sep 10 14:25:34 beasty kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
Sep 10 14:25:34 beasty kernel: md: using 128k window, over a total of 293032960k.
Sep 10 14:25:34 beasty kernel: md: resuming resync of md126 from checkpoint.

This task depends upon

Closed by Dave Reisner (falconindy)
Tuesday, 10 September 2013, 17:22 GMT
Reason for closing: Not a bug
Additional comments about closing: mkinitcpio, at least, is WAI

Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 15:39 GMT

The shutdown hook in the initramfs does this (which you're lacking). I'm fairly sure the wiki documents as much, too.

Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 15:49 GMT

Thanks for the quick response. I've just rechecked the Arch RAID wiki, at https://wiki.archlinux.org/index.php/RAID, but there's no mention of using the shutdown hook.

The example given shows:-

HOOKS="base udev autodetect block mdadm_udev filesystems usbinput fsck"

I checked the "Software RAID and LVM" and "Installing with Fake RAID" wiki pages too; a search for "shutdown" comes up with nothing on any of the Arch wiki pages...

Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 15:58 GMT

Feel free to add it as you see fit.

Unless addition of the shutdown hook doesn't work, I'll close as this WAI.

Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 16:09 GMT

Okay, I just found your same recommendation on the Arch wiki, at https://wiki.archlinux.org/index.php/Mkinitcpio#Common_hooks

I hadn't been to that page before, so prob worth adding specific mention on the RAID wiki pages. I was previously using help documentation from the command `mkinitcpio -H <HOOK>`, which only mentions the shutdown hook as being useful when /usr is on a separate partition.

I'll test it out and will then try and add some clarification to the RAID wiki page.

Thanks!

Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 16:27 GMT

It didn't work; my array's resync'ing right now :(

Steps:-
- Array was clean, previous resync operation had completed.

1. Added shutdown as last HOOK in /etc/mdadm.conf, so it's now:-

HOOKS="base udev autodetect block keyboard fsck modconf mdadm_udev vboxhost filesystems shutdown"

2. Ran mkinitcpio -p linux-ck

3. Reboot, array resync'ing...

Probably worth mentioning that my RAID array is not my root partition - I very recently configured it as an incremental backup partition - mounted at /media/RAID5/.

Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 16:57 GMT

> It didn't work; my array's resync'ing right now :(
I'm going to guess that you didn't extract the newly built image to /run/initramfs, or else it would have. Without rebooting on the new image, the necessary gears to make the shutdown hook work don't exist yet.

Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 17:05 GMT

Isn't that what the shutdown hook does, when running mkinitcpio?

Just had a look in /run/initramfs/ and there are a load of files and folders with a modification time of just a few minutes before my last system start, probably when I last run mkinitcpio.

How else should I extract the initramfs image there?

Thanks again for the assistance!

Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:12 GMT

> probably when I last run mkinitcpio.
Wrong.

> How else should I extract the initramfs image there?
If you've already rebooted, you don't need to. It's a one time thing.

Comment by Alex Leach (spleach) - Tuesday, 10 September 2013, 17:13 GMT

Okay, Sorry, missed the second sentence of your reply (wasn't in the email notification I got).

I've just rebooted after the last resync completed and indeed, the resync hasn't been triggered again, and the device's partition was mounted properly. Thanks again for the help!

Re: the RAID wiki page; do you think it should mention both the shutdown hook and extraction of the image after mkinitcpio creation?

Comment by Dave Reisner (falconindy) - Tuesday, 10 September 2013, 17:22 GMT

You can mention the caveat about needing to extract the image the first time, but it'll eventually be moot (it'll be "fixed" at some point).

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#36850 - Shutdown marks array as dirty, causing resync on reboot

Details

Loading...