FS#37537 - [mdadm] RAID setup unusable due to missing systemd support (upstream solution exists)

Attached to Project: Arch Linux
Opened by Bart De Vries (mogwai) - Tuesday, 29 October 2013, 09:30 GMT
Last edited by Tobias Powalowski (tpowa) - Wednesday, 30 October 2013, 07:23 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Tobias Powalowski (tpowa)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:
Trying to mount an IMSM RAID1 partition on a system using systemd results in the md_mod kernel driver hanging (waiting for mdmon to respond).

This is/was a known problem which has been solved upstream in version 3.3-1, but the solution is not incorporated in the Arch package. It involves adding a line to PKGBUILD (see solution below).

The background of the problem is that, in versions older than 3.3, the mdmon process was forked directly from the main code. When using systemd, this mdmon process gets killed immediately by systemd/udev. When trying to write to a RAID setup subsequently, the md_mod kernel driver will try communicate with the killed mdmon process, causing the driver to freeze. (see references below for more detailed information)


Solution:
An upstream solution exists, but it is not incorporated in the current mdadm 3.3-1 Arch package.
The following command should be added to the package() section of the PKGBUILD file:

make install-systemd

This will install the file mdmon@.service into /lib/systemd/system. This file will then be used by mdadm to spawn the mdmon process through systemd instead of forking it directly.

Copying the mdmon@.service file from the mdadm source code into the /lib/systemd/system directory solved all my issues.


References:
- Other people suffering from the same bug: https://bbs.archlinux.org/viewtopic.php?pid=1341729
- The discussion of this bug and the solution on RedHat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=873576
- The git commit of the solution upstream: https://github.com/neilbrown/mdadm/commit/0f7bdf8946316548500858303549e396655450c5


Additional info:
* package version: 3.3-1
* Log file output of me trying to mount an NTFS partition on a IMSM RAID1 array:
okt 21 18:56:27 ldmos kernel: INFO: task mount.ntfs-3g:4602 blocked for more than 120 seconds.
okt 21 18:56:27 ldmos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
okt 21 18:56:27 ldmos kernel: mount.ntfs-3g D ffff880128af5540 0 4602 1 0x00000004
okt 21 18:56:27 ldmos kernel: ffff880110703cb8 0000000000000082 0000000000014500 ffff880110703fd8
okt 21 18:56:27 ldmos kernel: ffff880110703fd8 0000000000014500 ffff8800ab4a4380 ffff8801274dc9d8
okt 21 18:56:27 ldmos kernel: ffff8801274dc9c0 0000000000000000 0000000000000003 ffff880110703c18
okt 21 18:56:27 ldmos kernel: Call Trace:
okt 21 18:56:27 ldmos kernel: [<ffffffff81093b52>] ? default_wake_function+0x12/0x20
okt 21 18:56:27 ldmos kernel: [<ffffffff810847b2>] ? autoremove_wake_function+0x12/0x40
okt 21 18:56:27 ldmos kernel: [<ffffffff8108c5e8>] ? __wake_up_common+0x58/0x90
okt 21 18:56:27 ldmos kernel: [<ffffffff8108ee44>] ? __wake_up+0x44/0x50
okt 21 18:56:27 ldmos kernel: [<ffffffff814e0f29>] schedule+0x29/0x70
okt 21 18:56:27 ldmos kernel: [<ffffffffa07b9d75>] md_write_start+0xb5/0x1a0 [md_mod]
okt 21 18:56:27 ldmos kernel: [<ffffffff810847a0>] ? wake_up_atomic_t+0x30/0x30
okt 21 18:56:27 ldmos kernel: [<ffffffffa007ece6>] make_request+0x46/0xbf0 [raid1]
okt 21 18:56:27 ldmos kernel: [<ffffffff8113cf1a>] ? write_cache_pages+0x16a/0x510
okt 21 18:56:27 ldmos kernel: [<ffffffff81132c0a>] ? find_get_pages_tag+0xea/0x180
okt 21 18:56:27 ldmos kernel: [<ffffffffa07b599c>] md_make_request+0xec/0x290 [md_mod]
okt 21 18:56:27 ldmos kernel: [<ffffffff811351b5>] ? mempool_alloc_slab+0x15/0x20
okt 21 18:56:27 ldmos kernel: [<ffffffff81263a82>] generic_make_request+0xc2/0x110
okt 21 18:56:27 ldmos kernel: [<ffffffff81263b43>] submit_bio+0x73/0x160
okt 21 18:56:27 ldmos kernel: [<ffffffff811d5866>] ? bio_alloc_bioset+0x196/0x2a0
okt 21 18:56:27 ldmos kernel: [<ffffffff81266c57>] blkdev_issue_flush+0x97/0xe0
okt 21 18:56:27 ldmos kernel: [<ffffffff811d6e55>] blkdev_fsync+0x35/0x50
okt 21 18:56:27 ldmos kernel: [<ffffffff811cde16>] do_fsync+0x56/0x80
okt 21 18:56:27 ldmos kernel: [<ffffffff811a0519>] ? SyS_write+0x49/0xa0
okt 21 18:56:27 ldmos kernel: [<ffffffff811ce0a0>] SyS_fsync+0x10/0x20
okt 21 18:56:27 ldmos kernel: [<ffffffff814ea4dd>] system_call_fastpath+0x1a/0x1f


Steps to reproduce:
Create an IMSM RAID1 array with the Intel Rapid Storage Technology Option ROM. If Arch is using systemd, it is then not possible to do any sort of write operation on the RAID array: writing a partition table, mounting a partition, etc.


PS: This is my first bug report. Please let me know if I omitted important information.
This task depends upon

Closed by  Tobias Powalowski (tpowa)
Wednesday, 30 October 2013, 07:23 GMT
Reason for closing:  Fixed
Additional comments about closing:  3.3-2

Loading...