Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#30995 - [cryptsetup][systemd] system hangs on shutdown

Attached to Project: Arch Linux
Opened by Christian Hesse (eworm) - Friday, 03 August 2012, 14:24 GMT
Last edited by Dave Reisner (falconindy) - Monday, 15 October 2012, 18:43 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Dave Reisner (falconindy)
Tom Gundersen (tomegun)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
System sometimes hangs on shutdown when root partition is on a LUKS encrypted partition and systemd is used. Dave Reisner is aware of the problem, see notes on  FS#30271  and the thread on systemd development mailing list [0]. This report's purpose is just to track the status.

Additional information:
cryptsetup 1.5.0-1
mkinitcpio 0.10-1
systemd 187-4

Steps to reproduce:
Shutdown.

[0] http://lists.freedesktop.org/archives/systemd-devel/2012-June/005440.html
This task depends upon

Closed by  Dave Reisner (falconindy)
Monday, 15 October 2012, 18:43 GMT
Reason for closing:  Fixed
Additional comments about closing:  fixed somewhere around linux-3.5
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 09:54 GMT
Looks like libdevmapper reads environment variable DM_UDEV_DISABLE_CHECKING. I think the attached patch could fix this problem.
Sadly my girlfriends system (she is effected most, I can reproduce it on my own system only very seldom) is not around, so I can not test at the moment.
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 10:13 GMT
Your patch is broken, but the fixed version seems to work (var needs to be exported). I can easily reproduce this on VMs, so I have plenty of test cases.
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 11:04 GMT
Great!
Thanks a lot!
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 13:18 GMT
So, I oddly can't reproduce this anymore, but more to the point, I can't see anywhere in libdevmapper's source where this environment variable is honored. The NEWS file in the source tarball mentions it, but that's it. Where did you find documentation on this?
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 13:41 GMT
Damn. I was on an old git revision. I did not notice it was removed some time ago, changing to complete autodetection.
I will take a look at that later.
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 16:39 GMT
Ok, trying to debug this... It's interesting to run cryptsetup with --debug. Udev cookie is created, then increased to 1, then 2. Then it is decreased to 1 and hangs with something like:

Udev cookie 0xd4de038(semid 229376) waiting for zero

(Sorry, no complete log as it is a real machine and limited in functionality on shutdown. Probably it's easier for you to capture this on a virtual machine.)

This brings up two questions:

1. If I understand correctly udevd is no longer running. Why is udev stuff used at all? Though it looks like some of the udev calls succeed.

2. Why does it fail to decrease to zero? And who is to be blamed? libdevmapper? udev? systemd?
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 17:39 GMT
Hrmm, so I think i have a better understanding of what's happening here... famous last words, I guess.

> 1. If I understand correctly udevd is no longer running. Why is udev stuff used at all? Though it looks like some of the udev calls succeed.

Because of the way that libdevmapper looks to see if udev is running [1]. In the sysvinit case, we shutdown udevd and everything comes down with it. This includes a bunch of stuff in /run/udev/. In the systemd case, a bunch of this hangs around because its controlled by socket activation. _check_udev_is_running() boils down to checking existance of /run/udev/queue.bin which I'm guessing succeeds for some odd reason (triggering the udev sync logic). If I'm right, nuking the /run/udev dir from within the shutdown script would fix this, as a silly hack. Sadly, I still can't seem to reproduce this hang anymore, but I'm going to keep trying.

> 2. Why does it fail to decrease to zero? And who is to be blamed? libdevmapper? udev? systemd?

No idea, but that's why it hangs (I've seen the hanging semop(2) operation via strace)... libdevmapper is at fault here, because its a pile of crap.

[1] http://sourceware.org/git/?p=lvm2.git;a=blob;f=libdm/libdm-common.c;h=fd775ca3aa8da7f0f56cddf2c9a28d66635f380c;hb=HEAD#l1795
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 17:47 GMT
Ok, it looks like I can reproduce this only on 3.4. The kernel in testing doesn't show this issue.
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 19:18 GMT
> In the systemd case, a bunch of this hangs around because its controlled by socket activation.

ps tells me it does not hang around, but I suppose you are right that systemd starts it on request.

I will try if removing /run/udev helps on my girlfriend's system.

I think anybody should report this upstream to dm-devel. Would you like to or should I?
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 19:23 GMT
Sure, go for it -- it's worth seeing if you can reproduce on 3.5 as well, and maybe then just confirming that something was fixed between 3.4 and 3.5. I glanced through the drivers/md subdir in linux.git and didn't see anything specific to this, but I won't pretend to know what's going on in that part of the world.
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 19:27 GMT
Purging /run/udev and its content seems to do the trick for 3.4.7. Ok, lets try 3.5.0 now...
Comment by Christian Hesse (eworm) - Monday, 06 August 2012, 19:58 GMT
For any reason cryptsetup --debug tells me it does not use udev when running 3.5.0 (compiled from git (AUR package linux-git), I do not want to mess with testing kernels on stable systems)... I double checked: The line removing /run/udev was commented.

So far we have two possibilities:
* Adding the workaround to mkinitcpio or
* getting 3.5.0 into [core] asap.

I'm fine as is, my systems do work now. (And my girlfriend does no longer complain. ;)
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 20:07 GMT
3.5 will end up in core soon enough. I'm not going to encourage we rush the build into [core] because of a bug that affects <1% users.
Comment by Tom Gundersen (tomegun) - Monday, 06 August 2012, 20:27 GMT
This is worth noting for -lts users.
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 20:33 GMT
Why? This bug shouldn't affect 3.0.
Comment by Tom Gundersen (tomegun) - Monday, 06 August 2012, 20:44 GMT
@Dave: i missed that. do we know what version it was introduced in? (I don't know what our next -lts will be, but it might be worth remembering).
Comment by Dave Reisner (falconindy) - Monday, 06 August 2012, 21:14 GMT
Oh joy. Apparently this bug was backported to the lts. I staunchly remember 3.3 not having this problem.
Comment by Christian Hesse (eworm) - Monday, 03 September 2012, 19:08 GMT
Linux 3.5.3 still has this problem, but it happens very rarely.
Comment by Greg (dolby) - Monday, 15 October 2012, 12:16 GMT
What about 3.6.2?
Comment by Christian Hesse (eworm) - Monday, 15 October 2012, 18:41 GMT
I think my girlfriend will kill my if I disable the mkinitcpio workaround...
Ok, will try to find some time to test this. ;)
Comment by Dave Reisner (falconindy) - Monday, 15 October 2012, 18:43 GMT
I can't reproduce this at all. Reopen if its still a problem.

Loading...