FS#40200 - [autofs] system shutdown hangs because systemd attempts unmount before autofs

Attached to Project: Community Packages
Opened by Brian BIdulock (bidulock) - Saturday, 03 May 2014, 10:18 GMT
Last edited by Ivy Foster (escondida) - Thursday, 10 October 2019, 02:59 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Lukas Fleischer (lfleischer)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Reboot or shutdown of system hangs when autofs indirect nfs mount is mounted. From attached log file systemd starts stopping autofs and then proceeds to attempt to unmount its indirect nfs mounts before autofs has had a chance to unmount them itself. Because autofs is compiled with --enable-ignore-busy mounts, it ignores the mounts that systemd has attempted to unmount and exits. Systemd times out on the unmounts before shutdown can proceed.

The solution is to add remote-fs.target to After= in autofs.service file. This allows automount to unmount indirect nfs mounts before systemd attempts to unmount them as demonstrated by second attached log file.

Additional info:

- autofs 5.0.9-1
- see attached logs

Steps to reproduce:

- mount an autofs indirect nfs mount
- reboot the system
This task depends upon

Closed by  Ivy Foster (escondida)
Thursday, 10 October 2019, 02:59 GMT
Reason for closing:  Implemented
Additional comments about closing:  Current version of autofs (5.1.4-3) has After=remote-fs.target, as OP recommended, in its service file.
Comment by Lukas Fleischer (lfleischer) - Thursday, 03 July 2014, 06:46 GMT
Maybe this should be reported upstream?
Comment by Brian BIdulock (bidulock) - Friday, 04 July 2014, 00:35 GMT
What would you tell upstream? The sample autofs.service file in samples/ works fine if you do not compile with --enable-ignore-busy. If I were upstream I would tell you that to use the default sample service file, compile with the default options...

automount(8) says this:

"If any autofs mount point directories are busy when the daemon is sent an exit signal the daemon will not exit. The exception to this is if autofs has been built with configure options to either ignore busy mounts at exit or force umount at exit. If the ignore busy mounts at exit option is used the filesystems will be left in a catatonic (non-functional) state and can be manually umounted when they become unused. If the force umount at exit option is used the filesystems will be umounted but the mount will not be released by the kernel until they are no longer in use by the processes that help them busy."

So it seems that by adding the --enable-ignore-busy flag we are asking autofs to leave filesystems "in a catatonic (non-functional) state".

Comment by Lukas Fleischer (lfleischer) - Friday, 04 July 2014, 07:17 GMT
Ok, I must confess that I don't fully understand what is happening here yet... Why does systemd only time out when "--enable-ignore-busy" is used? What exactly causes the hang and why doesn't it happen without the "--enable-ignore-busy" option? Could you please elaborate?
Comment by Lukas Fleischer (lfleischer) - Friday, 04 July 2014, 07:21 GMT
Oh, and note that Fedora also uses "--enable-ignore-busy" with the upstream unit file (from the redhat/ subdirectory but this is essentially the same as the service file included in samples/). So they should be affected by the same problem...
Comment by Brian BIdulock (bidulock) - Friday, 04 July 2014, 12:30 GMT
Without After=... remote-fs.target in the autofs.service file, as can be seen from the tmp2.log file attached to the report, systemd thinks that it can reach the remote-fs.target _before_ autofs has completed shutting down. So, in the logs, it starts unmounting /u2 (direct mount) and /home/brian (indirect mount) before autofs has had a chance to even attempt to unmount them.

When --enable-ignore-busy was not specified, autofs waits until all of its mounts are not busy _and_unmounted_ before exiting, so systemd cannot reach the next target until it exits, at which time the unmounts that systemd started have completed and things proceed anyway.

When --enable-ignore-busy was specified, autofs exits immediately, ignoring failed attempts to unmount them caused by systemd's attempt to unmount them earlier. systemd thinks it can reach the next target because autofs exited and waits for the unmounts that it started earlier to complete, but they will never complete because autofs is not around to complete the task and systemd goes into long timeout waiting for them to unmount before taking more drastic unmount actions: thus the hang.

Put another way, when --enable-ignore-busy was not specified, systemd thought that the unmounts that it started too early just completed on their own when in fact it was autofs waiting for them to become unbusy and umounting them that completed the unmount. When --enable-ignore-busy was specified, autofs immediately exits and does not complete systemd's unmount operations for it, so they go into long timeout.

Neither situation is totally correct. systemd should not attempt to unmount remote filesystem mounts before autofs has had a chance to at least try to unmount them. Adding remote-fs.targ to After= in the autofs.service file fixes this by forcing systemd to defer its remote filesystem unmount attempts until after autofs has exitted. As you can see from the tmp3.log file I attached, the filesystem were in fact never busy: autofs just ignored a spew of errors caused by systemd's invalid attempt to unmount them earlier.

So, before --enable-ignore-busy, autofs did the right thing shutting down even though systemd did the wrong thing (attempting to umount them too soon). After --enable-ignore-busy, autofs did the wrong thing (exiting on error) after systemd did the wrong thing (attempting to umount too soon). Placing After=... remote-fs.target in autofs.service causes systemd to do the right thing (wait for autofs to exit before attempting to unmount remote filesystems) after autofs has been allowed to do the right thing (attempt to and succeed in unmounting them) as they were never busy in the first place.

I should point out that the reason the indirect mount is not busy when autofs goes to unmount it is because my autofs.service file has Before=systemd-user-sessions.service it it as well. That was another bug report related to ypbind that I will dig out the number for you if you want.
Comment by Lukas Fleischer (lfleischer) - Saturday, 05 July 2014, 14:20 GMT
Thank you for your long explanation. You're basically saying that "After=remote-fs.target" is always correct, right? Could you please report this upstream?
Comment by Brian BIdulock (bidulock) - Saturday, 05 July 2014, 20:05 GMT
I tried, but vger doesn't like my mail address (due to a mail server issue in 2006 if you can believe it)! Do you know of a way I can submit an upstream bug without using vger?
Comment by Lukas Fleischer (lfleischer) - Sunday, 06 July 2014, 19:12 GMT
No, sorry. Can you try to contact a mailing list administrator (try postmaster@vger.kernel.org if you don't find a better contact address) and ask them to fix that issue? Or just use another email address?
Comment by Brian BIdulock (bidulock) - Saturday, 12 July 2014, 21:03 GMT
Unfortunately I can't. vger doesn't like me no matter which address I send to.
Comment by Lukas Fleischer (lfleischer) - Tuesday, 26 August 2014, 12:52 GMT
Maybe related:

> When you send there email, do make sure that all of the email headers, both
> visible and transport level, have same addresses in them. People experience
> problems when for example ``From:'', ``Sender:'' and possible ``Reply-To:''
> headers present different addresses. The most common manifestation is
> complete silence from VGER!

If it still does not work, could you please try another email address?
Comment by Lukas Fleischer (lfleischer) - Sunday, 17 April 2016, 06:20 GMT
What's the status of this?
Comment by Ivy Foster (escondida) - Thursday, 10 October 2019, 02:58 GMT
I'm going to take the liberty of closing this task, because the current autofs.service implements After=remote-fs.target as the original poster suggested, and there's been no response to lfleischer's request for more information for more than 3 years.

Loading...