FS#34757 : [ypbind-mt] systemd starts autofs before ypbind has bound to a server.

FS#34757 - [ypbind-mt] systemd starts autofs before ypbind has bound to a server.

Attached to Project: Arch Linux
Opened by Brian BIdulock (bidulock) - Friday, 12 April 2013, 23:32 GMT
Last edited by Dave Reisner (falconindy) - Friday, 08 August 2014, 13:32 GMT

Task Type	Bug Report
Category	Upstream Bugs
Status	Closed
Assigned To	Tom Gundersen (tomegun)
Architecture	All
Severity	Low
Priority	Normal
Reported Version
Due in Version	Undecided
Due Date	Undecided
Percent Complete
Votes	0
Private	No

Details

Description:

Although the 1.37 calls sd_notify() it does so in the main thread just before it exits. When the parent exists, systemd figures that ypbind has finished starting and goes about starting autofs (which depends on ypbind). The problem is that systemd is so freaking fast that autofs performs nss-switch table lookups before ypbind has bound to a server and autofs fails to load nis maps and proceeds assuming there are none. When autofs is mounting home directories from nis maps, it fails to start them and must be manually restarted.

This is probably an upstream problem, but it causes problems. I placed a startup script called ypwait and executed it as an ExecPre in autofs.service. The script simply peforms ypwhich until it gets a result (and sleeps for a whole darn second otherwise) to stall autofs startup until ypbind has finished binding to an NIS server. Ultimately, ypbind-mt should not call sd_notify() until it has successfully bound to an NIS server and the ypbind.service should have Notify=always. I would work up a patch, but ypbind-mt is the worst ratsnest I have seen in a while.

Any advice you could give?

Additional info:
* package version(s) 1.37 (all really)

Steps to reproduce:

This task depends upon

Closed by Dave Reisner (falconindy)
Friday, 08 August 2014, 13:32 GMT
Reason for closing: Fixed
Additional comments about closing: Current ypbind-mt 1.37.1-5 works quite fine.

Comment by Tom Gundersen (tomegun) - Saturday, 13 April 2013, 00:22 GMT

Thanks for looking into this. This definitely sounds like an upstream problem, so please contact them about it.

Comment by Brian BIdulock (bidulock) - Saturday, 13 April 2013, 00:36 GMT

I sent upstream a pointer to this report and asked Thorsten whether he could move the sd_notify() to the place where the ypserver binds for the first time.

Comment by Brian BIdulock (bidulock) - Tuesday, 16 April 2013, 12:55 GMT

I got the scoop from upstream. He says that RedHat and OpenSUSE use an ExecStartPost in their .service files to delay dependent units like autofs from starting until a bind has been performed. He also said that the nm dbus thing was a bad idea in the first place. With that, I think that the attached .service file will do the trick to resolve this problem.

Well, let me test it thouroughly first. Back in a moment...

Comment by Brian BIdulock (bidulock) - Tuesday, 16 April 2013, 14:01 GMT

The attached service file works nicely.

ypbind.service (0.4 KiB)

Comment by Dave Reisner (falconindy) - Tuesday, 16 April 2013, 14:09 GMT

I think you have a strange definition of "nicely". Your single line of /bin/sh isn't even /bin/sh compatible.

Please just fix sd_notify support in ypbind and offer it upstream. It's trivial to do.

Comment by Brian BIdulock (bidulock) - Tuesday, 16 April 2013, 15:25 GMT

Works on Arch Linux's /bin/sh.

If you prefer:

ExecStartPost=/bin/sh -c "while ! /usr/bin/ypwhich; do sleep 1; done"

Comment by Dave Reisner (falconindy) - Tuesday, 16 April 2013, 15:32 GMT

Because Arch's /bin/sh is a symlink to bash. It should not work.

"If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible, while conforming to the POSIX standard as well."

That it does work is, frankly, a bug in bash.

$ dash -c 'for ((i=0; i<5; i++)); do echo $i; done'
dash: 1: Syntax error: Bad for loop variable

You would be better off invoking bash if you're going to use bash syntax. Better yet, just fix the problem at the source rather than hacking around it.

Comment by Brian BIdulock (bidulock) - Wednesday, 17 April 2013, 09:27 GMT

The problem isn't at the source:

sd_notify(3): READY=1 tells the init system that daemon startup is finished. This is only used by systemd if the service definition has Type=notify set.

systemd.service(5): Behavior of notify is similar to simple, however, it is expected that the daemon sends a notification message via sd_notify(3) or an equivalent call when it finished starting up. (Note this is non-forking.)

ypbind(8): -debug: starts ypbind in debug mode. ypbind will not put itself into the background, and error messages and debug output are written to standard error.

So, unless you want your logs/journal filled with ypbind debugging messages, it is necessary to set Type=forking.

systemd.service(5): If set to forking it is expected that the process configured with ExecStart= will call fork() as part of its start-up. The parent process is expected to exit when start-up is complete and all communication channels set up.

That is what ypbind does, forks() and main exits once the communications channels have started up. It also calls sd_notify(3) so that you can run it with Type=notify in debug mode under systemd.

So it seems upstream ypbind is behaving quite properly.

So unless we at least put ExecStartPost=/usr/bin/sleep 5, logins and autofs mounts will proceed and fail routinely on boot.

Comment by Brian BIdulock (bidulock) - Wednesday, 17 April 2013, 09:40 GMT

The old rc.d files used to query with ypwhich and loop sleeping a second (displaying a '.' or a count). Debian still does this (dash compatible):

log_action_begin_msg "binding to YP server"
for i in 1 2 3 4 5 6 7 8 9 10
do
sleep 1
log_action_cont_msg "."
if [ "`ypwhich 2>/dev/null`" != "" ]
then
echo -n " done] "
bound="yes"
break
fi
done

ExecStartPost= is precisely the replacement for this kind of thing. They obviously don't know about the return code.

ExecStartPost=/bin/sh -c "for i in 1 2 3 4 5; do ! ypwhich >/dev/null || break; sleep 1; done"

Is that better?

Comment by Dave Reisner (falconindy) - Wednesday, 17 April 2013, 12:58 GMT

> That is what ypbind does, forks() and main exits once the communications channels have started up. It also calls sd_notify(3) so that you can run it with Type=notify in debug mode under systemd.

I'm reading this as:

1) In forking mode: ypbind-mt isn't actually ready to do work when main exits in forking mode. If it was actually ready, the sleep loop wouldn't be required.
2) In notify mode: upstream forces you to run in debug mode, but this works "properly" and doesn't need the sleep loop hack. Forcing debug logging on the user for a more preferable mode of operation under systemd seems like broken behavior.

How is this not an upstream problem?

> Is that better?
Syntactically, sure. 'ypwhich >/dev/null && break' would be a little more readable, but it's still a filthy hack.

Comment by Brian BIdulock (bidulock) - Wednesday, 17 April 2013, 15:44 GMT

No, systemd only wants to know that all sockets (communications channels) are open by the time that sd_notify(3) is sent. In forking it expects that the daemon sets up the sockets, forks and then exits, which is what it does.

The sleep loop has been done in sysvinit ypbind/nis scripts since time immemorial (well, 15 years). I have sysvinit scripts going back to redhat 7.3 that do this. redhat, mageia, mandriva, mandrake, suse, opensuse, fedora, debian, ubuntu, ... everybody but Arch.

I attach the ypbind initscript from RedHat 7.3 from Aug 13, 2001, current debian squeeze, and fedora 18's *.service file and supporting script. See?

ypbind (2.2 KiB)

nis (4.7 KiB)

ypbind.service (0.6 KiB)

ypbind-post-waitbind (1.4 KiB)

Comment by Dave Reisner (falconindy) - Wednesday, 17 April 2013, 16:07 GMT

> The sleep loop has been done in sysvinit ypbind/nis scripts since time immemorial (well, 15 years). I have sysvinit scripts going back to redhat 7.3 that do this. redhat, mageia, mandriva, mandrake, suse, opensuse, fedora, debian, ubuntu, ... everybody but Arch.
You're completely missing/ignoring my point. *WHY* is this needed? You claim the daemon behaves properly -- it signals to systemd (either by sd_notify or by main exiting) that it's setup all its sockets and is ready to handle requests. Why then do you need to wait even further before it's capable of handling requests?

Comment by Brian BIdulock (bidulock) - Wednesday, 17 April 2013, 16:20 GMT

Because follow-on units cannot start until a bind has been acheived. It is not the daemon's responsibility to determine what follow-on units can or cannot start, there are quite a few NIS maps that can be used for various purposes. It depends on how nssswitch, libc, nscd and others are set up as to what might depend on a bind, or not. Name service cacheing can be set up to serve the last cached maps, just not usually.

It has always been the init system's responsibility to determine the dependencies between services. If there is a bug in upstream, its systemd not ypbind. It has facilities for major subsystems but seems to have forgotten about name services.

Comment by Brian BIdulock (bidulock) - Friday, 04 July 2014, 22:59 GMT

The .service file in ypbind package is working quite nicely for quite a while now. Can we close this report?

	Tasks related to this task (0)

Duplicate tasks of this task (0)

Arch Linux

FS#34757 - [ypbind-mt] systemd starts autofs before ypbind has bound to a server.

Details

Loading...