FS#17138 - [openssh] startup scripts should check pid files more closely

Attached to Project: Arch Linux
Opened by Clemens Fruhwirth (therp) - Saturday, 14 November 2009, 22:43 GMT
Last edited by Guillaume ALAUX (galaux) - Monday, 17 January 2011, 20:47 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Aaron Griffin (phrakture)
Thomas Bächler (brain0)
Guillaume ALAUX (galaux)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


/etc/rc.d/sshd starts with

PID=`cat /var/run/sshd.pid 2>/dev/null`

and has
[ -z "$PID" ] && /usr/sbin/sshd $SSHD_ARGS

in its "start)" part. So when there is a left-over file at /var/run/sshd.pid, sshd will fail to start.

This issue seems to come from a broad problem class, namely "do we believe PID files?". The answer should be "NO!". At least we should do some sanity checks:
1) Does this PID exist? (check /proc/$PID/)
2) Is /proc/$PID/exe the daemon, we are about to start/kill?
I think Debian solved those problem quite elegantly with their start-stop-daemon facility. In any way, sshd must be fixed. Being locked out of my box with two country borders in between is just annoying.

Additional info:
* openssh 5.3p1-2

Steps to reproduce:
sudo echo "1123" > /var/run/sshd.pid

starting and restarting will fail.
This task depends upon

Closed by  Guillaume ALAUX (galaux)
Monday, 17 January 2011, 20:47 GMT
Reason for closing:  Fixed
Additional comments about closing:  Fixed in 5.6p1-2
Comment by Stefan Hermansen (scorpyn) - Saturday, 21 November 2009, 00:05 GMT
Try this.
Comment by Stefan Hermansen (scorpyn) - Saturday, 21 November 2009, 00:52 GMT
Also, you could mount /var/run and /var/lock as tmpfs to prevent stale lock files after reboot.


Something like this in /etc/fstab (not tested but should work, not sure if the mode is correct though) :

none /var/run tmpfs defaults,mode=0755,size=10M 0 0
none /var/lock tmpfs defaults,mode=1777,size=10M 0 0
Comment by Stefan Hermansen (scorpyn) - Saturday, 21 November 2009, 13:44 GMT
This patch is probably better than the last one I made.
Comment by Clemens Fruhwirth (therp) - Monday, 23 November 2009, 09:43 GMT

Thank you for your patch, Stefan. There is only a slight problem with it. cmdline usually contains all args zero-separated. So if $SSH_ARGS is defined /usr/sbin/sshd will be called with these args. Try to add -4 to the SSH_ARGS and then your check `cat /proc/$PID/cmdline` check will always fail. At least this is under bash semantics, because bash removes the zeros for some reason (the string to compare with would become "/usr/sbin/sshd-4"). zsh keeps the zeros btw.

Please reconsider the start-stop-daemon solution.
There is nothing special about it. It is a C file that has been tuned to do this job right. Building it does not require any obscure libs.

That's how our friends at Debian launch openssh-server
start-stop-daemon --start --quiet --oknodo --pidfile /var/run/sshd.pid --exec /usr/sbin/sshd -- $SSHD_OPTS
Comment by Stefan Hermansen (scorpyn) - Monday, 23 November 2009, 14:01 GMT
The start-stop-daemon solution is for the arch devs to decide (ie not me). I don't have any experience with it so I don't really have any opinion on it.


Couldn't the SSH_ARGS issue be solved by something like this?

awk -F \000 '{print $1}' < /proc/$PID/cmdline

I suppose not if the 0 is removed... perhaps this would work?

awk -F \000 '{print $1}' /proc/$PID/cmdline

I'll do some testing.

Btw, what do you mean by "cmdline usually contains all args zero-separated"? Could it be without args? Or space-separated? Or not separated at all?
Comment by Clemens Fruhwirth (therp) - Monday, 23 November 2009, 14:12 GMT
Is there a way to escalate this issue up to the devs?

I'm not sure who's responsible, but it could be the kernel or /usr/lib/crt1.o. Anyway, the usual layout is a string of zero-terminated strings. There are some mechanisms for the userspace program to change the content of /proc/../cmdline. For instance look at kdeinit, I have

00000000 6b 64 65 69 6e 69 74 34 3a 20 6b 69 6f 5f 66 69 |kdeinit4: kio_fi|
00000010 6c 65 20 5b 6b 64 65 69 6e 69 74 5d 20 66 69 6c |le [kdeinit] fil|
00000020 65 20 6c 6f 63 61 6c 3a 2f 74 6d 70 2f 6b 73 6f |e local:/tmp/kso|
00000030 63 6b 65 74 2d 63 6c 65 6d 65 6e 73 |cket-clemens|

No zero terminated strings, also this is no legal executable name neither is this a legal set of arguments. By looking at the source of start-stop-daemon.c I found that it also reads /proc/.../stat.

Regarding awk: awk -F \\000 '{print $1}' < /proc/../cmdline works for me
Comment by Stefan Hermansen (scorpyn) - Monday, 23 November 2009, 14:25 GMT
It's already assigned to 2 arch devs, I suppose they just haven't had time to look at it yet.

If the format of cmdline can change, then I perhaps a better solution would be to do something with the output from ps?
Comment by Clemens Fruhwirth (therp) - Monday, 23 November 2009, 14:49 GMT
"ps" uses the same information, so it basically prints /proc/$PID/cmdline with spaces instead of zero.

I wouldn't bother that much. sshd does not tamper with its cmdline content, so here it safe to use it, I'd say.
Comment by Aaron Griffin (phrakture) - Wednesday, 25 November 2009, 17:46 GMT
a) The start-stop-daemon thing is totally unrelated. Please open a separate feature request for that. My personal opinion, however, is that it is silly and overkill
b) Why even rely on the pid file if it can be stale? Most of our other rc.d scripts use something similar to:
PID=`pidof -o %PPID /usr/bin/sshd`
Comment by Stefan Hermansen (scorpyn) - Friday, 27 November 2009, 08:19 GMT
Didn't know pidof existed :-)

Need to test it though, so sshd doesn't fail on its own because of it. Iirc sshd created the pid file, so perhaps it won't start if it's incorrect.
Comment by Stefan Hermansen (scorpyn) - Saturday, 28 November 2009, 22:14 GMT
It appears that using pidof works correctly :-)

-PID=`cat /var/run/sshd.pid 2>/dev/null`
+PID=`pidof -o %PPID /usr/sbin/sshd`

A much smaller fix. I'd use that instead.

(Note : /usr/sbin/sshd, not /usr/bin/sshd)

Please test :-)
Comment by Aaron Griffin (phrakture) - Tuesday, 01 December 2009, 21:40 GMT
Committed to trunk. This will be in the next release
Comment by Clemens Fruhwirth (therp) - Tuesday, 05 January 2010, 21:16 GMT
Iirc, I have seen an update to openssh in the meantime, but there is no pidof in my /etc/rc.d/sshd. Is this fix already released? Thnx
Comment by Clemens Fruhwirth (therp) - Tuesday, 26 January 2010, 20:25 GMT
I can't see a fix for that in trunk (assuming the following is trunk)
Comment by Aaron Griffin (phrakture) - Friday, 29 January 2010, 21:01 GMT Comment by Pierre Schmitz (Pierre) - Wednesday, 10 March 2010, 12:45 GMT
I have reverted this chagne as it leads in a more critical bug here: http://bugs.archlinux.org/task/18611

I also disagree that we shouldn't rely on a programs pid file and instead use pidof. Pidof will just return one or more (!!) processes that are started using the specified binary. There is no guaranty that this process was started by a previous daemon script.

For example we had used pidof in mysql's daemon script. The problem is that there e.g. akonadi or other projects start their own instance of mysql as user. As a result we get more than on pid and the scripts will just handle the first (or simply fails).
Comment by Thomas Bächler (brain0) - Wednesday, 10 March 2010, 13:00 GMT
Okay, now I have to come in on this bug, too. Clemens' first post actually had a valid solution in it, this might work:

if [ "$(readlink /proc/$PID/exe 2>/dev/null)" != "/usr/sbin/sshd" ]; then

This should solve the problem of invalid/stale pidfiles. Using pidof is deadly here, as stated by Pierre above.
Comment by Pierre Schmitz (Pierre) - Wednesday, 10 March 2010, 13:06 GMT
I agree. This solution should be safe. I just put a new package into testing, but it's no big deal to add another one with that solution applied.
Comment by Linas (Linas) - Wednesday, 10 March 2010, 16:31 GMT
Seems safe.
Note that if we are really untrusting anything on the file (as opposed to considering just stale files), should the readlink call be quoted, the attacker could still force us to kill a different process:

mkdir /tmp/\ 123/
ln -s /usr/sbin/sshd /tmp/\ 123/exe
echo ../tmp/\ 123 > /var/run/sshd.pid
Comment by Jörg Kriegel (sokoban65) - Wednesday, 17 March 2010, 22:35 GMT
Consider using:

PID=$(pgrep -f /usr/sbin/sshd)

This finds only the running sshd daemon if any and not the ssh sessions. If you want find/kill only the sessions you could use:

pgrep -f 'sshd:'
pkill -f 'sshd:'

This is IMHO useful when doing a reboot or shutdown to terminate ssh sessions before network is down (stop section of sshd script):

if [ "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "6" ]; then
pkill -f 'sshd:'
sleep 2
Comment by Aaron Griffin (phrakture) - Wednesday, 17 March 2010, 23:12 GMT
That does add a dep to the package, though
Comment by Florian Pritz (bluewind) - Saturday, 03 July 2010, 14:58 GMT
any decision yet?
Comment by Clemens Fruhwirth (therp) - Friday, 22 October 2010, 07:48 GMT
Ping. And use start-stop-daemon. The Debian guys did it right, as they always do.
Comment by Thomas Bächler (brain0) - Friday, 22 October 2010, 09:57 GMT
Erm, this has been sitting here for a while. Clearly, we need a new maintainer for openssh, as Aaron is too busy for any package maintenance and I am already sloppy in maintaining my current packages, so I won't adopt even more.

Adopting Debian's start-stop-daemon is worth considering IMO.
Comment by Guillaume ALAUX (galaux) - Tuesday, 11 January 2011, 22:31 GMT
I tested Thomas' solution:

PID=$(cat $PIDFILE 2>/dev/null)
if [ "$(readlink /proc/$PID/exe 2>/dev/null)" != "/usr/sbin/sshd" ]; then
rm $PIDFILE 2>/dev/null

It does enable more careful check on pid file and works even if pid file survives after sshd goes down. Of course it won't solve the case where someone modifies the content of the pid file but that would require that someone to be root. In such case the case is "lost" anyway.

This looks very simple and clean to me and would prevent us from either adding more dependencies or new packages like start-stop-service. Am I missing a case?
Comment by Clemens Fruhwirth (therp) - Tuesday, 11 January 2011, 23:10 GMT
Oh cool, only a quarter and year, and we got a hexagon as reinvention of the wheel.

Sorry for the sarcasm, but get over it, guys, and adopt a solution that has been tested and works well for others. start-stop-daemon.c is 1400 LOC. How could you ever think that you can have all the smartness in just 6 lines of shell script? Forget about it. Also other daemons want the fix as well, as pid files are widely used, and what do you do when you find out that you forget something vital and common after you pushed this readlink check from above to 100 different startup scripts? Patch them all? Good luck. I'll go for a coffee in the meanwhile, or maybe a whole coffee plantation.

Just to give you an example: http://clemens.endorphin.org/start-stop-daemon.c function pid_is_exec. It is a function that pretty much tries to do what you are doing with readlink, except that start-stop-daemon goes down to inode comparison level and not file names. There is a strange check for " (deleted)" in there. I have clue why it is there, but apparently Debian devs found it necessary. Oh yeah, and start-stop-daemon checks whether /proc is actually mounted.

My argument is: Don't reinvent the wheel. Make us of free software knowledge out there and adopt it, improve it and reshare it.
(And please remove me from that bug if you don't. It's just too sad otherwise.)
Comment by Guillaume ALAUX (galaux) - Wednesday, 12 January 2011, 09:39 GMT
Hi Clemens,

I think there might be a misunderstanding here: I don't think that we "can have all the smartness [of start-stop-daemons] in just 6 lines of shell script". I never said such thing. I just think these 6 lines solve your initial issue: "startup scripts should check pid files more closely" especially when left-over pid file exists. As far as I'm concerned this would solve the bug report you made in a pretty simple and straightforward way.

Now I think the "What about including start-stop-daemon into Arch?" matter is something different and this bug report is probably not the right place for that. What about writing to the ML linking this issue to prove that it would be relevant? You would involve the whole Arch community.

> Oh cool, only a quarter and year, and we got a hexagon as reinvention of the wheel.
It's a bit tough to be made fun of when trying to solve 8 issues about a major package. Please keep this away from bug reports and ML.

Despite your sarcasm, I am still in favor of this clean solution. Up to you to bring the start-stop-daemon thing to the Arch community. :)