FS#17138 - [openssh] startup scripts should check pid files more closely
Attached to Project:
Arch Linux
Opened by Clemens Fruhwirth (therp) - Saturday, 14 November 2009, 22:43 GMT
Last edited by Guillaume ALAUX (galaux) - Monday, 17 January 2011, 20:47 GMT
Opened by Clemens Fruhwirth (therp) - Saturday, 14 November 2009, 22:43 GMT
Last edited by Guillaume ALAUX (galaux) - Monday, 17 January 2011, 20:47 GMT
|
Details
Description:
/etc/rc.d/sshd starts with PID=`cat /var/run/sshd.pid 2>/dev/null` and has [ -z "$PID" ] && /usr/sbin/sshd $SSHD_ARGS in its "start)" part. So when there is a left-over file at /var/run/sshd.pid, sshd will fail to start. This issue seems to come from a broad problem class, namely "do we believe PID files?". The answer should be "NO!". At least we should do some sanity checks: 1) Does this PID exist? (check /proc/$PID/) 2) Is /proc/$PID/exe the daemon, we are about to start/kill? I think Debian solved those problem quite elegantly with their start-stop-daemon facility. In any way, sshd must be fixed. Being locked out of my box with two country borders in between is just annoying. Additional info: * openssh 5.3p1-2 Steps to reproduce: sudo echo "1123" > /var/run/sshd.pid starting and restarting will fail. |
This task depends upon
Closed by Guillaume ALAUX (galaux)
Monday, 17 January 2011, 20:47 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed in 5.6p1-2
Monday, 17 January 2011, 20:47 GMT
Reason for closing: Fixed
Additional comments about closing: Fixed in 5.6p1-2
http://en.wikipedia.org/wiki/Tmpfs
Something like this in /etc/fstab (not tested but should work, not sure if the mode is correct though) :
none /var/run tmpfs defaults,mode=0755,size=10M 0 0
none /var/lock tmpfs defaults,mode=1777,size=10M 0 0
Thank you for your patch, Stefan. There is only a slight problem with it. cmdline usually contains all args zero-separated. So if $SSH_ARGS is defined /usr/sbin/sshd will be called with these args. Try to add -4 to the SSH_ARGS and then your check `cat /proc/$PID/cmdline` check will always fail. At least this is under bash semantics, because bash removes the zeros for some reason (the string to compare with would become "/usr/sbin/sshd-4"). zsh keeps the zeros btw.
Please reconsider the start-stop-daemon solution.
http://www.google.com/codesearch/p?hl=en&sa=N&cd=3&ct=rc#BU4XUsZjGHc/baselayout-1.12.10/src/start-stop-daemon.c&q=start-stop-daemon.c
There is nothing special about it. It is a C file that has been tuned to do this job right. Building it does not require any obscure libs.
That's how our friends at Debian launch openssh-server
start-stop-daemon --start --quiet --oknodo --pidfile /var/run/sshd.pid --exec /usr/sbin/sshd -- $SSHD_OPTS
Anyway.
Couldn't the SSH_ARGS issue be solved by something like this?
awk -F \000 '{print $1}' < /proc/$PID/cmdline
I suppose not if the 0 is removed... perhaps this would work?
awk -F \000 '{print $1}' /proc/$PID/cmdline
I'll do some testing.
Btw, what do you mean by "cmdline usually contains all args zero-separated"? Could it be without args? Or space-separated? Or not separated at all?
I'm not sure who's responsible, but it could be the kernel or /usr/lib/crt1.o. Anyway, the usual layout is a string of zero-terminated strings. There are some mechanisms for the userspace program to change the content of /proc/../cmdline. For instance look at kdeinit, I have
00000000 6b 64 65 69 6e 69 74 34 3a 20 6b 69 6f 5f 66 69 |kdeinit4: kio_fi|
00000010 6c 65 20 5b 6b 64 65 69 6e 69 74 5d 20 66 69 6c |le [kdeinit] fil|
00000020 65 20 6c 6f 63 61 6c 3a 2f 74 6d 70 2f 6b 73 6f |e local:/tmp/kso|
00000030 63 6b 65 74 2d 63 6c 65 6d 65 6e 73 |cket-clemens|
No zero terminated strings, also this is no legal executable name neither is this a legal set of arguments. By looking at the source of start-stop-daemon.c I found that it also reads /proc/.../stat.
Regarding awk: awk -F \\000 '{print $1}' < /proc/../cmdline works for me
If the format of cmdline can change, then I perhaps a better solution would be to do something with the output from ps?
I wouldn't bother that much. sshd does not tamper with its cmdline content, so here it safe to use it, I'd say.
b) Why even rely on the pid file if it can be stale? Most of our other rc.d scripts use something similar to:
PID=`pidof -o %PPID /usr/bin/sshd`
Need to test it though, so sshd doesn't fail on its own because of it. Iirc sshd created the pid file, so perhaps it won't start if it's incorrect.
-PID=`cat /var/run/sshd.pid 2>/dev/null`
+PID=`pidof -o %PPID /usr/sbin/sshd`
A much smaller fix. I'd use that instead.
(Note : /usr/sbin/sshd, not /usr/bin/sshd)
Please test :-)
http://repos.archlinux.org/wsvn/packages/openssh/repos/core-i686/sshd
http://repos.archlinux.org/wsvn/packages/openssh/trunk/sshd
I also disagree that we shouldn't rely on a programs pid file and instead use pidof. Pidof will just return one or more (!!) processes that are started using the specified binary. There is no guaranty that this process was started by a previous daemon script.
For example we had used pidof in mysql's daemon script. The problem is that there e.g. akonadi or other projects start their own instance of mysql as user. As a result we get more than on pid and the scripts will just handle the first (or simply fails).
PID=$(cat $PIDFILE)
if [ "$(readlink /proc/$PID/exe 2>/dev/null)" != "/usr/sbin/sshd" ]; then
PID=
rm $PIDFILE
fi
This should solve the problem of invalid/stale pidfiles. Using pidof is deadly here, as stated by Pierre above.
Note that if we are really untrusting anything on the file (as opposed to considering just stale files), should the readlink call be quoted, the attacker could still force us to kill a different process:
mkdir /tmp/\ 123/
ln -s /usr/sbin/sshd /tmp/\ 123/exe
echo ../tmp/\ 123 > /var/run/sshd.pid
PID=$(pgrep -f /usr/sbin/sshd)
This finds only the running sshd daemon if any and not the ssh sessions. If you want find/kill only the sessions you could use:
pgrep -f 'sshd:'
pkill -f 'sshd:'
This is IMHO useful when doing a reboot or shutdown to terminate ssh sessions before network is down (stop section of sshd script):
...
if [ "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "6" ]; then
pkill -f 'sshd:'
sleep 2
fi
...
Adopting Debian's start-stop-daemon is worth considering IMO.
PIDFILE=/var/run/sshd.pid
PID=$(cat $PIDFILE 2>/dev/null)
if [ "$(readlink /proc/$PID/exe 2>/dev/null)" != "/usr/sbin/sshd" ]; then
PID=
rm $PIDFILE 2>/dev/null
fi
It does enable more careful check on pid file and works even if pid file survives after sshd goes down. Of course it won't solve the case where someone modifies the content of the pid file but that would require that someone to be root. In such case the case is "lost" anyway.
This looks very simple and clean to me and would prevent us from either adding more dependencies or new packages like start-stop-service. Am I missing a case?
Sorry for the sarcasm, but get over it, guys, and adopt a solution that has been tested and works well for others. start-stop-daemon.c is 1400 LOC. How could you ever think that you can have all the smartness in just 6 lines of shell script? Forget about it. Also other daemons want the fix as well, as pid files are widely used, and what do you do when you find out that you forget something vital and common after you pushed this readlink check from above to 100 different startup scripts? Patch them all? Good luck. I'll go for a coffee in the meanwhile, or maybe a whole coffee plantation.
Just to give you an example: http://clemens.endorphin.org/start-stop-daemon.c function pid_is_exec. It is a function that pretty much tries to do what you are doing with readlink, except that start-stop-daemon goes down to inode comparison level and not file names. There is a strange check for " (deleted)" in there. I have clue why it is there, but apparently Debian devs found it necessary. Oh yeah, and start-stop-daemon checks whether /proc is actually mounted.
My argument is: Don't reinvent the wheel. Make us of free software knowledge out there and adopt it, improve it and reshare it.
(And please remove me from that bug if you don't. It's just too sad otherwise.)
I think there might be a misunderstanding here: I don't think that we "can have all the smartness [of start-stop-daemons] in just 6 lines of shell script". I never said such thing. I just think these 6 lines solve your initial issue: "startup scripts should check pid files more closely" especially when left-over pid file exists. As far as I'm concerned this would solve the bug report you made in a pretty simple and straightforward way.
Now I think the "What about including start-stop-daemon into Arch?" matter is something different and this bug report is probably not the right place for that. What about writing to the ML linking this issue to prove that it would be relevant? You would involve the whole Arch community.
> Oh cool, only a quarter and year, and we got a hexagon as reinvention of the wheel.
It's a bit tough to be made fun of when trying to solve 8 issues about a major package. Please keep this away from bug reports and ML.
Despite your sarcasm, I am still in favor of this clean solution. Up to you to bring the start-stop-daemon thing to the Arch community. :)