FS#32380 - [slim][systemd] Shutdown/Reboot processes sometimes hang on slim
Attached to Project:
Arch Linux
Opened by Stefano (Ste089) - Friday, 02 November 2012, 00:31 GMT
Last edited by Evangelos Foutras (foutrelis) - Wednesday, 23 April 2014, 17:04 GMT
Opened by Stefano (Ste089) - Friday, 02 November 2012, 00:31 GMT
Last edited by Evangelos Foutras (foutrelis) - Wednesday, 23 April 2014, 17:04 GMT
|
Details
Description: Sometimes with the Shutdown/Reboot processes
the system hangs on slim process (if I've well understood by
the attached file).
Additional info: The attached file was done with the procedure described in the Arch wiki in the link to the freedesktop wiki (shutdown debug.sh). Package versions: slim 1.3.4-4 ; systemd 195-2 Steps to reproduce: randomly rebooting or during shutdown. |
This task depends upon
Closed by Evangelos Foutras (foutrelis)
Wednesday, 23 April 2014, 17:04 GMT
Reason for closing: Works for me
Additional comments about closing: Please reopen if it's still an issue.
Wednesday, 23 April 2014, 17:04 GMT
Reason for closing: Works for me
Additional comments about closing: Please reopen if it's still an issue.
[ 14.116322] systemd[1]: Forked /usr/bin/slim as 398
[ 283.254759] systemd[1]: slim.service changed running -> stop-sigterm
[ 283.709605] systemd[1]: slim.service: cgroup is empty
[ 373.180420] systemd[1]: slim.service stopping timed out. Killing.
[ 373.180905] systemd[1]: Got SIGCHLD for process 398 (slim)
[ 373.180943] systemd[1]: Child 398 died (code=killed, status=9/KILL)
[ 373.180945] systemd[1]: Child 398 belongs to slim.service
[ 373.180970] systemd[1]: slim.service changed stop-sigkill -> failed
Any chance you could attach the output of
# systemctl status slim.service
?
slim.serivce.service
Loaded: error (Reason: No such file or directory)
Active: inactive (dead)
If I run "# systemctl status slim" the output is:
slim.service - SLiM Simple Login Manager
Loaded: loaded (/usr/lib/systemd/system/slim.service; enabled)
Active: active (running) since Sat, 2012-11-03 03:38:58 CET
Main PID: 385 (slim)
CGroup: name=systemd:/system/slim.service
└ 389 /usr/bin/X -nolisten tcp vt07 -auth /var/run/slim.auth
‣ 385 /usr/bin/slim -nodaemon
Nov 03 02:43:45 hank slim[385]: OS Function symlink declared
Nov 03 02:43:45 hank slim[385]: OS Attempting to declare FFI truncate
Nov 03 02:43:45 hank slim[385]: OS Function truncate declared
Nov 03 02:43:45 hank slim[385]: OS Attempting to declare FFI unlink
Nov 03 02:43:45 hank slim[385]: OS Function unlink declared
Nov 03 02:43:45 hank slim[385]: OS Attempting to declare FFI write
Nov 03 02:43:45 hank slim[385]: OS Function write declared
Nov 03 02:43:45 hank slim[385]: OS Attempting to declare FFI pipe
Nov 03 02:43:45 hank slim[385]: OS Function pipe declared
Nov 03 02:44:00 hank slim[385]: NOTE: child process received `Goodbye', closing down
Note: I followed the instructions written on slim's wiki page, at https://wiki.archlinux.org/index.php/SLiM, where in the "Enabling SliM" is written:
If you use systemd, just enable slim.service. With systemd, it is no longer possible to enable slim using inittab.
Simply sending slim the TERM signal twice (or three times) hasn't failed for me yet, but it might for you, seeing as how you have an even easier time reproducing the hang.
# cp /usr/lib/systemd/system/slim.service /etc/systemd/system/
... and edit /etc/systemd/system/slim.service to add an ExecStop line:
=== slim.service
[Unit]
Description=SLiM Simple Login Manager
After=systemd-user-sessions.service
[Service]
ExecStart=/usr/bin/slim -nodaemon
ExecStop=/bin/kill $MAINPID ; /bin/kill $MAINPID
[Install]
Alias=display-manager.service
===
(watch the spaces around the " ; ", they're not optional). Then disable/enable the slim service to update the "display-manager" symlink:
# systemctl disable slim
# systemctl enable slim
After "systemctl daemon-reload" things should be consistent I believe -- but if the first reboot doesn't yet not hang, please try a second time.
Sending a SIGTERM is also systemd's default behaviour, and this just sends it twice, or three times if systemd finds that ExecStop hasn't in fact worked to stop slim yet. I've managed 10+ reboots without hang using this.
If this in fact works, it's still clearly a workaround since that would mean that slim were routinely ignoring SIGTERM, but I'm personally not all that interested in digging through the slim sources: I'm just waiting for LightDM to, finally, hit the official arch repos to do away with it anyway.
And if it doesn't in fact work... well, crap.
===
[Unit]
Description=SLiM Simple Login Manager
After=systemd-user-sessions.service
[Service]
ExecStart=-/usr/bin/slim -nodaemon
ExecStop=/bin/kill $MAINPID ; /bin/sh -c "[ -n \"$MAINPID\" ] && /bin/kill $MAINPID"
[Install]
Alias=display-manager.service
===
Also note the "-" before /usr/bin/slim in the ExecStart line. That seems to take care of convincing systemd that all is fine and well with slim (that is, not let it enter "failed state", although that's on reboot/shutdown no more than an aesthetic difference) even if you probaby also see it complaining about closing already closed log files and such...
That "-" is probably a good idea to add to the arch standard slim.service, but the additional kill's are of course quite horrible. It might tell someone how to fix slim itself though.
And -- yeah, works for me...
The watchdog is complaining that it is being shutdown without anyone having told it nicely that it would be from userland -- but it shouldn't do that if nobody ever told it ANYTHING from userland before, as will no doubt be the case for you, as it was for me since I don't use a watchdog. Watchdogs might come in handy for remotely administered machines, but I don't want my local single-user desktop machine to go reboot itself when userspace hasn't told it that it was still alive for a certain time...
As such, I've just completely blacklisted those module(s). If you also want to do that:
[rene@e600 ~]$ cat /etc/modprobe.d/iTCO_wdt.conf
blacklist iTCO_wdt
in my case, I had a second watchdog (combined with a sensor chip) that I've also completely disabled:
[rene@e600 ~]$ cat /etc/modprobe.d/fschmd.conf
blacklist fschmd
If I ever grow a pressing desire for sensors, I'll go figure out how to disable just the watchdog part of the fschmd chip, but for now it'll do. In your case, blacklisting the iTCO_wdt module is probably enough. As indicated, the other option is setting up your userspace to also actually use the bloody thing.
I wouldn't hold my breath on these changes to slim.service making it to the arch repos -- maybe unless the arch maintainer is ALSO just waiting for LightDM anyway. Although still a *relatively* clean solution, it's a fairly dumb one in the sense that this should not be fixed in the systemd service file but in slim itself.
I have been trying a bit to get to the core of this one, but this is proving a significant pain to debug. This shutdown/reboot-hang occurs quite far into the shutdown process, at a point where the system is for all intents and purposes down with file systems unmounted and all that. Stepping in at that point AND being able to do anything means stepping in relatively intrusively therefore -- but given that this problem is very timing-dependent, that means you disturb the actual situation you want to observe. Up to now, enough so as to not have it happen. A bona fide Heisenbug, so to speak.
However, while the in the comments above mentioned attempts at a work-around worked "most of the time" for me, during testing I seem to have arrived at a version which, again for me, works always. It's moreover non-intrusive, so I would like to suggest this following slim.service for the arch package:
=== slim.service
[Unit]
Description=SLiM Simple Login Manager
After=systemd-user-sessions.service
[Service]
ExecStart=/usr/bin/slim -nodaemon
#
# slim, in a timing-dependent manner, sometimes fails to exit on the first
# SIGTERM to its process group. Send it manually so as to have systemd sent
# another one automatically if it didn't work.
#
ExecStop=/bin/kill -TERM -${MAINPID}
[Install]
Alias=display-manager.service
===
Whereas the earlier attempts above send the SIGTERM (a few times) to the slim main process, this sends it to the slim process GROUP which is also the default systemd behavior -- and the only difference with having no ExecStop is that systemd will do it again when our manual one fails to bring it down (note that the -TERM is needed so as to not have /bin/kill interpret the -${MAINPID} as the signal).
This hasn't failed for me yet in dozens of reboots, and while it would be better to not need this, I have as said been unable to get slim to behave directly. I did move a few things concerning signal handling around in the sources (app.cpp) but it didn't help, and a more structured approach to debugging was frustrated in the above mentioned manner.
So...
To anyone new to this report: you use the above slim.service file by placing it in /etc/systemd/system/ and disabling/enabling slim to update the symlink: "systemctl disable slim && systemctl enable slim"
As said, the issue is quite timing-dependent and I can as such not promise that this working for me means that it works for anyone -- but if a few others would test this and find it working (over a few days) I would suggest this for the arch package. After looking at the sources I am even more convinced than before that slim's future should be quite limited anyway. Feel free to disagree -- but then you also promise to fix this right ;-)
And now I'm not going to reboot for at least a week, even if only to get back at the fucking thing...
https://bugs.archlinux.org/task/26579
(including after pinging Roman through email) which seems to say that other than "broken" slim is also "unmaintained" in Arch Linux.
As to alternatives... LightDM is not in the official repositories, meaning it's not part of automatic upgrades, which I'm not fond of for system-security related software -- and doesn't in fact at the moment work right either. No shutdown/reboot from the (GTK) greeter (which probably has something to do with ConsoleKit dependencies) and no possibility to set a background -- which probably just has something to do with the fact that any piece of Linux software always has to be broken in at least SOME trivial or non-trivial way as a matter of Linux Standards compliance.
GDM drags in three quarters of the [extra] repository, from window-managers to sound-servers, so LXDM it is I guess. I switched to it before back when slim's communication with ConsoleKit gave me trouble, but at the time it wasn't much fun either. Right now it seems to be giving me no trouble. Which is basically all I want from a display manager anyway.
So... <shrug>
[Service]
ExecStart=/usr/bin/slim -nodaemon
ExecStop=/usr/bin/killall /usr/bin/slim
KillMode=none
not sure if i am getting it right but without above lines systemd seems to kill X first, next Xfce during its shut down requires? X which at this stage is already gone (i saw messages like xfsettingsd cant connect to display at :0.0 when i tried to shut down slim using systemd manually) - so asking systemd not to kill slim but allowing it to execute its own shuting down routine seems to work
uname -a: Linux XXX 3.7.5-1-ARCH #1 SMP PREEMPT Mon Jan 28 10:03:32 CET 2013 x86_64 GNU/Linux
You can read in the log very clearly that the SLIM service is at fault here.
Feb 01 00:45:13 arch64jon systemd-logind[380]: System is powering down.
Feb 01 00:45:13 arch64jon systemd[1]: Deactivating swap /dev/sda3...
Feb 01 00:45:13 arch64jon slim[400]: gnome-session[471]: CRITICAL: gsm_manager_set_phase: assertion `GSM_IS_MANAGER (manager)' failed
Feb 01 00:45:13 arch64jon slim[400]: gnome-session[471]: Gtk-CRITICAL: gtk_main_quit: assertion `main_loops != NULL' failed
Feb 01 00:45:13 arch64jon gnome-session[471]: CRITICAL: gsm_manager_set_phase: assertion `GSM_IS_MANAGER (manager)' failed
Feb 01 00:45:13 arch64jon gnome-session[471]: Gtk-CRITICAL: gtk_main_quit: assertion `main_loops != NULL' failed
Feb 01 00:45:13 arch64jon slim[400]: pam_unix(slim:session): session closed for user jonathan
Feb 01 00:45:13 arch64jon systemd[1]: Deactivating swap /dev/sda3...
Feb 01 00:45:13 arch64jon systemd[1]: Deactivating swap /dev/sda3...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Sound Card.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped target Sound Card.
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Accounts Service...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Disk Manager...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Daemon for power management...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Authorization Manager...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping RealtimeKit Scheduling Policy Service...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Manage, Install and Generate Color Profiles...
Feb 01 00:45:13 arch64jon systemd[1]: Stopping CUPS Printing Service...
Feb 01 00:45:13 arch64jon slim[400]: g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Periodic Command Scheduler.
Feb 01 00:45:13 arch64jon slim[400]: applet.py: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.0.
Feb 01 00:45:13 arch64jon slim[400]: Window manager warning: Log level 16: gnome-shell: Fatal IO error 0 (Success) on X server :0.0.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped D-Bus System Message Bus.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Login Service.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Getty on tty1.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped CUPS Printing Service.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Manage, Install and Generate Color Profiles.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Samba SMB/CIFS server.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped RealtimeKit Scheduling Policy Service.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Authorization Manager.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Daemon for power management.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Disk Manager.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Accounts Service.
Feb 01 00:45:13 arch64jon systemd[1]: Deactivated swap /dev/sda3.
Feb 01 00:45:13 arch64jon systemd[1]: Started Store Sound Card State.
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Samba NetBIOS name server...
Feb 01 00:45:13 arch64jon systemd[1]: Deactivated swap /dev/disk/by-id/wwn-0x5002538043584d30-part3.
Feb 01 00:45:13 arch64jon systemd[1]: Deactivated swap /dev/disk/by-id/ata-SAMSUNG_SSD_830_Series_S0VYNYAC118786-part3.
Feb 01 00:45:13 arch64jon systemd[1]: Deactivated swap /dev/disk/by-uuid/ce4ba251-c72d-4d80-8adb-b68c3f8c09da.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Samba NetBIOS name server.
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Network.
Feb 01 00:45:13 arch64jon systemd[1]: Stopped target Network.
Feb 01 00:45:13 arch64jon systemd[1]: Stopping Netcfg networking service for profile ethernet-static...
Feb 01 00:45:13 arch64jon slim[400]: Server terminated successfully (0). Closing log file.
Feb 01 00:45:13 arch64jon netcfg[1705]: :: ethernet-static down [done]
Feb 01 00:45:13 arch64jon systemd[1]: Stopped Netcfg networking service for profile ethernet-static.
Feb 01 00:46:43 arch64jon systemd[1]: slim.service stopping timed out. Killing.
Feb 01 00:46:43 arch64jon systemd[1]: slim.service: main process exited, code=killed, status=9/KILL
Feb 01 00:46:43 arch64jon systemd[1]: Stopped SLiM Simple Login Manager.
Feb 01 00:46:43 arch64jon systemd[1]: Unit slim.service entered failed state
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Permit User Sessions...
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Permit User Sessions.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Basic System.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Basic System.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Dispatch Password Requests to Console Directory Watch.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Dispatch Password Requests to Console Directory Watch.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping CUPS Printer Service Spool.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped CUPS Printer Service Spool.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Forward Password Requests to Wall Directory Watch.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Forward Password Requests to Wall Directory Watch.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Daily Cleanup of Temporary Directories.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Sockets.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Sockets.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping CUPS Printing Service Sockets.
Feb 01 00:46:43 arch64jon systemd[1]: Closed CUPS Printing Service Sockets.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping D-Bus System Message Bus Socket.
Feb 01 00:46:43 arch64jon systemd[1]: Closed D-Bus System Message Bus Socket.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Encrypted Volumes.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Setup Virtual Console.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Apply Kernel Variables.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Load Kernel Modules...
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Load Kernel Modules.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Set Up Additional Binary Formats.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Swap.
Feb 01 00:46:43 arch64jon systemd[1]: Unmounting /home...
Feb 01 00:46:43 arch64jon systemd[1]: Unmounting Temporary Directory...
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Remote File Systems.
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Remote File Systems.
Feb 01 00:46:43 arch64jon systemd[1]: Unmounted /boot.
Feb 01 00:46:43 arch64jon systemd[1]: Unmounted Temporary Directory.
Feb 01 00:46:43 arch64jon systemd[1]: Unmounted /home.
Feb 01 00:46:43 arch64jon systemd[1]: Starting Unmount All Filesystems.
Feb 01 00:46:43 arch64jon systemd[1]: Reached target Unmount All Filesystems.
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Local File Systems (Pre).
Feb 01 00:46:43 arch64jon systemd[1]: Stopped target Local File Systems (Pre).
Feb 01 00:46:43 arch64jon systemd[1]: Stopping Remount Root and Kernel File Systems...
Feb 01 00:46:43 arch64jon systemd[1]: Stopped Remount Root and Kernel File Systems.
Feb 01 00:46:43 arch64jon systemd[1]: Starting Shutdown.
Feb 01 00:46:43 arch64jon systemd[1]: Reached target Shutdown.
Feb 01 00:46:43 arch64jon systemd[1]: Starting Save Random Seed...
Feb 01 00:46:43 arch64jon systemd[1]: Starting Update UTMP about System Shutdown...
Feb 01 00:46:43 arch64jon systemd[1]: Started Save Random Seed.
Feb 01 00:46:43 arch64jon systemd[1]: Started Update UTMP about System Shutdown.
Feb 01 00:46:43 arch64jon systemd[1]: Starting Final Step.
Feb 01 00:46:43 arch64jon systemd[1]: Reached target Final Step.
Feb 01 00:46:43 arch64jon systemd[1]: Starting Power-Off...
Feb 01 00:46:43 arch64jon systemd[1]: Shutting down.
Feb 01 00:46:43 arch64jon systemd-journal[204]: Journal stopped
-- Reboot --
https://bugs.archlinux.org/task/32380#comment103886
or
https://bugs.archlinux.org/task/32380#comment104594
?
If yes, just give up on slim. It's an obsolete piece of software that's broken in various ways half of the time, and unmaintained three-quarters of it. LXDM has become mostly usable.
(and to whomever it may concern: it is ridiculously unhelpful that this bugzilla doesn't number comments for the sake of back-references)
Can you please try adding IgnoreSIGPIPE=no to the [Service] section of /usr/lib/systemd/system/slim.service?
Similar to Fedora's slim.service file: http://pkgs.fedoraproject.org/cgit/slim.git/plain/slim.service
Let me know if that improves the situation.
(Even though I have had slim hang on me 2-3 times in the past, I can't reproduce the issue often enough to test.)
I tried what you did and then restarted the system --> hung again.
After the reboot and thus with the modified slim service file I rebooted again --> hung again.
It is more a rule than an exception here that the system hangs at shutdown / reboot, so I might just switch to another DM when SLIM is not actively developed anymore.
TimeoutStopSec=5s
to the Service section of
/usr/lib/systemd/system/slim.service
I personally like SLIM. It is not bloated like other DMs and enables me
to do just what I want (a nice picture with a simple greeting and no frigging menus etc).
Also, there are commits every month, so it is developed.