FS#42147 - [systemd] gdm and logind showing intermittent timeouts

Attached to Project: Arch Linux
Opened by François-Xavier Thomas (frnx) - Saturday, 27 September 2014, 21:33 GMT
Last edited by Dave Reisner (falconindy) - Sunday, 22 March 2015, 15:07 GMT
Task Type Bug Report
Category System
Status Closed
Assigned To Thomas Bächler (brain0)
Dave Reisner (falconindy)
Tom Gundersen (tomegun)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

I have been experiencing occasional graphical boot failures with GDM refusing to start for a couple of weeks.

After investigating, here is a suspicious message that shows in the journal early in the boot sequence :

Sep 27 22:17:57 cirrus systemd[1]: Failed to register name: Connection timed out
Sep 27 22:17:57 cirrus systemd[1]: Failed to set up API bus: Connection timed out

After this message, some services refuse to operate as usual, among others GDM, systemd-logind, Avahi and Bluetooth. All show those "Connection timed out" messages at some point and fail to start properly.

Trying to login on a VT console results in a strange ~30s timeout that doesn't appear on normal boot, here is the corresponding journal output :

Sep 27 22:23:04 cirrus login[567]: pam_unix(login:session): session opened for user fx by LOGIN(uid=0)
Sep 27 22:23:04 cirrus systemd[1]: Starting Session c3 of user fx.
Sep 27 22:23:04 cirrus systemd[1]: Started Session c3 of user fx.
Sep 27 22:23:04 cirrus systemd-logind[394]: New session c3 of user fx.
Sep 27 22:23:29 cirrus login[567]: pam_systemd(login:session): Failed to create session: Connection timed out
Sep 27 22:23:29 cirrus login[567]: LOGIN ON tty3 BY fx

Similar error messages in the journal make it seem related to #39207 (a few months old, closed) and #40056 (tagged gdm, which I believe isn't the culprit, hence this new report).

Attached is the complete log from my last failed boot.
This task depends upon

Closed by  Dave Reisner (falconindy)
Sunday, 22 March 2015, 15:07 GMT
Reason for closing:  No response
Additional comments about closing:  Likely not to be a packaging bug. Please raise a bug with upstream if you're able to reproduce with recent systemd in a cleaner environment.
Comment by Dave Reisner (falconindy) - Saturday, 27 September 2014, 21:41 GMT
Seems like dbus is the commonality here. Regardless, your system looks like a bit of a mess. I see the following in your logs:

Sep 27 22:17:26 cirrus systemd[1]: [1;39mCannot add dependency job for unit arch-modules-load.service, ignoring: Invalid argument

This service went away almost 2 years ago.
Comment by François-Xavier Thomas (frnx) - Saturday, 27 September 2014, 21:53 GMT
That's what I thought, but systemctl status dbus didn't show anything weird. Is that "API bus" mentioned in the logs another software bus I'm not aware of?

And, true, my Arch had some moldy, dusty corners after 5 years of use. Some symlinks from previous upgrades were left in place and failed silently without affecting the system, which I didn't find out before browsing the logs. They're gone now ;)
Comment by Dave Reisner (falconindy) - Saturday, 27 September 2014, 22:03 GMT
> Is that "API bus" mentioned in the logs another software bus I'm not aware of?
This error comes from PID 1 subscribing to some dbus signals and requesting a name on the bus (org.freedesktop.systemd1). The "Failed to register name" error is the specific failure that leads to the bus API error. However, this again points at the system dbus-daemon as misbehaving.

I also see this:

Sep 27 22:18:46 cirrus dbus[371]: [1;39m[system] Successfully activated service 'org.freedesktop.ConsoleKit'[0m

If you really do still have consolekit installed, I'd strongly suggest removing it.
Comment by François-Xavier Thomas (frnx) - Saturday, 27 September 2014, 22:14 GMT
> This error comes from PID 1 subscribing to some dbus signals and requesting a name on the bus (org.freedesktop.systemd1).

Ah, that's what it was. Thanks, I'm not very familiar with how systemd boots, dbus used to start much later when it was managed by sysvinit. I'll keep looking and keep you posted if this happens again.

> If you really do still have consolekit installed, I'd strongly suggest removing it.

Done, hadn't noticed this wasn't useful anymore with systemd!
Comment by François-Xavier Thomas (frnx) - Tuesday, 30 September 2014, 23:15 GMT
Here is a second log from a boot having this issue. In both cases systemd tries and fails to communicate with dbus right after the "BNEP socket layer" is initialized.
Comment by Dave Reisner (falconindy) - Tuesday, 30 September 2014, 23:26 GMT
> In both cases systemd tries and fails to communicate with dbus right after the "BNEP socket layer" is initialized.
No, the timeout occurs at that point. The call is fired asynchronously 30 seconds earlier. Same story as your first log. debug logs would be rather helpful...

Also, more garbage:

Sep 30 20:01:33 cirrus systemd[1]: [1;39mCannot add dependency job for unit arch-daemons.target, ignoring: Unit arch-daemons.target failed to load: No such file or directory.[0m

Could you please do a proper cleaning of your /etc? In fact, could you disable everything and see if you still experience the connection timeouts?
Comment by Dave Reisner (falconindy) - Saturday, 13 December 2014, 15:13 GMT
Is this still a problem with systemd 218?

Loading...