FS#46387 - [gdm] gdm doesn't start after update to xorg-xinit 1.3.4-3, dbus-1.10.0.3 and libdbus-1.10.0.3

Attached to Project: Arch Linux
Opened by Ricardo Funke Ormieres (ricardofunke) - Tuesday, 22 September 2015, 02:31 GMT
Last edited by Jan Alexander Steffens (heftig) - Wednesday, 20 January 2016, 02:27 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Jan de Groot (JGC)
Jan Alexander Steffens (heftig)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 2
Private No

Details

Description:

After update to xorg-xinit-1.3.4-3, dbus-1.10.0.3 and libdbus-1.10.0.3, gdm shows the "something went wrong" screen.

Downgrading to previous version of dbus and libdbus fixes.

There's a workaround using last versions of dbus and libdbus, just enter a terminal (ctrl+alt+f2) as root and stop gdm (systemctl stop gdm) wait some seconds and start it again (systemctl start gdm)


Additional info:
* package version(s)
gdm 3.16.3-1
libgdm 3.16.3-1

dbus 1.10.0-3
libdbus 1.10.0-3

xorg-xinit 1.3.4-3

* config and/or log files etc.


Steps to reproduce:

1) Enable gdm on boot (systemctl enable gdm)
2) install xorg-xinit 1.3.4-3, dbus-1.10.0.3 and libdbus-1.10.0.3
3) Reboot arch linux
This task depends upon

Closed by  Jan Alexander Steffens (heftig)
Wednesday, 20 January 2016, 02:27 GMT
Reason for closing:  Fixed
Comment by Alex Seiler (aexl) - Tuesday, 22 September 2015, 04:38 GMT
I think this problem only occurs, if the graphics driver does not support to run a wayland session. It seems that gdm is trying to run a wayland session per default. Uncommenting the line

WaylandEnable=false

in /etc/gdm/custom.conf seems to solve the problem.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 22 September 2015, 18:41 GMT
No, GDM with either Wayland or Xorg greeter works for me.
Comment by Jan Alexander Steffens (heftig) - Tuesday, 22 September 2015, 18:56 GMT
Hm, it's possible that the failing Wayland greeter causes systemd --user to start and stop, and the following Xorg greeter immediately tries to start it again, but hits this bug:

https://bugs.freedesktop.org/show_bug.cgi?id=89145
Comment by Jan Alexander Steffens (heftig) - Tuesday, 22 September 2015, 19:05 GMT
Can you find the "Failed at step CGROUP spawning /usr/lib/systemd/systemd: No such file or directory" message in your journal?
Comment by Ricardo Funke Ormieres (ricardofunke) - Tuesday, 22 September 2015, 19:20 GMT
No, I couldn't find this message with "sudo journalctl" command.
Comment by David Rheinsberg (dvdhrm) - Tuesday, 22 September 2015, 23:12 GMT
Can somebody post a full boot-log of this failure? (`journalctl -b >boot.log`)
Comment by Ricardo Funke Ormieres (ricardofunke) - Wednesday, 23 September 2015, 02:16 GMT
Hi David,

journalctl-boot.log: I just reboot and waited for the "something wrong..." message
journalctl-boot-and-gdm-restart-twice: then I tried to restart gdm and got the error message again, so I tried again and it worked

Sorry for being late.

Thanks
Comment by David Rheinsberg (dvdhrm) - Wednesday, 23 September 2015, 09:12 GMT
So gnome-shell crashes during startup in your setup. I cannot tell why, but it does:

Set 22 23:08:29 funkenote kernel: gnome-shell[659]: segfault at 18 ip 00007fc5c125c5de sp 00007ffc884c9f20 error 4 in libcogl.so.20.3.0[7fc5c11e9000+a0000]

Can you have a look whether `coredumpctl` lists a coredump of gnome-shell or mutter at the bottom? If yes, you can attach to it via `coredumptctl gdb -1` (-1 meaning the last known coredump) and print a backtrace via "backtrace" in gdb (exit it via "quit").
Comment by Jan Alexander Steffens (heftig) - Wednesday, 23 September 2015, 09:21 GMT
You may have to recompile several packages (with OPTIONS=(... debug !strip ...) in makepkg.conf) to get useful traces.

Before the start that worked (which is the one with the gnome-shell crash), the failed starts of GDM seem to have a race in systemd, though, as I suspected:

Set 22 23:08:29 funkenote systemd[1]: Stopped User Manager for UID 120.
Set 22 23:08:29 funkenote systemd[1]: Started Session c2 of user gdm.

It seems that while pam_systemd is setting up the second session, the user manager is getting stopped or about to be stopped, and pam_systemd doesn't restart it.
Comment by Ricardo Funke Ormieres (ricardofunke) - Wednesday, 23 September 2015, 16:33 GMT
Hi David,

I'm attaching the result from the "coredumpctl gdb 659" as coredumpctl.out

Hope it helps.

Best regards
Comment by Ricardo Funke Ormieres (ricardofunke) - Wednesday, 23 September 2015, 16:39 GMT
Also, maybe it helps too:

[ricardofunke@funkenote ~]$ lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C210 Series Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C210 Series Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.2 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 3 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C210 Series Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Thames [Radeon HD 7500M/7600M Series] (rev ff)
07:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
08:00.0 Network controller: Qualcomm Atheros AR9485 Wireless Network Adapter (rev 01)
[ricardofunke@funkenote ~]$
Comment by David Rheinsberg (dvdhrm) - Wednesday, 23 September 2015, 16:57 GMT
The bug in gnome-shell I cannot help you with. However, it does indeed look like the fallback does not work. Looking at systemd-logind, it looks like starting a session on a "closing" user is currently buggy.
Comment by Ricardo Funke Ormieres (ricardofunke) - Wednesday, 23 September 2015, 17:27 GMT
Hi David,

Do you think I should open a bug in the systemd-logind package (or upstream) then?
Comment by Ricardo Funke Ormieres (ricardofunke) - Friday, 25 September 2015, 02:58 GMT
Adding a new information, the problem seems to only happen on a reboot command, when I do a total shutdown and start the system again (pressing the physical power button on the laptop), it works in the first try always apparently.
Comment by David Rheinsberg (dvdhrm) - Tuesday, 29 September 2015, 10:11 GMT
I'm working on fixing it in systemd-logind. I cleaned up some internal stuff, but the issue is still not fully resolved [1]. I am on it!

Note that there're multiple underlying issues here: First of all, systemd-logind has issues if you logout and login to fast (before the previous logout was fully handled). I'm fixing this issue. Another issue is Arch enabled the user-bus by default, which means you will *notice* if systemd-logind fails. The last issue is, Arch enabled Wayland sessions by default.

The first issue is a real bug and needs fixing. The other 2 issues are just conincidences that make this bug appear. Mutter seems to fail on your machine with Wayland enabled (probably related to your graphics driver, which cannot handle state recovery after non-coldboot), and hence falls back to Xorg login. This, however, caused a fast logout+login by the 'gdm' user, which again made systemd-logind fall over. Now, the second 'gdm' login is not properly tracked, thus `systemd --user` is not started, and thus the dbus-daemon is not started as user-bus (or incorrectly tracked, to be more precise). Eventually, gdm trips over missing dbus connections due to forced runtime-dir cleanup.

There're several workarounds here:
- disable Wayland session: select 'xorg/custom' in gdm session-list
- disable the user-bus for now: `systemctl --global disable dbus.socket`

Neither of these are particularly nice. I recommend the first one for now. I will let you know once logind is fully fixed. Thanks for your patience!

[1] https://github.com/systemd/systemd/pull/1402
Comment by Ricardo Funke Ormieres (ricardofunke) - Tuesday, 29 September 2015, 13:15 GMT
That's great David! Thanks for your efforts on it!
Comment by Christian Hesse (eworm) - Tuesday, 22 December 2015, 20:21 GMT
Looks like fixes have been committed [0] and should be present in systemd v228. Is this issue fixed now?

[0] https://github.com/systemd/systemd/pull/1917
Comment by Ricardo Funke Ormieres (ricardofunke) - Wednesday, 20 January 2016, 02:04 GMT
Sorry, fixed!

Loading...