FS#58001 - [systemd] Xorg crashes after suspend because it cannot find systemd-logind

Attached to Project: Arch Linux
Opened by Nico Schottelius (telmich) - Wednesday, 28 March 2018, 11:19 GMT
Last edited by Christian Hesse (eworm) - Monday, 09 April 2018, 11:57 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Dave Reisner (falconindy)
Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

xorg crashes (i.e. prints error log and exits) after suspending the notebook.
The last error lines before it dies are:

[ 7299.695] (EE)
[ 7299.695] (EE) systemd-logind disappeared (stopped/restarted?)
[ 7299.695] (EE)
[ 7299.695] (EE)
[ 7299.695] (EE) Please also check the log file at "/home/nico/.local/share/xorg/Xorg.0.log" for additional information.
[ 7299.695] (EE)
[ 7300.022] (EE) Server terminated with error (1). Closing log file.


Additional info:
* package version(s)

[13:13] line:~% pacman -Q | grep -e ^xorg -e ^systemd
systemd 238.51-1
systemd-sysvcompat 238.51-1
xorg-appres 1.0.5-1
xorg-bdftopcf 1.1-1
xorg-font-util 1.3.1-1
xorg-font-utils 7.6-4
xorg-fonts-alias 1.0.3-1
xorg-fonts-encodings 1.0.4-4
xorg-fonts-misc 1.0.3-5
xorg-luit 1.1.1-2
xorg-mkfontdir 1.0.7-8
xorg-mkfontscale 1.1.3-1
xorg-server 1.19.6+13+gd0d1a694f-1
xorg-server-common 1.19.6+13+gd0d1a694f-1
xorg-server-xvfb 1.19.6+13+gd0d1a694f-1
xorg-setxkbmap 1.3.1-1
xorg-xauth 1.0.10-1
xorg-xbacklight 1.2.2-1
xorg-xdpyinfo 1.3.2-1
xorg-xev 1.2.2-1
xorg-xinit 1.4.0-3
xorg-xinput 1.6.2-1
xorg-xkbcomp 1.4.1-1
xorg-xkill 1.0.5-1
xorg-xmodmap 1.0.9-1
xorg-xrandr 1.5.0-1
xorg-xrdb 1.1.1-1
xorg-xset 1.2.4-1
xorg-xwd 1.0.7-1
xorg-xwininfo 1.1.4-1
xorgproto 2018.4-1
[13:14] line:~%

* config and/or log files etc.

[13:16] line:~% uname -a
Linux line 4.15.13-1-ARCH #1 SMP PREEMPT Sun Mar 25 11:27:57 UTC 2018 x86_64 GNU/Linux

[13:17] line:~% lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 08)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Device 9d4e (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (4) I219-LM (rev 21)
02:00.0 Network controller: Intel Corporation Wireless 8265 / 8275 (rev 78)
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a808
[13:17] line:~%


Steps to reproduce:

- startx
- suspend
- resume

=> This does **not** fail on the Dell Latitude 5285 with an almost 100% identical system.
The problem occurs however on the Lenovo X1 Carbon (2018).
This task depends upon

Closed by  Christian Hesse (eworm)
Monday, 09 April 2018, 11:57 GMT
Reason for closing:  Works for me
Additional comments about closing:  Reported mode on...
Comment by Jan de Groot (JGC) - Wednesday, 28 March 2018, 19:32 GMT
This is a bug in logind. Xorg can't do anything else than crash when logind disappears. If it didn't throw an exception it would crash anyways because logind is required for device access.
Comment by Nico Schottelius (telmich) - Wednesday, 28 March 2018, 20:09 GMT
You seem to be very right, I see this in journalctl:

Mar 28 20:41:56 line systemd[1]: systemd-logind.service: Killing process 26088 (systemd-logind) with signal SIGABRT.
Mar 28 20:41:56 line dhcpcd[833]: wlp2s0: carrier lost
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Watchdog timeout (limit 3min)!
Mar 28 20:41:56 line dhcpcd[833]: wlp2s0: deleting address fe80::6f43:174b:3c61:8314
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Killing process 5697 (systemd-udevd) with signal SIGABRT.
Mar 28 20:41:56 line dhcpcd[833]: wlp2s0: deleting route to 192.168.43.0/24
Mar 28 20:41:56 line systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Mar 28 20:41:56 line dhcpcd[833]: wlp2s0: deleting default route via 192.168.43.1
Mar 28 20:41:56 line systemd[1]: systemd-journald.service: Killing process 5691 (systemd-journal) with signal SIGABRT.
Mar 28 20:41:56 line systemd-sleep[9700]: System resumed.
Mar 28 20:41:56 line systemd[1]: Starting Flush Journal to Persistent Storage...
Mar 28 20:41:56 line kernel: Process accounting resumed
Mar 28 20:41:56 line systemd[1]: Started Flush Journal to Persistent Storage.
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Main process exited, code=dumped, status=6/ABRT
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Failed with result 'watchdog'.
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Service has no hold-off time, scheduling restart.
Mar 28 20:41:56 line systemd[1]: systemd-udevd.service: Scheduled restart job, restart counter is at 2.
Mar 28 20:41:56 line systemd[1]: Stopped udev Kernel Device Manager.
Mar 28 20:41:56 line systemd[1]: Starting udev Kernel Device Manager...
Mar 28 20:41:56 line systemd-coredump[9814]: Process 5697 (systemd-udevd) of user 0 dumped core.

Stack trace of thread 5697:
#0 0x00007f294e3a0f90 epoll_pwait (libc.so.6)
#1 0x00007f294df1e72c sd_event_wait (libsystemd-shared-238.so)
#2 0x00007f294df1f53c sd_event_run (libsystemd-shared-238.so)
#3 0x00007f294df1f6ec sd_event_loop (libsystemd-shared-238.so)
#4 0x000055590ee8b746 n/a (systemd-udevd)
#5 0x00007f294e2cbf4a __libc_start_main (libc.so.6)
#6 0x000055590ee8b87a n/a (systemd-udevd)
Mar 28 20:41:56 line systemd[1]: Started udev Kernel Device Manager.
Mar 28 20:41:56 line systemd[1]: systemd-logind.service: Main process exited, code=dumped, status=6/ABRT
Mar 28 20:41:56 line systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Mar 28 20:41:56 line systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Mar 28 20:41:56 line systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 4.
Mar 28 20:41:56 line systemd[1]: Stopped Login Service.
Mar 28 20:41:56 line systemd[1]: Starting Login Service...
Mar 28 20:41:56 line systemd-coredump[9810]: Process 26088 (systemd-logind) of user 0 dumped core.

Stack trace of thread 26088:
#0 0x00007f264bed9f90 epoll_pwait (libc.so.6)
#1 0x00007f264ba5772c sd_event_wait (libsystemd-shared-238.so)
#2 0x00007f264ba5853c sd_event_run (libsystemd-shared-238.so)
#3 0x000055e88ebd3baf n/a (systemd-logind)
#4 0x00007f264be04f4a __libc_start_main (libc.so.6)
#5 0x000055e88ebd5f7a n/a (systemd-logind)
Comment by Nico Schottelius (telmich) - Thursday, 29 March 2018, 15:36 GMT
I just reinstalled all packages, as this notebook is a 1:1 clone (rsync) of my previous notebook to ensure there was no package corruption.

Note: sometimes I am able to suspend 2-3 times prior to a crash.

Latest log after last crash:


Mar 29 14:00:21 line systemd-logind[623]: Lid closed.
Mar 29 14:00:21 line systemd-logind[623]: Suspending...
Mar 29 14:00:21 line systemd[1]: Reached target Sleep.
Mar 29 14:00:21 line systemd[1]: Starting Suspend...
Mar 29 14:00:21 line systemd-sleep[3866]: Suspending system...
Mar 29 14:00:21 line kernel: PM: suspend entry (s2idle)
Mar 29 14:00:21 line kernel: PM: Syncing filesystems ... done.
Mar 29 15:06:17 line kernel: Freezing user space processes ... (elapsed 0.002 seconds) done.
Mar 29 15:06:17 line kernel: OOM killer disabled.
Mar 29 15:06:17 line kernel: Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Mar 29 15:06:17 line kernel: Suspending console(s) (use no_console_suspend to debug)
Mar 29 15:06:17 line kernel: wlp2s0: deauthenticating from 34:2d:0d:ce:9a:d4 by local choice (Reason: 3=DEAUTH_LEAVING)
Mar 29 15:06:17 line kernel: wlp2s0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-22)
Mar 29 15:06:17 line kernel: e1000e: EEE TX LPI TIMER: 00000011
Mar 29 15:06:17 line kernel: acpi INT3400:00: Unsupported event [0x86]
Mar 29 15:06:17 line kernel: thinkpad_acpi: unknown possible thermal alarm or keyboard event received
Mar 29 15:06:17 line kernel: thinkpad_acpi: unhandled HKEY event 0x6032
Mar 29 15:06:17 line kernel: thinkpad_acpi: please report the conditions when this event happened to ibm-acpi-devel@lists.sourceforge.net
Mar 29 15:06:17 line kernel: acpi INT3400:00: Unsupported event [0x86]
Mar 29 15:06:17 line kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Mar 29 15:06:17 line kernel: nvme nvme0: Shutdown timeout set to 8 seconds
Mar 29 15:06:17 line kernel: psmouse serio1: synaptics: queried max coordinates: x [..5676], y [..4760]
Mar 29 15:06:17 line kernel: psmouse serio1: synaptics: queried min coordinates: x [1266..], y [1094..]
Mar 29 15:06:17 line kernel: OOM killer enabled.
Mar 29 15:06:17 line kernel: Restarting tasks ... done.
Mar 29 15:06:17 line kernel: PM: suspend exit
Mar 29 15:06:17 line systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Mar 29 15:06:17 line systemd[1]: Started Process Core Dump (PID 3980/UID 0).
Mar 29 15:06:17 line systemd[1]: Started Atop advanced performance monitor.
Mar 29 15:06:17 line systemd[1]: Started Suspend.
Mar 29 15:06:17 line systemd[1]: sleep.target: Unit not needed anymore. Stopping.
Mar 29 15:06:17 line systemd[1]: Stopped target Sleep.
Mar 29 15:06:17 line systemd[1]: Reached target Suspend.
Mar 29 15:06:17 line systemd[1]: suspend.target: Unit not needed anymore. Stopping.
Mar 29 15:06:17 line systemd[1]: Stopped target Suspend.
Mar 29 15:06:17 line kernel: Process accounting resumed
Mar 29 15:06:17 line systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
Mar 29 15:06:17 line systemd-coredump[3985]: MESSAGE=Process 322 (systemd-journal) of user 0 dumped core.
Mar 29 15:06:17 line systemd-coredump[3985]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.326836e1de124c28929ec3ccc1a056c3.322.1522328776000000.lz4
Mar 29 15:06:17 line systemd-coredump[3985]: Stack trace of thread 322:
Mar 29 15:06:17 line systemd-coredump[3985]: #0 0x00007f5c1b5b7919 pthread_sigmask (libpthread.so.0)
Mar 29 15:06:17 line systemd-coredump[3985]: #1 0x00007f5c1ae586bb journal_file_set_offline (libsystemd-shared-238.so)
Mar 29 15:06:17 line systemd-coredump[3985]: #2 0x000055ec1a090d13 n/a (systemd-journald)
Mar 29 15:06:17 line systemd-coredump[3985]: #3 0x000055ec1a090ff1 n/a (systemd-journald)
Mar 29 15:06:17 line systemd-coredump[3985]: #4 0x00007f5c1ae64c3e n/a (libsystemd-shared-238.so)
Mar 29 15:06:17 line systemd-coredump[3985]: #5 0x00007f5c1ae64ebb sd_event_dispatch (libsystemd-shared-238.so)
Mar 29 15:06:17 line systemd-coredump[3985]: #6 0x00007f5c1ae664a0 sd_event_run (libsystemd-shared-238.so)
Mar 29 15:06:17 line kernel: systemd-coredum: 6 output lines suppressed due to ratelimiting
Mar 29 15:06:17 line systemd-journald[4054]: Journal started
Mar 29 15:06:17 line systemd-journald[4054]: System journal (/var/log/journal/7ec2c41eb4dc4dc4acea4d218fe831bf) is 152.0M, max 4.0G, 3.8G free.
Mar 29 15:06:17 line wpa_supplicant[618]: wlp2s0: CTRL-EVENT-DISCONNECTED bssid=34:2d:0d:ce:9a:d4 reason=3 locally_generated=1
Mar 29 15:06:16 line systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Mar 29 15:06:16 line dhcpcd[846]: wlp2s0: failed to renew DHCP, rebinding
Mar 29 15:06:16 line systemd[1]: systemd-logind.service: Killing process 623 (systemd-logind) with signal SIGABRT.
Mar 29 15:06:16 line dhcpcd[846]: wlp2s0: DHCP lease expired
Mar 29 15:06:16 line systemd[1]: systemd-journald.service: Watchdog timeout (limit 3min)!
Mar 29 15:06:16 line miredo[857]: Too much time drift. Resynchronizing.
Mar 29 15:06:16 line systemd[1]: systemd-journald.service: Killing process 322 (systemd-journal) with signal SIGABRT.
Mar 29 15:06:16 line dhcpcd[846]: wlp2s0: deleting route to 192.168.43.0/24
Mar 29 15:06:16 line systemd-sleep[3866]: System resumed.
Mar 29 15:06:16 line dhcpcd[846]: wlp2s0: deleting default route via 192.168.43.1
Mar 29 15:06:17 line dhcpcd[846]: wlp2s0: soliciting a DHCP lease
Mar 29 15:06:17 line dhcpcd[846]: wlp2s0: carrier lost
Mar 29 15:06:17 line dhcpcd[846]: wlp2s0: deleting address fe80::5a1d:ca51:4dda:920f
Mar 29 15:06:17 line systemd[1]: Starting Flush Journal to Persistent Storage...
Mar 29 15:06:17 line systemd[1]: Started Flush Journal to Persistent Storage.
Mar 29 15:06:17 line kernel: [drm] RC6 on
Mar 29 15:06:17 line systemd[1]: systemd-logind.service: Main process exited, code=dumped, status=6/ABRT
Mar 29 15:06:17 line systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Mar 29 15:06:17 line systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Mar 29 15:06:17 line systemd[1]: systemd-logind.service: Scheduled restart job, restart counter is at 1.
Mar 29 15:06:17 line systemd[1]: Stopped Login Service.
Mar 29 15:06:17 line systemd[1]: Starting Login Service...
Mar 29 15:06:17 line systemd-coredump[4002]: Process 623 (systemd-logind) of user 0 dumped core.

Stack trace of thread 623:
#0 0x00007febb8d54f90 epoll_pwait (libc.so.6)
#1 0x00007febb88d272c sd_event_wait (libsystemd-shared-238.so)
#2 0x00007febb88d353c sd_event_run (libsystemd-shared-238.so)
#3 0x000055cb2cfa9baf n/a (systemd-logind)
#4 0x00007febb8c7ff4a __libc_start_main (libc.so.6)
#5 0x000055cb2cfabf7a n/a (systemd-logind)
Mar 29 15:06:17 line systemd-logind[4084]: New seat seat0.
Mar 29 15:06:17 line systemd[1]: Started Login Service.
Mar 29 15:06:17 line systemd-logind[4084]: Watching system buttons on /dev/input/event2 (Power Button)
Mar 29 15:06:17 line systemd-logind[4084]: Watching system buttons on /dev/input/event1 (Lid Switch)
Mar 29 15:06:17 line systemd-logind[4084]: Watching system buttons on /dev/input/event0 (Sleep Button)
Mar 29 15:06:17 line systemd-logind[4084]: Watching system buttons on /dev/input/event3 (AT Translated Set 2 keyboard)
Mar 29 15:06:17 line systemd-logind[4084]: Watching system buttons on /dev/input/event4 (ThinkPad Extra Buttons)
Mar 29 15:06:17 line systemd-logind[4084]: New session c1 of user nico.
Mar 29 15:06:17 line kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
Mar 29 15:06:18 line systemd[1]: Started Process Core Dump (PID 4102/UID 0).
Mar 29 15:06:18 line kernel: slack[4105]: segfault at 968 ip 00007fa600515393 sp 00007ffea970a500 error 4 in libX11.so.6.3.0[7fa6004e8000+139000]
Mar 29 15:06:18 line systemd[1]: Started Process Core Dump (PID 4110/UID 0).


Subsequently a variety of programs like slack & emacs crash, because Xorg is gone (not pasting that output here, jfyi)
Comment by Jan de Groot (JGC) - Friday, 30 March 2018, 08:59 GMT
Your logs indicate a time jump backwards. Because time jumps the watchdog kicks in and logind is killed.
https://github.com/systemd/systemd/issues/5014

Can you attach a boot log? Could be a kernel driver issue if you're using a hardware watchdog.
Comment by Nico Schottelius (telmich) - Friday, 30 March 2018, 14:09 GMT
Hey Jan,

I am sorry, but I can't. I was under time pressure to use the machine and migrated to Devuan with acpid + linux 4.15.0-2-amd64 meanwhile.

Sorry for the noise. As I cannot be much helpful in this issue anymore, from my point of view the ticket can be closed.

Loading...