FS#68799 - [systemd] Assertion failure on reexec

Attached to Project: Arch Linux
Opened by Jan Alexander Steffens (heftig) - Monday, 30 November 2020, 17:01 GMT
Last edited by Christian Hesse (eworm) - Tuesday, 01 December 2020, 14:30 GMT
Task Type Bug Report
Category Packages: Testing
Status Closed
Assigned To Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

I encountered this first on the upgrade from 247-1 to 247-2:

pacman[165183]: upgraded systemd (247-1 -> 247-2)
systemd[1]: Reexecuting.
systemd[1]: Assertion 'p->n_ref > 0' failed at src/shared/varlink.c:386, function varlink_unref(). Aborting.
audit[165307]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=165307 comm="systemd" exe=2F7573722F6C69622F73797374656D642F73797374656D64202864656C6574656429 sig=6 res=1
systemd-coredump[165308]: Due to PID 1 having crashed coredump collection will now be turned off.
systemd-coredump[165308]: Process 165307 (systemd) of user 0 dumped core.
Stack trace of thread 165307:
#0 0x00007f7b8bb0b91b kill (libc.so.6 + 0x3d91b)
#1 0x00005617721168ae n/a (/usr/lib/systemd/systemd (deleted) + 0x388ae)
#2 0x00007f7b8bb0b6a0 __restore_rt (libc.so.6 + 0x3d6a0)
#3 0x00007f7b8bb0b615 raise (libc.so.6 + 0x3d615)
#4 0x00007f7b8baf4862 abort (libc.so.6 + 0x26862)
#5 0x00007f7b8be0c07f n/a (/usr/lib/systemd/libsystemd-shared-247.so (deleted) + 0x5607f)
systemd[1]: Caught <ABRT>, dumped core as pid 165307.
systemd[1]: Freezing execution.
systemd-oomd[20442]: Failed to connect to /run/systemd/io.system.ManagedOOM: Connection refused
systemd-oomd[20442]: Failed to acquire varlink connection
systemd-oomd[20442]: Event loop failed: Connection refused


After a reboot, running `systemctl daemon-reexec` twice triggered it again:

sudo[76111]: jan : TTY=pts/2 ; PWD=/home/jan ; USER=root ; COMMAND=/usr/bin/systemctl daemon-reexec
systemd[1]: Reexecuting.
systemd[1]: Assertion 'p->n_ref > 0' failed at src/shared/varlink.c:386, function varlink_unref(). Aborting.
audit[76113]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 pid=76113 comm="systemd" exe="/usr/lib/systemd/systemd" sig=6 res=1
systemd-coredump[76114]: Due to PID 1 having crashed coredump collection will now be turned off.
systemd-coredump[76114]: Process 76113 (systemd) of user 0 dumped core.

Stack trace of thread 76113:
#0 0x00007f051531b91b kill (libc.so.6 + 0x3d91b)
#1 0x000055f62e2318ae n/a (systemd + 0x388ae)
#2 0x00007f051531b6a0 __restore_rt (libc.so.6 + 0x3d6a0)
#3 0x00007f051531b615 raise (libc.so.6 + 0x3d615)
#4 0x00007f0515304862 abort (libc.so.6 + 0x26862)
#5 0x00007f051561c07f n/a (libsystemd-shared-247.so + 0x5607f)
#6 0x00007f05156ac752 varlink_unref (libsystemd-shared-247.so + 0xe6752)
#7 0x000055f62e278207 n/a (systemd + 0x7f207)
#8 0x000055f62e230a19 n/a (systemd + 0x37a19)
#9 0x00007f0515306152 __libc_start_main (libc.so.6 + 0x28152)
#10 0x000055f62e23140e n/a (systemd + 0x3840e)
systemd[1]: Caught <ABRT>, dumped core as pid 76113.
systemd[1]: Freezing execution.
systemd-oomd[623]: Failed to connect to /run/systemd/io.system.ManagedOOM: Connection refused
systemd-oomd[623]: Failed to acquire varlink connection
systemd-oomd[623]: Event loop failed: Connection refused
This task depends upon

Closed by  Christian Hesse (eworm)
Tuesday, 01 December 2020, 14:30 GMT
Reason for closing:  Fixed
Additional comments about closing:  systemd 247-3
Comment by Jan Alexander Steffens (heftig) - Monday, 30 November 2020, 17:38 GMT
Here's a better backtrace:

#0 0x00007fd6beec391b in kill () from /usr/lib/libc.so.6
No symbol table info available.
#1 0x000056251c331a43 in crash (sig=6) at ../systemd-stable/src/core/main.c:224
sa = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
pid = <optimized out>
__func__ = "crash"
__PRETTY_FUNCTION__ = "crash"
#2 <signal handler called>
No symbol table info available.
#3 0x00007fd6beec3615 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#4 0x00007fd6beeac862 in abort () from /usr/lib/libc.so.6
No symbol table info available.
#5 0x00007fd6bf1c4173 in log_assert_failed_realm (realm=<optimized out>, text=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
at ../systemd-stable/src/basic/log.c:864
No locals.
#6 0x00007fd6bf25b702 in varlink_unref (p=<optimized out>) at ../systemd-stable/src/shared/varlink.c:386
__PRETTY_FUNCTION__ = {<optimized out> <repeats 14 times>}
#7 varlink_unref (p=0x56251ccb7ac0) at ../systemd-stable/src/shared/varlink.c:386
__PRETTY_FUNCTION__ = "varlink_unref"
#8 0x00007fd6bf261c09 in varlink_close_unref (v=<optimized out>) at ../systemd-stable/src/shared/varlink.c:1212
No locals.
#9 0x000056251c37cc47 in manager_varlink_done (m=0x56251cb02490) at ../systemd-stable/src/core/core-varlink.c:479
__PRETTY_FUNCTION__ = {<optimized out> <repeats 21 times>}
#10 manager_free (m=0x56251cb02490) at ../systemd-stable/src/core/manager.c:1367
__PRETTY_FUNCTION__ = {<optimized out> <repeats 13 times>}
#11 0x000056251c330d6b in manager_free (m=0x56251cb02490) at ../systemd-stable/src/core/manager.c:1352
__PRETTY_FUNCTION__ = {<optimized out> <repeats 13 times>}
c = <optimized out>
_found = <optimized out>
__assert_in_set = {<optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>}
dt = <optimized out>
#12 main (argc=<optimized out>, argv=<optimized out>) at ../systemd-stable/src/core/main.c:2878
initrd_timestamp = {realtime = <optimized out>, monotonic = 0}
userspace_timestamp = {realtime = 1606756639250481, monotonic = 10518726}
kernel_timestamp = {realtime = 1606756628731755, monotonic = 0}
security_start_timestamp = {realtime = 1606756639250623, monotonic = 10518868}
security_finish_timestamp = {realtime = 1606756639250627, monotonic = 10518872}
saved_rlimit_nofile = {rlim_cur = 1024, rlim_max = 4096}
saved_rlimit_memlock = {rlim_cur = 65536, rlim_max = 65536}
skip_setup = <optimized out>
loaded_policy = false
queue_default_job = <optimized out>
first_boot = <optimized out>
reexecute = true
switch_root_dir = <optimized out>
switch_root_init = 0x0
before_startup = <optimized out>
after_startup = <optimized out>
timespan = "\230i@\034%V\000\000\340o@\034%V\000\000\000\000\000\000\000\000\000\000\340W\004\277\326\177\000\000\000\000\000\000\000\000\000\000\340_\004\277\326\177\000\000@`\004\277\326\177\000\000\300`\004\277\326\177\000"
shutdown_verb = 0x0
error_message = 0x0
r = <optimized out>
retval = 0
m = 0x56251cb02490
fds = 0x56251cc16c98
finish = <optimized out>
systemd = "systemd"
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
Comment by Christian Hesse (eworm) - Monday, 30 November 2020, 22:07 GMT
Mhh, I can't reproduce myself...
Have you reported upstream?
Comment by Christian Hesse (eworm) - Tuesday, 01 December 2020, 13:11 GMT
Is this reproducible with systemd 247.1-1?
Comment by Jan Alexander Steffens (heftig) - Tuesday, 01 December 2020, 14:13 GMT
I might be seeing this because I tried to use systemd-oomd, which communicates with systemd via varlink. I boot with systemd.unified_cgroup_hierarchy and added some overrides as recommended by man 8 systemd-oomd.service:

/etc/systemd/system/-.slice.d/local.conf:
[Slice]
ManagedOOMSwap=kill

/etc/systemd/system/user@.service.d/local.conf:
[Service]
Delegate=yes
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimitPercent=40%
Comment by Jan Alexander Steffens (heftig) - Tuesday, 01 December 2020, 14:16 GMT
I can no longer reproduce it with 247.1-1, but that might be because oomd is now missing.
Comment by Christian Hesse (eworm) - Tuesday, 01 December 2020, 14:30 GMT
Yes, oomd is not built in release mode. We will have it back once upstream decides it is ready for the public audience. :-p

Closing for now. I will keep this in mind (and try to reproduce if I have some spare time).

Loading...