FS#61498 - [systemd] systemd-journald severe memory leak

Attached to Project: Arch Linux
Opened by James Harvey (jamespharvey20) - Tuesday, 22 January 2019, 05:00 GMT
Last edited by Antonio Rojas (arojas) - Saturday, 21 September 2019, 09:06 GMT
Task Type Bug Report
Category Upstream Bugs
Status Closed
Assigned To Dave Reisner (falconindy)
Christian Hesse (eworm)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description: systemd-journald has a severe memory leak. Severe enough that it crashed a VM with 1GB memory in about 5 days. Couldn't use open ssh, make new ssh, or console login, and gave up after an hour and hard reset it. Severe enough that the upstream reporter saw it getting up to about 30GB in 5 days.

Additional info:
Upstream bugreport: https://github.com/systemd/systemd/issues/11502
Someone claims reverting 2d5d2e0 fixes the problem for them. If that's the case, this was committed on Dec 5, 2018.
systemd 239.370-1 was put in the Arch repo on Dec 18, so it's possible if this is the offending commit that the bug goes that far back on Arch. But, I upgraded to 239.370-1 on 12/21, and the system stayed up fine until 1/16 when I finally updated it for the first time in a few weeks. After upgrading to 240.34-3 on 1/16, I rebooted, and the system crashed in about 5 days. So, I'm doubtful it goes back to 239.370-1, but I haven't checked the git tree to see if it's linear and would include it.

Upstream comments suggest that every message logged stays in memory.

journalctl log is attached, from when the problem started. That's the end of the log. I gave it an hour before rebooting, and there was nothing after this.

Steps to reproduce:

# ps axuk-vsz | grep journald | grep -v grep

And watch it grow forever, until restarted.
This task depends upon

Closed by  Antonio Rojas (arojas)
Saturday, 21 September 2019, 09:06 GMT
Reason for closing:  Fixed
Comment by James Harvey (jamespharvey20) - Tuesday, 22 January 2019, 05:24 GMT
In #archlinux, rcf pointed out that doing something ridiculous like "base64 < /dev/urandom | logger" doesn't make it grow faster, so it's not keeping every message in memory.
Comment by Ryan Farley (rcf) - Tuesday, 22 January 2019, 09:29 GMT
More specifically, I was unable to actually reproduce this at all -- attempting to grow the journal with ridiculous entries resulted in it hitting a ceiling, rather than growing until OOM.
Comment by James Harvey (jamespharvey20) - Tuesday, 22 January 2019, 09:59 GMT
@rcf, what about this (making sure to run as normal user through sudo, as this repeatedly makes sudo logs in journalctl):

$ while(true); do sudo pmap --extended $(pidof systemd-journald) | sort -rnk 2 | head -n 1; sleep 1; done

For me, second column (Kbytes, size of map) grows by about 2633K/sec (2.6MB/sec), and third column (RSS, resident set size) by about 9.7K/sec.

Running as root without sudo, and it shrinks by 90%+ percent, slowing to 219K/sec and 0.9K/sec.

Running as root without sudo, but running "base64 < /dev/urandom | logger" as non-root, and it stays growing at the smaller rate.

EDIT: JustArchi in upstream bugreport ran with "sleep 0.1", vs my "sleep 1". If I do that, each line grows at a similar rate, making the leak 10x worse.
Comment by Ryan Farley (rcf) - Wednesday, 23 January 2019, 00:57 GMT
In that case, using a VM with a paltry 512M I was able to balloon the virtual size to 32.2G, which is not a particularly comfortable thing to do, but resident size consistently drops back down to 150M without ever reaching 300M.
Comment by Jake Kreiger (Magali75) - Saturday, 26 January 2019, 14:05 GMT

Loading...