Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#41728 - [systemd] coredumps, 100% cpu usage, X hanging

Attached to Project: Arch Linux
Opened by Steven Honeyman (stevenhoneyman) - Thursday, 28 August 2014, 00:43 GMT
Last edited by Thomas Bächler (brain0) - Friday, 29 August 2014, 16:44 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To Thomas Bächler (brain0)
Dave Reisner (falconindy)
Tom Gundersen (tomegun)
Architecture All
Severity Medium
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 8
Private No

Details

Description:

systemd 216 and (especially) firefox it seems, create loads of junk coredump files. Presumably while it's xz'ing them, that's why I get 100% cpu usage and X hanging.
A description better than mine and a video: https://bbs.archlinux.org/viewtopic.php?id=186271

Steps to reproduce:

pacman -Syu
<reboot>
This task depends upon

Closed by  Thomas Bächler (brain0)
Friday, 29 August 2014, 16:44 GMT
Reason for closing:  Not a bug
Additional comments about closing:  Segfaults in chromium and firefox are not systemd bugs.

The manpage for coredump.conf documents well how to disable coredump storage. The wiki documents very well how to disable coredump handling in systemd entirely. There's also a multitude of other possibilities to suppress coredumps only for specific processes.

None of this is a systemd bug, and this "bug report" is becoming a shitfest of insults. We are done here.
Comment by Dave Reisner (falconindy) - Thursday, 28 August 2014, 01:01 GMT
What do you mean by "junk coredump files"? If something is legitimately crashing, then a coredump will be created. If you dislike this behavior, disable it in coredump.conf. Otherwise, have you tried using software that doesn't crash repeatedly?
Comment by Steven Honeyman (stevenhoneyman) - Thursday, 28 August 2014, 01:13 GMT
Great idea! Please can you point me in the direction of "systemd 216 approved" software?

On a more serious note; everything was as stable as can be expected before this update. That poor guy on the forum is getting an SSD full of these just from opening and closing Firefox.
Comment by Michael Pusterhofer (feanor12) - Thursday, 28 August 2014, 10:26 GMT
After the upgrade to 216, systemd creates a lot of core.gvfsd-metadata dumps.
Comment by Maximilian Böhm (holalu) - Thursday, 28 August 2014, 23:48 GMT
We just have to wait until everybody's root fs is full of junk for this report to be widely considered meaningful.
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 00:21 GMT
Am I supposed to feel sorry for you? You missed the post_upgrade message and my first reply here explaining how to disable coredump storage, and my post on arch-dev-public about bringing lz4 into core to reduce CPU usage.

I guess you can go fuck off.
Comment by Maximilian Böhm (holalu) - Friday, 29 August 2014, 00:43 GMT
1) Firefox runs perfectly stable and systemd creates gigabytes of dumps nonetheless. This can't be the desired behavior.
2) Deactivating or limiting coredumping just sounds like a circumvention of the problem.
3) There was no post-upgrade message, look into your pacman logs.
4) I have a life, thank you.
Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 00:52 GMT
Wrong yet again Dave... how about you check your scripts before telling people to "go fuck off"

local v upgrades=(204-1
205-1
206-1
208-1
208-8
209-1
210-1
213-4
214-2
215-2)

Notice the absence of a particular number there?
Comment by Maximilian Böhm (holalu) - Friday, 29 August 2014, 01:08 GMT
Found an interesting hint: "As of systemd 215, the situation is a lot better as coredumps are stored in /var/lib/system/coredumps/ by default instead of polluting the journal."
https://bugs.archlinux.org/task/41286
– I have had similar dump problems with the journal under /var/log/journal and had to manually set "SystemMaxUse=250M" in /etc/systemd/journald.conf. Perhaps the change was postponed to 216 and now my big dumps are landing in /var/lib/systemd/coredump/
How can I turn this madness off? "Storage=none" only deletes the dumps after getting created.
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 01:17 GMT
_216_1_changes() {
echo ':: Coredumps are handled by systemd by default. Collection behavior can be'
echo ' tuned in /etc/systemd/coredump.conf.'
}

So, clearly, I missed adding this to the post_upgrade list -- my mistake, sorry about that. Instead of critiquing my frustration, maybe you can actually provide useful data points for this bug. My lack of response on this bug is directly correlated to the lack of any useful datapoints here.

systemd-coredump is a *reactionary* process which reads coredumps piped from kernelspace into userspace (via a sysctl knob). If you're suggesting that systemd-coredump is simply wandering off on its own and creating random coredump files for arbitrary processes, you'll need to provide a little more information than just "creating loads of junk coredump files". Really, I don't believe anecdata. I'm going to side with the kernel until someone here shows me otherwise.

If you think that deactivating coredumps is "circumvention of the problem" and not something you can live with, perhaps you could actually explain what it is you think the correct behavior should be, keeping in mind the above reactionary behavior.

Lastly, if you want to be arrogant and post snarky comments, you could at least use [testing] and report these things earlier so I'm not flooded with bug reports after a week of systemd sitting in [testing]. The testing repository is only as good as the the people who use it.
Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 01:46 GMT
"someone here shows me otherwise." - did you watch the video from the forum post of this probem happening? I thought that was very clear.

I'm sure people (including myself) would be able to provide you with what you need to fix this... if some help was offered in suggesting how to get whatever information is needed.
(you know, something like "can you set this this and this to <something> and attach the log")

---quote---
"Am I supposed to feel sorry for you? You missed the post_upgrade message.."
"I guess you can go fuck off."
-----------

^^^ That however, is just rude. you can't expect to make a "dick move"[your words to me last time] like that, treating people like you're perfect, and not expect a snarky comment in return when (again) you're proven wrong.
and not even bothering to apologise to him?


Aaaanyway...
Let us know what info you require for you to be able to fix this, and we'll go from there.
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 02:08 GMT
> did you watch the video from the forum post of this probem happening? I thought that was very clear.
The video shows firefox exiting and files being created. Given the nature of the files, the simplest explanation is that firefox is crashing on exit. When a program is terminated either by SIGABRT or SIGSEGV, the kernel sends the coredump to userspace. The value in /proc/sys/kernel/core_pattern determines what userspace does with it. In this particular case, systemd-coredump is spawned and the kernel pipes the coredump to systemd-coredump's stdin. Based on the settings in /etc/systemd/coredump.conf, systemd is writing it to /var/lib/systemd/coredump.

Had firefox being run from a shell, that crash would not have been silently swallowed. As it is, I have no other logical explanation for this. Perhaps dmesg would have shown the segfault.

What did I miss in this video?
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 02:26 GMT
You could also figure out why firefox terminated... by examining the coredump. "coredumpctl info" will tell you what the terminating signal was for each core.
Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 05:55 GMT
Excellent (no sarcasm this time).

It's 6:45am here in England. - so I'm just off to work, but once I get home ill use the above info to see if I can find some more data for this bug.

Oh, the video - just that this is a bug that's just started since 216. 215 firefoxdoesn't crash on exit.
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 12:50 GMT
> [...] this is a bug that's just started since 216. 215 firefoxdoesn't crash on exit.
If I understand correctly, your claim is that a process invoked by the kernel, as a reaction to a process exiting on a SIGABRT or SIGSEGV, is actually *responsible* for the crash on exit, post-mortem. I'm still having a hard time figuring out how this is possible, because it really doesn't make any logical sense (correlation is not causation). Run firefox from the command line with systemd 215 and 216 installed:

$ firefox; echo status=$?

Compare the exit statuses between systemd versions. Repeat the experiment without any extensions installed. I don't have an Arch machine that can run Firefox, but on other distros it seems that firefox 31 will exit "6" on SIGABRT, and "139" (128 + signum) on SIGSEGV, and the expected "0" on a clean shutdown. I'm pretty confident you'll get consistent results regardless of the version of systemd present.
Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 13:02 GMT
systemd is PID1, udev, and pretty much everything else though; right? a bug in any part of that could cause major system issues surely?

Not singling out firefox, but it's a common factor between affected users. I had camera.bin (whatever that is) and spacefm twice IIRC (forum post has them listed) in the short time before noticing and disabling coredump. Affecting me personally the issue is
1. Eating SSD space/write cycles
2. Completely hangs machine (while compressing?) as in cursor moves but nothing is clickable, and can't even switch to a VT

I'll do tests tonight and try narrow things down
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 13:09 GMT
> 2. Completely hangs machine (while compressing?) as in cursor moves but nothing is clickable, and can't even switch to a VT
Great, then can you recompile systemd with lz4 support and see if the cpu usage is reduced to a reasonable level during compression?
Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 13:22 GMT
Sure.

Can I disable any other features I don't need while I'm at it, or will that invalidate the results for you?
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 13:25 GMT
Invalidate? Not necessarily, but it adds unnecessary complications and won't really change the compile time. Let's keep things simple and make the single change to lz4.
Comment by Raphaël (metameta) - Friday, 29 August 2014, 13:27 GMT
I got the same type of issue with chromium (systemd-coredumpctl using 100%, machine hangs, coredump created).

To be more specific, if i run from command line "$chromium; echo status=$?", i got these 3 lines:
-------------------
[4982:5008:0829/151424:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
-------------------
No hang, but note that chromium is launched in another workspace (my console is in the second workspace, chromium starts on the first)

When I move to workspace 1, the issue starts. If I move again (during the hang) to second workspace, i can see systemd-coredumpctl at 100%, and the following lines right after the 3 previous:

-------------------
[31290:31290:0829/150214:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150214:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150214:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
[9:17:0829/150217:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[9:17:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[9:17:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
[31290:31290:0829/150217:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
[9:17:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(297)] Failed to initialize GLES2Implementation.
[31290:31290:0829/150223:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
[31290:31312:0829/150224:ERROR:value_store_frontend.cc(62)] Error while writing pafkbggdmjlpgkdkcbjmhmfcdpncadgh.alarms to /home/raf/.config/chromium/Default/Extension State
[31290:31312:0829/150228:ERROR:value_store_frontend.cc(62)] Error while writing gighmmpiobklfepjocnamgkkbiglidom.browser_action to /home/raf/.config/chromium/Default/Extension state
[31290:31312:0829/150229:ERROR:value_store_frontend.cc(62)] Error while writing gighmmpiobklfepjocnamgkkbiglidom.browser_action to
/home/raf/.config/chromium/Default/Extension State
-------------------


When the freeze stops, If I exit chromium, i got the expected "status=0", no other freeze or coredump.

Hope it will help.

Comment by Steven Honeyman (stevenhoneyman) - Friday, 29 August 2014, 13:35 GMT
@Raphael - Thank-you for confirming, that on it's own helps

@Dave - OK will do.
By "Invalidate" I just meant I see a lot of bug reports considered invalid if the system is not 100% as 'upstream' intended; "Unsupported configuration: WONTFIX" etc.
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 13:39 GMT
Doesn't really help without any correlation to system logs and what cores were created. I have no idea what else is happening in chromium, but it looks like render processes are crashing based on the errors. So, this would generate coredumps.
Comment by Raphaël (metameta) - Friday, 29 August 2014, 14:42 GMT
Problem is not the error in chromium, as chromium is *not* crashing (I guess these errors in chromium is there since a few month, if not the first install, but then again, it doesn't crash).

Issue is the 100% cpu usage during the creation of a coredump in /var/lib/systemd/coredump, via systemd-coredumpctl. This freezing is *NEW*, start with my last pacman -Syu a few day ago, and seems to happens to other people, with other software. For my personnal issue, the system is creating only 4,8M of log, and for that, the system hangs during may be 20 seconds. IMHO, this is not a correct behaviour, considering that the software (chromium, firefox, whatever...) is NOT crashing.

I have no idea of the changes between systemd-216 and 215 or 214, new compression level or something else.

How can I help ?
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 15:06 GMT
> Problem is not the error in chromium, as chromium is *not* crashing
<sarcasm>Yeah, your browser couldn't *possibly* be crashing. It's certainly *not* multi-process either.</sarcasm>

Why do you assert that it isn't crashing? Simply because it isn't exiting on you? What does the core dump tell you?
Comment by Raphaël (metameta) - Friday, 29 August 2014, 15:21 GMT
You are smart ? You know your job ? You know how "users" speaks ? So i'm pretty sure you understand very well the meaning of my sentence when i said "not crashing".

So your sarcasm doesn't help, thanks...


coredumpctl info (last)

PID: 23175 (chromium)
UID: 1000 (raf)
GID: 100 (users)
Signal: 11 (SEGV)
Timestamp: ven. 2014-08-29 16:07:38 CEST (2min 30s ago)
Command Line: /usr/lib/chromium/chromium --type=gpu-process --channel=23066.6.601620651 --supports-dual-gpus=false --gpu-driver-bug-workarounds=1,2,11,15 --disable-accelerated-video-decode --gpu-vendor-id=0x8086 --gpu-device-id=0x0106 --gpu-driver-vendor=Mesa --gpu-driver-version=10.2.
Executable: /usr/lib/chromium/chromium
Control Group: /user.slice/user-1000.slice/session-c2.scope
Unit: session-c2.scope
Slice: user-1000.slice
Session: c2
Owner UID: 1000 (raf)
Boot ID: b9a0b0cd161e4c26a9effdc2a317b52d
Machine ID: c6139e9ab9ce4c0c9c6f1163d72467b3
Hostname: gragon
Coredump: /var/lib/systemd/coredump/core.chromium.1000.b9a0b0cd161e4c26a9effdc2a317b52d.23175.1409321258000000.xz
Message: Process 23175 (chromium) of user 1000 dumped core.

See attachment.


Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 15:25 GMT
> So i'm pretty sure you understand very well the meaning of my sentence when i said "not crashing".
Your own coredump vehemently disagrees with you.

Signal: 11 (SEGV)
Message: Process 23175 (chromium) of user 1000 dumped core.
Comment by Raphaël (metameta) - Friday, 29 August 2014, 15:34 GMT
As you wish, chromium crashed for you... but for me, on my laptop, in the real world, after the 20s lagging when i just see the 100% cpu... i could use it, open url and son on... so may be he half-crashed ? A quantic crash ?

So... what next ?
Comment by Dave Reisner (falconindy) - Friday, 29 August 2014, 15:42 GMT
> As you wish, chromium crashed for you...
Well, for *you*. It's *your* coredump after all. I don't know how else to (re)explain that a process receiving SIGSEGV or SIGABRT will generate a coredump. In the default configuration of how we ship systemd, a coredump file will be written to /var/lib/systemd/coredump. If you want to disable this, the kernel.core_pattern sysctl can be modified, and you can disable storage in /etc/systemd/coredump.conf


> i could use it, open url and son on... so may be he half-crashed ? A quantic crash ?
Right, it was nice enough to recover for you. It still segfaulted somewhere in a render process.


> So... what next ?
You seem to be focused on the CPU usage, so can do the same as what I asked Steve to do -- recompile systemd with lz4 to see how that affects CPU usage.


> but for me, on my laptop, in the real world,
So, your tone doesn't help, thanks...
Comment by Raphaël (metameta) - Friday, 29 August 2014, 15:50 GMT
> In the default configuration of how we ship systemd, a coredump file will be written to /var/lib/systemd/coredump
No probleme with that. So before 216 or 215, this wasn't the case ?

> recompile systemd with lz4 to see how that affects CPU usage.
As steve will do it, we will wait his answer. But may be you can tell us if compression have change in systemd recently, so it could explain why we now have a curious cpu usage ?

> So, your tone doesn't help, thanks...
Funny how every people trying to talk to you seems to be annoying by the way your answer their questions, no ?
Comment by Maximilian Böhm (holalu) - Friday, 29 August 2014, 16:38 GMT
> The video shows firefox exiting and files being created. Given the nature of the files, the simplest explanation is that firefox is crashing on exit.
> … You could also figure out why firefox terminated
I can tell you: Alt + F4/Ctrl + Q :P Just as I have written in the forum: "I just start and quit Firefox". There is no crash.

> systemd-coredump is a *reactionary* process which reads coredumps piped from kernelspace into userspace (via a sysctl knob).
Okay, it might really doesn't have to do much with systemd but with Firefox or my Firefox profile especially.
Attached is my output of "coredumpctl info":
"segmentation fault
status=139"

A fresh profile gives exit status 0.

This seems to be an official Firefox bug: "firefox failed to exit and an error was produced on the xterm" https://bugzilla.mozilla.org/show_bug.cgi?id=1007761 - Seems to be fixed in the upcoming Firefox 32.

My computer doesn't hang but it's a fast Phenom II X6...
The fun part here is that everybody is writing under his real name into the public internet, remember.^^
Comment by Maximilian Böhm (holalu) - Friday, 29 August 2014, 16:43 GMT
Another attachment with my output of "firefox; echo status=$?" with my profile and a fresh profile.

Loading...