FS#41728 - [systemd] coredumps, 100% cpu usage, X hanging
Attached to Project:
Arch Linux
Opened by Steven Honeyman (stevenhoneyman) - Thursday, 28 August 2014, 00:43 GMT
Last edited by Thomas Bächler (brain0) - Friday, 29 August 2014, 16:44 GMT
Opened by Steven Honeyman (stevenhoneyman) - Thursday, 28 August 2014, 00:43 GMT
Last edited by Thomas Bächler (brain0) - Friday, 29 August 2014, 16:44 GMT
|
Details
Description:
systemd 216 and (especially) firefox it seems, create loads of junk coredump files. Presumably while it's xz'ing them, that's why I get 100% cpu usage and X hanging. A description better than mine and a video: https://bbs.archlinux.org/viewtopic.php?id=186271 Steps to reproduce: pacman -Syu <reboot> |
This task depends upon
Closed by Thomas Bächler (brain0)
Friday, 29 August 2014, 16:44 GMT
Reason for closing: Not a bug
Additional comments about closing: Segfaults in chromium and firefox are not systemd bugs.
The manpage for coredump.conf documents well how to disable coredump storage. The wiki documents very well how to disable coredump handling in systemd entirely. There's also a multitude of other possibilities to suppress coredumps only for specific processes.
None of this is a systemd bug, and this "bug report" is becoming a shitfest of insults. We are done here.
Friday, 29 August 2014, 16:44 GMT
Reason for closing: Not a bug
Additional comments about closing: Segfaults in chromium and firefox are not systemd bugs.
The manpage for coredump.conf documents well how to disable coredump storage. The wiki documents very well how to disable coredump handling in systemd entirely. There's also a multitude of other possibilities to suppress coredumps only for specific processes.
None of this is a systemd bug, and this "bug report" is becoming a shitfest of insults. We are done here.
On a more serious note; everything was as stable as can be expected before this update. That poor guy on the forum is getting an SSD full of these just from opening and closing Firefox.
I guess you can go fuck off.
2) Deactivating or limiting coredumping just sounds like a circumvention of the problem.
3) There was no post-upgrade message, look into your pacman logs.
4) I have a life, thank you.
local v upgrades=(204-1
205-1
206-1
208-1
208-8
209-1
210-1
213-4
214-2
215-2)
Notice the absence of a particular number there?
https://bugs.archlinux.org/task/41286
– I have had similar dump problems with the journal under /var/log/journal and had to manually set "SystemMaxUse=250M" in /etc/systemd/journald.conf. Perhaps the change was postponed to 216 and now my big dumps are landing in /var/lib/systemd/coredump/
How can I turn this madness off? "Storage=none" only deletes the dumps after getting created.
echo ':: Coredumps are handled by systemd by default. Collection behavior can be'
echo ' tuned in /etc/systemd/coredump.conf.'
}
So, clearly, I missed adding this to the post_upgrade list -- my mistake, sorry about that. Instead of critiquing my frustration, maybe you can actually provide useful data points for this bug. My lack of response on this bug is directly correlated to the lack of any useful datapoints here.
systemd-coredump is a *reactionary* process which reads coredumps piped from kernelspace into userspace (via a sysctl knob). If you're suggesting that systemd-coredump is simply wandering off on its own and creating random coredump files for arbitrary processes, you'll need to provide a little more information than just "creating loads of junk coredump files". Really, I don't believe anecdata. I'm going to side with the kernel until someone here shows me otherwise.
If you think that deactivating coredumps is "circumvention of the problem" and not something you can live with, perhaps you could actually explain what it is you think the correct behavior should be, keeping in mind the above reactionary behavior.
Lastly, if you want to be arrogant and post snarky comments, you could at least use [testing] and report these things earlier so I'm not flooded with bug reports after a week of systemd sitting in [testing]. The testing repository is only as good as the the people who use it.
I'm sure people (including myself) would be able to provide you with what you need to fix this... if some help was offered in suggesting how to get whatever information is needed.
(you know, something like "can you set this this and this to <something> and attach the log")
---quote---
"Am I supposed to feel sorry for you? You missed the post_upgrade message.."
"I guess you can go fuck off."
-----------
^^^ That however, is just rude. you can't expect to make a "dick move"[your words to me last time] like that, treating people like you're perfect, and not expect a snarky comment in return when (again) you're proven wrong.
and not even bothering to apologise to him?
Aaaanyway...
Let us know what info you require for you to be able to fix this, and we'll go from there.
The video shows firefox exiting and files being created. Given the nature of the files, the simplest explanation is that firefox is crashing on exit. When a program is terminated either by SIGABRT or SIGSEGV, the kernel sends the coredump to userspace. The value in /proc/sys/kernel/core_pattern determines what userspace does with it. In this particular case, systemd-coredump is spawned and the kernel pipes the coredump to systemd-coredump's stdin. Based on the settings in /etc/systemd/coredump.conf, systemd is writing it to /var/lib/systemd/coredump.
Had firefox being run from a shell, that crash would not have been silently swallowed. As it is, I have no other logical explanation for this. Perhaps dmesg would have shown the segfault.
What did I miss in this video?
It's 6:45am here in England. - so I'm just off to work, but once I get home ill use the above info to see if I can find some more data for this bug.
Oh, the video - just that this is a bug that's just started since 216. 215 firefoxdoesn't crash on exit.
If I understand correctly, your claim is that a process invoked by the kernel, as a reaction to a process exiting on a SIGABRT or SIGSEGV, is actually *responsible* for the crash on exit, post-mortem. I'm still having a hard time figuring out how this is possible, because it really doesn't make any logical sense (correlation is not causation). Run firefox from the command line with systemd 215 and 216 installed:
$ firefox; echo status=$?
Compare the exit statuses between systemd versions. Repeat the experiment without any extensions installed. I don't have an Arch machine that can run Firefox, but on other distros it seems that firefox 31 will exit "6" on SIGABRT, and "139" (128 + signum) on SIGSEGV, and the expected "0" on a clean shutdown. I'm pretty confident you'll get consistent results regardless of the version of systemd present.
Not singling out firefox, but it's a common factor between affected users. I had camera.bin (whatever that is) and spacefm twice IIRC (forum post has them listed) in the short time before noticing and disabling coredump. Affecting me personally the issue is
1. Eating SSD space/write cycles
2. Completely hangs machine (while compressing?) as in cursor moves but nothing is clickable, and can't even switch to a VT
I'll do tests tonight and try narrow things down
Great, then can you recompile systemd with lz4 support and see if the cpu usage is reduced to a reasonable level during compression?
Can I disable any other features I don't need while I'm at it, or will that invalidate the results for you?
To be more specific, if i run from command line "$chromium; echo status=$?", i got these 3 lines:
-------------------
[4982:5008:0829/151424:ERROR:nss_util.cc(856)] After loading Root Certs, loaded==false: NSS error code: -8018
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
-------------------
No hang, but note that chromium is launched in another workspace (my console is in the second workspace, chromium starts on the first)
When I move to workspace 1, the issue starts. If I move again (during the hang) to second workspace, i can see systemd-coredumpctl at 100%, and the following lines right after the 3 previous:
-------------------
[31290:31290:0829/150214:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150214:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150214:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
[9:17:0829/150217:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[9:17:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[9:17:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
[31290:31290:0829/150217:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150217:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
ATTENTION: default value of option force_s3tc_enable overridden by environment.
ATTENTION: option value of option force_s3tc_enable ignored.
[9:17:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(297)] Failed to initialize GLES2Implementation.
[31290:31290:0829/150223:ERROR:command_buffer_proxy_impl.cc(153)] Could not send GpuCommandBufferMsg_Initialize.
[31290:31290:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(236)] CommandBufferProxy::Initialize failed.
[31290:31290:0829/150223:ERROR:webgraphicscontext3d_command_buffer_impl.cc(256)] Failed to initialize command buffer.
[31290:31312:0829/150224:ERROR:value_store_frontend.cc(62)] Error while writing pafkbggdmjlpgkdkcbjmhmfcdpncadgh.alarms to /home/raf/.config/chromium/Default/Extension State
[31290:31312:0829/150228:ERROR:value_store_frontend.cc(62)] Error while writing gighmmpiobklfepjocnamgkkbiglidom.browser_action to /home/raf/.config/chromium/Default/Extension state
[31290:31312:0829/150229:ERROR:value_store_frontend.cc(62)] Error while writing gighmmpiobklfepjocnamgkkbiglidom.browser_action to
/home/raf/.config/chromium/Default/Extension State
-------------------
When the freeze stops, If I exit chromium, i got the expected "status=0", no other freeze or coredump.
Hope it will help.
@Dave - OK will do.
By "Invalidate" I just meant I see a lot of bug reports considered invalid if the system is not 100% as 'upstream' intended; "Unsupported configuration: WONTFIX" etc.
Issue is the 100% cpu usage during the creation of a coredump in /var/lib/systemd/coredump, via systemd-coredumpctl. This freezing is *NEW*, start with my last pacman -Syu a few day ago, and seems to happens to other people, with other software. For my personnal issue, the system is creating only 4,8M of log, and for that, the system hangs during may be 20 seconds. IMHO, this is not a correct behaviour, considering that the software (chromium, firefox, whatever...) is NOT crashing.
I have no idea of the changes between systemd-216 and 215 or 214, new compression level or something else.
How can I help ?
<sarcasm>Yeah, your browser couldn't *possibly* be crashing. It's certainly *not* multi-process either.</sarcasm>
Why do you assert that it isn't crashing? Simply because it isn't exiting on you? What does the core dump tell you?
So your sarcasm doesn't help, thanks...
coredumpctl info (last)
PID: 23175 (chromium)
UID: 1000 (raf)
GID: 100 (users)
Signal: 11 (SEGV)
Timestamp: ven. 2014-08-29 16:07:38 CEST (2min 30s ago)
Command Line: /usr/lib/chromium/chromium --type=gpu-process --channel=23066.6.601620651 --supports-dual-gpus=false --gpu-driver-bug-workarounds=1,2,11,15 --disable-accelerated-video-decode --gpu-vendor-id=0x8086 --gpu-device-id=0x0106 --gpu-driver-vendor=Mesa --gpu-driver-version=10.2.
Executable: /usr/lib/chromium/chromium
Control Group: /user.slice/user-1000.slice/session-c2.scope
Unit: session-c2.scope
Slice: user-1000.slice
Session: c2
Owner UID: 1000 (raf)
Boot ID: b9a0b0cd161e4c26a9effdc2a317b52d
Machine ID: c6139e9ab9ce4c0c9c6f1163d72467b3
Hostname: gragon
Coredump: /var/lib/systemd/coredump/core.chromium.1000.b9a0b0cd161e4c26a9effdc2a317b52d.23175.1409321258000000.xz
Message: Process 23175 (chromium) of user 1000 dumped core.
See attachment.
Your own coredump vehemently disagrees with you.
Signal: 11 (SEGV)
Message: Process 23175 (chromium) of user 1000 dumped core.
So... what next ?
Well, for *you*. It's *your* coredump after all. I don't know how else to (re)explain that a process receiving SIGSEGV or SIGABRT will generate a coredump. In the default configuration of how we ship systemd, a coredump file will be written to /var/lib/systemd/coredump. If you want to disable this, the kernel.core_pattern sysctl can be modified, and you can disable storage in /etc/systemd/coredump.conf
> i could use it, open url and son on... so may be he half-crashed ? A quantic crash ?
Right, it was nice enough to recover for you. It still segfaulted somewhere in a render process.
> So... what next ?
You seem to be focused on the CPU usage, so can do the same as what I asked Steve to do -- recompile systemd with lz4 to see how that affects CPU usage.
> but for me, on my laptop, in the real world,
So, your tone doesn't help, thanks...
No probleme with that. So before 216 or 215, this wasn't the case ?
> recompile systemd with lz4 to see how that affects CPU usage.
As steve will do it, we will wait his answer. But may be you can tell us if compression have change in systemd recently, so it could explain why we now have a curious cpu usage ?
> So, your tone doesn't help, thanks...
Funny how every people trying to talk to you seems to be annoying by the way your answer their questions, no ?
> … You could also figure out why firefox terminated
I can tell you: Alt + F4/Ctrl + Q :P Just as I have written in the forum: "I just start and quit Firefox". There is no crash.
> systemd-coredump is a *reactionary* process which reads coredumps piped from kernelspace into userspace (via a sysctl knob).
Okay, it might really doesn't have to do much with systemd but with Firefox or my Firefox profile especially.
Attached is my output of "coredumpctl info":
"segmentation fault
status=139"
A fresh profile gives exit status 0.
This seems to be an official Firefox bug: "firefox failed to exit and an error was produced on the xterm" https://bugzilla.mozilla.org/show_bug.cgi?id=1007761 - Seems to be fixed in the upcoming Firefox 32.
My computer doesn't hang but it's a fast Phenom II X6...
The fun part here is that everybody is writing under his real name into the public internet, remember.^^