FS#67070 - [glibc] 2.31-5 causing several applications to crash
Attached to Project:
Arch Linux
Opened by Sean Lingham (Cxpher) - Monday, 22 June 2020, 00:35 GMT
Last edited by freswa (frederik) - Monday, 05 October 2020, 12:52 GMT
Opened by Sean Lingham (Cxpher) - Monday, 22 June 2020, 00:35 GMT
Last edited by freswa (frederik) - Monday, 05 October 2020, 12:52 GMT
|
Details
Description:
Bug in glibc causing several applications to crash even on a fresh install. Whenever attempting to compile or install software with pacman or yay, the following error message is seen "sh: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed." Whenever using Google Chrome from AUR or Steam, the following coredump happens.. (see attached dump log). This causes individual tabs to crash randomly & frequently. Several SSL errors can also be seen --> [38200:38218:0622/082004.851124:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -107. Eventually, systemd coredump itself crashes --> automata.localdomain systemd-coredump[34265]: Failed to connect to coredump service: Connection refused When using encrypted partitions, the filesystem gets corrupted (since filesystem itself in inter-dependent on glibc). This ends up in an unrecoverable state. Additional info: * package version(s) glibc 2.31-5 * config and/or log files etc. See attached glibcdump.txt * link to upstream bug report, if any Not sure if it's upstream bug or just in Arch. Need to confirm Steps to reproduce: 1. Install latest Arch packages 2. Launch Google Chrome, attempt to use pacman, use Steam or even try to install yay (with Go package) This bug might go unnoticed by several especially if they don't use Chrome/Steam/yay/flatpak etc. |
This task depends upon
Do you have appropriate microcode updates installed?
This isn't a graphical interface issue. I have the same problem with KDE.
journalctl -xf
Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.
On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.
I will see a segfault in the logs.
I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.
System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.
I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.
When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.
When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).
It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.
I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
If you can not consistently trigger a segfault, I'd suggest checking your RAM.
journalctl -xf
Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.
On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.
I will see a segfault in the logs.
I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.
System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.
I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.
When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.
When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).
It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.
I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
Could be systemd related yes?
The reason i guessed this is because it does not seem to happen in a non chroot env (based on arch iso) that runs systemd 243.162-2 but it happens on my arch-chroot env that runs the latest packages (systemd 245.6-7).
If any of you have an earlier systemd package i could use, i'd be happy to test it.
Please enlighten me with your conclusion then. Point me to the exact problem with this system.
If you cannot contribute anything, then don't. Do not however, assume that others are not willing to help.
It's because of toxic behavior like this that people are put off.
Whenever attempting to build anything with makepkg, i get spam of this on stdout and it fails -->
/usr/share/makepkg/util/pkgbuild.sh: line 31: 53205 Done { declare -f "$1" || declare -f package; } 2> /dev/null
53206 Aborted (core dumped) | grep -E "$2"
Please don't insult the bug wranglers for contributing by telling you to look elsewhere (EDIT: as in, not systemd -- somewhere else on your system!) for your problem rather than chasing wild red herrings based on completely unknowledgeable guesses that distract people from getting to the root of the problem.
If you can figure out what else on your system is causing this problem and it's something we can do about it, then we do want to hear anything you discovered... it's just that I don't believe you will find it by looking at systemd...
grep: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed.
How was your Arch chroot created? Have your stripped any files from it to slim it down?
Thank you for your suggestions.
Although i've tested the RAM and the HDD shows fine, i'm going to take it apart piece by piece and put it all back together and test again.
Logic does state that it's a hardware problem before software (although i wish of course that it is the latter). It's quite rare for a stock CPU to post and install and OS (and fail).
Once i've done all of that and if the issue persists, i'll try Arch Rollback Machine first to see if i can isolate the problem.
Will update this ticket once i have the answers.
And I don't know if it's possible, but if you got phoronix' test suite installed on it (while live-booted) it should probably be able to stress things enough to shake out hardware issues.
i3 config:
exec --no-startup-id redshift-gtk
exec --no-startup-id blueman-applet
exec --no-startup-id udiskie -t
udiskie.txt (7.4 KiB)
blueman-tray.txt (7.1 KiB)
I haven't been able to reproduce the crashes the original reporter has mentioned with pacman or when compiling software in general. I also tried crashing Chromium and Steam, the logic being that they and Discord are all essentially a Chromium based browser under the hood,
but so far I can't make them crash. I haven't yet tested actual Google Chrome, as opposed to Chromium. Only Discord crashes and only when I'm in a voice call while running a particular game via wine + dxvk, and the i3 system tray needs to be enabled for it too.
Unless someone manages to reproduce the original issues I'd consider these separate.
The mention of filesystem corruption has me worried though, which is why I'm following this. I'll test with Chrome later, but I'm curious if the OP could provide the way in which they managed to crash grep, as seen in the second dump. That might be the easiest way for people to try and reproduce the issue.
Just to point that memtest86 is not enough for testing memories. I had a laptop in 2011 which always passed memtest86 (several passes, lots of hours) with no errors. But firefox, java and mplayer randomly crashed from time to time (although not very frequently). One day gcc also crashed, which pointed me to a memory problem. Despite I'm a programmer, most applications are not big enough to cause gcc to crash. So I tried compiling the linux kernel: it always crashed at some point! Perhaps at 50% of progress, or at 33% or at 90% ... but it crashed and allowed me to easily find the culprit memory module. I repeated again Memtest86 and it did not find any memory problem.
I don't trust memtest86... It is just another test, but not the best memory stress test. Now I always compile kernels for testing memory in a new machine... An high -j parameter did not increased the crash chances, but anyway I have resorted to huge -j values in a recent computer (beyond the hardware threads count) to fill the memory.