FS#67070 - [glibc] 2.31-5 causing several applications to crash

Attached to Project: Arch Linux
Opened by Sean Lingham (Cxpher) - Monday, 22 June 2020, 00:35 GMT
Last edited by freswa (frederik) - Monday, 05 October 2020, 12:52 GMT
Task Type Bug Report
Category Packages: Core
Status Closed
Assigned To freswa (frederik)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 5
Private No

Details

Description:

Bug in glibc causing several applications to crash even on a fresh install.

Whenever attempting to compile or install software with pacman or yay, the following error message is seen

"sh: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed."

Whenever using Google Chrome from AUR or Steam, the following coredump happens.. (see attached dump log). This causes individual tabs to crash randomly & frequently. Several SSL errors can also be seen --> [38200:38218:0622/082004.851124:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -107.

Eventually, systemd coredump itself crashes --> automata.localdomain systemd-coredump[34265]: Failed to connect to coredump service: Connection refused

When using encrypted partitions, the filesystem gets corrupted (since filesystem itself in inter-dependent on glibc). This ends up in an unrecoverable state.

Additional info:
* package version(s)
glibc 2.31-5
* config and/or log files etc.
See attached glibcdump.txt
* link to upstream bug report, if any
Not sure if it's upstream bug or just in Arch. Need to confirm

Steps to reproduce:
1. Install latest Arch packages
2. Launch Google Chrome, attempt to use pacman, use Steam or even try to install yay (with Go package)

This bug might go unnoticed by several especially if they don't use Chrome/Steam/yay/flatpak etc.
This task depends upon

Closed by  freswa (frederik)
Monday, 05 October 2020, 12:52 GMT
Reason for closing:  No response
Comment by Allan McRae (Allan) - Monday, 22 June 2020, 01:52 GMT
Nothing in the attached stacktrace suggests this is a glibc issue.

Do you have appropriate microcode updates installed?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 02:28 GMT
Yes. I have amd-ucode installed.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 04:46 GMT
Attached another stack trace.

This isn't a graphical interface issue. I have the same problem with KDE.
Comment by Allan McRae (Allan) - Monday, 22 June 2020, 04:54 GMT
Can you consistently replicate the grep segfault? What is the command you used?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 05:38 GMT
All I do is just

journalctl -xf

Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.

On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.

I will see a segfault in the logs.

I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.

System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.

I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.

When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.

When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).

It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.

I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
Comment by Allan McRae (Allan) - Monday, 22 June 2020, 06:09 GMT
All stack traces end in glibc. Very few are caused by it.

If you can not consistently trigger a segfault, I'd suggest checking your RAM.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 09:19 GMT
All I do is just

journalctl -xf

Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.

On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.

I will see a segfault in the logs.

I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.

System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.

I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.

When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.

When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).

It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.

I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 17:18 GMT
Have tested the RAM will a full pass of memtest86+ and no issues with it.

Could be systemd related yes?
Comment by Doug Newgard (Scimmia) - Monday, 22 June 2020, 17:39 GMT
How did you come to that conclusion?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 17:55 GMT
It's just a guess. The issue seems complex.

The reason i guessed this is because it does not seem to happen in a non chroot env (based on arch iso) that runs systemd 243.162-2 but it happens on my arch-chroot env that runs the latest packages (systemd 245.6-7).

If any of you have an earlier systemd package i could use, i'd be happy to test it.
Comment by Doug Newgard (Scimmia) - Monday, 22 June 2020, 17:59 GMT
Don't make wild guesses that make no sense at all.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 18:30 GMT
Really?

Please enlighten me with your conclusion then. Point me to the exact problem with this system.

If you cannot contribute anything, then don't. Do not however, assume that others are not willing to help.

It's because of toxic behavior like this that people are put off.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 18:54 GMT
100% reproducible with makepkg on this system. See attached log.

Whenever attempting to build anything with makepkg, i get spam of this on stdout and it fails -->

/usr/share/makepkg/util/pkgbuild.sh: line 31: 53205 Done { declare -f "$1" || declare -f package; } 2> /dev/null
53206 Aborted (core dumped) | grep -E "$2"

Comment by Eli Schwartz (eschwartz) - Monday, 22 June 2020, 18:58 GMT
arch-chroot doesn't actually invoke systemd though, systemd is an init system and the /usr/bin/chroot program doesn't invoke it in any way.

Please don't insult the bug wranglers for contributing by telling you to look elsewhere (EDIT: as in, not systemd -- somewhere else on your system!) for your problem rather than chasing wild red herrings based on completely unknowledgeable guesses that distract people from getting to the root of the problem.
Comment by Eli Schwartz (eschwartz) - Monday, 22 June 2020, 21:49 GMT
I'm concerned that you might have read my statement "telling you to look elsewhere for your problem" as referring to "go away". I would like to clarify that I meant looking elsewhere on your system, rather than looking at systemd.

If you can figure out what else on your system is causing this problem and it's something we can do about it, then we do want to hear anything you discovered... it's just that I don't believe you will find it by looking at systemd...
Comment by AK (Andreaskem) - Monday, 22 June 2020, 21:55 GMT
The comment, "When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs" indicates either a hardware or a kernel issue, I would think? Is this with a vanilla kernel package? Is your kernel tainted?
Comment by Allan McRae (Allan) - Tuesday, 23 June 2020, 06:51 GMT
All the grep issues show this:

grep: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed.

How was your Arch chroot created? Have your stripped any files from it to slim it down?
Comment by Sean Lingham (Cxpher) - Tuesday, 23 June 2020, 09:48 GMT
Hello everyone.

Thank you for your suggestions.

Although i've tested the RAM and the HDD shows fine, i'm going to take it apart piece by piece and put it all back together and test again.

Logic does state that it's a hardware problem before software (although i wish of course that it is the latter). It's quite rare for a stock CPU to post and install and OS (and fail).

Once i've done all of that and if the issue persists, i'll try Arch Rollback Machine first to see if i can isolate the problem.

Will update this ticket once i have the answers.
Comment by Martin Sandsmark (sandsmark) - Friday, 07 August 2020, 13:42 GMT
You could try a more "complex" live boot as well. Maybe try Manjaro's live CD?

And I don't know if it's possible, but if you got phoronix' test suite installed on it (while live-booted) it should probably be able to stress things enough to shake out hardware issues.
Comment by itsme (itsme) - Friday, 07 August 2020, 17:31 GMT
I have similar problem with tray apps when i3 starts.

i3 config:
exec --no-startup-id redshift-gtk
exec --no-startup-id blueman-applet
exec --no-startup-id udiskie -t
Comment by Nils Siemons (re-l124c41) - Saturday, 08 August 2020, 17:23 GMT
@itsme I'm not sure if those are connected at all. I also have some curious crashes if I have the system tray enabled for i3 with i3bar, but only Discord crashes so far and only under very particular circumstances.
I haven't been able to reproduce the crashes the original reporter has mentioned with pacman or when compiling software in general. I also tried crashing Chromium and Steam, the logic being that they and Discord are all essentially a Chromium based browser under the hood,
but so far I can't make them crash. I haven't yet tested actual Google Chrome, as opposed to Chromium. Only Discord crashes and only when I'm in a voice call while running a particular game via wine + dxvk, and the i3 system tray needs to be enabled for it too.
Unless someone manages to reproduce the original issues I'd consider these separate.

The mention of filesystem corruption has me worried though, which is why I'm following this. I'll test with Chrome later, but I'm curious if the OP could provide the way in which they managed to crash grep, as seen in the second dump. That might be the easiest way for people to try and reproduce the issue.
Comment by loqs (loqs) - Saturday, 08 August 2020, 18:41 GMT
@re-l124c41 do you back traces resmble itsme's? The thread that crashed was started by python, the only glibc call in it is __libc_start_main, a fault in libgtk-3, libgdk-3, glib2 or X11 results in _XError finally calling g_log_writer_default?
Comment by Nils Siemons (re-l124c41) - Saturday, 08 August 2020, 23:21 GMT
@loqs I have attached a log for one of the crashes of Discord.
Comment by itsme (itsme) - Wednesday, 12 August 2020, 06:22 GMT
@re-l124c41 yes, looks like it's not related: https://github.com/i3/i3/issues/4159
Comment by Ciriaco Garcia de Celis (cgarcia) - Sunday, 23 August 2020, 11:17 GMT
Related with hardware problems...

Just to point that memtest86 is not enough for testing memories. I had a laptop in 2011 which always passed memtest86 (several passes, lots of hours) with no errors. But firefox, java and mplayer randomly crashed from time to time (although not very frequently). One day gcc also crashed, which pointed me to a memory problem. Despite I'm a programmer, most applications are not big enough to cause gcc to crash. So I tried compiling the linux kernel: it always crashed at some point! Perhaps at 50% of progress, or at 33% or at 90% ... but it crashed and allowed me to easily find the culprit memory module. I repeated again Memtest86 and it did not find any memory problem.

I don't trust memtest86... It is just another test, but not the best memory stress test. Now I always compile kernels for testing memory in a new machine... An high -j parameter did not increased the crash chances, but anyway I have resorted to huge -j values in a recent computer (beyond the hardware threads count) to fill the memory.
Comment by freswa (frederik) - Sunday, 13 September 2020, 14:48 GMT
Anyone here still having this issue with 2.32-4?

Loading...