Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/index.php/Reporting_Bug_Guidelines

Do NOT report bugs when a package is just outdated, or it is in Unsupported. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#67070 - Bug in glibc (2.31-5) causing several applications to crash

Attached to Project: Arch Linux
Opened by Sean Lingham (Cxpher) - Monday, 22 June 2020, 00:35 GMT
Last edited by Eli Schwartz (eschwartz) - Monday, 22 June 2020, 21:47 GMT
Task Type Bug Report
Category Packages: Core
Status Assigned   Reopened
Assigned To Bartłomiej Piotrowski (Barthalion)
Architecture x86_64
Severity Critical
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 0%
Votes 2
Private No

Details

Description:

Bug in glibc causing several applications to crash even on a fresh install.

Whenever attempting to compile or install software with pacman or yay, the following error message is seen

"sh: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed."

Whenever using Google Chrome from AUR or Steam, the following coredump happens.. (see attached dump log). This causes individual tabs to crash randomly & frequently. Several SSL errors can also be seen --> [38200:38218:0622/082004.851124:ERROR:ssl_client_socket_impl.cc(959)] handshake failed; returned -1, SSL error code 1, net_error -107.

Eventually, systemd coredump itself crashes --> automata.localdomain systemd-coredump[34265]: Failed to connect to coredump service: Connection refused

When using encrypted partitions, the filesystem gets corrupted (since filesystem itself in inter-dependent on glibc). This ends up in an unrecoverable state.

Additional info:
* package version(s)
glibc 2.31-5
* config and/or log files etc.
See attached glibcdump.txt
* link to upstream bug report, if any
Not sure if it's upstream bug or just in Arch. Need to confirm

Steps to reproduce:
1. Install latest Arch packages
2. Launch Google Chrome, attempt to use pacman, use Steam or even try to install yay (with Go package)

This bug might go unnoticed by several especially if they don't use Chrome/Steam/yay/flatpak etc.
This task depends upon

Comment by Allan McRae (Allan) - Monday, 22 June 2020, 01:52 GMT
Nothing in the attached stacktrace suggests this is a glibc issue.

Do you have appropriate microcode updates installed?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 02:28 GMT
Yes. I have amd-ucode installed.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 04:46 GMT
Attached another stack trace.

This isn't a graphical interface issue. I have the same problem with KDE.
Comment by Allan McRae (Allan) - Monday, 22 June 2020, 04:54 GMT
Can you consistently replicate the grep segfault? What is the command you used?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 05:38 GMT
All I do is just

journalctl -xf

Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.

On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.

I will see a segfault in the logs.

I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.

System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.

I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.

When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.

When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).

It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.

I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
Comment by Allan McRae (Allan) - Monday, 22 June 2020, 06:09 GMT
All stack traces end in glibc. Very few are caused by it.

If you can not consistently trigger a segfault, I'd suggest checking your RAM.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 09:19 GMT
All I do is just

journalctl -xf

Install Google Chrome and Steam (normal multilib) and then hammer away like a normal user.

On Chrome, I open a few tabs including watching YouTube videos and on Steam, I just open friend's list or click on any other page.

I will see a segfault in the logs.

I've installed amd-ucode and had my governer set to performance for the CPU but i see this even without those. In fact, i even see the gconv errors in an arch-chroot environment when i use pacman.

System is an AMD Ryzen 3950x with 128 GB RAM and NVMe SSD (FireCuda) running the OS/apps.

I didn't see this last month so some thing introduced by way of updates over the past month would be the culprit.

When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs.

When you try to mkinitcpio or use yay to install apps, you will see the gconv errors (not always though as it's possible to repeat them enough that they don't show).

It's the gconv errors and the lib6 stack traces that led me to guess it's an issue with glibc.

I've also attached the stack trace with libcef.so (see attached) that Steam sees but Steam relies on the OS (for webkit). The whole Steam app is basically wrapped around a browser.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 17:18 GMT
Have tested the RAM will a full pass of memtest86+ and no issues with it.

Could be systemd related yes?
Comment by Doug Newgard (Scimmia) - Monday, 22 June 2020, 17:39 GMT
How did you come to that conclusion?
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 17:55 GMT
It's just a guess. The issue seems complex.

The reason i guessed this is because it does not seem to happen in a non chroot env (based on arch iso) that runs systemd 243.162-2 but it happens on my arch-chroot env that runs the latest packages (systemd 245.6-7).

If any of you have an earlier systemd package i could use, i'd be happy to test it.
Comment by Doug Newgard (Scimmia) - Monday, 22 June 2020, 17:59 GMT
Don't make wild guesses that make no sense at all.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 18:30 GMT
Really?

Please enlighten me with your conclusion then. Point me to the exact problem with this system.

If you cannot contribute anything, then don't. Do not however, assume that others are not willing to help.

It's because of toxic behavior like this that people are put off.
Comment by Sean Lingham (Cxpher) - Monday, 22 June 2020, 18:54 GMT
100% reproducible with makepkg on this system. See attached log.

Whenever attempting to build anything with makepkg, i get spam of this on stdout and it fails -->

/usr/share/makepkg/util/pkgbuild.sh: line 31: 53205 Done { declare -f "$1" || declare -f package; } 2> /dev/null
53206 Aborted (core dumped) | grep -E "$2"

Comment by Eli Schwartz (eschwartz) - Monday, 22 June 2020, 18:58 GMT
arch-chroot doesn't actually invoke systemd though, systemd is an init system and the /usr/bin/chroot program doesn't invoke it in any way.

Please don't insult the bug wranglers for contributing by telling you to look elsewhere (EDIT: as in, not systemd -- somewhere else on your system!) for your problem rather than chasing wild red herrings based on completely unknowledgeable guesses that distract people from getting to the root of the problem.
Comment by Eli Schwartz (eschwartz) - Monday, 22 June 2020, 21:49 GMT
I'm concerned that you might have read my statement "telling you to look elsewhere for your problem" as referring to "go away". I would like to clarify that I meant looking elsewhere on your system, rather than looking at systemd.

If you can figure out what else on your system is causing this problem and it's something we can do about it, then we do want to hear anything you discovered... it's just that I don't believe you will find it by looking at systemd...
Comment by AK (Andreaskem) - Monday, 22 June 2020, 21:55 GMT
The comment, "When you setup a new system with encryption, upon 2 or 3 reboots, you will see FS corruption with ext4 or xfs" indicates either a hardware or a kernel issue, I would think? Is this with a vanilla kernel package? Is your kernel tainted?
Comment by Allan McRae (Allan) - Tuesday, 23 June 2020, 06:51 GMT
All the grep issues show this:

grep: gconv_conf.c:506: __gconv_get_path: Assertion `cwd != NULL' failed.

How was your Arch chroot created? Have your stripped any files from it to slim it down?
Comment by Sean Lingham (Cxpher) - Tuesday, 23 June 2020, 09:48 GMT
Hello everyone.

Thank you for your suggestions.

Although i've tested the RAM and the HDD shows fine, i'm going to take it apart piece by piece and put it all back together and test again.

Logic does state that it's a hardware problem before software (although i wish of course that it is the latter). It's quite rare for a stock CPU to post and install and OS (and fail).

Once i've done all of that and if the issue persists, i'll try Arch Rollback Machine first to see if i can isolate the problem.

Will update this ticket once i have the answers.

Loading...