FS#79444 - [linux-hardened] 6.4.9+ breaks AVX enumeration

Attached to Project: Arch Linux
Opened by CodingCellist (CodingCellist) - Tuesday, 22 August 2023, 17:03 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:19 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Levente Polyak (anthraxx)
Architecture All
Severity High
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 3
Private No

Details

Description:

When using linux-hardened 6.4.9-1 or 6.4.10-1 (6.4.10-1 is the latest at time of writing) along with lightdm and lightdm-webkit2-greeter, the system fails to start the greeter due to a coredump caused by a missing/removed part of libatomic: host-config.h (see the logs for details).

This makes logging in impossible, as the continuous crash-and-restart of the greeter (re-)focuses its tty.

Downgrading to linux-hardened 6.4.7-2 fixes the problem; downgrading any of the lightdm or webkit2 packages does not seem to affect things.


Additional info:

* link to upstream bug report:
https://github.com/anthraxx/linux-hardened/issues/85

* package version(s):
- lightdm: 1:1.32.0-4
- lightdm-webkit2-greeter: 2.2.5-7
- webkit2gtk: 2.40.5-1
- systemd: 254.1-1

* config and/or log files etc. attached (redacted for brevity, please let me know if I accidentally removed too much)


Steps to reproduce:

1. Set up an Arch machine running linux-hardened 6.4.7 (rel 1 or 2), along with lightdm and lightdm-webkit2-greeter. Booting and logging in should work at this stage.

2. Upgrade the kernel to linux-hardened 6.4.9-1 or 6.4.10-1 and reboot. The greeter should never appear, leaving only a flickering cursor.

3. Attempt to switch tty using Ctrl+Alt+F3 (for example). This should briefly work when repeatedly pressing the key, although the restarting greeter will force you back to its tty within a second.

At this point, the only recovery method I thought of was to go via the install ISO on a USB, mounting the system manually, chroot-ing, and then downgrading the kernel from there. If there is an easier one, I'd be grateful to know, although my system is working now.

I'm happy to provide more information if need be.
This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:19 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/linux-hardened/issues/ 1
Comment by loqs (loqs) - Tuesday, 22 August 2023, 17:41 GMT
Does adding the boot parameter spec_rstack_overflow=off have any effect? Can you reproduce the issue using the linux package?

I suspect host-config.h not being found by the debugger is not relevant as it is not used at run time and not included by upstream as it is not referenced in any public headers.
Comment by CodingCellist (CodingCellist) - Wednesday, 23 August 2023, 10:01 GMT
Thanks for the suggestions and insights!

Unfortunately, spec_rstack_overflow=off does not seem to change anything.
The issue does not repro on linux (non-hardened) 6.4.{9,10,11}. There was a slight oddity in that switching to 6.4.9 and 6.4.10 seemed to require 2 reboots for the greeter to work, but after the second reboot, it persistently worked. 6.4.11 worked out-of-the-box.
Comment by loqs (loqs) - Wednesday, 23 August 2023, 11:00 GMT
Is the issue still present with linux-hardened 6.4.11.hardened1-1? If so if you build linux (non-hardened) with the config from linux-hardened does that produce the issue and vice versa?
Comment by CodingCellist (CodingCellist) - Wednesday, 23 August 2023, 13:26 GMT
linux-hardened 6.4.11 still presents the issue, yes (log attached).

I've swapped the configs for the arch kernels and I'm currently rebuilding the linux-hardened with the non-hardened config. It's been going for ~40 minutes; hopefully it'll be done soon.

Rebuilding the non-hardened package with the hardened config is proving difficult: the pgp-verification keeps failing on public key '3B94A80E50A477C7'. This is, according to the keyservers, heftig's key, although neither searching+importing the public key via GPG, nor importing it from the public key file found via "archlinux/people/developers" -> "heftig" -> "PGP Key" resolves the problem. I was missing anthraxx's key as well, but --search-keys resolved that without any issue.
Comment by CodingCellist (CodingCellist) - Wednesday, 23 August 2023, 14:30 GMT
Installing the kernel resulting from linux-hardened's source, but with the regular linux package's config does not reproduce the problem. This is the kernel I've just rebooted with, and it got to the greeter without any problem (log attached).

I'm going to try to figure out the gpg public key issue to hopefully get the linux package to build with the linux-hardened config. Any pointers and/or ideas as to what might be wrong would be much appreciated : )
Comment by Levente Polyak (anthraxx) - Wednesday, 23 August 2023, 14:42 GMT
@CodingCellist: You can find the key in the arch linux keyring, a shorthand for importing would be:
```
gpg --import <(pacman-key --export 3B94A80E50A477C7)
```
Comment by CodingCellist (CodingCellist) - Wednesday, 23 August 2023, 14:56 GMT
@anthraxx cheers! Now building the linux package with the hardened config. Will report back when that's been tested : )
Comment by CodingCellist (CodingCellist) - Wednesday, 23 August 2023, 20:42 GMT
Managed to get it to compile: in order for the headers to build, I had to manually compile src/archlinux-linux/tools/bpf after launching makepkg. It seems this library is built by the linux-hardened package, but not by the default linux package.

The resulting linux kernel+headers, compiled from the linux package source using the linux-hardened config, reproduces the issue (log attached). As mentioned earlier, the reverse, compiling the linux-hardened sources with the linux (non-hardened) config does not reproduce it.
Comment by loqs (loqs) - Thursday, 24 August 2023, 10:45 GMT
Does adding the boot parameter gather_data_sampling=off have any effect?
Comment by CodingCellist (CodingCellist) - Friday, 25 August 2023, 13:20 GMT
@loqs, yes that fixed it! All of the affected packages (6.4.9, 6.4.10, and 6.4.11) launch the greeter without any issue if gather_data_sampling=off is set.
Comment by loqs (loqs) - Friday, 25 August 2023, 14:08 GMT
GDS_FORCE_MITIGATION=ON is known to break applications that perform incomplete AVX enumeration [1].
[2] Section 14.3 DETECTION OF INTEL® AVX INSTRUCTIONS Provides flow diagrams and pseudo code of how to enumerate AVX support and also the following note
Note: It is unwise for an application to rely exclusively on CPUID.1:ECX.AVX[bit 28] or at all on CPUID.1:ECX.XSAVE[bit 26]: These indicate hardware support but not operating system support. If YMM state management is not enabled by an operating systems, Intel AVX instructions will #UD regardless of CPUID.1:ECX.AVX[bit 28]. “CPUID.1:ECX.XSAVE[bit 26] = 1” does not guarantee the OS actually uses the XSAVE process for state management.
[3] Provides C source for detecting AVX.
Edit:
Does building webgit2gtk with the attached patch have any effect?

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=553a5c03e90a6087e88f8ff878335ef0621536fb
[2] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html
[3] https://www.intel.com/content/dam/develop/external/us/en/documents/intro-to-intel-avx-183287.pdf
Comment by CodingCellist (CodingCellist) - Monday, 28 August 2023, 08:18 GMT
The patch doesn't seem to fix things unfortunately. With a patched version of webkit2gtk installed and gather_data_sampling (edit: enabled), the greeter still crashes and restarts; seemingly with the same error in the logs.
Comment by loqs (loqs) - Monday, 28 August 2023, 11:54 GMT
What is the output if you compile and execute the attached c source?
   test.c (0.2 KiB)
Comment by CodingCellist (CodingCellist) - Monday, 28 August 2023, 14:33 GMT
Non-zero (system booted with a kernel with GDS turned off via command-line args). Although it changes between Clang and GCC:

[thomas@skidbladnir testc]$ ./gcc.out
__builtin_cpu_supports ("avx"):512
__builtin_cpu_supports ("avx2"):1024
[thomas@skidbladnir testc]$ ./clang.out
__builtin_cpu_supports ("avx"):1
__builtin_cpu_supports ("avx2"):1
Comment by loqs (loqs) - Monday, 28 August 2023, 17:05 GMT
Last attempt at runtime detection, this uses code from Intel and FFmpeg.
In any event the issue needs to be reported to upstream webkit to fix. I suspect wc -l will also be broken [1] as __builtin_cpu_supports ("avx2") is not returning 0.

[1] https://git.savannah.gnu.org/cgit/coreutils.git/commit/src/wc.c?id=91a74d361461494dd546467e83bc36c24185d6e7
   test.c (1.1 KiB)
Comment by CodingCellist (CodingCellist) - Tuesday, 29 August 2023, 15:06 GMT
wc -l does indeed break in the same way when GDS is not disabled. The testing I did with your (loqs) program, was with the mitigation disabled via the command-line, so it seems it was outputting correctly. I reran the tests, both code and assembler, without disabling GDS (i.e. the default boot parameters); as far as I can tell, it is correct:

----- normal -----

asm-clang.out
CPU supports AVX:0

asm-gcc.out
CPU supports AVX:0

code-clang.out
__builtin_cpu_supports ("avx"):0
__builtin_cpu_supports ("avx2"):0

code-gcc.out
__builtin_cpu_supports ("avx"):0
__builtin_cpu_supports ("avx2"):0

----- gds=off -----

asm-clang.out
CPU supports AVX:1

asm-gcc.out
CPU supports AVX:1

code-clang.out
__builtin_cpu_supports ("avx"):1
__builtin_cpu_supports ("avx2"):1

code-gcc.out
__builtin_cpu_supports ("avx"):512
__builtin_cpu_supports ("avx2"):1024

Comment by CodingCellist (CodingCellist) - Tuesday, 29 August 2023, 15:09 GMT
I also tried adding some logging to the webkit2gtk code. It again seems to confirm that the __builtin_cpu_supports code is working as expected, but unfortunately doesn't seem to fix the issue...
Comment by CodingCellist (CodingCellist) - Tuesday, 26 September 2023, 07:45 GMT
Bugreport submitted to webkit upstream: https://bugs.webkit.org/show_bug.cgi?id=262100
Comment by Caham Everan (olcammy) - Tuesday, 26 September 2023, 19:11 GMT
Having this same issue with 6.5.4-hardened1-1-hardened, causing a number of applications to crash. Thought at first it was this issue: https://bugs.archlinux.org/task/79603. Vanilla kernel avoids the Illegal Instruction.
Comment by loqs (loqs) - Tuesday, 26 September 2023, 19:34 GMT
@olcammy for gimp see [1] waiting for someone to report the issue upstream to babl. For firefox see [2].

[1] https://bbs.archlinux.org/viewtopic.php?id=288816
[2] https://bbs.archlinux.org/viewtopic.php?id=289037
Comment by loqs (loqs) - Thursday, 28 September 2023, 10:57 GMT
@olcammy if you rebuild babl with the attached diff that disables the use of AVX2 at build time are you able to use gimp?
Comment by Toolybird (Toolybird) - Tuesday, 03 October 2023, 05:48 GMT
Merged here  FS#79828 
Comment by Pascal Ernster (hardfalcon) - Friday, 03 November 2023, 07:07 GMT Comment by loqs (loqs) - Friday, 03 November 2023, 20:51 GMT
> Potentially related: https://github.com/google/security-research/tree/master/pocs/cpus/xgetbv
From the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 Chapter 14.3 DETECTION OF INTEL® AVX INSTRUCTIONS:
1) Detect CPUID.1:ECX.OSXSAVE[bit 27] = 1 (XGETBV enabled for application use[1]).
2) Issue XGETBV and verify that XCR0[2:1] = ‘11b’ (XMM state and YMM state are enabled by OS).
3) detect CPUID.1:ECX.AVX[bit 28] = 1 (AVX instructions supported).
(Step 3 can be done in any order relative to 1 and 2.)

[1]: If CPUID.01H:ECX.OSXSAVE reports 1, it also indirectly implies the processor supports XSAVE, XRSTOR, XGETBV, processor extended state bit vector XCR0. Thus an application may streamline the checking of CPUID feature flags for XSAVE and OSXSAVE. XSETBV is a privileged instruction.

Do you know of an alternative to using XGETBV? The report does not indicate if the potential issue was ever reported to Intel. All the affected packages include code that uses AVX or AVX2 instructions without calling XGETBV to check for OS support.
Comment by Pascal Ernster (hardfalcon) - Friday, 03 November 2023, 21:05 GMT
loqs: Tbh, I don't have the faintest idea. Basically, I just came across this issue here on the Archlinux bug tracker by accident whilst trying to figure out why electron 26 (custom package, not in the official Arch repos) crashes when started on one of my machines. Though I've hopefully managed to find a solution for that (the compile run should be finished within the next couple of hours):

https://github.com/electron/electron/issues/40441

Loading...