FS#67715 - [glibc] libm-2.32.so SIGILL in pow() due to FMA4 instruction on non-FMA4 system
Attached to Project:
Arch Linux
Opened by Ondřej Hošek (RavuAlHemio) - Tuesday, 25 August 2020, 10:54 GMT
Last edited by Bartłomiej Piotrowski (Barthalion) - Friday, 04 September 2020, 06:08 GMT
Opened by Ondřej Hošek (RavuAlHemio) - Tuesday, 25 August 2020, 10:54 GMT
Last edited by Bartłomiej Piotrowski (Barthalion) - Friday, 04 September 2020, 06:08 GMT
|
Details
/usr/lib/libm-2.32.so (glibc 2.32-2), built with gcc 10.2.0
(gcc 10.2.0-1), exits with SIGILL when calling the pow()
function because it executes the FMA4 instruction "vfmaddsd
%xmm4,0x8(%rdx),%xmm6,%xmm0" on a system that does not
support FMA4.
When glibc is built from ABS with debug symbols, the debugger points to sysdeps/ieee754/dbl-64/e_pow.c:77 as the culprit: r = __builtin_fma (z, invc, -1.0); I assume that something is not entirely okay with glibc's multi-arch support (i.e. detecting supported instruction set extensions of the running system and swizzling in the optimal codepath on first call), but I don't know whether this is a glibc, gcc or binutils issue. It appears that a non-FMA4 implementation of pow() is chosen, but this implementation was compiled with FMA4 support for some reason, which means __builtin_fma is compiled to a FMA4 instruction, which leads to the illegal instruction signal on execution. This issue crashes most nontrivial Python scripts, so I have increased the severity to High. I will now attempt to build glibc with --disable-multi-arch and report back. |
This task depends upon
Closed by Bartłomiej Piotrowski (Barthalion)
Friday, 04 September 2020, 06:08 GMT
Reason for closing: Fixed
Additional comments about closing: glibc 2.32-4
Friday, 04 September 2020, 06:08 GMT
Reason for closing: Fixed
Additional comments about closing: glibc 2.32-4
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) Gold 6150 CPU @ 2.70GHz
stepping : 0
microcode : 0x2000069
cpu MHz : 2693.672
cache size : 25344 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase smep arat md_clear flush_l1d arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5389.81
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:
(+ processor 1 with the same specs)
(Fortunately, this is not a Heisenbug: it happens both with release and debug builds, and independent of whether a debugger is attached or not.)
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/fpu/multiarch/ifunc-fma4.h;hb=107e6a3c2212ba7a3a4ec7cae8d82d73f7c95d0b#l35
This should probably check for FMA4, not just for FMA.
- if (CPU_FEATURES_ARCH_P (cpu_features, FMA_Usable)
- && CPU_FEATURES_ARCH_P (cpu_features, AVX2_Usable))
+ if (CPU_FEATURE_USABLE_P (cpu_features, FMA)
+ && CPU_FEATURE_USABLE_P (cpu_features, AVX2))
return OPTIMIZE (fma);
- if (CPU_FEATURES_ARCH_P (cpu_features, FMA4_Usable))
+ if (CPU_FEATURE_USABLE_P (cpu_features, FMA))
return OPTIMIZE (fma4);
Seems like the second diff hunk has a clear and obvious typo. :)
regards
Kai
Not only it prevented X from booting at all on my VM, it *also* made gcc crash when trying to compile a fixed version.
I had to build from a live cd to get it working.
Possible to backport the upstream fix and deploy a new build revision somehow?
regards
Kai