FS#75747 - [arrow] Crashes with SIGILL (illegal instruction) on CPUs without AVX2 support
Attached to Project:
Community Packages
Opened by Thomas (thomastc) - Tuesday, 30 August 2022, 13:36 GMT
Last edited by Bruno Pagani (ArchangeGabriel) - Monday, 02 January 2023, 22:55 GMT
Opened by Thomas (thomastc) - Tuesday, 30 August 2022, 13:36 GMT
Last edited by Bruno Pagani (ArchangeGabriel) - Monday, 02 January 2023, 22:55 GMT
|
Details
Version: arrow 8.0.1-1
Description: On x86_64, Arrow is compiled with AVX2 forcibly enabled. This causes programs using Arrow to crash with SIGILL (illegal instruction) on CPUs that don't support this instruction set. AVX2 is explicitly enabled in the PKGBUILD file using `-DARROW_SIMD_LEVEL=AVX2`: https://github.com/archlinux/svntogit-community/blob/48857f8373ddf21f3cffb4f039df07804fc827d3/trunk/PKGBUILD#L54 Note that this is an unconditional switch, as described in the Arrow manual: https://arrow.apache.org/docs/cpp/env_vars.html?highlight=arrow_user_simd_level#envvar-ARROW_USER_SIMD_LEVEL > In addition to runtime dispatch, the compile-time SIMD level can be set using the ARROW_SIMD_LEVEL CMake configuration variable. Unlike runtime dispatch, compile-time SIMD optimizations cannot be changed at runtime (for example, if you compile Arrow C++ with AVX512 enabled, the resulting binary will only run on AVX512-enabled CPUs). As far as I know, Arch Linux has not dropped support for any x86_64 compatible CPU, so this setting should probably be left at the default. Steps to reproduce: * Be on a CPU without AVX2 support. Mine is an Intel Core i7 920, and gives no output for `cat /proc/cpuinfo | grep avx`. See https://stackoverflow.com/questions/55762372/disabling-avx2-in-cpu-for-testing-purposes for emulation options if needed. * Run `plasma-store-server`, an executable included in the `arrow` package. * See `Illegal instruction (core dumped)`. gdb points to `libarrow.so.800` for the location of the instruction: $ gdb plasma-store-server Reading symbols from plasma-store-server... (No debugging symbols found in plasma-store-server) (gdb) run Starting program: /usr/bin/plasma-store-server ... Program received signal SIGILL, Illegal instruction. 0x00007ffff6388309 in ?? () from /usr/lib/libarrow.so.800 (gdb) bt #0 0x00007ffff6388309 in ?? () from /usr/lib/libarrow.so.800 #1 0x00007ffff7fcecee in call_init (env=0x7fffffffe228, argv=0x7fffffffe218, argc=1, l=<optimized out>) at dl-init.c:70 #2 call_init (l=<optimized out>, argc=1, argv=0x7fffffffe218, env=0x7fffffffe228) at dl-init.c:26 #3 0x00007ffff7fceddc in _dl_init (main_map=0x7ffff7ffe2c0, argc=1, argv=0x7fffffffe218, env=0x7fffffffe228) at dl-init.c:117 #4 0x00007ffff7fe56a0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #5 0x0000000000000001 in ?? () #6 0x00007fffffffe5e3 in ?? () #7 0x0000000000000000 in ?? () Related report from Manjaro: https://forum.manjaro.org/t/issues-with-gdal-and-arrow-after-update/117670 |
This task depends upon
Closed by Bruno Pagani (ArchangeGabriel)
Monday, 02 January 2023, 22:55 GMT
Reason for closing: Fixed
Additional comments about closing: Done in arrow 10.0.1-3. Sorry for not being able to handle it earlier.
Monday, 02 January 2023, 22:55 GMT
Reason for closing: Fixed
Additional comments about closing: Done in arrow 10.0.1-3. Sorry for not being able to handle it earlier.
The version update is due to a failing test which is fixed in 9.0.0. Unfortunately 9.0.0 does not support xsimd 9.0.0 [1]. So that is fixed up in prepare.
ARROW_SIMD_LEVEL=NONE as the default is SSE4.2 which I believe is not in baseline X86_64 support.
ARROW_RUNTIME_SIMD_LEVEL=MAX which is the current default is for documentation, currently allows runtime support up to AVX512.
[1] https://github.com/apache/arrow/pull/14005
I think this is still relevant in 9.0.0. Installed arrow-9.0.0-1 from community, and still get "Illegal instruction (core dumped)" when running
```
python -c 'from osgeo import gdal'
```
or
```
Rscript -e 'library(sf)'
```
which both depend on arrow. When simply removing the line `-DARROW_SIMD_LEVEL=AVX2`from the PKGBUILD, everything works fine. So how can we fix this regression?
I tested before/after that this fixes the issue. Conveniently, the makepkg tests even catch the crash at build time.
Is there anything else I can do to help getting this merged ASAP?
gcc -c -Q -march=x86-64 --help=target
[1] https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
EDIT: tried this on a second machine and wasn't reproduced. All tests passed