FS#73012 - [python-pytorch] Please support cpus older than haswell.
Attached to Project:
Community Packages
Opened by HaoCheng (tkit) - Tuesday, 14 December 2021, 08:43 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 27 June 2022, 03:46 GMT
Opened by HaoCheng (tkit) - Tuesday, 14 December 2021, 08:43 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 27 June 2022, 03:46 GMT
|
Details
Description:
Current pytorch's PKGBUILD does not support cpus older than haswell with `-march=haswell` flag. It would be great if the PKGBUILD can be splited into 4 pacakges instead of 2, just like the tensorflow's PKGBUILD. python-pytorch:without cuda and without non-x86-64 optimizations python-pytorch-opt: without cuda and with non-x86-64 optimizations python-pytorch-cuda:with cuda and without non-x86-64 optimizations python-pytorch-opt-cuda:with cuda and with non-x86-64 optimizations Additional info: * package version(s) ➜ ~ pacman -Q python-pytorch python-pytorch 1.10.0-5 * config and/or log files etc. My old laptop's cpuinfo: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 58 model name : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz stepping : 9 microcode : 0x12 cpu MHz : 2374.419 cache size : 3072 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds bogomips : 4990.41 clflush size : 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: * link to upstream bug report, if any Steps to reproduce: ➜ ~ python -c "import torch" [1] 5646 illegal hardware instruction (core dumped) python -c "import torch" |
This task depends upon
You may try to take a stab at it if you like. If you can produce a working patch, I'll apply it. However, we spent way too much time on this already and won't attempt it further.
If you install pytorch from the offical website using pip, it can work perfectly.
It tried to build a pure cpu version by modifing the PKGBUILD, simpily commenting the -march flag in pkgbuild. It works well too.
I will try to build the gpu version on another machine with nvidia gpu later.
The upstream does support it! It's the -march flag that causing the problem
Here is my modification for pure cpu version:
https://gist.github.com/tkit1994/f4f8aaf7df68791594befc5f745730b6
Try this: Build on a modern CPU and transfer the package to your old system and see whether it works. We tried for some hours to do this but weren't successful.
```
python
Python 3.10.2 (main, Jan 15 2022, 19:56:27) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Illegal instruction (core dumped)
```
Please find a way to build without `-march=haswell` so the package can work out of the box in older CPUs, like the one downloaded from the official pytorch website.
Maybe change to `-march=x86-64`?
(Interestingly, the current community PKGBUILD does not work on my environment. I had to modify some CMake files (Line 221-226) to avoid some buggy checks.)
By the way, do you try to use my PKGBUILD without line 221-226, since it is added for my environment?
---
Regarding this issue,
> I'd really like to enable this but our attempts to far were fruitless. I'm very happy to apply a patch to the PKGBUILD if anyone manages to produce one.
Just tried, with python-pytorch 1.11.0-10 and the following change:
diff --git a/trunk/PKGBUILD b/trunk/PKGBUILD
index c2de7479..f0be646c 100644
--- a/trunk/PKGBUILD
+++ b/trunk/PKGBUILD
@@ -233,7 +233,7 @@ build() {
export USE_CUDA=0
export USE_CUDNN=0
cd "${srcdir}/${_pkgname}-${pkgver}"
- echo "add_definitions(-march=haswell)" >> cmake/MiscCheck.cmake
+ echo "add_definitions(-march=x86-64)" >> cmake/MiscCheck.cmake
# this horrible hack is necessary because the current release
# ships inconsistent CMake which tries to build objects before
# thier dependencies, build twice when dependencies are available
@@ -244,7 +244,7 @@ build() {
export USE_CUDA=1
export USE_CUDNN=1
cd "${srcdir}/${_pkgname}-${pkgver}-cuda"
- echo "add_definitions(-march=haswell)" >> cmake/MiscCheck.cmake
+ echo "add_definitions(-march=x86-64)" >> cmake/MiscCheck.cmake
# same horrible hack as above
python setup.py build || python setup.py build
}
Packages built on either build.archlinux.org (AMD EPYC 7502P) or build.archlinuxcn.org (Intel Xeon Silver 4114) can be run on Intel Core i5-3470. The problem is probably not directly related to AVX512 detection now.