FS#73012 - [python-pytorch] Please support cpus older than haswell.

Attached to Project: Community Packages
Opened by HaoCheng (tkit) - Tuesday, 14 December 2021, 08:43 GMT
Last edited by Sven-Hendrik Haase (Svenstaro) - Monday, 27 June 2022, 03:46 GMT
Task Type Support Request
Category Packages
Status Closed
Assigned To Sven-Hendrik Haase (Svenstaro)
Konstantin Gizdov (kgizdov)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Description:
Current pytorch's PKGBUILD does not support cpus older than haswell with `-march=haswell` flag.
It would be great if the PKGBUILD can be splited into 4 pacakges instead of 2, just like the tensorflow's PKGBUILD.
python-pytorch:without cuda and without non-x86-64 optimizations
python-pytorch-opt: without cuda and with non-x86-64 optimizations
python-pytorch-cuda:with cuda and without non-x86-64 optimizations
python-pytorch-opt-cuda:with cuda and with non-x86-64 optimizations

Additional info:
* package version(s)
➜ ~ pacman -Q python-pytorch
python-pytorch 1.10.0-5

* config and/or log files etc.
My old laptop's cpuinfo:
processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
stepping : 9
microcode : 0x12
cpu MHz : 2374.419
cache size : 3072 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 2
apicid : 3
initial apicid : 3
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts
vmx flags : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds
bogomips : 4990.41
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:


* link to upstream bug report, if any

Steps to reproduce:

➜ ~ python -c "import torch"
[1] 5646 illegal hardware instruction (core dumped) python -c "import torch"
This task depends upon

Closed by  Sven-Hendrik Haase (Svenstaro)
Monday, 27 June 2022, 03:46 GMT
Reason for closing:  Fixed
Comment by Sven-Hendrik Haase (Svenstaro) - Friday, 17 December 2021, 18:39 GMT
We tried that and we originally had it that way but upstream doesn't support it as it turns out. We changed it here: https://github.com/archlinux/svntogit-community/commit/d53792e82e79a167fdd274877e494bcbde907189#diff-3e341d2d9c67be01819b25b25d5e53ea3cdf3a38d28846cda85a195eb9b7203a

You may try to take a stab at it if you like. If you can produce a working patch, I'll apply it. However, we spent way too much time on this already and won't attempt it further.
Comment by HaoCheng (tkit) - Saturday, 05 February 2022, 00:31 GMT
The upstream does support cpus older than haswell!!!

If you install pytorch from the offical website using pip, it can work perfectly.

It tried to build a pure cpu version by modifing the PKGBUILD, simpily commenting the -march flag in pkgbuild. It works well too.

I will try to build the gpu version on another machine with nvidia gpu later.

The upstream does support it! It's the -march flag that causing the problem

Here is my modification for pure cpu version:
https://gist.github.com/tkit1994/f4f8aaf7df68791594befc5f745730b6
Comment by Sven-Hendrik Haase (Svenstaro) - Saturday, 05 February 2022, 18:37 GMT
This appears to only work if you also compile on an old CPU. If you compile on a modern CPU, it appears to be currently unsupported to force building for an older architecture or have you figured out a way?

Try this: Build on a modern CPU and transfer the package to your old system and see whether it works. We tried for some hours to do this but weren't successful.
Comment by bart (edubart) - Wednesday, 09 March 2022, 02:12 GMT
  • Field changed: Percent Complete (100% → 0%)
I've just installed pytorch in my system, my CPU has no support for AVX2. Importing torch from python gives me the following error:

```
python
Python 3.10.2 (main, Jan 15 2022, 19:56:27) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Illegal instruction (core dumped)
```

Please find a way to build without `-march=haswell` so the package can work out of the box in older CPUs, like the one downloaded from the official pytorch website.

Maybe change to `-march=x86-64`?
Comment by Sven-Hendrik Haase (Svenstaro) - Wednesday, 09 March 2022, 02:13 GMT
I'd really like to enable this but our attempts to far were fruitless. I'm very happy to apply a patch to the PKGBUILD if anyone manages to produce one.
Comment by Adrien Wu (adrien1018) - Thursday, 10 March 2022, 14:19 GMT
I've managed to produce a working PKGBUILD to compile without AVX (using -mno-avx). It is on AUR currently: https://aur.archlinux.org/packages/python-pytorch-noavx
(Interestingly, the current community PKGBUILD does not work on my environment. I had to modify some CMake files (Line 221-226) to avoid some buggy checks.)
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 13 March 2022, 23:58 GMT
Are you entirely sure that the package builds without AVX no matter the host CPU? Can you try building it on a very modern CPU and then run it on the old system and check whether that works?
Comment by Adrien Wu (adrien1018) - Monday, 14 March 2022, 05:34 GMT
I've tested it by building it on a Xeon Gold 5118 (with AVX512) and installing it on a Xeon E5630 (without AVX), and it seems to work properly.
Comment by Sven-Hendrik Haase (Svenstaro) - Thursday, 05 May 2022, 01:09 GMT
I tried to make this work with a clean package but I fail to make it behave and compile properly. I'm starting to think that this would be better currently as an upstream pull request which properly enables this mode of compilation. Would that work for you?
Comment by Adrien Wu (adrien1018) - Thursday, 05 May 2022, 13:25 GMT
Surely it's a great idea, but I'm not familiar enough with the project structure to add CMake options for force building with / without AVX or its variants. Should we submit a feature request?
By the way, do you try to use my PKGBUILD without line 221-226, since it is added for my environment?
Comment by Sven-Hendrik Haase (Svenstaro) - Thursday, 05 May 2022, 14:18 GMT
Yeah, could you put in a feature request?
Comment by Chih-Hsuan Yen (yan12125) - Tuesday, 14 June 2022, 09:57 GMT
Looks like Adrien Wu created a feature request some time ago: https://github.com/pytorch/pytorch/issues/77411

---

Regarding this issue,

> I'd really like to enable this but our attempts to far were fruitless. I'm very happy to apply a patch to the PKGBUILD if anyone manages to produce one.

Just tried, with python-pytorch 1.11.0-10 and the following change:

diff --git a/trunk/PKGBUILD b/trunk/PKGBUILD
index c2de7479..f0be646c 100644
--- a/trunk/PKGBUILD
+++ b/trunk/PKGBUILD
@@ -233,7 +233,7 @@ build() {
export USE_CUDA=0
export USE_CUDNN=0
cd "${srcdir}/${_pkgname}-${pkgver}"
- echo "add_definitions(-march=haswell)" >> cmake/MiscCheck.cmake
+ echo "add_definitions(-march=x86-64)" >> cmake/MiscCheck.cmake
# this horrible hack is necessary because the current release
# ships inconsistent CMake which tries to build objects before
# thier dependencies, build twice when dependencies are available
@@ -244,7 +244,7 @@ build() {
export USE_CUDA=1
export USE_CUDNN=1
cd "${srcdir}/${_pkgname}-${pkgver}-cuda"
- echo "add_definitions(-march=haswell)" >> cmake/MiscCheck.cmake
+ echo "add_definitions(-march=x86-64)" >> cmake/MiscCheck.cmake
# same horrible hack as above
python setup.py build || python setup.py build
}

Packages built on either build.archlinux.org (AMD EPYC 7502P) or build.archlinuxcn.org (Intel Xeon Silver 4114) can be run on Intel Core i5-3470. The problem is probably not directly related to AVX512 detection now.
Comment by Sven-Hendrik Haase (Svenstaro) - Sunday, 26 June 2022, 20:08 GMT
Ok, I pushed a new set of packages. Please check out rel -11 in testing.
Comment by Chih-Hsuan Yen (yan12125) - Monday, 27 June 2022, 03:09 GMT
Thanks! Training a simple model works on i5-3470.
Comment by Sven-Hendrik Haase (Svenstaro) - Monday, 27 June 2022, 03:46 GMT
Great!

Loading...