FS#21313 - [python-numpy] numpy.dot is much slower without atlas blas
Attached to Project:
Arch Linux
Opened by Haoyu Bai (bhy) - Tuesday, 19 October 2010, 02:58 GMT
Last edited by Antonio Rojas (arojas) - Wednesday, 14 October 2015, 06:04 GMT
Opened by Haoyu Bai (bhy) - Tuesday, 19 October 2010, 02:58 GMT
Last edited by Antonio Rojas (arojas) - Wednesday, 14 October 2015, 06:04 GMT
|
Details
Description:
python-numpy's BLAS accelerated numpy.dot (_dotblas.so) is not built, this causing dot and matrix multiplication about 5x slower on my Arch box when compare to a Ubuntu box with same hardware configuration. Additional info: * package version(s) python-numpy 1.5.0-2 python2 2.7-2 * config and/or log files etc. On Arch Linux: In [2]: numpy.dot.__module__ Out[2]: 'numpy.core.multiarray' On Ubuntu: In [2]: numpy.dot.__module__ Out[2]: 'numpy.core._dotblas' However, numpy.show_config() shows exactly the same. Steps to reproduce: I'm using the following script to benchmark: import numpy as np a = np.random.randn(1000,1000) b = np.random.randn(1000,1000) np.dot(a,b) On Arch it gives me: $ time python2 ndotbench.py real 0m5.577s user 0m5.536s sys 0m0.033s On Ubuntu it gives me: $ time python ndotbench.py real 0m1.658s user 0m1.616s sys 0m0.032s Note that somewhere on the Web mentions that _dotblas.so needs ATLAS to build, however the Ubuntu also doesn't have ATLAS installed. |
This task depends upon
Closed by Antonio Rojas (arojas)
Wednesday, 14 October 2015, 06:04 GMT
Reason for closing: Implemented
Wednesday, 14 October 2015, 06:04 GMT
Reason for closing: Implemented
There is an atlas-lapack in AUR [2], but we still need to see whether we require a separate lapack package (and of course whether it actually is faster). It has enough votes so someone can bring it in if we cannot do this by default.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=461472
[2] http://aur.archlinux.org/packages.php?ID=16575
$ pacman -Ql python-numpy | grep _dotblas.so
python-numpy /usr/lib/python2.6/site-packages/numpy/core/_dotblas.so
$ time ./ndotbench.py
real 0m0.962s
user 0m0.860s
sys 0m0.043s
Anyway, this is not a bug, since numpy and most of it is not affected and numpy.dot still works.
Steps:
* Install unsupported/atlas-lapack (this will replace lapack and blas for you)
* Rebuild abs/extra/python-numpy
A dev or TU is free to adopt the atlas-lapack package into the repositories. Anyway, closing.
[1] https://launchpad.net/ubuntu/+source/atlas
with our sagemath:
sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 2.55 s per loop
with the precompiled sagemath:
sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 316 ms per loop
So even with a generic not-optimized atlas, one gets almost a 10x performance improvement. I'd like to bring atlas to [extra] and then recompile numpy (and all other sagemath dependencies) with support for it.
If some user then wants to take advantage of the processor-specific optimizations, they would only have to rebuild atlas, and not all packages on top of it.
Any objections?
Anyway, we could build with other blas that is more friendly to packaging, like OpenBlas.
What I think I'll do is add an atlas-lapack-base package, with provides=('atlas-lapack') and a post-install message telling users to compile atlas-lapack from AUR if they want an optimized version. This will also open the door to providing processor-specific versions in the future, like Fedora does (which I don't have any intention to do myself).
Benchmarks:
with our current numpy:
sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 2.55 s per loop
with the upstream precompiled sagemath:
sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 316 ms per loop
with numpy + cblas:
sage: %timeit load('ndotbench.py')
1 loops, best of 3: 1.03 s per loop
with numpy (compiled against cblas) + atlas-lapack-base:
sage: %timeit load('ndotbench.py')
1 loops, best of 3: 220 ms per loop