FS#21313 - [python-numpy] numpy.dot is much slower without atlas blas

Attached to Project: Arch Linux
Opened by Haoyu Bai (bhy) - Tuesday, 19 October 2010, 02:58 GMT
Last edited by Antonio Rojas (arojas) - Wednesday, 14 October 2015, 06:04 GMT
Task Type Feature Request
Category Packages: Extra
Status Closed
Assigned To Jan de Groot (JGC)
Felix Yan (felixonmars)
Architecture All
Severity Very Low
Priority Low
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

python-numpy's BLAS accelerated numpy.dot (_dotblas.so) is not built, this causing dot and matrix multiplication about 5x slower on my Arch box when compare to a Ubuntu box with same hardware configuration.

Additional info:
* package version(s)
python-numpy 1.5.0-2
python2 2.7-2

* config and/or log files etc.
On Arch Linux:
In [2]: numpy.dot.__module__
Out[2]: 'numpy.core.multiarray'

On Ubuntu:
In [2]: numpy.dot.__module__
Out[2]: 'numpy.core._dotblas'

However, numpy.show_config() shows exactly the same.


Steps to reproduce:

I'm using the following script to benchmark:

import numpy as np
a = np.random.randn(1000,1000)
b = np.random.randn(1000,1000)
np.dot(a,b)

On Arch it gives me:

$ time python2 ndotbench.py

real 0m5.577s
user 0m5.536s
sys 0m0.033s

On Ubuntu it gives me:


$ time python ndotbench.py

real 0m1.658s
user 0m1.616s
sys 0m0.032s


Note that somewhere on the Web mentions that _dotblas.so needs ATLAS to build, however the Ubuntu also doesn't have ATLAS installed.
This task depends upon

Closed by  Antonio Rojas (arojas)
Wednesday, 14 October 2015, 06:04 GMT
Reason for closing:  Implemented
Comment by Ray Rashif (schivmeister) - Wednesday, 20 October 2010, 00:16 GMT
Well, it does look to be the case, at least for Red Hat [1]. It can be a build-time dependency, so that might explain why you don't have atlas installed on Ubuntu.

There is an atlas-lapack in AUR [2], but we still need to see whether we require a separate lapack package (and of course whether it actually is faster). It has enough votes so someone can bring it in if we cannot do this by default.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=461472
[2] http://aur.archlinux.org/packages.php?ID=16575
Comment by Ray Rashif (schivmeister) - Wednesday, 20 October 2010, 02:09 GMT
  • Field changed: Task Type (Bug Report → Feature Request)
  • Field changed: Summary ([python-numpy] numpy.dot is much slower → [python-numpy] numpy.dot is much slower without atlas blas)
It (python-numpy/atlas-lapack) works (on a not-very-up-to-date system):

$ pacman -Ql python-numpy | grep _dotblas.so
python-numpy /usr/lib/python2.6/site-packages/numpy/core/_dotblas.so

$ time ./ndotbench.py

real 0m0.962s
user 0m0.860s
sys 0m0.043s

Anyway, this is not a bug, since numpy and most of it is not affected and numpy.dot still works.
Comment by Ray Rashif (schivmeister) - Monday, 15 November 2010, 10:37 GMT
  • Field changed: Severity (Medium → Very Low)
  • Field changed: Priority (Normal → Low)
  • Task reassigned to Jan de Groot (JGC)
ATLAS is heavily dependent on hardware. We could offer a standard build, but its performance improvement, if any, may not be worth the effort. Instead of providing multiple packages like some distributions [1], the AUR is a simpler avenue for us in this case.

Steps:

* Install unsupported/atlas-lapack (this will replace lapack and blas for you)
* Rebuild abs/extra/python-numpy

A dev or TU is free to adopt the atlas-lapack package into the repositories. Anyway, closing.

[1] https://launchpad.net/ubuntu/+source/atlas
Comment by Antonio Rojas (arojas) - Saturday, 10 October 2015, 13:02 GMT
I would like to implement this. I tested with the script on the report in both our current sagemath build and an upstream precompiled sagemath (with atlas-enabled numpy but no processor specific optimizations).

with our sagemath:

sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 2.55 s per loop

with the precompiled sagemath:

sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 316 ms per loop

So even with a generic not-optimized atlas, one gets almost a 10x performance improvement. I'd like to bring atlas to [extra] and then recompile numpy (and all other sagemath dependencies) with support for it.
If some user then wants to take advantage of the processor-specific optimizations, they would only have to rebuild atlas, and not all packages on top of it.

Any objections?

Comment by Antonio Rojas (arojas) - Saturday, 10 October 2015, 13:07 GMT
CC'ing Felix as he's comaintainer of numpy, and Andrzej as the current atlas maintainer in AUR
Comment by Ray Rashif (schivmeister) - Saturday, 10 October 2015, 18:49 GMT
I say go for it. I did never investigate the performance improvement for a generic build because I only had one crappy system at the time that yielded no substantial performance improvement.
Comment by Andrzej Giniewicz (Giniu) - Sunday, 11 October 2015, 06:39 GMT
I'd say don't go for it. Atlas is perfect fit for AUR, because it is automatically tuned library. This beast detects CPU and chooses algorithms depending on both CPU model (not only type or generation) and benchmarks run during build process. It would be very hard to create generic atlas package, and benefits would be limited when compared with AUR build.

Anyway, we could build with other blas that is more friendly to packaging, like OpenBlas.
Comment by Antonio Rojas (arojas) - Sunday, 11 October 2015, 07:08 GMT
@giniu: the point is enabling atlas support in numpy and sagemath, and for that it needs to be in the official repos. numpy requires atlas specifically, not any other C blas implementation (we already have the standard one in [community]). Same for sagemath (I'm currently patching it to make it work with cblas). But I agree that users should be encouraged to build it from AUR.
What I think I'll do is add an atlas-lapack-base package, with provides=('atlas-lapack') and a post-install message telling users to compile atlas-lapack from AUR if they want an optimized version. This will also open the door to providing processor-specific versions in the future, like Fedora does (which I don't have any intention to do myself).
Comment by Andrzej Giniewicz (Giniu) - Sunday, 11 October 2015, 07:13 GMT
iirc Numpy worked with OpenBlas (like here: https://leemendelowitz.github.io/blog/installing-numpy-with-openblas.html) and I remember reading somewhere, that all tests pass with openblas also. For Sage - I agree, that Atlas is currently only viable and tested option.
Comment by Antonio Rojas (arojas) - Sunday, 11 October 2015, 09:52 GMT
It seems that compiling numpy against atlas makes it a hard dependency, which is not desirable (we shouldn't require atlas for users that don't care about performance). So I will compile it against the standard cblas implementation (which already gives a 2x improvement over the current situation) and add atlas-lapack-base as an optional drop-in replacement.

Benchmarks:

with our current numpy:

sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 2.55 s per loop

with the upstream precompiled sagemath:

sage: %timeit load ("perf-numpy.py")
1 loops, best of 3: 316 ms per loop

with numpy + cblas:

sage: %timeit load('ndotbench.py')
1 loops, best of 3: 1.03 s per loop

with numpy (compiled against cblas) + atlas-lapack-base:

sage: %timeit load('ndotbench.py')
1 loops, best of 3: 220 ms per loop

Comment by Antonio Rojas (arojas) - Monday, 12 October 2015, 17:05 GMT
Implemented in 1.10.0-2, please test that I didn't break anything. Test suite passes (except for an out of memory error unrelated to this)

Loading...