FS#63054 - [openblas] built from community gives wrong matrix multiplication results (DGEMM)

Attached to Project: Community Packages
Opened by Xu Xiansong (phyxxs) - Monday, 01 July 2019, 06:56 GMT
Last edited by Buggy McBugFace (bugbot) - Saturday, 25 November 2023, 20:00 GMT
Task Type Bug Report
Category Packages
Status Closed
Assigned To Felix Yan (felixonmars)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 4
Private No

Details

Description:

OpenBlas 0.3.6 installed automatically via pacman -S or manually via PKGBUILD with pacman -U gives wrong matrix multiplication results by using python numpy/julia/Fortran.

OpenBlas 0.3.6/0.3.7 compiled from the source code do not give this problem.

Other machines do not have this problem with arch linux/manjaro with the same version of openblas.

Suspecting this could be PKGBUILD and hardware compatibility.

Additional info:
* package version(s): Openblas 0.3.6
* config and/or log files etc. Default configuration
* link to upstream bug report, if any
* kernel version: x86_64 Linux 4.14.124-1-MANJARO
* hardware: CPU: i7-7820X CPU
Motherboard: Asus TUF X299 II

Steps to reproduce:
I can reproduce locally, however, not able to reproduce on other machines with different CPUs. I do not have another machine with i7-7820X. (thus suspecting hardware issues)


pacman -S openblas
pacman -S julia

The codes are attached as the outputs are attached as figures except for Fortran output.



This task depends upon

Closed by  Buggy McBugFace (bugbot)
Saturday, 25 November 2023, 20:00 GMT
Reason for closing:  Moved
Additional comments about closing:  https://gitlab.archlinux.org/archlinux/p ackaging/packages/openblas/issues/1
Comment by Mamy Ratsimbazafy (ChoK) - Wednesday, 28 August 2019, 21:08 GMT
I also hit this bug here is my investigation:

------------------------------------------------

Description:

When using Openblas + cblas, float64 matrix multiplications does not work properly and can give incorrect results. This impacts raw calls to cblas as well as Numpy.

Using blas or MKL instead return correct results.

Additional info:
* packages versions:
* blas-3.8.0-2
* openblas-0.3.6-1
* cblas-3.8.0-2
* numpy-1.17.0-1
* Link to my investigation in my own library that does raw BLAS calls:
https://github.com/mratsim/Arraymancer/issues/375#issuecomment-525907194

Steps to reproduce:

Run the following Python script with blas and then openblas.
It can be compared with a Python instance on repl.it
https://repl.it/languages/python3

```python
import numpy as np

n1 = np.array(
[[2, 4, 3, 1, 3, 1, 3, 1],
[1, 2, 1, 1, 2, 0, 4, 3],
[2, 0, 0, 3, 0, 4, 4, 1],
[1, 1, 4, 0, 3, 1, 3, 0],
[3, 4, 1, 1, 4, 2, 3, 4],
[2, 4, 0, 2, 3, 3, 3, 4],
[3, 0, 0, 3, 1, 4, 3, 1],
[4, 3, 2, 4, 1, 0, 0, 0]],
dtype=np.float64)

n2 = np.array(
[[2, 2, 0, 4, 0, 0, 4, 2],
[2, 0, 0, 1, 1, 1, 3, 1],
[0, 2, 2, 0, 2, 2, 3, 3],
[0, 0, 1, 0, 4, 2, 4, 1],
[0, 0, 1, 3, 4, 2, 4, 2],
[4, 3, 4, 1, 4, 4, 0, 3],
[3, 3, 0, 2, 1, 2, 3, 3],
[2, 1, 2, 1, 2, 4, 4, 1]],
dtype=np.float64)

n1n2 = np.array(
[[27,23,16,29,35,32,58,37],
[24,19,11,23,26,30,49,27],
[34,29,21,21,34,34,36,32],
[17,22,15,21,28,25,40,33],
[39,27,23,40,45,46,72,41],
[41,26,25,34,47,48,65,38],
[33,28,22,26,37,34,41,33],
[14,12, 9,22,27,17,51,23]],
dtype=np.float64)

print(n1)
print(n2)

print(n1 @ n2)

np.testing.assert_array_equal(n1 @ n2, n1n2)
```

Fix suggestion

Openblas also provides its own cblas.h for the C interface to BLAS
which is currently deleted by the package build https://git.archlinux.org/svntogit/community.git/tree/trunk/PKGBUILD?h=packages/openblas#n31

Like what is done in Debian/Ubuntu it shouldn't need cblas from Netlib.

Instead Openblas can just provide/conflict both blas and cblas instead of blas as it's done now.

This also would have the added benefit of being the same setup as Ubuntu/Debian, developers wouldn't need to
autodetect "if Archlinux, look into libcblas.so else if Ubuntu look into libblas.so" to get the cblas symbols.

Comment by Felix Yan (felixonmars) - Thursday, 29 August 2019, 02:22 GMT
I need to confirm the problem before applying a fix. How did you build openblas to avoid the error? Did you try to edit PKGBUILD and only remove the NO_CBLAS part?
Comment by Eli Schwartz (eschwartz) - Thursday, 29 August 2019, 02:55 GMT
> This also would have the added benefit of being the same setup as Ubuntu/Debian, developers wouldn't need toautodetect "if Archlinux, look into libcblas.so else if Ubuntu look into libblas.so" to get the cblas symbols.

Wouldn't it make more sense for distributions that provide a single library with both blas and cblas symbols, to provide a symlink libcblas.so -> libblas.so, and for everyone to look for cblas symbols in the canonical upstream name?
Comment by Mamy Ratsimbazafy (ChoK) - Thursday, 29 August 2019, 09:32 GMT
Here is a fixed PKGBUILD, tested with extra/numpy

https://github.com/mratsim/Arch-Data-Science/commit/737d5de7f43220d3ce381b8ef480ef9372cf90fe

```
# Maintainer: Felix Yan <REDACTED to prevent spam>
# Contributor: Giuseppe Borzi <REDACTED to prevent spam>

pkgname=openblas
_pkgname=OpenBLAS
pkgver=0.3.7
pkgrel=2
pkgdesc="An optimized BLAS library based on GotoBLAS2 1.13 BSD"
arch=('x86_64')
url="https://www.openblas.net/"
license=('BSD')
depends=('gcc-libs')
makedepends=('perl' 'gcc-fortran')
provides=('blas=3.8.0' 'cblas=3.8.0')
conflicts=('blas' 'cblas')
source=(${_pkgname}-v${pkgver}.tar.gz::https://github.com/xianyi/OpenBLAS/archive/v${pkgver}.tar.gz)
sha512sums=('9c4898301c675471bbce2bb99b6bbe7c90724784fac06504416d4bd5da3cd4488f727b0a118c9a38ea342daac2af9e32597a847004241cc57de693b58b856262')

build() {
cd "$srcdir/$_pkgname-$pkgver"

make NO_STATIC=1 NO_LAPACK=1 NO_LAPACKE=1 NO_AFFINITY=1 USE_OPENMP=1 \
CFLAGS="$CPPFLAGS $CFLAGS" DYNAMIC_ARCH=1 \
NUM_THREADS=64 MAJOR_VERSION=3 libs shared
}

package() {
cd "$srcdir/$_pkgname-$pkgver"

make PREFIX="$pkgdir"/usr NUM_THREADS=64 MAJOR_VERSION=3 install
rm -f "$pkgdir"/usr/include/lapacke*
install -Dm644 LICENSE "$pkgdir"/usr/share/licenses/$pkgname/LICENSE

cd "$pkgdir"/usr/lib/
sed -i -e "s%$pkgdir%%" "$pkgdir"/usr/lib/cmake/openblas/OpenBLASConfig.cmake
sed -i -e "s%$pkgdir%%" "$pkgdir"/usr/lib/pkgconfig/openblas.pc
# Provide blas library
ln -s libopenblasp-r$pkgver.so libblas.so
ln -s libopenblasp-r$pkgver.so libblas.so.3
ln -s openblas.pc "$pkgdir"/usr/lib/pkgconfig/blas.pc
# Provide cblas library
ln -s libopenblasp-r$pkgver.so libcblas.so
ln -s libopenblasp-r$pkgver.so libcblas.so.3
ln -s openblas.pc "$pkgdir"/usr/lib/pkgconfig/cblas.pc

rmdir "$pkgdir"/usr/bin
}

# vim:set ts=2 sw=2 et:
```

```
$ pacman -Ql blas
blas /usr/
blas /usr/lib/
blas /usr/lib/libblas.so
blas /usr/lib/libblas.so.3
blas /usr/lib/libblas.so.3.8.0
blas /usr/lib/pkgconfig/
blas /usr/lib/pkgconfig/blas.pc
blas /usr/share/
blas /usr/share/licenses/
blas /usr/share/licenses/blas/
blas /usr/share/licenses/blas/LICENSE.blas
```
```
$ pacman -Ql cblas
cblas /usr/
cblas /usr/include/
cblas /usr/include/cblas.h
cblas /usr/include/cblas_f77.h
cblas /usr/include/cblas_mangling.h
cblas /usr/include/cblas_test.h
cblas /usr/lib/
cblas /usr/lib/cmake/
cblas /usr/lib/cmake/cblas-3.8.0/
cblas /usr/lib/cmake/cblas-3.8.0/cblas-config-version.cmake
cblas /usr/lib/cmake/cblas-3.8.0/cblas-config.cmake
cblas /usr/lib/cmake/cblas-3.8.0/cblas-targets-release.cmake
cblas /usr/lib/cmake/cblas-3.8.0/cblas-targets.cmake
cblas /usr/lib/libcblas.so
cblas /usr/lib/libcblas.so.3
cblas /usr/lib/libcblas.so.3.8.0
cblas /usr/lib/pkgconfig/
cblas /usr/lib/pkgconfig/cblas.pc
```
```
$ pacman -Ql openblas
openblas /usr/
openblas /usr/include/
openblas /usr/include/cblas.h
openblas /usr/include/f77blas.h
openblas /usr/include/openblas_config.h
openblas /usr/lib/
openblas /usr/lib/cmake/
openblas /usr/lib/cmake/openblas/
openblas /usr/lib/cmake/openblas/OpenBLASConfig.cmake
openblas /usr/lib/cmake/openblas/OpenBLASConfigVersion.cmake
openblas /usr/lib/libblas.so
openblas /usr/lib/libblas.so.3
openblas /usr/lib/libcblas.so
openblas /usr/lib/libcblas.so.3
openblas /usr/lib/libopenblas.so
openblas /usr/lib/libopenblas.so.3
openblas /usr/lib/libopenblasp-r0.3.7.so
openblas /usr/lib/pkgconfig/
openblas /usr/lib/pkgconfig/blas.pc
openblas /usr/lib/pkgconfig/cblas.pc
openblas /usr/lib/pkgconfig/openblas.pc
openblas /usr/share/
openblas /usr/share/licenses/
openblas /usr/share/licenses/openblas/
openblas /usr/share/licenses/openblas/LICENSE
```
Comment by Mamy Ratsimbazafy (ChoK) - Saturday, 21 September 2019, 09:34 GMT
I would like to stress that float64 matrix multiplication is a hard requirement for many scientific computing workload
especially physics.

This bug will trigger heisenbugs on downstream codebases as part of my float64 test suite was working and part gave the wrong result.
Comment by Michaël Defferrard (mdeff) - Friday, 03 April 2020, 03:15 GMT
I agree with ChoK. I tried to compile a compelling argumentation in https://bugs.archlinux.org/task/66092.
Comment by Felix Yan (felixonmars) - Sunday, 04 June 2023, 22:40 GMT
The test case in https://bugs.archlinux.org/task/63054#comment181166 no longer reproduces, but cblas is now included in 0.3.23-2. Please let me know if it works as expected.
Comment by Buggy McBugFace (bugbot) - Tuesday, 08 August 2023, 19:11 GMT
This is an automated comment as this bug is open for more then 2 years. Please reply if you still experience this bug otherwise this issue will be closed after 1 month.

Loading...