Arch Linux

Please read this before reporting a bug:
https://wiki.archlinux.org/title/Bug_reporting_guidelines

Do NOT report bugs when a package is just outdated, or it is in the AUR. Use the 'flag out of date' link on the package page, or the Mailing List.

REPEAT: Do NOT report bugs for outdated packages!
Tasklist

FS#78960 - [python-mpi4py] Build and production timeouts

Attached to Project: Arch Linux
Opened by Anton (sci-pirate) - Monday, 03 July 2023, 14:12 GMT
Last edited by Christian Heusel (gromit) - Saturday, 02 September 2023, 18:06 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Bruno Pagani (ArchangeGabriel)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

Description:
The package cannot pass the check() stage at building, while being installed from the repo it hangs applications based on it:
```
A request has timed out and will therefore fail:

Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345

Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
E.................................................................ss......................................................................................................................................................ssssssssssssssssssssssssssssssssssssss
======================================================================
ERROR: testCommSpawnMultipleDefaults2 (test_spawn.TestSpawnSelfMany.testCommSpawnMultipleDefaults2)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/python-mpi4py/src/mpi4py-3.1.4/test/test_spawn.py", line 188, in testCommSpawnMultipleDefaults2
child = self.COMM.Spawn_multiple(COMMAND, ARGS, 1, MPI.INFO_NULL)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "mpi4py/MPI/Comm.pyx", line 1978, in mpi4py.MPI.Intracomm.Spawn_multiple
mpi4py.MPI.Exception: MPI_ERR_UNKNOWN: unknown error

----------------------------------------------------------------------
Ran 1402 tests in 316.983s

FAILED (errors=1, skipped=189)
error: test
==> ERROR: A failure occurred in check().
```

Additional info:
* python-mpi4py 3.1.4-3
* default PKGBUILD

Steps to reproduce:
`makechrootpkg -cr $CHROOT`
or `make -Cfs`
This task depends upon

Closed by  Christian Heusel (gromit)
Saturday, 02 September 2023, 18:06 GMT
Reason for closing:  Fixed
Additional comments about closing:  openmpi 4.1.5-4
Comment by Anton (sci-pirate) - Monday, 03 July 2023, 17:38 GMT
The workaround is to use MPICH from AUR instead of Open MPI, but employing it in main repos would require rebuilding all other MPI-linked packages. Fortunately, it would be beneficial for Arch ecosystem as MPICH has numerous advantages over Open MPI.
Comment by Christian Heusel (gromit) - Saturday, 02 September 2023, 15:03 GMT
I just tried to repro this and got another error:

==> Starting check()...
running test
[0@arch-nspawn-213570] Python 3.11 (/usr/bin/python)
[0@arch-nspawn-213570] MPI 3.1 (Open MPI 4.1.5)
[0@arch-nspawn-213570] mpi4py 3.1.4 (/build/python-mpi4py/src/mpi4py-3.1.4/build/lib.linux-x86_64-cpython-311/mpi4py)
....s........................................ssssssss........................................................................................................................................................................................................................................................................ssssssssssssssssssssssss..............................................................ss...................................................................ssssssssssss............................................................sss....................s............................................sssss.................................................................................................................................s.....................................s.sss..s....s............sssssssss......ssssssssssssssssssssssssss.................ssssssssssssssssss......sss...s...s..................................................................................................................................................................s...s............ssssssssssssssssssssssssssssssssssssssssssssssssss[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
--------------------------------------------------------------------------
A request has timed out and will therefore fail:

Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345

Your job may terminate as a result of this problem. You may want to
adjust the MCA parameter pmix_server_max_wait and try again. If this
occurred during a connect/accept operation, you can adjust that time
using the pmix_base_exchange_timeout parameter.
--------------------------------------------------------------------------
E[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] 1 more process has sent help message help-orted.txt / timedout
[arch-nspawn-213570:01531] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[arch-nspawn-213570:01531] 1 more process has sent help message help-orted.txt / timedout
E[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] 1 more process has sent help message help-orted.txt / timedout
E[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] 1 more process has sent help message help-orted.txt / timedout
[arch-nspawn-213570:01531] 1 more process has sent help message help-orted.txt / timedout
E[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY
[arch-nspawn-213570:01531] UNPACK-OPAL-VALUE: UNSUPPORTED TYPE 33 FOR KEY


This should apparently be fixed with the next openmpi version (v4.1.6.): https://github.com/open-mpi/ompi/issues/11749
Comment by Christian Heusel (gromit) - Saturday, 02 September 2023, 18:06 GMT
Change was backported to openmpi 4.1.5-4 (currently in testing) which fixes this bug aswell!

Loading...