FS#79543 - [openmpi] Reading from file (MPI_File_open) gets stuck occasionally

Attached to Project: Arch Linux
Opened by Elkin (helq) - Saturday, 02 September 2023, 16:36 GMT
Last edited by Christian Heusel (gromit) - Wednesday, 06 September 2023, 23:00 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To David Runge (dvzrv)
Levente Polyak (anthraxx)
Christian Heusel (gromit)
Architecture x86_64
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No


Binary gets stuck at MPI_File_open call when executed twice in a row.

Additional info:
* openmpi 4.1.5-3
* Downgrading system to Aug 02 2023 (where openmpi is 4.1.5-2) "solves" the issue

Steps to reproduce:

1. Create a dummy file called `test-file.txt`.

2. Copy minimal code in `test.c` file:
// Based on minimal code for bug report: https://bugs.archlinux.org/task/78786?project=1&string=openmpi
#include <stdio.h>
#include <mpi.h>

int main() {
int rank;
MPI_Comm_rank(comm, &rank);

MPI_File fh;
int err = MPI_File_open(comm, "test-file.txt", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);
if (err != MPI_SUCCESS) {
printf("Got error trying to open file\n");

printf("Hello, I am rank %d in the merged comm\n", rank);

return 0;

3. Compile code `gcc test.c -lmpi`.

4. Run code twice: `mpirun -np 2 a.out && mpirun -np 2 a.out`

The expected output (order does not matter) is
Hello, I am rank 0 in the merged comm
Hello, I am rank 1 in the merged comm
Hello, I am rank 0 in the merged comm
Hello, I am rank 1 in the merged comm

Sadly, my output is only the first two lines. It gets stuck without printing the other two lines. Debugging using gdb confirms that it is stuck somewhere in the MPI_File_open call.
This task depends upon

Closed by  Christian Heusel (gromit)
Wednesday, 06 September 2023, 23:00 GMT
Reason for closing:  Fixed
Additional comments about closing:  Should be fixed by openmpi 4.1.5-5
Comment by Toolybird (Toolybird) - Saturday, 02 September 2023, 22:44 GMT
Could you please try openmpi-4.1.5-4 in [extra-testing]?
Comment by Elkin (helq) - Sunday, 03 September 2023, 06:06 GMT
I tried it. The bug is still present :S
Comment by Christian Heusel (gromit) - Sunday, 03 September 2023, 11:25 GMT
Hm, so I have invested some time and tried different versions and build and this seems to be a regression present from openmpi-4.1.5 onwards.
Also it seems like the processes get stuck at 100% CPU.

I tried:
- 5.0.0rc10 (didnt work)
- 4.1.6rc2 (didnt work)
- 4.1.5 (didnt work)
- 4.1.4 (worked)
- 4.1.4 with same flags as the current build (worked)

So I guess this is an upstream bug...

If you need this then you can just build the old package yourself:
$ pkgctl repo clone --switch="4.1.4-4" openmpi
$ pkgctl build openmpi
Comment by David Runge (dvzrv) - Sunday, 03 September 2023, 12:44 GMT
It would be awesome if someone would bisect this between 4.1.4 and 4.1.5 and then report it upstream :)
Comment by Christian Heusel (gromit) - Sunday, 03 September 2023, 14:07 GMT
Will do!
Comment by Christian Heusel (gromit) - Wednesday, 06 September 2023, 19:18 GMT
So I raised the issue upstream as all debugging on my side didn't help: https://github.com/open-mpi/ompi/issues/11913
Comment by loqs (loqs) - Wednesday, 06 September 2023, 21:11 GMT
@gromit can you reproduce the the issue using the current PKGBUILD with both patches disabled? I could not while, adding pkgname-4.1.5-openpmix_4.2.3.patch back reintroduced the issue for me.
Comment by Christian Heusel (gromit) - Wednesday, 06 September 2023, 22:25 GMT
Oh my, I was so sure that I already did disable the patches, but indeed this fixes the issue for me! 🙈
