FS#79543 - [openmpi] Reading from file (MPI_File_open) gets stuck occasionally
Attached to Project:
Arch Linux
Opened by Elkin (helq) - Saturday, 02 September 2023, 16:36 GMT
Last edited by Christian Heusel (gromit) - Wednesday, 06 September 2023, 23:00 GMT
Opened by Elkin (helq) - Saturday, 02 September 2023, 16:36 GMT
Last edited by Christian Heusel (gromit) - Wednesday, 06 September 2023, 23:00 GMT
|
Details
Description:
Binary gets stuck at MPI_File_open call when executed twice in a row. Additional info: * openmpi 4.1.5-3 * Downgrading system to Aug 02 2023 (where openmpi is 4.1.5-2) "solves" the issue Steps to reproduce: 1. Create a dummy file called `test-file.txt`. 2. Copy minimal code in `test.c` file: ``` // Based on minimal code for bug report: https://bugs.archlinux.org/task/78786?project=1&string=openmpi #include <stdio.h> #include <mpi.h> int main() { int rank; MPI_Init(NULL, NULL); MPI_Comm comm = MPI_COMM_WORLD; MPI_Comm_rank(comm, &rank); MPI_File fh; int err = MPI_File_open(comm, "test-file.txt", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); if (err != MPI_SUCCESS) { printf("Got error trying to open file\n"); } printf("Hello, I am rank %d in the merged comm\n", rank); MPI_Barrier(comm); MPI_Finalize(); return 0; } ``` 3. Compile code `gcc test.c -lmpi`. 4. Run code twice: `mpirun -np 2 a.out && mpirun -np 2 a.out` The expected output (order does not matter) is ``` Hello, I am rank 0 in the merged comm Hello, I am rank 1 in the merged comm Hello, I am rank 0 in the merged comm Hello, I am rank 1 in the merged comm `` Sadly, my output is only the first two lines. It gets stuck without printing the other two lines. Debugging using gdb confirms that it is stuck somewhere in the MPI_File_open call. |
This task depends upon
Closed by Christian Heusel (gromit)
Wednesday, 06 September 2023, 23:00 GMT
Reason for closing: Fixed
Additional comments about closing: Should be fixed by openmpi 4.1.5-5
Wednesday, 06 September 2023, 23:00 GMT
Reason for closing: Fixed
Additional comments about closing: Should be fixed by openmpi 4.1.5-5
Also it seems like the processes get stuck at 100% CPU.
I tried:
- 5.0.0rc10 (didnt work)
- 4.1.6rc2 (didnt work)
- 4.1.5 (didnt work)
- 4.1.4 (worked)
- 4.1.4 with same flags as the current build (worked)
So I guess this is an upstream bug...
If you need this then you can just build the old package yourself:
$ pkgctl repo clone --switch="4.1.4-4" openmpi
$ pkgctl build openmpi