Dear developers of Intel-MPI,
First of all: Congratulations, that INTEL-MPI now supports also MPI-3 !
However, I found a bug in INTEL-MPI-5.0 when running the MPI-3 shared memory feature (calling MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY) on a Linux Cluster (NEC Nehalem) by a Fortran95 CFD-code.
I isolated the problem into a small Ftn95 example program, which allocates shared an integer*4-array of array dimension N , then uses it by the MPI-processes (on the same node), and then repeats the same for the next shared allocation. So, the number of shared windows do accumulate in the run, because I do not free the shared windows allocated so far. This allocation of shared windows works, but only until the total number of allocated memory exceeds a limit of ~30 millions of Integer*4 numbers (~120 MB).
When that limit is reached, the next call of MPI_WIN_ALLOCATE_SHARED, MPI_WIN_SHARED_QUERY to allocated one more shared window do not give an error message, but the 1st attempt to use that allocated shared array results in a bus error (because the shared array has not been allocated correctly).
The problem is independent of the number of MPI-processes started by mpirun on the node (I used only 1 node)
Example: N= 100 000 à bus error occurred at iwin=288 (i.e. the allocation of the 288-th shared window had failed)
N= 1 000 000 à bus error occurred at iwin= 30
N= 5 000 000 à bus error occurred at iwin= 6
N= 28 000 000 à bus error occurred at iwin= 2
N= 30 000 000 à bus error occurred at iwin= 1 (i.e. already the 1st allocation failed)
The node on the cluster has 8 Nehalem cores, and had a free memory of 10 GB, and I was the only user on it. I used the INTEL-13 and also the INTEL-14 compiler for compiling the example program.
mpiifort -O0 -debug -traceback -check -fpe0 sharedmemtest.f90
mpirun -binding -prepend-rank -ordered-output -np 4 ./a.out
If it is helpful for you, I could send you the source code of the program.
It seems to me, that there is an internal storage limitation in the implementation of the MPI-3 shared memory feature in INTEL-MPI-5.0 . I cannot use INTEL-MPI in my real CFD-code with that limitation, because in case of very large grids the total storage allocated simultaneously by the shared windows can exceed 10 GB.
Greetings to you all
Michael R.