Hi all,
I am new here but following advise from Intel, I ask my question here.
I use the intel MPI beta version (2017.0.042). I have some codes that run locally. Everything works well. But in at least one case, I get an odd behavior. Inside my code, I do a first send/recv to get the data size and then I send/recv data. Now when the size is small, everything works fine. But when I want to send more than 10.000 doubles, I get an infinite loop. Using GDB in the following way on two MPI_proc, I do a Ctrl+C and looking at the backtrace, I get something so weird.
mpirun -n 2 xterm -hold -e gdb --args ./foo -m <datafilename>
The sketch is to send from process 1 to process 0, as a small reduction. In that configuration, the destination is 0 and the source is 1. But from the backtrace, this informations are corrupted i.e. source = -1. This explains the infinite loop. Moreover the tag variable, setup to 0, move to another value.
So, my idea is that there might be a bufferoverflow. To be sure, I switch to MPICH 3.2. And now, everything works fine.
Finally, following advise of Gergana, I have looked at the troubleshooting and try few ideas. One more time, I got an odd behavior :using an option as follow, it fixes the bug (https://software.intel.com/fr-fr/node/535596)
mpirun -n 2 -genv I_MPI_FABRICS tcp ./foo -m <datafilename>
Well, my question is finally I would like to get some help, some information and/or some explanation about that. Is it bug coming from my usage of I_MPI ?
Thank you in advance for taking time to read me.
Sebastien
PS: additional informations : laptop Asus UX31 with Ubuntu 14.04 LTS and Intel® Core™ i5-2557M CPU @ 1.70GHz × 4