Hi,
I have an MPI program, which runs fine on a Windows cluster over Ethernet. When I run it using IPoIB and start one processor on each node, there's also no problem. However, when I try to start multiple processes on each node, it hangs. Can anyone tell me what is wrong?
This is the script I use to run the program (there are 4 hosts in total). I am using Intel MPI on Windows RT 4.0.3.009, and the operating system is Windows Server 2008 R2.
set I_MPI_AUTH_METHOD=delegate
set I_MPI_NETMASK=ib
set I_MPI_DEBUG=5
mpiexec.exe -machinefile hosts -n 8 myprogram.exe
Below is the output I get before the program hangs:
[-1] MPI startup(): Rank Pid Node name Pin cpu
[-1] MPI startup(): 0 9652 {0,1,2,3,4,5,6,7,8,9,10,11}
[-1] MPI startup(): I_MPI_DEBUG=5
[-1] MPI startup(): I_MPI_PIN_MAPPING=1:0 0
[4] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H001
[4] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H001
[0] MPI startup(): shm and tcp data transfer modes
[7] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H007
[2] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H004
[2] MPI startup(): shm and tcp data transfer modes
[7] MPI startup(): shm and tcp data transfer modes
[3] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H007
[3] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H003
[6] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H004
[6] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[5] MPI startup(): The real interface being used for tcp is 'Mellanox IPoIB Adapter' and interface hostname is H003
[5] MPI startup(): shm and tcp data transfer modes
[6] MPI startup(): Internal info: pinning initialization was done
[0] MPI startup(): Internal info: pinning initialization was done
[5] MPI startup(): Internal info: pinning initialization was done
[7] MPI startup(): Internal info: pinning initialization was done
Thanks,
Ling Zhuo