Dear Intel colleagues,
I have just set up a new diskless cluster. Running IMB "Pingpong" with -genv I_MPI_FABRICS shm:dapl shows promising performance. But with -genv I_MPI_FABRICS shm:ofa things never worked. I have provided all system environment and execution traces below. Your help will be important to us.
# I_MPI_DEBUG 4
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
#I_MPI_DEBUG 2
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 2 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
#I_MPI_DEBUG 100
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 100 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 1 Build 20130522
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation. All rights reserved.
[0] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[1] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[0] MPI startup(): Found 1 IB devices
[1] MPI startup(): Found 1 IB devices
[1] MPI startup(): Open 0 IB device: mlx4_0
[0] MPI startup(): Open 0 IB device: mlx4_0
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.
icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.2.183 Build 20130514
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
env | grep I_MPI
I_MPI_ROOT=/opt/intel/impi/4.1.1.036
pdsh -w dn[01-06] ls /usr/lib64/libibverbs.so
dn01: /usr/lib64/libibverbs.so
dn02: /usr/lib64/libibverbs.so
dn05: /usr/lib64/libibverbs.so
dn06: /usr/lib64/libibverbs.so
dn03: /usr/lib64/libibverbs.so
dn04: /usr/lib64/libibverbs.so
ibstat -V
ibstat BUILD VERSION: 1.6.1.MLNX20130822.dfac5dd Build date: Aug 25 2013 11:19:43
uname -a
Linux dn01 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
ssh dn01
Last login: Tue Oct 28 10:56:44 2014 from head.cluster
head -n 20 /etc/dat.conf
# DAT v2.0, v1.2 configuration file
#
# Each entry should have the following fields:
#
# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
# <provider_version> <ia_params> <platform_params>
#
# For uDAPL cma provder, <ia_params> is one of the following:
# network address, network hostname, or netdev name and 0 for port
#
# For uDAPL scm provider, <ia_params> is device name and port
# For uDAPL ucm provider, <ia_params> is device name and port
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL RoCE provider, <ia_params> is device name and 0
#
#ON THIS CLUSTER, ONLY PORT 2 OF EACH HCA IS ACTIVATED
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2"""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0"""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0"""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1"""
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1032855
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1032855
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
#if I_MPI_FALLBACK is enabled, then I_MPI_FABRICS shm:ofa will work, but apparently "falls back" to 1Gbit Ethernet
export I_MPI_FALLBACK=1
[root@head run-033]# /opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4 -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): fabric ofa failed: will try use tcp fabric
[1] MPI startup(): fabric ofa failed: will try use tcp fabric
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 30486 dn01 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[0] MPI startup(): 1 29284 dn02 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
...(ifnored)
2097152 20 17770.48 112.55
4194304 10 35445.40 112.85