Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 930

DAPL works but OFA not

$
0
0

Dear Intel colleagues,

I have just set up a new diskless cluster. Running IMB "Pingpong" with -genv I_MPI_FABRICS shm:dapl shows promising performance. But with -genv I_MPI_FABRICS shm:ofa things never worked. I have provided all system environment and execution traces below. Your help will be important to us.

# I_MPI_DEBUG 4 
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4  -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled

#I_MPI_DEBUG 2
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 2  -genv I_MPI_FABRICS shm:ofa   /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled

#I_MPI_DEBUG 100
/opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 100  -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 1  Build 20130522
[0] MPI startup(): Copyright (C) 2003-2013 Intel Corporation.  All rights reserved.
[0] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[1] MPI startup(): MPIDI_CH3I_RDMA_Process.boot_cq_hndl=(nil)
[0] MPI startup(): Found 1 IB devices
[1] MPI startup(): Found 1 IB devices
[1] MPI startup(): Open 0 IB device: mlx4_0
[0] MPI startup(): Open 0 IB device: mlx4_0
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled

mpirun -V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.

icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.2.183 Build 20130514
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

env | grep I_MPI
I_MPI_ROOT=/opt/intel/impi/4.1.1.036

pdsh -w dn[01-06] ls /usr/lib64/libibverbs.so
dn01: /usr/lib64/libibverbs.so
dn02: /usr/lib64/libibverbs.so
dn05: /usr/lib64/libibverbs.so
dn06: /usr/lib64/libibverbs.so
dn03: /usr/lib64/libibverbs.so
dn04: /usr/lib64/libibverbs.so

ibstat -V
ibstat BUILD VERSION: 1.6.1.MLNX20130822.dfac5dd Build date: Aug 25 2013 11:19:43

uname -a
Linux dn01 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
ssh dn01
Last login: Tue Oct 28 10:56:44 2014 from head.cluster

head -n 20 /etc/dat.conf
# DAT v2.0, v1.2 configuration file
#
# Each entry should have the following fields:
#
# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
#           <provider_version> <ia_params> <platform_params>
#
# For uDAPL cma provder, <ia_params> is one of the following:
#       network address, network hostname, or netdev name and 0 for port
#
# For uDAPL scm provider, <ia_params> is device name and port
# For uDAPL ucm provider, <ia_params> is device name and port
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL RoCE provider, <ia_params> is device name and 0
#
#ON THIS CLUSTER, ONLY PORT 2 OF EACH HCA IS ACTIVATED
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2"""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0"""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0"""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1"""

ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1032855
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1032855
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

#if I_MPI_FALLBACK is enabled, then I_MPI_FABRICS shm:ofa will work, but apparently "falls back" to 1Gbit Ethernet
export I_MPI_FALLBACK=1
[root@head run-033]# /opt/intel/impi/4.1.1.036/intel64/bin/mpirun -n 2 -host dn01,dn02 -ppn 1 -genv I_MPI_DEBUG 4  -genv I_MPI_FABRICS shm:ofa /opt/intel/impi/4.1.1.036/intel64/bin/IMB-MPI1 PingPong
[0] MPI startup(): fabric ofa failed: will try use tcp fabric
[1] MPI startup(): fabric ofa failed: will try use tcp fabric
[0] MPI startup(): shm and tcp data transfer modes
[1] MPI startup(): shm and tcp data transfer modes
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       30486    dn01       {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[0] MPI startup(): 1       29284    dn02       {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
 ...(ifnored)
      2097152           20     17770.48       112.55
      4194304           10     35445.40       112.85

 


Viewing all articles
Browse latest Browse all 930

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>