Intel Cluster Checker collection issue

October 26, 2017, 9:00 am

Latest and popular articles on Intel Technologies

≫ Next: How do disable intra-node comminucation

Dear All,

I'm using Intel(R) Cluster Checker 2017 Update 2 (build 20170117), installed locally on master node in /opt/intel as part of Intel Parallel Studio XE.

However, when running clck-collect I get the following error for all connected computenodes.

[root@master ~]# clck-collect -a -f nodelist
computenode02: bash: /opt/intel/clck/2017.2.019/libexec/clck_run_provider.sh: No such file or directory
pdsh@master: computenode02: ssh exited with exit code 127

computenode01: bash: /opt/intel/clck/2017.2.019/libexec/clck_run_provider.sh: No such file or directory
pdsh@master: computenode01: ssh exited with exit code 127

Please guide if this is an issue with installation of parllel studio, or I'm missing something.

↧

How do disable intra-node comminucation

October 30, 2017, 10:25 am

Latest and popular articles on Intel Technologies

≫ Next: MPI ISend/IRecv deadlock on AWS EC2

≪ Previous: Intel Cluster Checker collection issue

I would like to test the network latency/bandwidth of each node that I am running on in parallel. I think the simplest way to do this would be to have each node test itself.

My question is: How can I force all the IntelMPI TCP communication to go through the network adapter, and not use the optimized node-local communication?

Any advice would be greatly appreciated.

Best Regards,

John

↧

MPI ISend/IRecv deadlock on AWS EC2

November 2, 2017, 2:46 pm

Latest and popular articles on Intel Technologies

≫ Next: shared memory initialization failure

≪ Previous: How do disable intra-node comminucation

Hi, I'm encountering an unexpected deadlock in this Fortran test program, compiled using Parallel Studio XE 2017 Update 4 on an Amazon EC2 cluster (Linux system).

$ mpiifort -traceback nbtest.f90 -o test.x

On one node, the program runs just fine, but any more and it deadlocks, leading me to suspect a internode comm failure, but my knowledge in this area is lacking. FYI, the test code is hardcoded to be run on 16 cores.

Any help or insight is appreciated!

Danny

Code

program nbtest

  use mpi
  implicit none

  !***____________________ Definitions _______________
  integer, parameter :: r4 = SELECTED_REAL_KIND(6,37)
  integer :: irank

  integer, allocatable :: gstart1(:)
  integer, allocatable :: gend1(:)
  integer, allocatable :: gstartz(:)
  integer, allocatable :: gendz(:)
  integer, allocatable :: ind_fl(:)
  integer, allocatable :: blen(:),disp(:)

  integer, allocatable :: ddt_recv(:),ddt_send(:)

  real(kind=r4), allocatable :: tmp_array(:,:,:)
  real(kind=r4), allocatable :: tmp_in(:,:,:)

  integer :: cnt, i, j
  integer :: count_send, count_recv

  integer :: ssend
  integer :: srecv
  integer :: esend
  integer :: erecv
  integer :: erecv2, srecv2

  integer :: mpierr, ierr, old, typesize, typesize2,typesize3
  integer :: mpi_requests(2*16)
  integer :: mpi_status_arr(MPI_STATUS_SIZE,2*16)

  character(MPI_MAX_ERROR_STRING) :: string
  integer :: resultlen
  integer :: errorcode
!***________Code___________________________
  !*_________initialize MPI__________________
  call MPI_INIT(ierr)
  call MPI_COMM_RANK(MPI_COMM_WORLD,irank,ierr)
  call MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN,ierr)

  allocate(gstart1(0:15), &
       gend1(0:15), &
       gstartz(0:15), &
       gendz(0:15))


  gstart1(0) = 1
  gend1(0) = 40
  gstartz(0) = 1
  gendz(0) = 27

  do i = 2, 16
     gstart1(i-1) = gend1(i-2) + 1
     gend1(i-1)   = gend1(i-2) + 40
     gstartz(i-1) = gendz(i-2) + 1
     gendz(i-1)   = gendz(i-2) + 27
  end do

  allocate(ind_fl(15))
  cnt = 1
  do i = 1, 16
     if ( (i-1) == irank ) cycle
     ind_fl(cnt) = (i - 1)
     cnt = cnt + 1
  end do
  cnt = 1
  do i = 1, 16
     if ( (i-1) == irank ) cycle
     ind_fl(cnt) = (i - 1)
     cnt = cnt + 1
  end do

  !*_________new datatype__________________
  allocate(ddt_recv(16),ddt_send(16))
  allocate(blen(60), disp(60))
  call mpi_type_size(MPI_REAL,typesize,ierr)

  do i = 1, 15
     call mpi_type_contiguous(3240,MPI_REAL, &
          ddt_send(i),ierr)
     call mpi_type_commit(ddt_send(i),ierr)

     srecv2 = (gstartz(ind_fl(i))-1)*2+1
     erecv2 = gendz(ind_fl(i))*2
     blen(:) = erecv2 - srecv2 + 1
     do j = 1, 60
        disp(j) = (j-1)*(852) + srecv2 - 1
     end do

     call mpi_type_indexed(60,blen,disp,MPI_REAL, &
          ddt_recv(i),ierr)
     call mpi_type_commit(ddt_recv(i),ierr)
     old = ddt_recv(i)
     call mpi_type_create_resized(old,int(0,kind=MPI_ADDRESS_KIND),&
          int(51120*typesize,kind=MPI_ADDRESS_KIND),&
          ddt_recv(i),ierr)
     call mpi_type_free(old,ierr)
     call mpi_type_commit(ddt_recv(i),ierr)

  end do


  allocate(tmp_array(852,60,40))
  allocate(tmp_in(54,60,640))
  tmp_array = 0.0_r4
  tmp_in = 0.0_r4

  ssend = gstart1(irank)
  esend = gend1(irank)
  cnt = 0

  do i = 1, 15
     srecv = gstart1(ind_fl(i))
     erecv = gend1(ind_fl(i))

     ! Calculate the number of bytes to send (for MPI_SEND)
     count_send = erecv - srecv + 1
     count_recv = esend - ssend + 1
     cnt = cnt + 1

     call mpi_irecv(tmp_array,count_recv,ddt_recv(i), &
          ind_fl(i),ind_fl(i),MPI_COMM_WORLD,mpi_requests(cnt),ierr)

     cnt = cnt + 1
     call mpi_isend(tmp_in(:,:,srecv:erecv), &
          count_send,ddt_send(i),ind_fl(i), &
          irank,MPI_COMM_WORLD,mpi_requests(cnt),ierr)

  end do

  call mpi_waitall(cnt,mpi_requests(1:cnt),mpi_status_arr(:,1:cnt),ierr)

  if (ierr /=  MPI_SUCCESS) then
     do i = 1,cnt
        errorcode = mpi_status_arr(MPI_ERROR,i)
        if (errorcode /= 0 .AND. errorcode /= MPI_ERR_PENDING) then
           call MPI_Error_string(errorcode,string,resultlen,mpierr)
           print *, "rank: ",irank, string
           !call MPI_Abort(MPI_COMM_WORLD,errorcode,ierr)
        end if
     end do
  end if

  deallocate(tmp_array)
  deallocate(tmp_in)

  print *, "great success"

  call MPI_FINALIZE(ierr)

end program nbtest

Running gdb on one of the processors during the deadlock:

(gdb) bt

#0 0x00002acb4c6bf733 in __select_nocancel () from /lib64/libc.so.6

#1 0x00002acb4b496a2e in MPID_nem_tcp_connpoll () from /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/libmpi.so.12

#2 0x00002acb4b496048 in MPID_nem_tcp_poll () from /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/libmpi.so.12

#3 0x00002acb4b350020 in MPID_nem_network_poll () from /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/libmpi.so.12

#4 0x00002acb4b0cc5f2 in PMPIDI_CH3I_Progress () from /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/libmpi.so.12

#5 0x00002acb4b50328f in PMPI_Waitall () from /opt/intel/psxe_runtime_2017.4.196/linux/mpi/intel64/lib/libmpi.so.12

#6 0x00002acb4ad1d53f in pmpi_waitall_ (v1=0x1e, v2=0xb0c320, v3=0x0, ierr=0x2acb4c6bf733 <__select_nocancel+10>) at ../../src/binding/fortran/mpif_h/waitallf.c:275

#7 0x00000000004064b0 in MAIN__ ()

#8 0x000000000040331e in main ()

Output log after I kill the job:

$ mpirun -n 16 ./test.x

forrtl: error (78): process killed (SIGTERM)

Image PC Routine Line Source

test.x 000000000040C12A Unknown Unknown Unknown

libpthread-2.17.s 00002BA8B42F95A0 Unknown Unknown Unknown

libmpi.so.12 00002BA8B3303EBF PMPIDI_CH3I_Progr Unknown Unknown

libmpi.so.12 00002BA8B373B28F PMPI_Waitall Unknown Unknown

libmpifort.so.12. 00002BA8B2F5553F pmpi_waitall Unknown Unknown

test.x 00000000004064B0 MAIN__ 129 nbtest.f90

test.x 000000000040331E Unknown Unknown Unknown

libc-2.17.so 00002BA8B4829C05 __libc_start_main Unknown Unknown

test.x 0000000000403229 Unknown Unknown Unknown

(repeated 15 times, once for each processor)

Output with I_MPI_DEBUG = 6

[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 3 Build 20170405 (id: 17193)

[0] MPI startup(): Multi-threaded optimized library

[12] MPI startup(): cannot open dynamic library libdat2.so.2

[7] MPI startup(): cannot open dynamic library libdat2.so.2

[10] MPI startup(): cannot open dynamic library libdat2.so.2

[13] MPI startup(): cannot open dynamic library libdat2.so.2

[4] MPI startup(): cannot open dynamic library libdat2.so.2

[9] MPI startup(): cannot open dynamic library libdat2.so.2

[14] MPI startup(): cannot open dynamic library libdat2.so.2

[5] MPI startup(): cannot open dynamic library libdat2.so.2

[11] MPI startup(): cannot open dynamic library libdat2.so.2

[15] MPI startup(): cannot open dynamic library libdat2.so.2

[6] MPI startup(): cannot open dynamic library libdat2.so.2

[8] MPI startup(): cannot open dynamic library libdat2.so.2

[0] MPI startup(): cannot open dynamic library libdat2.so.2

[3] MPI startup(): cannot open dynamic library libdat2.so.2

[2] MPI startup(): cannot open dynamic library libdat2.so.2

[4] MPI startup(): cannot open dynamic library libdat2.so

[7] MPI startup(): cannot open dynamic library libdat2.so

[8] MPI startup(): cannot open dynamic library libdat2.so

[9] MPI startup(): cannot open dynamic library libdat2.so

[6] MPI startup(): cannot open dynamic library libdat2.so

[10] MPI startup(): cannot open dynamic library libdat2.so

[13] MPI startup(): cannot open dynamic library libdat2.so

[0] MPI startup(): cannot open dynamic library libdat2.so

[15] MPI startup(): cannot open dynamic library libdat2.so

[3] MPI startup(): cannot open dynamic library libdat2.so

[12] MPI startup(): cannot open dynamic library libdat2.so

[4] MPI startup(): cannot open dynamic library libdat.so.1

[14] MPI startup(): cannot open dynamic library libdat2.so

[7] MPI startup(): cannot open dynamic library libdat.so.1

[5] MPI startup(): cannot open dynamic library libdat2.so

[8] MPI startup(): cannot open dynamic library libdat.so.1

[1] MPI startup(): cannot open dynamic library libdat2.so.2

[6] MPI startup(): cannot open dynamic library libdat.so.1

[9] MPI startup(): cannot open dynamic library libdat.so.1

[10] MPI startup(): cannot open dynamic library libdat.so.1

[0] MPI startup(): cannot open dynamic library libdat.so.1

[12] MPI startup(): cannot open dynamic library libdat.so.1

[4] MPI startup(): cannot open dynamic library libdat.so

[11] MPI startup(): cannot open dynamic library libdat2.so

[3] MPI startup(): cannot open dynamic library libdat.so.1

[13] MPI startup(): cannot open dynamic library libdat.so.1

[5] MPI startup(): cannot open dynamic library libdat.so.1

[15] MPI startup(): cannot open dynamic library libdat.so.1

[5] MPI startup(): cannot open dynamic library libdat.so

[7] MPI startup(): cannot open dynamic library libdat.so

[1] MPI startup(): cannot open dynamic library libdat2.so

[9] MPI startup(): cannot open dynamic library libdat.so

[8] MPI startup(): cannot open dynamic library libdat.so

[11] MPI startup(): cannot open dynamic library libdat.so.1

[6] MPI startup(): cannot open dynamic library libdat.so

[10] MPI startup(): cannot open dynamic library libdat.so

[14] MPI startup(): cannot open dynamic library libdat.so.1

[11] MPI startup(): cannot open dynamic library libdat.so

[13] MPI startup(): cannot open dynamic library libdat.so

[15] MPI startup(): cannot open dynamic library libdat.so

[12] MPI startup(): cannot open dynamic library libdat.so

[0] MPI startup(): cannot open dynamic library libdat.so

[14] MPI startup(): cannot open dynamic library libdat.so

[1] MPI startup(): cannot open dynamic library libdat.so.1

[3] MPI startup(): cannot open dynamic library libdat.so

[1] MPI startup(): cannot open dynamic library libdat.so

[2] MPI startup(): cannot open dynamic library libdat2.so

[2] MPI startup(): cannot open dynamic library libdat.so.1

[2] MPI startup(): cannot open dynamic library libdat.so

[4] MPI startup(): cannot load default tmi provider

[7] MPI startup(): cannot load default tmi provider

[5] MPI startup(): cannot load default tmi provider

[9] MPI startup(): cannot load default tmi provider

[0] MPI startup(): cannot load default tmi provider

[6] MPI startup(): cannot load default tmi provider

[10] MPI startup(): cannot load default tmi provider

[3] MPI startup(): cannot load default tmi provider

[15] MPI startup(): cannot load default tmi provider

[8] MPI startup(): cannot load default tmi provider

[1] MPI startup(): cannot load default tmi provider

[14] MPI startup(): cannot load default tmi provider

[11] MPI startup(): cannot load default tmi provider

[2] MPI startup(): cannot load default tmi provider

[12] MPI startup(): cannot load default tmi provider

[13] MPI startup(): cannot load default tmi provider

[12] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[4] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[9] ERROR - load_iblibrary(): [15] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[5] ERROR - load_iblibrary(): [0] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[10] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[1] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[3] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[13] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[7] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[2] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[6] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[8] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[11] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[14] ERROR - load_iblibrary(): Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

Can't open IB verbs library: libibverbs.so.1: cannot open shared object file: No such file or directory

[0] MPI startup(): shm and tcp data transfer modes

[1] MPI startup(): shm and tcp data transfer modes

[2] MPI startup(): shm and tcp data transfer modes

[3] MPI startup(): shm and tcp data transfer modes

[4] MPI startup(): shm and tcp data transfer modes

[5] MPI startup(): shm and tcp data transfer modes

[7] MPI startup(): shm and tcp data transfer modes

[9] MPI startup(): shm and tcp data transfer modes

[8] MPI startup(): shm and tcp data transfer modes

[6] MPI startup(): shm and tcp data transfer modes

[10] MPI startup(): shm and tcp data transfer modes

[11] MPI startup(): shm and tcp data transfer modes

[12] MPI startup(): shm and tcp data transfer modes

[13] MPI startup(): shm and tcp data transfer modes

[14] MPI startup(): shm and tcp data transfer modes

[15] MPI startup(): shm and tcp data transfer modes

[0] MPI startup(): Device_reset_idx=1

[0] MPI startup(): Allgather: 4: 1-4 & 0-4

[0] MPI startup(): Allgather: 1: 5-11 & 0-4

[0] MPI startup(): Allgather: 4: 12-28 & 0-4

[0] MPI startup(): Allgather: 1: 29-1694 & 0-4

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures}

[0] MPI startup(): Allgather: 4: 1695-3413 & 0-4

[0] MPI startup(): Allgather: 1: 3414-513494 & 0-4

[0] MPI startup(): Allgather: 3: 513495-1244544 & 0-4

[0] MPI startup(): Allgather: 4: 0-2147483647 & 0-4

[0] MPI startup(): Allgather: 4: 1-16 & 5-16

[0] MPI startup(): Allgather: 1: 17-38 & 5-16

[0] MPI startup(): Allgather: 3: 0-2147483647 & 5-16

[0] MPI startup(): Allgather: 4: 1-8 & 17-2147483647

[0] MPI startup(): Allgather: 1: 9-23 & 17-2147483647

[0] MPI startup(): Allgather: 4: 24-35 & 17-2147483647

[0] MPI startup(): Allgather: 3: 0-2147483647 & 17-2147483647

[0] MPI startup(): Allgatherv: 1: 0-3669 & 0-4

[0] MPI startup(): Allgatherv: 4: 3669-4949 & 0-4

[0] MPI startup(): Allgatherv: 1: 4949-17255 & 0-4

[0] MPI startup(): Allgatherv: 4: 17255-46775 & 0-4

[0] MPI startup(): Allgatherv: 3: 46775-836844 & 0-4

[0] MPI startup(): Allgatherv: 4: 0-2147483647 & 0-4

[0] MPI startup(): Allgatherv: 4: 0-10 & 5-16

[0] MPI startup(): Allgatherv: 1: 10-38 & 5-16

[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 5-16

[0] MPI startup(): Allgatherv: 4: 0-8 & 17-2147483647

[0] MPI startup(): Allgatherv: 1: 8-21 & 17-2147483647

[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 17-2147483647

[0] MPI startup(): Allreduce: 5: 0-6 & 0-8

[0] MPI startup(): Allreduce: 7: 6-11 & 0-8

[0] MPI startup(): Allreduce: 5: 11-26 & 0-8

[0] MPI startup(): Allreduce: 4: 26-43 & 0-8

[0] MPI startup(): Allreduce: 5: 43-99 & 0-8

[0] MPI startup(): Allreduce: 1: 99-176 & 0-8

[0] MPI startup(): Allreduce: 6: 176-380 & 0-8

[0] MPI startup(): Allreduce: 2: 380-2967 & 0-8

[0] MPI startup(): Allreduce: 1: 2967-9460 & 0-8

[0] MPI startup(): Allreduce: 2: 0-2147483647 & 0-8

[0] MPI startup(): Allreduce: 5: 0-95 & 9-16

[0] MPI startup(): Allreduce: 1: 95-301 & 9-16

[0] MPI startup(): Allreduce: 2: 301-2577 & 9-16

[0] MPI startup(): Allreduce: 6: 2577-5427 & 9-16

[0] MPI startup(): Allreduce: 1: 5427-10288 & 9-16

[0] MPI startup(): Allreduce: 2: 0-2147483647 & 9-16

[0] MPI startup(): Allreduce: 6: 0-6 & 17-2147483647

[0] MPI startup(): Allreduce: 5: 6-11 & 17-2147483647

[0] MPI startup(): Allreduce: 6: 11-452 & 17-2147483647

[0] MPI startup(): Allreduce: 2: 452-2639 & 17-2147483647

[0] MPI startup(): Allreduce: 6: 2639-5627 & 17-2147483647

[0] MPI startup(): Allreduce: 1: 5627-9956 & 17-2147483647

[0] MPI startup(): Allreduce: 2: 9956-2587177 & 17-2147483647

[0] MPI startup(): Allreduce: 3: 0-2147483647 & 17-2147483647

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures}

[0] MPI startup(): Alltoall: 4: 1-16 & 0-8

[0] MPI startup(): Alltoall: 1: 17-69 & 0-8

[0] MPI startup(): Alltoall: 2: 70-1024 & 0-8

[0] MPI startup(): Alltoall: 2: 1024-52228 & 0-8

[0] MPI startup(): Alltoall: 4: 52229-74973 & 0-8

[0] MPI startup(): Alltoall: 2: 74974-131148 & 0-8

[0] MPI startup(): Alltoall: 3: 131149-335487 & 0-8

[0] MPI startup(): Alltoall: 4: 0-2147483647 & 0-8

[0] MPI startup(): Alltoall: 4: 1-16 & 9-16

[0] MPI startup(): Alltoall: 1: 17-40 & 9-16

[0] MPI startup(): Alltoall: 2: 41-497 & 9-16

[0] MPI startup(): Alltoall: 1: 498-547 & 9-16

[0] MPI startup(): Alltoall: 2: 548-1024 & 9-16

[0] MPI startup(): Alltoall: 2: 1024-69348 & 9-16

[0] MPI startup(): Alltoall: 4: 0-2147483647 & 9-16

[0] MPI startup(): Alltoall: 4: 0-1 & 17-2147483647

[0] MPI startup(): Alltoall: 1: 2-4 & 17-2147483647

[0] MPI startup(): Alltoall: 4: 5-24 & 17-2147483647

[0] MPI startup(): Alltoall: 2: 25-1024 & 17-2147483647

[0] MPI startup(): Alltoall: 2: 1024-20700 & 17-2147483647

[0] MPI startup(): Alltoall: 4: 20701-57414 & 17-2147483647

[0] MPI startup(): Alltoall: 3: 57415-66078 & 17-2147483647

[0] MPI startup(): Alltoall: 4: 0-2147483647 & 17-2147483647

[0] MPI startup(): Alltoallv: 2: 0-2147483647 & 0-2147483647

[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647

[0] MPI startup(): Barrier: 0: 0-2147483647 & 0-2147483647

[0] MPI startup(): Bcast: 4: 1-29 & 0-8

[0] MPI startup(): Bcast: 7: 30-37 & 0-8

[0] MPI startup(): Bcast: 4: 38-543 & 0-8

[0] MPI startup(): Bcast: 6: 544-1682 & 0-8

[0] MPI startup(): Bcast: 4: 1683-2521 & 0-8

[0] MPI startup(): Bcast: 6: 2522-30075 & 0-8

[0] MPI startup(): Bcast: 7: 30076-34889 & 0-8

[0] MPI startup(): Bcast: 4: 34890-131072 & 0-8

[0] MPI startup(): Bcast: 6: 131072-409051 & 0-8

[0] MPI startup(): Bcast: 7: 0-2147483647 & 0-8

[0] MPI startup(): Bcast: 4: 1-13 & 9-2147483647

[0] MPI startup(): Bcast: 1: 14-25 & 9-2147483647

[0] MPI startup(): Bcast: 4: 26-691 & 9-2147483647

[0] MPI startup(): Bcast: 6: 692-2367 & 9-2147483647

[0] MPI startup(): Bcast: 4: 2368-7952 & 9-2147483647

[0] MPI startup(): Bcast: 6: 7953-10407 & 9-2147483647

[0] MPI startup(): Bcast: 4: 10408-17900 & 9-2147483647

[0] MPI startup(): Bcast: 6: 17901-36385 & 9-2147483647

[0] MPI startup(): Bcast: 7: 36386-131072 & 9-2147483647

[0] MPI startup(): Bcast: 7: 0-2147483647 & 9-2147483647

[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures}

[0] MPI startup(): Gather: 2: 1-3 & 0-8

[0] MPI startup(): Gather: 3: 4-4 & 0-8

[0] MPI startup(): Gather: 2: 5-66 & 0-8

[0] MPI startup(): Gather: 3: 67-174 & 0-8

[0] MPI startup(): Gather: 2: 175-478 & 0-8

[0] MPI startup(): Gather: 3: 479-531 & 0-8

[0] MPI startup(): Gather: 2: 532-2299 & 0-8

[0] MPI startup(): Gather: 3: 0-2147483647 & 0-8

[0] MPI startup(): Gather: 2: 1-141 & 9-16

[0] MPI startup(): Gather: 3: 142-456 & 9-16

[0] MPI startup(): Gather: 2: 457-785 & 9-16

[0] MPI startup(): Gather: 3: 786-70794 & 9-16

[0] MPI startup(): Gather: 2: 70795-254351 & 9-16

[0] MPI startup(): Gather: 3: 0-2147483647 & 9-16

[0] MPI startup(): Gather: 2: 1-89 & 17-2147483647

[0] MPI startup(): Gather: 3: 90-472 & 17-2147483647

[0] MPI startup(): Gather: 2: 473-718 & 17-2147483647

[0] MPI startup(): Gather: 3: 719-16460 & 17-2147483647

[0] MPI startup(): Gather: 2: 0-2147483647 & 17-2147483647

[0] MPI startup(): Gatherv: 2: 0-2147483647 & 0-16

[0] MPI startup(): Gatherv: 2: 0-2147483647 & 17-2147483647

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Menlo; color: #000000; background-color: #ffffff}
span.s1 {font-variant-ligatures: no-common-ligatures}

[0] MPI startup(): Reduce_scatter: 5: 0-5 & 0-4

[0] MPI startup(): Reduce_scatter: 1: 5-192 & 0-4

[0] MPI startup(): Reduce_scatter: 3: 192-349 & 0-4

[0] MPI startup(): Reduce_scatter: 1: 349-3268 & 0-4

[0] MPI startup(): Reduce_scatter: 3: 3268-71356 & 0-4

[0] MPI startup(): Reduce_scatter: 2: 71356-513868 & 0-4

[0] MPI startup(): Reduce_scatter: 5: 513868-731452 & 0-4

[0] MPI startup(): Reduce_scatter: 2: 731452-1746615 & 0-4

[0] MPI startup(): Reduce_scatter: 5: 1746615-2485015 & 0-4

[0] MPI startup(): Reduce_scatter: 2: 0-2147483647 & 0-4

[0] MPI startup(): Reduce_scatter: 5: 0-5 & 5-16

[0] MPI startup(): Reduce_scatter: 1: 5-59 & 5-16

[0] MPI startup(): Reduce_scatter: 5: 59-99 & 5-16

[0] MPI startup(): Reduce_scatter: 3: 99-198 & 5-16

[0] MPI startup(): Reduce_scatter: 1: 198-360 & 5-16

[0] MPI startup(): Reduce_scatter: 3: 360-3606 & 5-16

[0] MPI startup(): Reduce_scatter: 2: 3606-4631 & 5-16

[0] MPI startup(): Reduce_scatter: 3: 0-2147483647 & 5-16

[0] MPI startup(): Reduce_scatter: 5: 0-22 & 17-2147483647

[0] MPI startup(): Reduce_scatter: 1: 22-44 & 17-2147483647

[0] MPI startup(): Reduce_scatter: 5: 44-278 & 17-2147483647

[0] MPI startup(): Reduce_scatter: 3: 278-3517 & 17-2147483647

[0] MPI startup(): Reduce_scatter: 5: 3517-4408 & 17-2147483647

[0] MPI startup(): Reduce_scatter: 3: 0-2147483647 & 17-2147483647

[0] MPI startup(): Reduce: 4: 4-5 & 0-4

[0] MPI startup(): Reduce: 1: 6-59 & 0-4

[0] MPI startup(): Reduce: 2: 60-188 & 0-4

[0] MPI startup(): Reduce: 6: 189-362 & 0-4

[0] MPI startup(): Reduce: 2: 363-7776 & 0-4

[0] MPI startup(): Reduce: 5: 7777-151371 & 0-4

[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-4

[0] MPI startup(): Reduce: 4: 4-60 & 5-16

[0] MPI startup(): Reduce: 3: 61-88 & 5-16

[0] MPI startup(): Reduce: 4: 89-245 & 5-16

[0] MPI startup(): Reduce: 3: 246-256 & 5-16

[0] MPI startup(): Reduce: 4: 257-8192 & 5-16

[0] MPI startup(): Reduce: 3: 8192-1048576 & 5-16

[0] MPI startup(): Reduce: 3: 0-2147483647 & 5-16

[0] MPI startup(): Reduce: 4: 4-8192 & 17-2147483647

[0] MPI startup(): Reduce: 3: 8192-1048576 & 17-2147483647

[0] MPI startup(): Reduce: 3: 0-2147483647 & 17-2147483647

[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647

[0] MPI startup(): Scatter: 2: 1-7 & 0-16

[0] MPI startup(): Scatter: 3: 8-9 & 0-16

[0] MPI startup(): Scatter: 2: 10-64 & 0-16

[0] MPI startup(): Scatter: 3: 65-372 & 0-16

[0] MPI startup(): Scatter: 2: 373-811 & 0-16

[0] MPI startup(): Scatter: 3: 812-115993 & 0-16

[0] MPI startup(): Scatter: 2: 115994-173348 & 0-16

[0] MPI startup(): Scatter: 3: 0-2147483647 & 0-16

[0] MPI startup(): Scatter: 1: 1-1 & 17-2147483647

[0] MPI startup(): Scatter: 2: 2-76 & 17-2147483647

[0] MPI startup(): Scatter: 3: 77-435 & 17-2147483647

[0] MPI startup(): Scatter: 2: 436-608 & 17-2147483647

[0] MPI startup(): Scatter: 3: 0-2147483647 & 17-2147483647

[0] MPI startup(): Scatterv: 1: 0-2147483647 & 0-2147483647

[5] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[1] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[7] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[2] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[6] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[3] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[13] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[4] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[9] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[14] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[11] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[15] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[8] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[12] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[10] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[0] MPI startup(): Rank Pid Node name Pin cpu

[0] MPI startup(): 0 10691 ip-10-0-0-189 0

[0] MPI startup(): 1 10692 ip-10-0-0-189 1

[0] MPI startup(): 2 10693 ip-10-0-0-189 2

[0] MPI startup(): 3 10694 ip-10-0-0-189 3

[0] MPI startup(): 4 10320 ip-10-0-0-174 0

[0] MPI startup(): 5 10321 ip-10-0-0-174 1

[0] MPI startup(): 6 10322 ip-10-0-0-174 2

[0] MPI startup(): 7 10323 ip-10-0-0-174 3

[0] MPI startup(): 8 10273 ip-10-0-0-104 0

[0] MPI startup(): 9 10274 ip-10-0-0-104 1

[0] MPI startup(): 10 10275 ip-10-0-0-104 2

[0] MPI startup(): 11 10276 ip-10-0-0-104 3

[0] MPI startup(): 12 10312 ip-10-0-0-158 0

[0] MPI startup(): 13 10313 ip-10-0-0-158 1

[0] MPI startup(): 14 10314 ip-10-0-0-158 2

[0] MPI startup(): 15 10315 ip-10-0-0-158 3

[0] MPI startup(): Recognition=2 Platform(code=32 ippn=2 dev=5) Fabric(intra=1 inter=6 flags=0x0)

[0] MPI startup(): I_MPI_DEBUG=6

[0] MPI startup(): I_MPI_HYDRA_UUID=bb290000-2b37-e5b2-065d-050000bd0a00

[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=1

[0] MPI startup(): I_MPI_PIN_MAPPING=4:0 0,1 1,2 2,3 3

↧

shared memory initialization failure

November 7, 2017, 8:54 am

Latest and popular articles on Intel Technologies

≫ Next: mpirun command does not distribute jobs to compute nodes

≪ Previous: MPI ISend/IRecv deadlock on AWS EC2

Hi all,

Running our MPI application on a newly setup RHEL 7.3 system using SGE, we obtain the following error:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1817)......: fail failed
MPIR_Comm_commit(711): fail failed
(unknown)(): Other MPI error

With I_MPI_DEBUG=1000 the following error is reported:

[0] I_MPI_Init_shm_colls_space(): Cannot create shm object: /shm-col-space-69142-2-55D0EBDD4B46E errno=Permission denied
[0] I_MPI_Init_shm_colls_space(): Something goes wrong in shared memory initialization (Permission denied)

Usually Intel MPI creates shm objects in /dev/shm. Does anybody know why the library tries to create them in /?

Cheers,
Pieter

↧

mpirun command does not distribute jobs to compute nodes

November 9, 2017, 9:30 am

Latest and popular articles on Intel Technologies

≫ Next: ITAC -- Naming generated .stf file to differentiate runs

≪ Previous: shared memory initialization failure

Dear Folks,

I have Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.0.1.117 Build 20121010 in my system. I am trying to submit a job using mpirun to my machine having following hosts:

weather
compute-0-0
compute-0-1
compute-0-2
compute-0-3
compute-0-4
compute-0-5
compute-0-6
compute-0-7

after running mpdboot (as mpdboot -v -n 9 -f ~/hostfile -r ssh) I am using the command: mpirun -np 72 -f ~/hostfile ./wrf.exe &

after submitting the job, it fails with some error after 10-15 min. I checked the top command on the compute nodes and did not see any process running as wrf.exe in the mean time. Please suggest if I am making any mistake or there is something else which is inhibiting me to submit jobs on the compute nodes.

Thank you in anticipation.

Dhirendra

↧

ITAC -- Naming generated .stf file to differentiate runs

November 13, 2017, 6:12 am

Latest and popular articles on Intel Technologies

≫ Next: mpirun command does not send jobs to compute nodes

≪ Previous: mpirun command does not distribute jobs to compute nodes

Hello,

I am using ITAC from the 2017.05 Intel Parallel Cluster Studio. I issue a number of mpirun command lines with ITAC tracing enabled. I am trying though to assign specific names to the generated .stf files so that I can associate the .stf files of a particular run with the corresponding mpirun command.

How can I do this?

Is there any option as we have for the statistics with the I_MPI_STATS_FILE?

Can I do something like

mpiexec.hydra ... -stf-file-name MPIapp_$(date +%F_%T) ... ./MPIapp

Thank you!

Michael

↧

mpirun command does not send jobs to compute nodes

November 14, 2017, 4:18 am

Latest and popular articles on Intel Technologies

≫ Next: intel mpi cross os launch error

≪ Previous: ITAC -- Naming generated .stf file to differentiate runs

Dear Folks,

weather
compute-0-0
compute-0-1
compute-0-2
compute-0-3
compute-0-4
compute-0-5
compute-0-6
compute-0-7

after running mpdboot (as mpdboot -v -n 9 -f ~/hostfile -r ssh) I am using the command: mpirun -np 72 -f ~/hostfile ./wrf.exe &

Thank you in anticipation.

Dhirendra

↧

intel mpi cross os launch error

November 21, 2017, 2:03 am

Latest and popular articles on Intel Technologies

≫ Next: pbs system said: 'MPI startup(): ofa fabric is not available and fallback fabric is not enabled''

≪ Previous: mpirun command does not send jobs to compute nodes

Env:

node1 : window 10 (192.168.137.1)
node2 : debian8 virtual machine. (192.168.137.3)

test app: the test.cpp included with intel mpi package

1, Launch from windows side(node1), 1 process (just node 1):

mpiexec -demux select -bootstrap=service -genv I_MPI_FABRICS=shm:tcp -n 1 -host localhost test

get output:

node1:

Hello world: rank 0 of 1 running on DESKTOP-J4KRVVD

2, Launch from windows side(node1), 1 process (just node 2):

mpiexec -demux select -bootstrap=service -genv I_MPI_FABRICS=shm:tcp -host 192.168.137.3 -hostos linux -n 1 -path /opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/test test

get output:

node1:

Hello world: rank 0 of 1 running on vm-build-debian8

3, Launch from windows side(node1), two processes(1 at node1, 1 at node2):

mpiexec -demux select -bootstrap=service -genv I_MPI_FABRICS=shm:tcp -host 192.168.137.3 -hostos linux -n 1 -path /opt/intel/compilers_and_libraries_2017.2.174/linux/mpi/test test : -n 1 -host localhost test

get error:

node1:
rank = 1, revents = 29, state = 1
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 2988: (it_plfd->revents & POLLERR) == 0
internal ABORT - process 0

node2:
[hydserv@vm-build-debian8] stdio_cb (../../tools/bootstrap/persist/persist_server.c:170): assert (!closed) failed
[hydserv@vm-build-debian8] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[hydserv@vm-build-debian8] main (../../tools/bootstrap/persist/persist_server.c:339): demux engine error waiting for event

If i try to turn on verbose output with -v or -genv I_MPI_HYDRA_DEBUG=on, even test 2 will fail with errors below, so don't know what's wrong? or how to find out what's wrong?

node1:
[mpiexec@DESKTOP-J4KRVVD] STDIN will be redirected to 1 fd(s): 4
[mpiexec@DESKTOP-J4KRVVD] ..\hydra\utils\sock\sock.c (420): write error (Unknown error)
[mpiexec@DESKTOP-J4KRVVD] ..\hydra\tools\bootstrap\persist\persist_launch.c (52): assert (sent == hdr.buflen) failed
[mpiexec@DESKTOP-J4KRVVD] ..\hydra\tools\demux\demux_select.c (103): callback returned error status
[mpiexec@DESKTOP-J4KRVVD] ..\hydra\pm\pmiserv\pmiserv_pmci.c (501): error waiting for event
[mpiexec@DESKTOP-J4KRVVD] ..\hydra\ui\mpich\mpiexec.c (1147): process manager error waiting for completion

node2:
[hydserv@vm-build-debian8] stdio_cb (../../tools/bootstrap/persist/persist_server.c:170): assert (!closed) failed
[hydserv@vm-build-debian8] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[hydserv@vm-build-debian8] main (../../tools/bootstrap/persist/persist_server.c:339): demux engine error waiting for event

↧

pbs system said: 'MPI startup(): ofa fabric is not available and fallback fabric is not enabled''

November 22, 2017, 12:24 am

Latest and popular articles on Intel Technologies

≫ Next: Fata Error using MPI in Linux

≪ Previous: intel mpi cross os launch error

I've been using PBS system for testing my code. I have a PBS script to run my binary code. But when I get:

> [0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled

And I read this site: https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...

However, those methods can not sovle the problem. My code can run in host node and other node, but the code can not run by PBS system.

What can I do for this problem?

Thanks.

P.S.

This my PBS script:

#!/bin/sh

#PBS -N job_1
#PBS -l nodes=1:ppn=12
#PBS -o example.out
#PBS -e example.err
#PBS -l walltime=3600:00:00
#PBS -q default_queue

echo -e --------- `date` ----------

echo HomeDirectory is $PWD
echo
echo Current Dir is $PBS_O_WORKDIR
echo


cd $PBS_O_WORKDIR

echo "------------This is the node file -------------"
cat $PBS_NODEFILE
echo "-----------------------------------------------"

np=$(cat $PBS_NODEFILE | wc -l)
echo The number of core is $np
echo
echo

cat $PBS_NODEFILE > $PBS_O_WORKDIR/mpd.host

mpdtrace  >/dev/null 2>&1
if [ "$?" != "0" ]
then
        echo -e
        mpdboot -n 1 -f mpd.host -r ssh
fi

mpirun -np 12 ./run_test

↧

Fata Error using MPI in Linux

November 28, 2017, 4:14 am

Latest and popular articles on Intel Technologies

≫ Next: drastic reduction in performance when compute node running at half load

≪ Previous: pbs system said: 'MPI startup(): ofa fabric is not available and fallback fabric is not enabled''

Hi,

I'm using a virtual Linux Ubuntu machine (Linux-VirtualBox 4.4.0-101-generic #124-Ubuntu SMP Fri Nov 10 18:29:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux), with 8GB RAM.

For a process on Matlab, the software requires Intel MPI runtime package v4.1.3.045 or superior. Instead, I've installed the 2018.1.163 version, being not sure about the 2018 number version.

Using 8 cores in the processing, the software went in error, with the following error:
Fatal error in MPI_Recv: Other MPI error, error stack:
MPI_Recv(224)...................: MPI_Recv(buf=0x7f566d59c040, count=9942500, MPI_FLOAT, src=3, tag=5, MPI_COMM_WORLD, status=0x7ffc43a72b60) failed
PMPIDI_CH3I_Progress(658).......: fail failed
MPID_nem_handle_pkt(1450).......: fail failed
pkt_RTS_handler(317)............: fail failed
do_cts(662).....................: fail failed
MPID_nem_lmt_dcp_start_recv(302): fail failed
dcp_recv(165)...................: Internal MPI error! Cannot read from remote process
Two workarounds have been identified for this issue:
1) Enable ptrace for non-root users with:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
2) Or, use:
I_MPI_SHM_LMT=shm

Reducing the number of cores to 4, the process hangs for more than 3 hours and I'm not sure it is still working.

What could be the problem?

thank you

Pietro

↧

drastic reduction in performance when compute node running at half load

December 7, 2017, 6:01 am

Latest and popular articles on Intel Technologies

≫ Next: Slowdown of message exchange by multiple oders of magnitude due to dynamic connection

≪ Previous: Fata Error using MPI in Linux

We have compute nodes with 24 cores( 48 threads) and 64 GB RAM (2x32GB). When I run a sample code (matrix multiplication)in one of the compute node in one thread, it takes only 4 seconds. But when I starting more runs (copy of the same program) in the same compute node, the time taken increases drastically. When the number of programs running reaches 24 (I gave maximum 24 since physically only 24 cores are present), the time taken becomes like around 40 seconds ( 10 times less). When I checked the temperature, it is below 40 deg Celsius.

When I searched in the Internet about this issue, I found some people saying that it may be due to slowing down of transfer of data from ram to processor when we run many programs. I was not satisfied with this comment, because the compute nodes are designed to run at maximum load with out much decrease in performance. Also, we are using only 1GB of memory even with 24 programs running. Since we are getting performance reduction of about 1/10, I guess the problem is something else.

↧

Slowdown of message exchange by multiple oders of magnitude due to dynamic connection

December 7, 2017, 7:56 am

Latest and popular articles on Intel Technologies

≫ Next: HPCC benchmark HPL results degrade as more cores are used

≪ Previous: drastic reduction in performance when compute node running at half load

Hello,

We develop MPI algorithms on the SuperMUC supercomputer [1]. We compile our algorithms with Intel MPI 2018. Unfortunately, it seems like the message transfer between two processes which have not exchanged a message before is slower than the message transfer between two processes which have already exchanged a message before by a factor of up to 1000.

I want to give several examples:

1.: Let benchmark A perform the following operations: "First, execute a barrier on MPI_COMM_WORLD. Second, start a timer. Third, process 0 sends 256 messages of size 32 kB each. Message i is sent to process i + 1. Finally, stop the timer." The first execution of benchmark A takes about 686 microseconds on an instance of 2048 processes (2048 cores on 128 nodes). Subsequent executions of A just take 0.85 microseconds each.

Insight: If we perform a communication pattern (here 'partial' broadcast) the first time, the execution is slower than subsequent executions by a factor of about 800. Unfortunately, if we execute benchmark A again with a different communication partner, e.g., process 1 sends messages to process 1..256, benchmark A is slow again. Thus, an initial warm up phase which executes benchmark A once does not speed up communication in general.

2.: Let benchmark B perform the following operations: "First, execute a barrier on MPI_COMM_WORLD. Second, start a timer. Third, invoke MPI_Alltoall with messages of size 32 kB each. Finally, stop the timer." The first execution of benchmark B takes about 42.41 seconds(!) on an instance of 2048 processes (2048 cores on 128 nodes). The second execution of B just takes 0.12 seconds.

Insight: If we perform a communication pattern (here MPI_Alltoall) the first time, the execution is slower than subsequent executions by a facto of about 353. Unfortunately, the first MPI_Alltoall is unbelievable slow and gets even much slower on larger machine instances.

3. Let benchmark C perform the following operations: "First, execute a barrier on MPI_COMM_WORLD. Second, start a timer, Third, execute an all-to-all collective operation with messages of size 32 kB each. Finally, stop the timer." The all-to-all collective operation we use in benchmark C is an own implementation of the MPI_Alltoall interface. We now execute benchmark C first and then benchmark A afterwards. Benchmark C takes about 40 seconds and benchmark A takes about 0.85 seconds.

Insight: The first execution of our all-to-all implementation performs similar to the first execution of MPI_Alltoall. Surprisingly, the subsequent execution of benchmark A is executed very fast (0.85 second), compared to the case where we do not have a preceding all-to-all. It seems like the all-to-all collective operation sets up the connections between each process which results in a fast execution of benchmark A. However, as the all-to-all collective operation (MPI_Alltoall as well) is unbelievable slow, we don't want to execute the all-to-all collective operation as a warm up on large scale.

We already figured out that the environment variable I_MPI_USE_DYNAMIC_CONNECTIONS=no avoids these slow running times on small scale (up to 2048 cores). However, setting I_MPI_USE_DYNAMIC_CONNECTIONS to the value 'no' does not have any effect on larger machine instances (number of cores > 2048).

We think that these benchmarks give interesting insights into the running time of Intel MPI. Our supercomputer might be configured incorrectly. We tried to adjust several environment variables but did not find a satisfying configuration. We also want to mention that those fluctuations in running time does not occur with IBM MPI on that machine.

If you have further suggestions to handle this problem, please let us know. If required we run additional benchmarks, apply different configurations, and provide debug output, e.g. I_MPI_DEBUG=xxx.

Best,

Michael A.

[1] https://www.lrz.de/services/compute/supermuc/systemdescription/

↧

HPCC benchmark HPL results degrade as more cores are used

December 7, 2017, 8:36 am

Latest and popular articles on Intel Technologies

≫ Next: Error Loading libmpifort.so.12

≪ Previous: Slowdown of message exchange by multiple oders of magnitude due to dynamic connection

I have a 6-node cluster consisting of 12 cores per node with a total of 72 cores.

When running the HPCC benchmark on 6 cores - 1 core per node, 6 nodes - HPL results is 1198.87 GFLOPS. However, running HPCC on all available cores of the 6-node cluster, for a total of 72 cores, HPL results is 847.421 GFLOPS.

MPI Library Used: Intel(R) MPI Library for Linux* OS, Version 2018 Update 1 Build 20171011 (id: 17941)

Options to mpiexec.hydra:
-print-rank-map
-pmi-noaggregate
-nolocal
-genvall
-genv I_MPI_DEBUG 5
-genv I_MPI_HYDRA_IFACE ens2f0
-genv I_MPI_FABRICS shm:tcp
-n 72
-ppn 12
-ilp64
--hostname filename

Any ideas?

Thanks in advance.

↧

Error Loading libmpifort.so.12

December 19, 2017, 11:01 am

Latest and popular articles on Intel Technologies

≫ Next: Cannot correctly write to a file with MPI-IO and indexed type

≪ Previous: HPCC benchmark HPL results degrade as more cores are used

Hello,

I am trying to run my executable (PDES Simulator - ROSS) on a slurm cluster and I add the module mpich/ge/gcc/64/3.2rc2 for mpi support.

But I got "while loading shared libraries: libmpifort.so.12: cannot open shared object file: No such file or directory" error.

Which module should I add for libmpifort.so.12 ? Or should mpich/ge/gcc/64/3.2rc2 have it already that maybe I do a mistake while loading this module. "module list" shows me I have it though. Also "which libmpifort.so.12" gives me no such library message.

Thank you in advance.

Sincerely,

Ali.

↧

Cannot correctly write to a file with MPI-IO and indexed type

December 21, 2017, 6:30 am

Latest and popular articles on Intel Technologies

≫ Next: peak floating point operations of Intel Xeon E5345

≪ Previous: Error Loading libmpifort.so.12

Hello,

I am trying to write in a file using MPI-IO, with an indexed type used to set a view of the file (see code sample attached).
The program launch but I do not get the result I was expecting.

When I output the content of the file via "od -f TEST", I get:
0000000 0 0 1 3

But I was expecting:
0000000 0 2 1 3

I am using Intel MPI 2008 update 1 with gfortran.

Regards

Attachment	Size
Download idx2.f	2.02 KB

↧

peak floating point operations of Intel Xeon E5345

January 9, 2018, 9:25 pm

Latest and popular articles on Intel Technologies

≫ Next: Timers of PCH

≪ Previous: Cannot correctly write to a file with MPI-IO and indexed type

I want to find the peak floating point operations of Intel Xeon E5345 prcessor. i searched about that and found it 9.332 GFlop/s . I want to make sure . There is a formula(please correct me if I am wrong) :

Flops/s = #instructions per second * clock cycle

The clock cycle is 2.33 GHz( I ma not sure ) and I did not find the number of instructions per second the machine can perform.

Any idea

↧

Timers of PCH

January 18, 2018, 11:09 pm

Latest and popular articles on Intel Technologies

≫ Next: problems using PSM2 "Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 "

≪ Previous: peak floating point operations of Intel Xeon E5345

Hi,

As far as I know, Intel PCH provides some high resolution timers.

I want to know how many timers are provide and how to use them under Linux.

Thank you.

↧

problems using PSM2 "Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 "

January 23, 2018, 2:05 pm

Latest and popular articles on Intel Technologies

≫ Next: mpi calling external mpi program failed

≪ Previous: Timers of PCH

hello,

I installed the Omni-path driver ( IntelOPA-Basic.RHEL74-x86_64.10.6.1.0.2.tgz ) on two identical KNL/F servers with Centos ( CentOS Linux release 7.4.1708 (Core) )

I executed the MPI Benchmark provided by intel using PSM2:

mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

And the execution return the following error:

[silvio@phi05 ~]$ mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so ""
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so ""
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so ""
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so ""
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so ""
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so ""
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
phi05.11971 Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11971 RUNNING AT 10.0.0.5
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================

I search this message on google (Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 ) and the only reference is the following source code:

https://github.com/01org/opa-psm2/blob/master/ptl_ips/ips_proto_connect.c

How do i put the two fabrics in the same subnet?

When i change to Infiniband it works:

mpirun -IB -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

#-----------------------------------------------------------------------------

# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 17.79 17.79 17.79 0.00
1 1000 18.11 18.11 18.11 0.11
2 1000 18.05 18.05 18.05 0.22
4 1000 18.08 18.08 18.08 0.44
8 1000 18.05 18.05 18.05 0.89
16 1000 18.06 18.06 18.06 1.77
32 1000 18.99 18.99 18.99 3.37
64 1000 19.05 19.07 19.06 6.71
128 1000 19.20 19.20 19.20 13.33
256 1000 19.96 19.97 19.97 25.64
512 1000 20.22 20.22 20.22 50.63
1024 1000 20.38 20.39 20.39 100.44
2048 1000 24.70 24.71 24.70 165.78
4096 1000 25.98 25.98 25.98 315.31
8192 1000 55.57 55.59 55.58 294.75
16384 1000 61.89 61.90 61.90 529.33
32768 1000 112.95 113.01 112.98 579.89
65536 640 158.22 158.23 158.22 828.37
131072 320 297.40 297.50 297.45 881.16
262144 160 599.27 600.30 599.78 873.38
524288 80 31394.80 31489.45 31442.13 33.30
1048576 40 28356.10 28414.67 28385.39 73.81
2097152 20 31387.65 31661.40 31524.53 132.47
4194304 10 38455.80 40408.99 39432.39 207.59

↧

mpi calling external mpi program failed

January 24, 2018, 10:30 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance issues with Omni Path

≪ Previous: problems using PSM2 "Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 "

I tried to use MPI to parallel running external command line program.

So I write `run.f90`

    program run
          use mpi
          implicit none
          integer::num_process,rank,ierr;

          call MPI_Init(ierr);

          call MPI_Comm_rank(MPI_COMM_WORLD, rank,ierr);
          call MPI_Comm_size(MPI_COMM_WORLD, num_process,ierr);

          call execute_command_line('./a.out')

          call MPI_Finalize(ierr);

    end program

I compile it using intel compiler like

 mpiifort run.f90 -o run.out

Now if `./a.out` is normal non-MPI program like

program test
    implicit none
    print*,'hello'
    end

then

mpiexec.hydra -n 4 ./run.out

works fine.

However, if `./a.out` is also a MPI program like

    program mpi_test
              use mpi
              implicit none
              integer::num_process,rank,ierr;

              call MPI_Init(ierr);

              call MPI_Comm_rank(MPI_COMM_WORLD, rank,ierr);
              call MPI_Comm_size(MPI_COMM_WORLD, num_process,ierr);

              print*,rank

              call MPI_Finalize(ierr);

        end program

Then, I got error after running "mpiexec.hydra -n 4 ./run.out"

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 19519 RUNNING AT i02n18
= EXIT CODE: 13
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 19522 RUNNING AT i02n18
= EXIT CODE: 13
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764

What is wrong?

↧

Performance issues with Omni Path

February 2, 2018, 5:19 am

Latest and popular articles on Intel Technologies

≫ Next: problem with intel mpi 2019

≪ Previous: mpi calling external mpi program failed

Hi all,

I installed two Omni Path Fabric cards on two Xeon Servers.

Following the instructions present in this web site: https://software.intel.com/en-us/articles/using-intel-omni-path-architec...

The performance tests in this link shows that the network achieved 100 Gb/s - (4194304 10 360.39 360.39 360.39 23276.25)

I the network i deployed i achieved half of this performance ( 4194304 10 661.40 661.40 661.40 12683.17):

Is there some configuration needed to achieve 100 Gb/s using Omni Path?

Here is the complete output of benchmark execute:

mpirun -PSM2 -host 10.0.0.3 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.1 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

[silvio@phi03 ~]$ mpirun -PSM2 -host 10.0.0.3 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.1 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
#------------------------------------------------------------
# Intel (R) MPI Benchmarks 2018 Update 1, MPI-1 part
#------------------------------------------------------------
# Date : Fri Feb 2 11:14:01 2018
# Machine : x86_64
# System : Linux
# Release : 3.10.0-693.17.1.el7.x86_64
# Version : #1 SMP Thu Jan 25 20:13:58 UTC 2018
# MPI Version : 3.1
# MPI Thread Environment:

# Calling sequence was:

# /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv

# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#

# List of Benchmarks to run:

# Sendrecv

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 1.92 1.92 1.92 0.00
1 1000 1.85 1.85 1.85 1.08
2 1000 1.84 1.84 1.84 2.17
4 1000 1.84 1.84 1.84 4.35
8 1000 1.76 1.76 1.76 9.10
16 1000 2.07 2.07 2.07 15.44
32 1000 2.06 2.07 2.07 30.98
64 1000 2.02 2.02 2.02 63.46
128 1000 2.08 2.08 2.08 123.26
256 1000 2.11 2.11 2.11 242.41
512 1000 2.25 2.25 2.25 454.30
1024 1000 3.56 3.56 3.56 575.46
2048 1000 4.19 4.19 4.19 976.91
4096 1000 5.16 5.16 5.16 1586.69
8192 1000 7.15 7.15 7.15 2290.80
16384 1000 14.32 14.32 14.32 2288.44
32768 1000 20.77 20.77 20.77 3154.69
65536 640 26.08 26.09 26.09 5024.04
131072 320 34.77 34.77 34.77 7538.32
262144 160 53.03 53.03 53.03 9886.58
524288 80 93.55 93.55 93.55 11208.78
1048576 40 172.25 172.28 172.26 12173.26
2097152 20 355.15 355.21 355.18 11808.02
4194304 10 661.40 661.40 661.40 12683.17

# All processes entering MPI_Finalize

Thanks in advance!

Silvio

↧