Quantcast
Channel: Clusters and HPC Technology
Viewing all 930 articles
Browse latest View live

where are the others 4 cores?

$
0
0

 

Hi, Intel support guys,

I am running tests on our SKYLAKE computers. I am  surprise to see there are 4 cores/pkg gone. Where are they? 

Our computer system information is below:

Process: Intel Xeon Gold 6148 CPU@2.40GHz 2.39GHz (2 processors)

Installed memory: 384GB

System type: 64-bit operating system x64-based processor

OS: Windows server 2016 standard

Please see the following outputs and you will see that 4 cores per package are gone. where are these 8 cores in total?

I am looking forward to hearing from you.

Thanks in advance

Best regards,

Dingjun

Computer Modelling Group Ltd.

Calgary, AB, Canada

 

 

VECTOR_SIMD_OPENMP_TEST
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}
OMP: Info #156: KMP_AFFINITY: 32 available OS procs
OMP: Info #158: KMP_AFFINITY: Nonuniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 20 cores/pkg x 1 threads/core (32 total cores)
OMP: Info #213: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 17
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 18
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 19
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 0 core 20
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 24
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 0 core 25
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 0 core 26
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 0 core 27
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 0 core 28
OMP: Info #171: KMP_AFFINITY: OS proc 20 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 21 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 22 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 23 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 24 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 25 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 26 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 27 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 28 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 29 maps to package 1 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 30 maps to package 1 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 31 maps to package 1 core 17
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 4004 thread 0 bound to OS proc set {0}
  The number of processors available =       32
  The number of threads available    =       20
  HELLO from process        0
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8956 thread 1 bound to OS proc set {1}
  HELLO from process        1
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8820 thread 2 bound to OS proc set {2}
  HELLO from process        2
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9292 thread 3 bound to OS proc set {3}
  HELLO from process        3
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9752 thread 4 bound to OS proc set {4}
  HELLO from process        4
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3776 thread 5 bound to OS proc set {5}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8464 thread 6 bound to OS proc set {6}
  HELLO from process        5
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 1416 thread 7 bound to OS proc set {7}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3868 thread 8 bound to OS proc set {8}
  HELLO from process        6
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 7396 thread 9 bound to OS proc set {9}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9772 thread 10 bound to OS proc set {10}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9280 thread 11 bound to OS proc set {11}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9948 thread 12 bound to OS proc set {12}
  HELLO from process        7
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8712 thread 13 bound to OS proc set {13}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 6092 thread 14 bound to OS proc set {14}
  HELLO from process       11
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8532 thread 15 bound to OS proc set {15}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9892 thread 16 bound to OS proc set {16}
  HELLO from process       12
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 10640 thread 17 bound to OS proc set {17}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9060 thread 18 bound to OS proc set {18}
  HELLO from process       14
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8908 thread 19 bound to OS proc set {19}
  HELLO from process       18
  HELLO from process       16
  HELLO from process       19
  HELLO from process       13
  HELLO from process        8
  HELLO from process       17
  HELLO from process       15
  HELLO from process       10
  HELLO from process        9
matrix multiplication completed
  Elapsed wall clock time 2 =    133.379


Announcing the Intel® Parallel Studio XE 2019 Beta Program

$
0
0

Join the Intel® Parallel Studio XE 2019 Beta Program today and—for a limited time—get early access to new features and get an open invitation to tell us what you really think.

We want YOU to tell us what to improve so we can create high-quality software tools that meet your development needs.

Sign Up Now >

Top New Features in Intel® Parallel Studio XE 2019 Beta

  • Scale and perform on the path to exascale. Enable greater scalability and improve latency with the latest Intel® MPI Library.
  • Get better answers with less overhead. Focus more fully on useful data, CPU utilization of physical cores, and more using new data-selection support from Intel® VTune Amplifier’s Application Performance Snapshot.
  • Visualize parallelism. Interactively build, validate, and visualize algorithms using Intel® Advisor’s Flow Graph Analyzer.
  • Stay up-to-date with the latest standards:
    • Expanded C++17 and Fortran 2018 support
    • Full OpenMP* 4.5 and expanded OpenMP 5.0 support
    • Python* 3.6 and 2.7

New Features in Intel® MPI Library

  • Updated architecture to streamline fabric utilization through libfabrics.
  • Implemented support for Intel® Omni-Path Architecture Multiple Endpoints (Multi-EP)./li>
  • Cleaned up directory structure.
  • New format for MPI tuner.
  • Added impi_info utility as a technical preview feature.
  • Updated Hydra process manager.

New Features in Intel® Cluster Checker

  • Simplified execution of Intel® Cluster Checker with a single command.
  • New ‘-X’ option to get details of data collected and analysis test.
  • New feature to compare two snapshots of a cluster state to identify changes.
  • New option to refresh any missing or old data before analysis.
  • Added auto-node discovery when using SLURM.

To learn more, visit Intel® Parallel Studio XE 2019 Beta page.

Then sign up to get started.

IMPI run error

$
0
0
Dear All,
    I compiled VASP package with IMPI succcessfully, when I run the program, the program stops with 
some MPI errors listed as below.  Could anybody tell me how to fix it. Thanks!

Xiang YE

[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 4  Build 20170817 (id: 17752)
[0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[3] MPI startup(): Found 1 IB devices
[5] MPI startup(): Found 1 IB devices
[24] MPI startup(): Found 1 IB devices
[15] MPI startup(): Found 1 IB devices
[25] MPI startup(): Found 1 IB devices
[14] MPI startup(): Found 1 IB devices
[26] MPI startup(): Found 1 IB devices
[9] MPI startup(): Found 1 IB devices
[16] MPI startup(): Found 1 IB devices
[27] MPI startup(): Found 1 IB devices
[7] MPI startup(): Found 1 IB devices
[1] MPI startup(): Found 1 IB devices
[17] MPI startup(): Found 1 IB devices
[18] MPI startup(): Found 1 IB devices
[8] MPI startup(): Found 1 IB devices
[10] MPI startup(): Found 1 IB devices
[19] MPI startup(): Found 1 IB devices
[20] MPI startup(): Found 1 IB devices
[0] MPI startup(): Found 1 IB devices
[11] MPI startup(): Found 1 IB devices
[21] MPI startup(): Found 1 IB devices
[2] MPI startup(): Found 1 IB devices
[22] MPI startup(): Found 1 IB devices
[13] MPI startup(): Found 1 IB devices
[12] MPI startup(): Found 1 IB devices
[23] MPI startup(): Found 1 IB devices
[4] MPI startup(): Found 1 IB devices
[6] MPI startup(): Found 1 IB devices
[0] MPI startup(): Open 0 IB device: mlx5_0
[11] MPI startup(): Open 0 IB device: mlx5_0
[20] MPI startup(): Open 0 IB device: mlx5_0
[2] MPI startup(): Open 0 IB device: mlx5_0
[21] MPI startup(): Open 0 IB device: mlx5_0
[9] MPI startup(): Open 0 IB device: mlx5_0
[3] MPI startup(): Open 0 IB device: mlx5_0
[22] MPI startup(): Open 0 IB device: mlx5_0
[25] MPI startup(): Open 0 IB device: mlx5_0
[1] MPI startup(): Open 0 IB device: mlx5_0
[23] MPI startup(): Open 0 IB device: mlx5_0
[13] MPI startup(): Open 0 IB device: mlx5_0
[4] MPI startup(): Open 0 IB device: mlx5_0
[10] MPI startup(): Open 0 IB device: mlx5_0
[15] MPI startup(): Open 0 IB device: mlx5_0
[26] MPI startup(): Open 0 IB device: mlx5_0
[14] MPI startup(): Open 0 IB device: mlx5_0
[16] MPI startup(): Open 0 IB device: mlx5_0
[27] MPI startup(): Open 0 IB device: mlx5_0
[19] MPI startup(): Open 0 IB device: mlx5_0
[8] MPI startup(): Open 0 IB device: mlx5_0
[18] MPI startup(): Open 0 IB device: mlx5_0
[6] MPI startup(): Open 0 IB device: mlx5_0
[24] MPI startup(): Open 0 IB device: mlx5_0
[7] MPI startup(): Open 0 IB device: mlx5_0
[12] MPI startup(): Open 0 IB device: mlx5_0
[17] MPI startup(): Open 0 IB device: mlx5_0
[5] MPI startup(): Open 0 IB device: mlx5_0
[0] MPI startup(): Start 1 ports per adapter
[20] MPI startup(): Start 1 ports per adapter
[11] MPI startup(): Start 1 ports per adapter
[9] MPI startup(): Start 1 ports per adapter
[3] MPI startup(): Start 1 ports per adapter
[21] MPI startup(): Start 1 ports per adapter
[2] MPI startup(): Start 1 ports per adapter
[1] MPI startup(): Start 1 ports per adapter
[25] MPI startup(): Start 1 ports per adapter
[22] MPI startup(): Start 1 ports per adapter
[23] MPI startup(): Start 1 ports per adapter
[4] MPI startup(): Start 1 ports per adapter
[10] MPI startup(): Start 1 ports per adapter
[15] MPI startup(): Start 1 ports per adapter
[13] MPI startup(): Start 1 ports per adapter
[26] MPI startup(): Start 1 ports per adapter
[14] MPI startup(): Start 1 ports per adapter
[27] MPI startup(): Start 1 ports per adapter
[16] MPI startup(): Start 1 ports per adapter
[12] MPI startup(): Start 1 ports per adapter
[18] MPI startup(): Start 1 ports per adapter
[24] MPI startup(): Start 1 ports per adapter
[6] MPI startup(): Start 1 ports per adapter
[19] MPI startup(): Start 1 ports per adapter
[8] MPI startup(): Start 1 ports per adapter
[5] MPI startup(): Start 1 ports per adapter
[17] MPI startup(): Start 1 ports per adapter
[7] MPI startup(): Start 1 ports per adapter
[11] MPID_nem_ofacm_init(): Init
[0] MPID_nem_ofacm_init(): Init
[20] MPID_nem_ofacm_init(): Init
[9] MPID_nem_ofacm_init(): Init
[3] MPID_nem_ofacm_init(): Init
[21] MPID_nem_ofacm_init(): Init
[2] MPID_nem_ofacm_init(): Init
[1] MPID_nem_ofacm_init(): Init
[22] MPID_nem_ofacm_init(): Init
[25] MPID_nem_ofacm_init(): Init
[23] MPID_nem_ofacm_init(): Init
[11] MPI startup(): ofa data transfer mode
[0] MPI startup(): ofa data transfer mode
[20] MPI startup(): ofa data transfer mode
[4] MPID_nem_ofacm_init(): Init
[10] MPID_nem_ofacm_init(): Init
[26] MPID_nem_ofacm_init(): Init
[14] MPID_nem_ofacm_init(): Init
[15] MPID_nem_ofacm_init(): Init
[13] MPID_nem_ofacm_init(): Init
[27] MPID_nem_ofacm_init(): Init
[16] MPID_nem_ofacm_init(): Init
[12] MPID_nem_ofacm_init(): Init
[9] MPI startup(): ofa data transfer mode
[18] MPID_nem_ofacm_init(): Init
[3] MPI startup(): ofa data transfer mode
[8] MPID_nem_ofacm_init(): Init
[24] MPID_nem_ofacm_init(): Init
[19] MPID_nem_ofacm_init(): Init
[5] MPID_nem_ofacm_init(): Init
[21] MPI startup(): ofa data transfer mode
[17] MPID_nem_ofacm_init(): Init
[1] MPI startup(): ofa data transfer mode
[7] MPID_nem_ofacm_init(): Init
[2] MPI startup(): ofa data transfer mode
[6] MPID_nem_ofacm_init(): Init
[22] MPI startup(): ofa data transfer mode
[25] MPI startup(): ofa data transfer mode
[23] MPI startup(): ofa data transfer mode
[10] MPI startup(): ofa data transfer mode
[14] MPI startup(): ofa data transfer mode
[15] MPI startup(): ofa data transfer mode
[26] MPI startup(): ofa data transfer mode
[4] MPI startup(): ofa data transfer mode
[13] MPI startup(): ofa data transfer mode
[27] MPI startup(): ofa data transfer mode
[16] MPI startup(): ofa data transfer mode
[12] MPI startup(): ofa data transfer mode
[18] MPI startup(): ofa data transfer mode
[24] MPI startup(): ofa data transfer mode
[19] MPI startup(): ofa data transfer mode
[8] MPI startup(): ofa data transfer mode
[5] MPI startup(): ofa data transfer mode
[6] MPI startup(): ofa data transfer mode
[7] MPI startup(): ofa data transfer mode
[17] MPI startup(): ofa data transfer mode
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1866)......: fail failed
MPIR_Comm_commit(711): fail failed
(unknown)(): Other MPI error

Problem compiling with Intel MPI 2018.2 and ifort 15.0.3

$
0
0

Hi Everyone,

I just installed the newest version of Intel MPI library (2018.2.199), previously I was using openMPI. I am using ifort 15.0.3.
I am trying to compile the following test program:

program main
    use mpi_f08
    implicit none
    integer :: rank, size, len
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version

    call MPI_INIT()
    call MPI_COMM_RANK(MPI_COMM_WORLD, rank)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, size)
    call MPI_GET_LIBRARY_VERSION(version, len)

    print *, "rank:", rank
    print *, "size:",size
    print *, "version: "//version
    print *, ' No Errors'

    call MPI_FINALIZE()
end

When I use openMPI it works fine. However, I am getting the following errors with Intel MPI:

% mpiifort test_F08.f90 
test_F08.f90(2): error #7012: The module file cannot be read.  Its format requires a more recent F90 compiler.   [MPI_F08]
    use mpi_f08
--------^
test_F08.f90(8): error #6404: This name does not have a type, and must have an explicit type.   [MPI_COMM_WORLD]
    call MPI_COMM_RANK(MPI_COMM_WORLD, rank)
-----------------------^
test_F08.f90(5): error #6279: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association.   [MPI_MAX_LIBRARY_VERSION_STRING]
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version
------------------^
test_F08.f90(5): error #6591: An automatic object is invalid in a main program.   [VERSION]
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version
-----------------------------------------------------^
compilation aborted for test_F08.f90 (code 1)

So, do I have to use the same version of ifort that was used during compilation of Intel MPI modules? That is not listed in the requirements to use intel MPI.
Why does Intel MPI not create new modules files using the Fortran compiler available in the system?
Is there anything that I can do to use Intel MPI with my compiler?

Thanks for your help,

Hector

Support for NVIDIA GPUdirect RDMA?

$
0
0

Does Intel MPI support GPUdirect RDMA, with NVIDIA drivers and Cudatoolkit 9.x installed?

Is there any documentation on what drivers to install, and what fabric select env vars to set?

Thanks

Ron

Interested in buying a "used" cluster edition compiler for linux

$
0
0

My understanding is that Intel allows one to sell and transfer your license to someone else. I am a small scale open-source developer and can't afford the price for the latest cluster edition linux compilers. Send me a note if you have an older version you wouldn't mind transferring to me. For my needs anything 2015 and newer would suffice.

adding further compute nodes

$
0
0

Hi,

Is there a need to re-install Intel Studio even in the case then I added further compute nodes to my cluster? There exists two infiniband -islands,  ibstat is:

CA 'mlx4_0', CA type: MT4099 and CA 'mlx4_1', CA type: MT26428. The latest  compute nodes are associated to MT4099.

These provider -errors are only present in the 'newer node context'

[2] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[10] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[12] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
node009:UCM:2d97:570fa700: 1249 us(1249 us):  open_hca: device mlx4_0 not found
node009:UCM:2d9f:1626f700: 1262 us(1262 us):  open_hca: device mlx4_0 not found
node009:UCM:2da1:7f214700: 1102 us(1102 us):  open_hca: device mlx4_0 not found

 

Regards

Gert

AttachmentSize
Downloadtext/plainib_provider.txt5.87 KB

Visual Studio project settings to instrument for Trace Analyzer

$
0
0

Hello,

I'm just getting started with Intel MPI and am trying to understand how to use Trace Analyzer. My understanding is that linking with vt.lib and running an mpi application is sufficient to cause a *.stf file to be emitted. I have a simple Hello World MPI application. After linking with vt.lib and running through mpiexec, I see no stf output.

There's not much more information to add. The setup could not be simpler. What am I missing?

Jeff


Issue with MPI_Sendrecv

$
0
0

Hello,

I am experiencing issues while using MPI_Sendrecv on multiple machines. In the code I am sending a vector in the circular manner in parallel. Each process is sending data to the subsequent process and receiving data from preceding process. Surprisingly, in the first execution of  SEND_DATA routine the output is correct. While for the second execution the output is incorrect. The code and the output are below. 

PROGRAM SENDRECV_REPROD
USE MPI
USE ISO_FORTRAN_ENV,ONLY: INT32
IMPLICIT NONE
INTEGER(KIND=INT32) :: STATUS(MPI_STATUS_SIZE) 
INTEGER(KIND=INT32) :: RANK,NUM_PROCS,IERR

CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)

CALL SEND_DATA(RANK,NUM_PROCS)
CALL SEND_DATA(RANK,NUM_PROCS)

CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)  
CALL MPI_FINALIZE(IERR)

END PROGRAM

SUBROUTINE SEND_DATA(RANK,NUM_PROCS)
USE ISO_FORTRAN_ENV,ONLY: INT32,REAL64
USE MPI
IMPLICIT NONE
INTEGER(KIND=INT32),INTENT(IN) :: RANK
INTEGER(KIND=INT32),INTENT(IN) :: NUM_PROCS
INTEGER(KIND=INT32) :: IERR,ALLOC_ERROR
INTEGER(KIND=INT32) :: VEC_SIZE,I_RANK,RANK_DESTIN,RANK_SOURCE,TAG_SEND,TAG_RECV
REAL(KIND=REAL64), ALLOCATABLE :: COMM_BUFFER(:),VEC1(:)
INTEGER(KIND=INT32) :: MPI_COMM_STATUS(MPI_STATUS_SIZE) 



! Allocate communication arrays.

VEC_SIZE = 374454
ALLOCATE(COMM_BUFFER(VEC_SIZE),STAT=ALLOC_ERROR)
ALLOCATE(VEC1(VEC_SIZE),STAT=ALLOC_ERROR)



! Define destination and source ranks for sending and receiving messages.

RANK_DESTIN = MOD((RANK+1),NUM_PROCS)
RANK_SOURCE = MOD((RANK+NUM_PROCS-1),NUM_PROCS)

TAG_SEND = RANK+1
TAG_RECV = RANK
IF (RANK==0) TAG_RECV=NUM_PROCS

VEC1=RANK
COMM_BUFFER=0.0_REAL64
        
    
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R',RANK, VEC1(1),'B', COMM_BUFFER(1)
ENDDO

CALL MPI_SENDRECV(VEC1(1),VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_DESTIN,TAG_SEND,COMM_BUFFER(1),&
                    VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_SOURCE,TAG_RECV,MPI_COMM_WORLD,MPI_COMM_STATUS,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R' ,  RANK , VEC1(1),'A', COMM_BUFFER(1)
ENDDO



END SUBROUTINE SEND_DATA 

Output of four processes run on four machines:

 R           0  0.000000000000000E+000 B  0.000000000000000E+000

 R           1   1.00000000000000      B  0.000000000000000E+000

 R           2   2.00000000000000      B  0.000000000000000E+000

 R           3   3.00000000000000      B  0.000000000000000E+000

 R           0  0.000000000000000E+000 A   3.00000000000000     

 R           1   1.00000000000000      A  0.000000000000000E+000

 R           2   2.00000000000000      A   1.00000000000000     

 R           3   3.00000000000000      A   2.00000000000000     

 R           0  0.000000000000000E+000 B  0.000000000000000E+000

 R           1   1.00000000000000      B  0.000000000000000E+000

 R           2   2.00000000000000      B  0.000000000000000E+000

 R           3   3.00000000000000      B  0.000000000000000E+000

 R           0  0.000000000000000E+000 A   2.00000000000000     

 R           1   1.00000000000000      A   3.00000000000000     

 R           2   2.00000000000000      A  0.000000000000000E+000

 R           3   3.00000000000000      A   1.00000000000000    

 

As you see the output of first SEND_DATA execution is different from the second. The results are correct if I run the reproducer on single machine with multiple processes. I am compiling the code with:  mpiifort for the Intel(R) MPI Library 2017 Update 3 for Linux* ifort version 17.0.4

and running with mpirun version Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405.

Do you have any idea what could be a source of this issue?

Thank you,
Piotr

mpirun: unexpected disconnect completion event

$
0
0

 

Hi,

I've been running on 5 (distributed memory) nodes (each has 20 processors) by using mpirun -n 5 -ppn 1 -hosts nd1,nd2,nd3,nd4,nd5.

Sometimes it works, sometimes, it gives inaccurate results, and sometimes it crashes with the error:

"[0:nd1] unexpected disconnect completion event from [35:nd2] Fatal error in PMPI_Comm_dup: Internal MPI error!, error stack ...". 

Any suggestion to fix this communication error while running on multiple nodes with mpi (2017 update 2)?

I already set the stacksize to unlimited in my .rc. file. I tested this for two different applications (one is the famous distributed-memory solver, MUMPS). I have the same issue with both. This is not a very memory-demanding job. mpirun works perfectly on 1 node, this only happens on multiple nodes (even 2).

Thanks

 

Suport for mpi_f08 Fortran module in MPI 2019.0.045 beta

$
0
0

Hi!

I am testing Intel Parallel Studio 2019.0.045 beta for windows. The Intel MPI library that comes with it  does not support the Fortran module mpi_f08. Whereas the the Linux version provides such module. Why does this module is not supported in Windows?

Are you planning to support the mpi_f08 module for Windows in the future?

Thanks for your help,

Hector

Trace Collector + Fortran 2008

$
0
0

Hi,

I have observed that when trying to trace the following program with mpiexec -trace everything work fine
as long as I stick with "use mpi". If I change that to "use mpi_f08" I do not get a tracefile.
The reason I'm interested in using mpi_f08 is because I have an application to trace that uses
the shared memory MPI model and it seems that the call to

MPI_Comm_split_type

that is used below is only possible with the mpi_f08 module, right?

Any hints on why I cannot trace that program when using "use mpi_f08"?

Some extra Info:

$ mpiifort -o shm shm.f90
$ mpiifort --version
ifort (IFORT) 18.0.2 20180210

$ mpiexec -trace -np 4 shm

 

 

program nicks_program

   ! use mpi_f08
   use mpi 

   implicit none

   integer :: wrank, wsize, sm_rank, sm_size, ierr, send
   type(MPI_COMM) :: MPI_COMM_SHARED 

   call MPI_Init(ierr)
   call MPI_comm_rank(MPI_COMM_WORLD, wrank, ierr)
   call MPI_comm_size(MPI_COMM_WORLD, wsize, ierr)

   ! call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, MPI_COMM_SHARED, ierr)
   send = wrank


   call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )
   ! call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_SHARED, ierr )

   write(*,*) 'send = ', send
   write(*,*) 'ierr = ', ierr

   call MPI_Finalize(ierr)
end

 

 

 

IMPI w/ Slurm

$
0
0

I'm working at a site configured with IMPI (2016.4.072) / Slurm (17.11.4).  The MpiDefault is none.

When I run my MPICH2 code (defaulting to --mpi=none)

     srun -N 2 -n 4 -l -vv ...

I get (trimming out duplicate error messages from other ranks)

0: PMII_singinit: execv failed: No such file or directory

0: [unset]:   This singleton init program attempted to access some feature

0: [unset]:   for which process manager support was required, e.g. spawn or universe_size.

0: [unset]:   But the necessary mpiexec is not in your path.

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P2-hostname

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P3-hostname

0: :

0: system msg for write_line failure : Bad file descriptor

0: 2018-05-25 09:00:14  2: MPI startup(): Multi-threaded optimized library

0: 2018-05-25 09:00:14  2: DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u

0: 2018-05-25 09:00:14  2: MPI startup(): DAPL provider ofa-v2-mlx4_0-1u

0: 2018-05-25 09:00:14  2: MPI startup(): shm and dapl data transfer modes

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=foobar key=foobar

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0

0: :

0: system msg for write_line failure : Bad file descriptor

0: Fatal error in PMPI_Init_thread: Other MPI error, error stack:

0: MPIR_Init_thread(784).................:

0: MPID_Init(1332).......................: channel initialization failed

0: MPIDI_CH3_Init(141)...................:

0: dapl_rc_setup_all_connections_20(1388): generic failure with errno = 872614415

0: getConnInfoKVS(849)...................: PMI_KVS_Get failed

 

If I run the same code with

 

   srun --mpi=pmi2 ...

 

it works fine.

 

A couple of questions/comments:

1. In neither case do I set I_MPI_PMI_LIBRARY, which I thought I needed to -- how else does IMPI find the Slurm PMI?  This might be why --mpi=none is failing, but for the moment, I can't set the variable because I can't find libpmi[1,2,x].so.

2. I would think that since none is the default, it should work.  Under what conditions would none fail, but pmi2 work?  Is it because IMPI supports pmi2?

3. If I do need to set I_MPI_PMI_LIBRARY, why does pmi2 still work without setting I_MPI_PMI_LIBRARY?  Or do I not need to set it when using IMPI?

4. I'm still trying to understand a bit more of the correlation between libpmi.so and mpi_*.so.  libpmi.so is the Slurm PMI library, correct?  And mpi_* are the Slurm plug-in libraries (e.g. mpi_none, mpi_pmi2, etc.).  How do these libraries fit together?

 

Thanks,

Raymond

Cannot use MPI 2019.0.045 beta

$
0
0

Hi there!

I am unable to use Intel MPI library from Parallel Studio XE beta 2019 in Windows.

I am trying to compile the following code with mpicc:
http://people.sc.fsu.edu/~jburkardt/c_src/hello_mpi/hello_mpi.c

It seems to compile ok.

However, when I run it with mpiexec there is no output.
mpiexec -n 1 hello_mpi.exe

I don't have this problem with Intel MPI 2018.

Thanks for your help,

Hector

Error when opening command prompt with Intel compiler

$
0
0

After installing IPS XE 2019 Beta, I am experiencing a problem with IPS XE 2017 Update 2 when running "Intel 64 Visual Studio 2015 environment". I get error "The application was unable to start correctly (0xc0000005). Click OK to close the application." in a window titled "fi_info.exe - Application Error". The corresponding command prompt window I am opening is titled "Intel(R) MPI Library 2019 Pre-Release (Beta) for Windows* Target Build Environment for Intel(R) 64 applications" and the same text is also in the command prompt window. After I click OK, the command prompt window contains following:

Intel(R) MPI Library 2019 Pre-Release (Beta) for Windows* Target Build Environment for Intel(R) 64 applications
Copyright 2007-2018 Intel Corporation.

Intel(R) MPI Library 2017 Update 2 for Windows* Target Build Environment for Intel(R) 64 applications
Copyright (C) 2007-2017 Intel Corporation. All rights reserved.

Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
Intel(R) Compiler 17.0 Update 2 (package 187)

Why would MPI 2019 be used in IPS XE 2017?

The same happens when I try running "Intel 64 Visual Studio 2015 environment" in IPS XE 2019 Beta instead of in IPS XE 2017 Update 2.

Can I ignore the problem? I am not using MPI in my applications.


MPI_File_get_size, max file limit on windows 10

$
0
0

It seems the max file size returned is limited to a 4 byte unsigned integer, even though MPI_Offset is 8 bytes. The following program fails in MPI_File_get_size for files larger than 4GB. Is there a way around this?

This is for Windows 10, mpicc.bat for the Intel(R) MPI Library 2018 Update 2 for Windows*
Copyright 2007-2018 Intel Corporation.

Microsoft (R) C/C++ Optimizing Compiler Version 19.14.26430 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

Attached program shows the issue.

AttachmentSize
Downloadtext/x-csrcwrite.c792 bytes

Mpi_comm_spawn with large number of children hangs at mpi_init

$
0
0

Hello,

I have a Fortran 90 mpi program running in on a linux cluster, with intel/2018.0.2 and intelmpi/2018.0.2 compilers, which uses MPI_COMM_SPAWN to spawn 1 child process of a C++ mpi program per parent process. The idea is that the parent processes are mapped evenly across the nodes, each of which spawns a child, waits for a blocking send/recv from it to signal completion, and then goes on to work with the output of the child.

Here is the command I use to spawn call the children:

call MPI_COMM_SPAWN('MUSIC', argv, 1, info, 0, &
        MPI_COMM_SELF, MPI_COMM_CHILD, MPI_ERRCODES_IGNORE, ierr)

So maxprocs=1 process is spawned by each parent, using its own communicator, concurrently by all the parent processes (or whenever they reach this call).

I have tested the code and it works for 8 processes (8 parent + 8 child = 16 total), spread over 2 nodes. I'm now trying to scale up to 128 processes spread over 32 nodes, but all of the children processes are hanging (I think) at Mpi_Init(). I can top on the nodes and see that they (the correct number of them) are running, so they have been spawned, but aren't progressing through the program.

Here is the tail of stdout with I_MPI_DEBUG=10:

[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx5_0-1u
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx5_0-1u
[0] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[0] I_MPI_dlopen_dat(): trying to load default dat library: libdat2.so.2
[0] MPI startup(): DAPL provider ofa-v2-mlx5_0-1u
[0] MPI startup(): DAPL provider ofa-v2-mlx5_0-1u
[0] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): shm and dapl data transfer modes
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPI startup(): DAPL provider ofa-v2-mlx5_0-1u
[0] MPI startup(): DAPL provider ofa-v2-mlx5_0-1u
[0] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): shm and dapl data transfer modes
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000

 

This is what is suggests to me the children are hanging at either startup or Mpi_Init(), since these are some, but not all of the "MPI startup():" messages they should produce with a successful start up. By inspection of the successful start ups, after the above, there should be some messages about the cores on each node and then: 

[0] MPI startup(): I_MPI_INFO_CACHE3=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,\
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_CACHES=3
[0] MPI startup(): I_MPI_INFO_CACHE_SHARE=2,2,64
[0] MPI startup(): I_MPI_INFO_CACHE_SIZE=32768,1048576,28835840
[0] MPI startup(): I_MPI_INFO_CORE=0,1,2,3,4,8,9,10,11,12,16,17,18,19,20,24,25,26,27,28,0,1,2,3,4,8,9,10,11,12,16\
,17,18,19,20,24,25,26,27,28,0,1,2,3,4,8,9,10,11,12,16,17,18,19,20,24,25,26,27,28,0,1,2,3,4,8,9,10,11,12,16,17,18,\
19,20,24,25,26,27,28
[0] MPI startup(): I_MPI_INFO_C_NAME=Unknown
[0] MPI startup(): I_MPI_INFO_DESC=1342177280
[0] MPI startup(): I_MPI_INFO_FLGB=-744488965
[0] MPI startup(): I_MPI_INFO_FLGC=2147417079
[0] MPI startup(): I_MPI_INFO_FLGCEXT=8
[0] MPI startup(): I_MPI_INFO_FLGD=-1075053569
[0] MPI startup(): I_MPI_INFO_FLGDEXT=201326592
[0] MPI startup(): I_MPI_INFO_LCPU=80
[0] MPI startup(): I_MPI_INFO_MODE=775
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_MAP=mlx5_0:0
[0] MPI startup(): I_MPI_INFO_NUMA_NODE_NUM=2
[0] MPI startup(): I_MPI_INFO_PACK=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,\
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_SIGN=329300
[0] MPI startup(): I_MPI_INFO_STATE=0
[0] MPI startup(): I_MPI_INFO_THREAD=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,\
0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
[0] MPI startup(): I_MPI_INFO_VEND=1
[0] MPI startup(): I_MPI_PIN_INFO=x0,1,2,3,4,5,6,7,8,9,40,41,42,43,44,45,46,47,48,49
[0] MPI startup(): I_MPI_PIN_MAPPING=4:0 0,1 10,2 20,3 30

 

which are the last messages produced by the successful start up of the parent processes (similarly by the children in the 8 process case).

There is another thread https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog... where on windows there was trouble with large number of child processes, and they had some success switching impi.dll to the debug version, although they were observing an outright crash and not a hang.

Any help/suggestions on how to debug greatly appreciated.

 

 

Fatal error in PMPI_Type_size: Invalid datatype, error stack:

$
0
0

I am trying to use -trace flag to get .stf output file for traceanalyzer. I run my job using this script:

#!/bin/bash -l
#PBS -l nodes=2:ppn=40,walltime=00:10:00
#PBS -N GranularGas
#PBS -o granularjob.out -e granularjob.err

export MPIRUN=/apps/intel/ComposerXE2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/bin/mpirun
export CODEPATH=${HOME}/GranularGas/1.1_parallel_GranularGas/build
source /apps/intel/ComposerXE2018/itac/2018.2.020/intel64/bin/itacvars.sh

cd ${CODEPATH}
${MPIRUN} -trace ${CODEPATH}/GranularGas

After submitting my job, I get the following error:

Fatal error in PMPI_Type_size: Invalid datatype, error stack:
PMPI_Type_size(131): MPI_Type_size(INVALID DATATYPE) failed
PMPI_Type_size(76).: Invalid datatype

and I get a ".prot" file. Where this error come from? How can I fix it?

For more information I am using Intel compiler 18.0.2 and Intel MPI 20180125.

Compile Error with 2018.3.222 version mpiicc

$
0
0

Not sure if this is the right place to ask. Forgive me if I should ask someplace else.

I tried to compile the EPCC OpenMP/MPI benchmark  with Intel Tools 2018.3.222 version, it failed with this error:

(The source code can be downloaded from here: https://www.epcc.ed.ac.uk/research/computing/performance-characterisatio...)

mpiicc -qopenmp -O3  -o mixedModeBenchmark parallelEnvironment.o benchmarkSetup.o output.o pt_to_pt_pingpong.o pt_to_pt_pingping.o pt_to_pt_multiPingpong.o pt_to_pt_multiPingping.o pt_to_pt_haloexchange.o collective_barrier.o collective_broadcast.o collective_scatterGather.o collective_reduction.o collective_alltoall.o mixedModeBenchmarkDriver.o 

benchmarkSetup.o:(.bss+0x0): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

output.o:(.bss+0x0): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

pt_to_pt_pingpong.o:(.bss+0xd0): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

pt_to_pt_pingping.o:(.bss+0xb4): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

 

....

collective_alltoall.o:(.bss+0x60): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

mixedModeBenchmarkDriver.o:(.bss+0x0): multiple definition of `myThreadID'

parallelEnvironment.o:(.bss+0xc): first defined here

make: *** [mixedModeBenchmark] Error 1

 

 

However I can compile it with version 15.0.3.187 without error. 

What's the reason for the error in the  2018.3.222 version? Thanks a lot.

Zero-sized .stf file generated from ITAC

$
0
0

I am actually trying to use Intel Trace Analyzer and Collector (ITAC) to profile my MPI code written in Fortran.

The code does execute MPI_init at first and MPI_finalize at the end.

 

Following the thread of [ https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog... ] and

http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.2rc1-u... ],

I have learned that using either LD_PRE_LOAD prior to run or adding '-itac' option for compiling and linking works in my case.

 

The problem is that, although I was able to obtain .stf and .prot files, the size of .stf file is zero, thus unable to open it with ITAC.

In addition, I have also noticed that when I try to run the executable file compiled using '-itac' option, it works only for the single node case, and hangs for multiple nodes case.

 

For your information, I'm using "Intel® Parallel Studio XE 2018 Update 3" for compilers, mvapich2.2.2-qlc-intel18 for MPI library, which is compatible with ITAC from check_compatibility.c test, and CentOS 6.4 for the OS.

 

Any helpful comment will be deeply appreciated.

Viewing all 930 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>