hfi1 driver version confusion

November 17, 2016, 11:51 am

Latest and popular articles on Intel Technologies

≫ Next: HPCG- Assertion nxf%2==0 failed.

≪ Previous: intel MPI and oversubscribing

I previously installed IntelOPA-IFS.RHEL72-x86_64.10.1.1.0.9 to set up an Omni-Path cluster. I noticed that this version of the Omni-Path software includes hfi1-0.11.3.10.0_327.el7.x86_64-248.src.rpm and after installing, hfi1 version 0.11-162 was present on the system.

I recently upgraded to IntelOPA-IFS.RHEL72-x86_64.10.2.0.0.158 and noticed that this version does not include an hfi1 driver. After upgrading, the system loads hfi1 version 0.9-294, which is the version included with the kernel.

What is the latest version of the hfi1 driver? From the version numbers I'd guess 0.11-162 is newer than 0.9-294. If that's the case then why does upgrading to the latest Omni-Path software result in reversion to an older driver version? Additionally it looks like a version of the hfi1 driver source code is also present here: https://github.com/01org/opa-hfi1, however the commit comments indicate that the latest version is 0.11-156 but the actual version string in common.h is 0.11-162, leading to more confusion.

Is there a reason not to use driver version 0.11-162 with the latest Omni-Path software? Assumably it would be better to use the latest version.

Thanks

↧

HPCG- Assertion nxf%2==0 failed.

November 20, 2016, 11:54 pm

Latest and popular articles on Intel Technologies

≫ Next: efficent methood for hybrid OpenMP and MPI

≪ Previous: hfi1 driver version confusion

Dear All,
I have compiled HPCG benchmark with intel 2016 parallel studio XE for (2x)Xeon(R) CPU E5-2680 (RHEL 6.6).
I have 64 GB of memory installed on my machine. Also, my hpcg.dat file is :

HPCG Linpack benchmark input file
Sandia National Laboratories, University of Tennessee
430 430 430
1200

With this configuration , the xhpcg binary occupies 83% of memory but after ~ 13 minutes the run crashes with error:
xhpcg: src/GenerateCoarseProblem.cpp:50: void GenerateCoarseProblem(const SparseMatrix_STRUCT &): Assertion `nxf%2==0' failed.

Could you please let me know where i am going wrong with the configurations?

↧

efficent methood for hybrid OpenMP and MPI

November 21, 2016, 2:34 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MPI real-time performance problem

≪ Previous: HPCG- Assertion nxf%2==0 failed.

Hi, I have a question about an efficient method for extending well implemented scientific code written using C++ and OpenMP, to MPI layer.

This code is architecture-aware implementation (ccNuma, affinity, caches, etc..) that can utilize different aspects of architectures, and especially used all threads.

The main goal is to implement MPI layer without performance losses on the exits shared memory code, and do it efficiently.

So, I have to overlap MPI communications with OpenMP computations. My application allows for achieving this goal since I perform a loop-blocking technique.

Shortly speaking: When the results from the one block can be send to another MPI rank, the OpenMP threads can perform computations – such schema is repeated several time, and after it the synchronization point is necessary. Then, such a structure is run thousand times.

The main requirement/limitations of MPI communication will be a lot of small portions of data for exchanging (a lot of data bars of size 1.5 KB or 3 KB from 3D arrays)

This code will be run on rather novel hardware and software :

Intel CPU cluster
Intel MIC cluster: MPI communication between KNC (and KNL similar to 1.)
Hybrid: MPI communication between CPUs and MICs

The general question how to do it in an efferent way: I do not ask about implementation details but which MPI scenarios can guaranties the best performance.

In details:

Does the MPI communication cause any cores overheads – I men when I run both MPI communications and OMP computations at the same time but on different memory region
Should I allocated MPI communication for a separate (dedicated for this task) core, when other cores will perform OMP computations, which scenarios will be more efficiently:
1. OMP master or a single threads blinded to a single physical core run communication only, other OMP threads use others cores for computation
  - which communication will be better here synchronous or asynchronous ??
2. a selected group of OMP threads for MPI communication and computations while others OMP threads for computations only
3. or other solutions ??

In fact, the 2.b is most suitable for my application, but the programmer is responsible to guaranties the right MPI communication paths between MPI ranks and OMP threads.

If any can help me or share with me his advance experience I will be very happy.

Lukasz

Zone:

Thread Topic:

How-To

↧

Intel MPI real-time performance problem

November 22, 2016, 8:40 am

Latest and popular articles on Intel Technologies

≫ Next: Pinning of processes spawned with MPI_Comm_spawn

≪ Previous: efficent methood for hybrid OpenMP and MPI

Hello:
In my case ,i found sometimes MPI_Gather() take more than 5000-40000 cpu crcle，but normally MPI_Gather() only take about 2000 cpu crcle.

I can confirm that there is no timer interrupt or other interrupts to disturb MPI_Gather(), i also try to use mlockall(), use my own malloc() to replace i_malloc and i_free,but it not work.

When call MPI_Gather(), my programme need it return in a determinacy time, i don't know is there ant thing i can do to improve MPI real-time performance,or is there any tools can help me to find why this function take so long some times.

OS:linux3.10

cpuinfo: i use isolcpus ,so the cpuinfo cmd may get wrong info,it 8core16Threads

Intel(R) processor family information utility, Version 4.1 Update 3 Build 20140124
Copyright (C) 2005-2014 Intel Corporation. All rights reserved.
===== Processor composition =====
Processor name : Intel(R) Xeon(R) E5-2660 0
Packages(sockets) : 1
Cores : 1
Processors(CPUs) : 1
Cores per package : 1
Threads per core : 1
===== Processor identification =====
Processor Thread Id. Core Id. Package Id.
0 0 0 0
===== Placement on packages =====
Package Id. Core Id. Processors
0 0 0
===== Cache sharing =====
Cache Size Processors
L1 32 KB no sharing
L2 256 KB no sharing
L3 20 MB no sharing

test_program like this:
...
t1= rdtsc;
Ierr = MPI_Bcast(&nn, 1, MPI_INTEGER, 0, MPI_COMM_WORLD);
t2= rdtsc;
if(t2>t1&& (t2-t1)/1000 >5) record_it(t2-t1);
...

mpirun -n 4 -env I_MPI_DEBUG=4 ./my_test

[0] MPI startup(): Intel(R) MPI Library, Version 4.1 Update 3 Build 20140124
[0] MPI startup(): Copyright (C) 2003-2014 Intel Corporation. All rights reserved.
[1] MPI startup(): shm data transfer mode
[3] MPI startup(): shm data transfer mode
[0] MPI startup(): shm data transfer mode
[2] MPI startup(): shm data transfer mode
[0] MPI startup(): Device_reset_idx=8
[0] MPI startup(): Allgather: 1: 1-1 & 0-2147483647
[0] MPI startup(): Allgather: 4: 2-4 & 0-2147483647
[0] MPI startup(): Allgather: 1: 5-10 & 0-2147483647
[0] MPI startup(): Allgather: 4: 11-22 & 0-2147483647
[0] MPI startup(): Allgather: 1: 23-469 & 0-2147483647
[0] MPI startup(): Allgather: 4: 470-544 & 0-2147483647
[0] MPI startup(): Allgather: 1: 545-3723 & 0-2147483647
[0] MPI startup(): Allgather: 3: 3724-59648 & 0-2147483647
[0] MPI startup(): Allgather: 1: 59649-3835119 & 0-2147483647
[0] MPI startup(): Allgather: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allgatherv: 1: 0-1942 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 1942-128426 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 128426-193594 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 193594-454523 & 0-2147483647
[0] MPI startup(): Allgatherv: 4: 454523-561981 & 0-2147483647
[0] MPI startup(): Allgatherv: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-6 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 6-13 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 13-37 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 37-104 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 104-409 & 0-2147483647
[0] MPI startup(): Allreduce: 1: 409-5708 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 5708-12660 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 12660-61166 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 61166-74718 & 0-2147483647
[0] MPI startup(): Allreduce: 8: 74718-163640 & 0-2147483647
[0] MPI startup(): Allreduce: 2: 163640-355186 & 0-2147483647
[0] MPI startup(): Allreduce: 6: 355186-665233 & 0-2147483647
[0] MPI startup(): Allreduce: 7: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-1 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 2-2 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 3-25 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 26-48 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 49-1826 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 1827-947308 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 947309-1143512 & 0-2147483647
[0] MPI startup(): Alltoall: 2: 1143513-3715953 & 0-2147483647
[0] MPI startup(): Alltoall: 3: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Alltoallw: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Barrier: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Bcast: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Exscan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gather: 3: 1-2045 & 0-2147483647
[0] MPI startup(): Gather: 2: 2046-3072 & 0-2147483647
[0] MPI startup(): Gather: 3: 3073-313882 & 0-2147483647
[0] MPI startup(): Gather: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Gatherv: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-5 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 1: 5-162 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 3: 162-81985 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 2: 81985-690794 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 5: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce: 1: 4-11458 & 0-2147483647
[0] MPI startup(): Reduce: 5: 11459-22008 & 0-2147483647
[0] MPI startup(): Reduce: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scan: 0: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatter: 3: 1-24575 & 0-2147483647
[0] MPI startup(): Scatter: 2: 24576-37809 & 0-2147483647
[0] MPI startup(): Scatter: 3: 37810-107941 & 0-2147483647
[0] MPI startup(): Scatter: 2: 107942-399769 & 0-2147483647
[0] MPI startup(): Scatter: 3: 399770-2150807 & 0-2147483647
[0] MPI startup(): Scatter: 2: 0-2147483647 & 0-2147483647
[0] MPI startup(): Scatterv: 0: 0-2147483647 & 0-2147483647
[1] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[3] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
mrloop is using core 3
mrloop is using core 5
[2] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
mrloop is using core 4
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 3477 zte 0
[0] MPI startup(): 1 3478 zte 0
[0] MPI startup(): 2 3479 zte 0
[0] MPI startup(): 3 3480 zte 0
[0] MPI startup(): Recognition=2 Platform(code=8 ippn=2 dev=1) Fabric(intra=1 inter=1 flags=0x0)
[0] MPI startup(): I_MPI_DEBUG=8
[0] MPI startup(): I_MPI_PIN_MAPPING=4:0 0,1 0,2 0,3 0

core 2: Now the following data is the statistics in 1000000 step:
3:997331 4:2168 5:1 46:1
core 3: Now the following data is the statistics in 1000000 step:
1:4497 2:995003 3:1
core 4: Now the following data is the statistics in 1000000 step:
1:2767 2:996733 16:1
core 5: Now the following data is the statistics in 1000000 step:
1:1 2:999070 3:430

3:997331 means : 997331 times it take 3000 cpu cycle

4:2168 means: 2168 times it take 4000 cpu cycle

you can see , one time it take 46000 cpu cycle, but in other 999501 times ,it only takee 1000-3000 cpu cycle

↧

Pinning of processes spawned with MPI_Comm_spawn

October 31, 2016, 3:56 am

Latest and popular articles on Intel Technologies

≫ Next: Environment variables defined by intel mpirun?

≪ Previous: Intel MPI real-time performance problem

(somehow, my previous post 700216 was initially in a draft state, then got published, but didn't appear on the mailing list)

Hi,

the I_MPI_PIN_* variables can be used to set to pretty much any cpu-mask for the MPI-ranks that are used. Unfortunately, the Intel MPI library doesn't set the mask correctly for processes that are dynamically spawned.

Here's an example to show the problem:

program mpispawn
  use mpi
  implicit none

  integer ierr,errcodes(1),intercomm,pcomm,mpisize,dumm,rank
  character(1000) cmd
  logical master

  call MPI_Init(ierr)
  call get_command_argument(0,cmd)
  print*,'cmd=',trim(cmd)
  call MPI_Comm_get_parent(pcomm,ierr)
  if (pcomm.eq.MPI_COMM_NULL) then
    print*,'I am the master. Clone myself!'
    master=.true.
    call MPI_Comm_spawn(cmd,MPI_ARGV_NULL,4,MPI_INFO_NULL,0,MPI_COMM_WORLD,pcomm,errcodes,ierr)
    call MPI_Comm_size(pcomm,mpisize,ierr)
    print*,'Processes in intercommunicator:',mpisize
    dumm=88
    call MPI_Bcast(dumm,1,MPI_INTEGER,MPI_ROOT,pcomm,ierr)
  else
    print*,'I am a clone. Use me'
    master=.false.
    call MPI_Bcast(dumm,1,MPI_INTEGER,0,pcomm,ierr)
  endif
  call MPI_Comm_rank(pcomm,rank,ierr)
  print*,'rank,master,dumm=',rank,master,dumm
  call sleep(300)
  call MPI_Barrier(pcomm,ierr)
  call MPI_Finalize(ierr)
end

I run this example on 2 nodes, each with 2 8-core CPUs. I request core binding and domains with scattered ordering and 3 processes per node (so ranks are bound round-robin to the sockets). mpirun starts 2 MPI-processes, and these spawn 4 further MPI-processes:

[donners@int1 mpispawn]$ I_MPI_DEBUG=4 mpirun -n 2 -hosts "int1,int2" -ppn 3 -binding "pin=yes;cell=core;domain=1;order=scatter" ./mpi.impi
[0] MPI startup(): Multi-threaded optimized library
[0] MPI startup(): shm data transfer mode
[1] MPI startup(): shm data transfer mode
[0] MPI startup(): Rank    Pid      Node name                   Pin cpu
[0] MPI startup(): 0       31036    int1.cartesius.surfsara.nl  {0}
[0] MPI startup(): 1       31037    int1.cartesius.surfsara.nl  {8}
 cmd=./mpi.impi
 I am the master. Clone myself!
 cmd=./mpi.impi
 I am the master. Clone myself!
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): reinitialization: shm and dapl data transfer modes
[1] MPI startup(): reinitialization: shm and dapl data transfer modes
[0] MPI startup(): Multi-threaded optimized library
[0] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[3] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[2] DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
[1] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[1] MPI startup(): shm and dapl data transfer modes
[3] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[3] MPI startup(): shm and dapl data transfer modes
[0] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[0] MPI startup(): shm and dapl data transfer modes
[2] MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
[2] MPI startup(): shm and dapl data transfer modes
[0] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[0] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[1] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[2] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[2] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): User set DAPL collective mask = 0000
[3] MPID_nem_init_dapl_coll_fns(): Effective DAPL collective mask = 0000
 Processes in intercommunicator:           2
 rank,master,dumm=           1 T          88
 Processes in intercommunicator:           2
 rank,master,dumm=           0 T          88
[0] MPI startup(): Rank    Pid      Node name                   Pin cpu
[0] MPI startup(): 0       31045    int1.cartesius.surfsara.nl  {0}
[0] MPI startup(): 1       24519    int2.cartesius.surfsara.nl  {0}
[0] MPI startup(): 2       24520    int2.cartesius.surfsara.nl  {8}
[0] MPI startup(): 3       24521    int2.cartesius.surfsara.nl  {1}
 cmd=./mpi.impi
 I am a clone. Use me
 rank,master,dumm=           0 F          88
 cmd=./mpi.impi
 I am a clone. Use me
 cmd=./mpi.impi
 I am a clone. Use me
 cmd=./mpi.impi
 I am a clone. Use me
 rank,master,dumm=           1 F          88
 rank,master,dumm=           2 F          88
 rank,master,dumm=           3 F          88

The 2 initial processes are bound correctly, each round-robin to the first core of each sockets on the first node. However, the first dynamically spawned rank is also bound to the first core, but it seems that this should have been the second core. Now it competes with an initial process for the same core. Note that the processes that were dynamically spawned, do get distributed correctly across nodes. Also the binding on the second node is correct.

What can be done to bind all dynamically spawned processes correctly?

Thread Topic:

Bug Report

↧

Environment variables defined by intel mpirun?

November 23, 2016, 10:41 am

Latest and popular articles on Intel Technologies

≫ Next: modify environment variable from MPICH

≪ Previous: Pinning of processes spawned with MPI_Comm_spawn

Hello Forum Gurus,

Where can I find a list of environment variables defined by Intel mpirun that contain the rank and process information, analogous to OMP_COMM_WORLD_SIZE, OMP_COMM_WORLD_RANK, etc. shown here: https://www.open-mpi.org/faq/?category=running#mpi-environmental-variables? Does Intel MPI even provide this functionality? I can't seem to find anything similar in the hydra documentation (https://software.intel.com/en-us/mpi-developer-reference-linux).

Thanks in advance for your help!

Thread Topic:

Question

↧

modify environment variable from MPICH

November 29, 2016, 8:51 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem in transfer from SMPD process manager to mpiexec MPI process manager

≪ Previous: Environment variables defined by intel mpirun?

I have some scripts for runing DART(a meteorology software) with intel compiler and intel mpi library. These scripts were used on cray and MPICH. The environment variable for MPICH setting are :

26 setenv MPICH_VERSION_DISPLAY 1
27 setenv MPICH_ENV_DISPLAY 1
28 setenv MPICH_MPIIO_HINTS_DISPLAY 1
29
30 setenv MPICH_GNI_RDMA_THRESHOLD 2048
31 setenv MPICH_GNI_DYNAMIC_CONN disabled
32
33 setenv MPICH_CPUMASK_DISPLAY 1
34 setenv OMP_NUM_THREADS 1

On original machine 4000 CPU core were used.

On new linux machine with intel MPI, I commet out those variable. When I use 200 CPU core, it takes about 6000 seconds. But when I use 720 CPU core, it takes about 15000 seconds. If I use those environment variable for intel MPI, will the performance improved? How to modify those environment variable? Is there some introduction for setting both intel MPI and MPICH?

Thanks

Thread Topic:

Help Me

↧

Problem in transfer from SMPD process manager to mpiexec MPI process manager

December 9, 2016, 1:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Reference manual for Intel MPI library routines?

≪ Previous: modify environment variable from MPICH

Since Intel® MPI Library 2017 the SMPD process manager was removed. I have no problems in the use the SMPD process manager that was available in the previous Intel® MPI 5.1.3 version of the MPI library, as it supports the call of executable in a way that is similar to ordinary call of Fortran executables:

mpiexec.smpd <g-options> <l-options> <executable>

where executable = <name>.exe is the executable file. In particular case, the following call was supported:

mpiexec.smpd -n 4 executable < input file > output file 2>&1 (A)

The Syntax of the mpiexec utility that is a scalable MPI process manager is formally similar:

mpiexec <g-options> <l-options> <executable>

On a local node the following call of the mpiexec utility works:

mpiexec -n 4 –localonly executable > output file 2>&1

Explain, please, what options should be used to run the executable by the mpiexec utility in a way, similar to (A) with the explicit definition of the input file.

Zone:

Windows*

Thread Topic:

Question

↧

Reference manual for Intel MPI library routines?

December 11, 2016, 1:20 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel Paralel Studio xe 2017 - Cluster Edition Installation problem

≪ Previous: Problem in transfer from SMPD process manager to mpiexec MPI process manager

Dear Sirs,

I have now searched for about 30min unsuccessfully for the description of the Intel MPI library routines, including in particular, how to call them from a program targeted for the intel fortran compiler. I am somewhat embarassed for my inability to find it and to have to ask you this here:

Could you please point me to the appropriate intel document that describes the said routines, and how to call them?

(Unfortunately, this is a prerequisite totally independent of compiling and running programs, which is of course information easily found on the intel web site.)

Best regards,

Klaus Hallatschek

↧

Intel Paralel Studio xe 2017 - Cluster Edition Installation problem

December 13, 2016, 9:02 am

Latest and popular articles on Intel Technologies

≫ Next: maximum file size issues (bug??).

≪ Previous: Reference manual for Intel MPI library routines?

I got Intel Parallel Studio trial version to run on my linux nodes (I have two nodes currently). During the installation I am facing two issues.

1. It says "Login Failed - due to ssh issue"

I have ssh up and running and I have registered and copied the keys and using ssh "IP" I can easily login to the remote server.

2.
p, li { white-space: pre-wrap; }

Invalid cluster description file

Though I have created "machines.LINUX" in the same directory where install.sh is located yet it shows the same error.

Is the invalid cluster file error caused due to the ssh issue or is it something else.

Any help in this regard will be appreciated.

Thanks

Thread Topic:

Question

↧

maximum file size issues (bug??).

December 14, 2016, 6:34 am

Latest and popular articles on Intel Technologies

≫ Next: Intel Premier Support Legacy Status Update

≪ Previous: Intel Paralel Studio xe 2017 - Cluster Edition Installation problem

Hi, all.
Just after updating to Intel Parallel Studio XE 2017, I've met something wired.
If I tried to copy a file from A to B, when the size of A is approximately 6GB, but B returns 2GB (4byte integer??). Another test gives same result, MPI-IO can not write a file larger than 2GB. This code works very well With Intel Parallel Studio XE 2016. If I exchange mpif90/mpirun to openmpi's one, there is no problem, too. I guess there are some bug in the intel MPI library of Intel Parallel Studio XE 2017.

   call su00.init (ns=sufile.ns)
   call para_range  (jsta, jskp, jend, 1, 1, sufile.ntrcr, nprocs, myrank)

   call mpi_file_open (mpi_comm_world, trim(sufile.file)//".su", mpi_mode_rdonly, mpi_info_null, file00, ierr00)
   call mpi_file_open (mpi_comm_world, trim(sufile.file)//"_fill.su", mpi_mode_create+mpi_mode_wronly, mpi_info_null, file01, ierr01)
   call mpi_barrier   (mpi_comm_world, ierr)

   do itrcr = jsta, jend, jskp
      su00.dum4 = 0.0E+0
      disp00 = 4*(60+sufile.ns)*(itrcr-1)

      call mpi_file_read_at  (file00, disp00, su00.dum4, 60+sufile.ns, mpi_integer4, stat00, ierr00)
      call mpi_file_write_at (file01, disp00, su00.dum4, 60+sufile.ns, mpi_integer4, stat01, ierr01)
   end do
   call mpi_barrier    (mpi_comm_world, ierr)
   call mpi_file_close (file00, ierr00)
   call mpi_file_close (file01, ierr01)
   call su00.free ( )

Best,

U Geun

Zone:

Modern Code

Thread Topic:

Bug Report

↧

Intel Premier Support Legacy Status Update

December 12, 2016, 6:42 pm

Latest and popular articles on Intel Technologies

≫ Next: MPS_STAT_ENABLE_IDLE=I_MPI_PVAR_IDLE

≪ Previous: maximum file size issues (bug??).

See: Intel Premier Support Legacy Status Update

↧

MPS_STAT_ENABLE_IDLE=I_MPI_PVAR_IDLE

December 20, 2016, 12:22 pm

Latest and popular articles on Intel Technologies

≫ Next: mpirun vs mpiexec

≪ Previous: Intel Premier Support Legacy Status Update

When I source psxevars.csh intel64 I see this env var set thusly:

MPS_STAT_ENABLE_IDLE=I_MPI_PVAR_IDLE

I cannot find MPS_STAT_ENABLE_IDLE nor I_MPI_PVAR_IDLE documentation anywhere on the web. What is this env var, and what does it do? THis is on a cluster with Intel Omni-Path fabric managed by MOAB/Slurm. Does this have something to do with this fabric/resource manager?

↧

mpirun vs mpiexec

December 22, 2016, 9:24 am

Latest and popular articles on Intel Technologies

≫ Next: Conflict between IMSL and MPI

≪ Previous: MPS_STAT_ENABLE_IDLE=I_MPI_PVAR_IDLE

mpiexec is a direct link to mpiexec.hydra

mpirun is a wrapper script that determines your batch system and simplifies the launch for the user, but eventually also calls mpiexec.hydra

Our site is considering removing the mpiexec link and instead linking it to mpirun. Can you see any possible downside to doing this?

Ron

↧

Conflict between IMSL and MPI

December 29, 2016, 8:35 pm

Latest and popular articles on Intel Technologies

≫ Next: mpiexec does not run

≪ Previous: mpirun vs mpiexec

I am trying to divide my fortran code into several parts and I want to parallelize each part by using MPI. For each part, I use IMSL library to solve an optimization problem (use BCONF). However, I find that IMSL library has its own subroutines about MPI and it does not allow me to call the standard MPI start subroutine "Call MPI_INIT(ierror)". It just gives me an fatal error and ends the program.

I give two examples to illustrate the issue.

Example 1, print "Hello World " from each node:

program main
use mpi

implicit none

integer ( kind = 4 ) error
integer ( kind = 4 ) id
integer ( kind = 4 ) p

call MPI_Init ( error )

call MPI_Comm_size ( MPI_COMM_WORLD, p, error )

call MPI_Comm_rank ( MPI_COMM_WORLD, id, error )

write ( *, * ) ' Process ', id, ' says "Hello, world!"'

call MPI_Finalize ( error )

end program

When I compile and run without IMSL library, it gives me the correct answer:

mpif90 -o a.out hello_mpi.f90

mpiexec -n 4 ./a.out

Process 3 says "Hello, world!"
Process 0 says "Hello, world!"
Process 2 says "Hello, world!"
Process 1 says "Hello, world!"

Now If I do nothing to the code but just add IMSL library, it will cause the error:

mpif90 -o a.out hello_mpi.f90 $LINK_FNL_STATIC_IMSL $F90FLAGS

mpiexec -n 4 ./a.out

*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** dummy routine. Parallel performance needs a functioning MPI
*** library.

In the first example, changing "$LINK_FNL_STATIC_IMSL" to "LINK_MPI" will cure the problem, but it does not work in a more realistic example here:

Example 2: use MPI and each node use IMSL library to calculate quadrature nodes

program main
USE GQRUL_INT
use mpi

implicit none

integer ( kind = 4 ) error
integer ( kind = 4 ) id
integer ( kind = 4 ) p
real ( kind = 8 ) QW(10), QX(10)

call MPI_Init ( error )

call MPI_Comm_size ( MPI_COMM_WORLD, p, error )

call MPI_Comm_rank ( MPI_COMM_WORLD, id, error )

write ( *, * ) ' Process ', id, ' says "Hello, world!"'
CALL GQRUL (10, QX, QW )

call MPI_Finalize ( error )

end program

When I compile and run, program stops at "MPI_INIT":

mpif90 -o a.out hello_mpi.f90 $LINK_FNL_STATIC_IMSL $F90FLAGS

mpiexec -n 4 ./a.out

*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** FATAL ERROR 1 from MPI_INIT. A CALL was executed using the IMSL
*** dummy routine. Parallel performance needs a functioning MPI
*** library.
*** dummy routine. Parallel performance needs a functioning MPI
*** library.

If I change the linking option to $LINK_MPI, the program crashes at the IMSL library subroutine:

mpif90 -o a.out hello_mpi.f90 $LINK_MPI $F90FLAGS

mpiexec -n 4 ./a.out

Process 1 says "Hello, world!"
Process 0 says "Hello, world!"
Process 3 says "Hello, world!"
Process 2 says "Hello, world!"
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 00000000018D5C75 Unknown Unknown Unknown
a.out 00000000018D3A37 Unknown Unknown Unknown
a.out 000000000188ADC4 Unknown Unknown Unknown
a.out 000000000188ABD6 Unknown Unknown Unknown
a.out 000000000184BCB9 Unknown Unknown Unknown
a.out 000000000184F410 Unknown Unknown Unknown
libpthread.so.0 00007EFC178C67E0 Unknown Unknown Unknown
a.out 000000000178E634 Unknown Unknown Unknown
a.out 000000000178A423 Unknown Unknown Unknown
a.out 0000000000430491 Unknown Unknown Unknown
a.out 000000000042AACD Unknown Unknown Unknown
a.out 00000000004233D2 Unknown Unknown Unknown
a.out 0000000000422FEA Unknown Unknown Unknown
a.out 0000000000422DD0 Unknown Unknown Unknown
a.out 0000000000422C9E Unknown Unknown Unknown
libc.so.6 00007EFC16F7BD1D Unknown Unknown Unknown
a.out 0000000000422B29 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 00000000018D5C75 Unknown Unknown Unknown
a.out 00000000018D3A37 Unknown Unknown Unknown
a.out 000000000188ADC4 Unknown Unknown Unknown
a.out 000000000188ABD6 Unknown Unknown Unknown
a.out 000000000184BCB9 Unknown Unknown Unknown
a.out 000000000184F410 Unknown Unknown Unknown
libpthread.so.0 00007EFDE2A037E0 Unknown Unknown Unknown
a.out 000000000178E634 Unknown Unknown Unknown
a.out 000000000178A423 Unknown Unknown Unknown
a.out 0000000000430491 Unknown Unknown Unknown
a.out 000000000042AACD Unknown Unknown Unknown
a.out 00000000004233D2 Unknown Unknown Unknown
a.out 0000000000422FEA Unknown Unknown Unknown
a.out 0000000000422DD0 Unknown Unknown Unknown
a.out 0000000000422C9E Unknown Unknown Unknown
libc.so.6 00007EFDE20B8D1D Unknown Unknown Unknown
a.out 0000000000422B29 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 00000000018D5C75 Unknown Unknown Unknown
a.out 00000000018D3A37 Unknown Unknown Unknown
a.out 000000000188ADC4 Unknown Unknown Unknown
a.out 000000000188ABD6 Unknown Unknown Unknown
a.out 000000000184BCB9 Unknown Unknown Unknown
a.out 000000000184F410 Unknown Unknown Unknown
libpthread.so.0 00007FBF21C277E0 Unknown Unknown Unknown
a.out 000000000178E634 Unknown Unknown Unknown
a.out 000000000178A423 Unknown Unknown Unknown
a.out 0000000000430491 Unknown Unknown Unknown
a.out 000000000042AACD Unknown Unknown Unknown
a.out 00000000004233D2 Unknown Unknown Unknown
a.out 0000000000422FEA Unknown Unknown Unknown
a.out 0000000000422DD0 Unknown Unknown Unknown
a.out 0000000000422C9E Unknown Unknown Unknown
libc.so.6 00007FBF212DCD1D Unknown Unknown Unknown
a.out 0000000000422B29 Unknown Unknown Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
a.out 00000000018D5C75 Unknown Unknown Unknown
a.out 00000000018D3A37 Unknown Unknown Unknown
a.out 000000000188ADC4 Unknown Unknown Unknown
a.out 000000000188ABD6 Unknown Unknown Unknown
a.out 000000000184BCB9 Unknown Unknown Unknown
a.out 000000000184F410 Unknown Unknown Unknown
libpthread.so.0 00007F8084FD67E0 Unknown Unknown Unknown
a.out 000000000178E634 Unknown Unknown Unknown
a.out 000000000178A423 Unknown Unknown Unknown
a.out 0000000000430491 Unknown Unknown Unknown
a.out 000000000042AACD Unknown Unknown Unknown
a.out 00000000004233D2 Unknown Unknown Unknown
a.out 0000000000422FEA Unknown Unknown Unknown
a.out 0000000000422DD0 Unknown Unknown Unknown
a.out 0000000000422C9E Unknown Unknown Unknown
libc.so.6 00007F808468BD1D Unknown Unknown Unknown
a.out 0000000000422B29 Unknown Unknown Unknown

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 174
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

I am running this code on a UNIX system on my school's supercomputer and I am using intel compiler and MPICH version 3.0.1. My actual code is very similar to the second example, which uses some IMSL subroutines on each node. Can you please help me to make it work? Thank you!

Zone:

Thread Topic:

Help Me

↧

mpiexec does not run

December 30, 2016, 1:16 am

Latest and popular articles on Intel Technologies

≫ Next: NAS Benchmark Issue - spawning processes across multiple nodes (EXIT CODE: 9)

≪ Previous: Conflict between IMSL and MPI

Hi I have encountered a weird problem with mpiexec. Currently I am using intel MPI that comes with intel parallel studio cluster edition on my Windows 10 machines. After installing the program, I configured it and registered mpiexec as suggested here:

https://software.intel.com/en-us/get-started-with-mpi-for-windows

Then I set up my visual studio project as suggested here:

https://software.intel.com/en-us/node/610381

I follow exact the same steps to install and configure Intel MPI on my desktop and laptop. However, while my desktop can run the program with MPI flawlessly, my laptop cannot run mpiexec. If I run my program using mpiexec :

mpiexec -n 4 myprogram.exe

It just freezes, does nothing, and gives me a blank command window. When I check my task manager, my program is not running. I am very curious about why this problem only happens to my laptop since I am doing the exactly same thing on my laptop and desktop.

Can you please help me with this issue? Thank you!

Zone:

Thread Topic:

Help Me

↧

NAS Benchmark Issue - spawning processes across multiple nodes (EXIT CODE: 9)

January 4, 2017, 3:12 am

Latest and popular articles on Intel Technologies

≫ Next: Using MSMPI.dll - fortran crash forrtl 157

≪ Previous: mpiexec does not run

Hi all,
I am using intel parallel studio 2015 on Intel(R) Xeon(R) CPU E5-2680 v3 (RHEL-6.5) and currently facing issues with an mpi based application(Nas Parallel Benchmark-BT). Though the issue seems application specific, I would like to have your opinions on methodology to debug/fix issues like these .

I was successful in testing the mpi setup as :-

[puneets@host01 bin]$ cat hosts.txt
host02
host03

[puneets@host01 bin]$ mpirun -np 4 -ppn 2 -hostfile hosts.txt ./hello
host02
host02
host03
host03

But when i try to run the application, I end up with:-

[puneets@host01 bin]$ mpirun -np 4 -ppn 2 -hostfile hosts.txt ./bt.E.4.mpi_io_full

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 25799 RUNNING AT host03
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)

Whereas, on single node, i am able to run the application as:-

[puneets@host01 bin]$ mpirun -np 4  ./bt.E.4.mpi_io_full


 NAS Parallel Benchmarks 3.3 -- BT Benchmark

 No input file inputbt.data. Using compiled defaults
 Size: 1020x1020x1020
 Iterations:  250    dt:   0.0000040
 Number of active processes:     4

 BTIO -- FULL MPI-IO write interval:   5

I am attaching the make.def and compilation log for your reference.
Any help/Hint will be very useful. Eagerly awaiting your replies.

Attachment	Size
Download NPB.compile.log.txt	3.7 KB
Download make.def_.txt	7.28 KB

↧

Using MSMPI.dll - fortran crash forrtl 157

January 4, 2017, 4:25 am

Latest and popular articles on Intel Technologies

≫ Next: OPA upgrade from 10.2 to 10.3

≪ Previous: NAS Benchmark Issue - spawning processes across multiple nodes (EXIT CODE: 9)

I am trying to get a simple example working across 2 amazon nodes. When I run locally, its all fine. Then when I use -machinefile to launch across both nodes, the sample application on the slave node throws an exception. forrtl 157. I can run using -machinefile and just run both locally. And I can run using -machine file and just run everything on the remote node. But when I add both addresses to the hosts.txt, it seems the sample application on the slave node crashes. Node names are the same as they are identical amazon images. I have tried disabling firewalls. It always crashs when the slave node (or rank 1) calls MPI_SEND. However, when I run with out -machinefile, and just run on the same machine,it all works fine with -n 4 etc.

Any pointers as to where I should start looking, thanks.

Here is my sample application:

program hello
include 'mpif.h'
parameter (MASTER = 0)

      integer numtasks, taskid, len, ierr, rank, size,count
      character(MPI_MAX_PROCESSOR_NAME) hostname
      double precision data(100)
     integer status(MPI_STATUS_SIZE)

integer tag

!      write (*,*) 'Starting'

      call MPI_INIT(ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD, taskid, ierr)

     call MPI_GET_PROCESSOR_NAME(hostname, len, ierr)
     write(*,20) taskid, hostname

       if (taskid .eq. MASTER) then
        write(*,30) numtasks
       end if

        do i=1, 10
          data(i) = (i * (taskid + 1))
        end do

        count = 10
        tag = 666

        ! All slaves send back data
!        if (taskid >= 1) then
       if (taskid .ne. MASTER) then
            call MPI_SEND( data, count, MPI_DOUBLE_PRECISION, 0, tag, MPI_COMM_WORLD, ierr )
            write (*,50) taskid
        end if

        count = 10
       if (taskid .eq. MASTER) then
         ! receive from each slave
           do i=1, numtasks-1
           call MPI_RECV(data, count, MPI_DOUBLE_PRECISION, i , tag, MPI_COMM_WORLD, status, ierr )
           write (*,40) i
           do j=1, count
               write (*,*) data(j)
           end do
         end do
       end if


      call MPI_FINALIZE(ierr)

20    format('Hello from task ',I2,' on ',A48)
30    format('MASTER: Number of MPI tasks is: ',I2)
40    format('MASTER: Received from: ',I2)
50    format('SLAVE:',I2,' : Sending to Master')

end

HERE is what i see on the console

Hello from task 0 on WIN-MSDP9MK1V14
MASTER: Number of MPI tasks is: 2
Hello from task 1 on WIN-MSDP9MK1V14
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source

msmpi.dll          00007FFCE7018FA6 Unknown               Unknown Unknown
msmpi.dll          00007FFCE7018C68 Unknown               Unknown Unknown
msmpi.dll          00007FFCE7018492 Unknown               Unknown Unknown
msmpi.dll          00007FFCE7012768 Unknown               Unknown Unknown
msmpi.dll          00007FFCE70130C6 Unknown               Unknown Unknown
msmpi.dll          00007FFCE707E87F Unknown               Unknown Unknown
msmpi.dll          00007FFCE707DE8C Unknown               Unknown Unknown
msmpi.dll          00007FFCE70930A5 Unknown               Unknown Unknown
msmpi.dll          00007FFCE7092785 Unknown               Unknown Unknown
msmpi.dll          00007FFCE703B2DB Unknown               Unknown Unknown
msmpi.dll          00007FFCE70BD99B Unknown               Unknown Unknown
F-testapplication 00007FF785EB1240 Unknown               Unknown Unknown
F-testapplication 00007FF785EB274E Unknown               Unknown Unknown
F-testapplication 00007FF785EB2B24 Unknown               Unknown Unknown
KERNEL32.DLL       00007FFD038713D2 Unknown               Unknown Unknown
ntdll.dll          00007FFD05CF5444 Unknown               Unknown Unknown

job aborted:
[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on 10.249.60.161
F-testapplication-nodebug.exe ended prematurely and may have crashed. exit code
157

---- error analysis -----

and this is what I see on the Master SMPD process:

[-1:1220] Launching SMPD service.
[-1:1220] smpd listening on port 8677
[-1:1220] Authentication completed. Successfully obtained Context for Client.
[-1:1220] version check complete, using PMP version 3.
[-1:1220] create manager process (using smpd daemon credentials)
[-1:1220] smpd reading the port string from the manager
[-1:2700] Launching smpd manager instance.
[-1:2700] created set for manager listener, 236
[-1:2700] smpd manager listening on port 49472
[-1:1220] closing the pipe to the manager
[-1:2700] Authentication completed. Successfully obtained Context for Client.
[-1:2700] Authorization completed.
[-1:2700] version check complete, using PMP version 3.
[-1:2700] Received session header from parent id=2, parent=1, level=1
[02:2700] Connecting back to parent using host WIN-MSDP9MK1V14 and endpoint 4953
6
[02:2700] Previous attempt failed, trying again with a resolved parent host 10.2
51.11.178:49536
[02:2700] Authentication completed. Successfully obtained Context for Client.
[02:2700] Authorization completed.
[02:2700] handling command SMPD_COLLECT src=0
[02:2700] handling command SMPD_LAUNCH src=0
[02:2700] Successfully handled bcast nodeids command.
[02:2700] setting environment variable: <MPIEXEC_HOSTNAME> = <WIN-MSDP9MK1V14>
[02:2700] env: PMI_SIZE=2
[02:2700] env: PMI_KVS=267b6f49-eac4-46e3-ad68-6ec4dd9d4e4a
[02:2700] env: PMI_DOMAIN=876a283b-5270-45bf-8a2b-812063bbae3e
[02:2700] env: PMI_HOST=localhost
[02:2700] env: PMI_PORT=6a4c59b6-9ba4-473f-bd95-54efa41e77c8
[02:2700] env: PMI_SMPD_ID=2
[02:2700] env: PMI_APPNUM=0
[02:2700] env: PMI_NODE_IDS=s
[02:2700] env: PMI_RANK_AFFINITIES=a
[02:2700] searching for 'F-testapplication-nodebug.exe' in workdir 'C:\Users\Adm
inistrator\Downloads\test'
[02:2700] C>CreateProcess(C:\Users\Administrator\Downloads\test\F-testapplicatio
n-nodebug.exe F-testapplication-nodebug.exe)
[02:2700] env: PMI_RANK=1
[02:2700] env: PMI_SMPD_KEY=0
[02:2700] Authentication completed. Successfully obtained Context for Client.
[02:2700] Authorization completed.
[02:2700] version check complete, using PMP version 3.
[02:2700] 2 -> 0 : returning parent_context: 0 < 2
[02:2700] forwarding command SMPD_INIT to 0
[02:2700] posting command SMPD_INIT to parent, src=2, ctx_key=0, dest=0.
[02:2700] Handling cmd=SMPD_INIT result
[02:2700] forward SMPD_INIT result to dest=2 ctx_key=0
[02:2700] 2 -> 1 : returning parent_context: 1 < 2
[02:2700] Caching business card for rank 1
[02:2700] forwarding command SMPD_BCPUT to 1
[02:2700] posting command SMPD_BCPUT to parent, src=2, ctx_key=0, dest=1.
[02:2700] Handling cmd=SMPD_BCPUT result
[02:2700] forward SMPD_BCPUT result to dest=2 ctx_key=0
[02:2700] handling command SMPD_BARRIER src=2 ctx_key=0
[02:2700] Handling SMPD_BARRIER src=2 ctx_key=0
[02:2700] initializing barrier(267b6f49-eac4-46e3-ad68-6ec4dd9d4e4a): in=1 size=
1
[02:2700] incrementing barrier(267b6f49-eac4-46e3-ad68-6ec4dd9d4e4a) incount fro
m 0 to 1 out of 1
[02:2700] all in barrier, sending barrier to parent.
[02:2700] posting command SMPD_BARRIER to parent, src=2, ctx_key=65535, dest=1.
[02:2700] Handling cmd=SMPD_BARRIER result
[02:2700] cmd=SMPD_BARRIER result will be handled locally
[02:2700] sending reply to barrier command '267b6f49-eac4-46e3-ad68-6ec4dd9d4e4a
'.
[02:2700] read 72 bytes from stdout
[02:2700] posting command SMPD_STDOUT to parent, src=2, dest=0.
[02:2700] 2 -> 1 : returning parent_context: 1 < 2
[02:2700] forwarding command SMPD_BCGET to 1
[02:2700] posting command SMPD_BCGET to parent, src=2, ctx_key=0, dest=1.
[02:2700] Handling cmd=SMPD_STDOUT result
[02:2700] cmd=SMPD_STDOUT result will be handled locally
[02:2700] Handling cmd=SMPD_BCGET result
[02:2700] forward SMPD_BCGET result to dest=2 ctx_key=0
[02:2700] Caching business card for rank 0
[02:2700] read 1024 bytes from stderr
[02:2700] posting command SMPD_STDERR to parent, src=2, dest=0.
[02:2700] read 358 bytes from stderr
[02:2700] posting command SMPD_STDERR to parent, src=2, dest=0.
[02:2700] Handling cmd=SMPD_STDERR result
[02:2700] cmd=SMPD_STDERR result will be handled locally
[02:2700] reading failed, assuming stdout is closed. error 0xc000014b
[02:2700] process_id=0 process refcount == 2, stdout closed.
[02:2700] reading failed, assuming stderr is closed. error 0xc000014b
[02:2700] process_id=0 process refcount == 1, stderr closed.
[02:2700] Handling cmd=SMPD_STDERR result
[02:2700] cmd=SMPD_STDERR result will be handled locally
[02:2700] process_id=0 process refcount == 0, pmi client closed.
[02:2700] process_id=0 rank=1 refcount=0, waiting for the process to finish exit
ing.
[02:2700] creating an exit command for process id=0 rank=1, pid=836, exit code=
157.
[02:2700] posting command SMPD_EXIT to parent, src=2, dest=0.
[02:2700] Handling cmd=SMPD_EXIT result
[02:2700] cmd=SMPD_EXIT result will be handled locally
[02:2700] handling command SMPD_CLOSE from parent
[02:2700] sending 'closed' command to parent context
[02:2700] posting command SMPD_CLOSED to parent, src=2, dest=1.
[02:2700] Handling cmd=SMPD_CLOSED result
[02:2700] cmd=SMPD_CLOSED result will be handled locally
[02:2700] smpd manager successfully stopped listening.
[02:2700] SMPD exiting with error code 0.

This is what I am seeing on the slave SMPD

[-1:736] Launching SMPD service.
[-1:736] smpd listening on port 8677
[-1:736] Authentication completed. Successfully obtained Context for Client.
[-1:736] version check complete, using PMP version 3.
[-1:736] create manager process (using smpd daemon credentials)
[-1:736] smpd reading the port string from the manager
[-1:2200] Launching smpd manager instance.
[-1:2200] created set for manager listener, 236
[-1:2200] smpd manager listening on port 49546
[-1:736] closing the pipe to the manager
[-1:2200] Authentication completed. Successfully obtained Context for Client.
[-1:2200] Authorization completed.
[-1:2200] version check complete, using PMP version 3.
[-1:2200] Received session header from parent id=1, parent=0, level=0
[01:2200] Connecting back to parent using host WIN-MSDP9MK1V14 and endpoint 4947
9
[01:2200] Previous attempt failed, trying again with a resolved parent host 10.2
49.60.161:49479
[01:2200] Authentication completed. Successfully obtained Context for Client.
[01:2200] Authorization completed.
[01:2200] handling command SMPD_CONNECT src=0
[01:2200] now connecting to 10.249.60.161
[01:2200] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:2200] using spn RestrictedKrbHost/10.249.60.161 to contact server
[01:2200] WIN-MSDP9MK1V14 posting a re-connect to 10.249.60.161:49483 in left ch
ild context.
[01:2200] Authentication completed. Successfully obtained Context for Client.
[01:2200] Authorization completed.
[01:2200] version check complete, using PMP version 3.
[01:2200] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:2200] handling command SMPD_COLLECT src=0
[01:2200] 1 -> 2 : returning left_context
[01:2200] forwarding command SMPD_COLLECT to 2
[01:2200] posting command SMPD_COLLECT to left child, src=0, dest=2.
[01:2200] Handling cmd=SMPD_COLLECT result
[01:2200] forward result SMPD_COLLECT to dest=0
[01:2200] handling command SMPD_STARTDBS src=0
[01:2200] sending start_dbs result command kvs = 67ef6493-0e3a-4e22-a346-857cf78
1526a.
[01:2200] handling command SMPD_LAUNCH src=0
[01:2200] Successfully handled bcast nodeids command.
[01:2200] setting environment variable: <MPIEXEC_HOSTNAME> = <WIN-MSDP9MK1V14>
[01:2200] env: PMI_SIZE=2
[01:2200] env: PMI_KVS=67ef6493-0e3a-4e22-a346-857cf781526a
[01:2200] env: PMI_DOMAIN=cd05bc63-8a53-4f72-b8d0-46fc66e1ed60
[01:2200] env: PMI_HOST=localhost
[01:2200] env: PMI_PORT=a21dfd59-26f5-487e-bc78-59ff341d4fac
[01:2200] env: PMI_SMPD_ID=1
[01:2200] env: PMI_APPNUM=0
[01:2200] env: PMI_NODE_IDS=s
[01:2200] env: PMI_RANK_AFFINITIES=a
[01:2200] searching for 'F-testapplication-nodebug.exe' in workdir 'C:\Users\Adm
inistrator\Downloads\test'
[01:2200] C>CreateProcess(C:\Users\Administrator\Downloads\test\F-testapplicatio
n-nodebug.exe F-testapplication-nodebug.exe)
[01:2200] env: PMI_RANK=0
[01:2200] env: PMI_SMPD_KEY=0
[01:2200] 1 -> 2 : returning left_context
[01:2200] forwarding command SMPD_LAUNCH to 2
[01:2200] posting command SMPD_LAUNCH to left child, src=0, dest=2.
[01:2200] Handling cmd=SMPD_LAUNCH result
[01:2200] forward result SMPD_LAUNCH to dest=0
[01:2200] Authentication completed. Successfully obtained Context for Client.
[01:2200] Authorization completed.
[01:2200] version check complete, using PMP version 3.
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_INIT to 0
[01:2200] posting command SMPD_INIT to parent, src=1, ctx_key=0, dest=0.
[01:2200] Authentication completed. Successfully obtained Context for Client.
[01:2200] Authorization completed.
[01:2200] Handling cmd=SMPD_INIT result
[01:2200] forward SMPD_INIT result to dest=1 ctx_key=0
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_INIT to 0
[01:2200] posting command SMPD_INIT to parent, src=2, ctx_key=0, dest=0.
[01:2200] handling command SMPD_BCPUT src=1 ctx_key=0
[01:2200] Handling SMPD_BCPUT command from smpd 1
        ctx_key=0
        rank=0
        value=port=49553 description="10.251.11.178 WIN-MSDP9MK1V14 " shm_host=W
IN-MSDP9MK1V14 shm_queue=1564:196
        result=success
[01:2200] handling command SMPD_BARRIER src=1 ctx_key=0
[01:2200] Handling SMPD_BARRIER src=1 ctx_key=0
[01:2200] initializing barrier(67ef6493-0e3a-4e22-a346-857cf781526a): in=1 size=
1
[01:2200] incrementing barrier(67ef6493-0e3a-4e22-a346-857cf781526a) incount fro
m 0 to 1 out of 2
[01:2200] Handling cmd=SMPD_INIT result
[01:2200] forward SMPD_INIT result to dest=2 ctx_key=0
[01:2200] handling command SMPD_BCPUT src=2 ctx_key=0
[01:2200] Handling SMPD_BCPUT command from smpd 2
        ctx_key=0
        rank=1
        value=port=49487 description="10.249.60.161 WIN-MSDP9MK1V14 " shm_host=W
IN-MSDP9MK1V14 shm_queue=2392:196
        result=success
[01:2200] handling command SMPD_BARRIER src=2 ctx_key=65535
[01:2200] Handling SMPD_BARRIER src=2 ctx_key=65535
[01:2200] incrementing barrier(67ef6493-0e3a-4e22-a346-857cf781526a) incount fro
m 1 to 2 out of 2
[01:2200] all in barrier, release the barrier.
[01:2200] sending reply to barrier command '67ef6493-0e3a-4e22-a346-857cf781526a
'.
[01:2200] sending reply to barrier command '67ef6493-0e3a-4e22-a346-857cf781526a
'.
[01:2200] read 72 bytes from stdout
[01:2200] posting command SMPD_STDOUT to parent, src=1, dest=0.
[01:2200] read 36 bytes from stdout
[01:2200] posting command SMPD_STDOUT to parent, src=1, dest=0.
[01:2200] Handling cmd=SMPD_STDOUT result
[01:2200] cmd=SMPD_STDOUT result will be handled locally
[01:2200] Handling cmd=SMPD_STDOUT result
[01:2200] cmd=SMPD_STDOUT result will be handled locally
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_STDOUT to 0
[01:2200] posting command SMPD_STDOUT to parent, src=2, dest=0.
[01:2200] Handling cmd=SMPD_STDOUT result
[01:2200] forward result SMPD_STDOUT to dest=2
[01:2200] Authentication completed. Successfully obtained Context for Client.
[01:2200] Authorization completed.
[01:2200] handling command SMPD_BCGET src=2 ctx_key=0
[01:2200] Handling SMPD_BCGET command from smpd 2
        ctx_key=0
        rank=0
        value=port=49553 description="10.251.11.178 WIN-MSDP9MK1V14 " shm_host=W
IN-MSDP9MK1V14 shm_queue=1564:196
        result=success
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_STDERR to 0
[01:2200] posting command SMPD_STDERR to parent, src=2, dest=0.
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_STDERR to 0
[01:2200] posting command SMPD_STDERR to parent, src=2, dest=0.
[01:2200] Handling cmd=SMPD_STDERR result
[01:2200] forward result SMPD_STDERR to dest=2
[01:2200] Handling cmd=SMPD_STDERR result
[01:2200] forward result SMPD_STDERR to dest=2
[01:2200] 1 -> 0 : returning parent_context: 0 < 1
[01:2200] forwarding command SMPD_EXIT to 0
[01:2200] posting command SMPD_EXIT to parent, src=2, dest=0.
[01:2200] handling command SMPD_SUSPEND src=0
[01:2200] suspending proc_id=0 succeeded, sending result to parent context
[01:2200] Handling cmd=SMPD_EXIT result
[01:2200] forward result SMPD_EXIT to dest=2
[01:2200] handling command SMPD_KILL src=0
[01:2200] process_id=0 rank=0 refcount=3, waiting for the process to finish exit
ing.
[01:2200] creating an exit command for process id=0 rank=0, pid=1564, exit code
=-1.
[01:2200] posting command SMPD_EXIT to parent, src=1, dest=0.
[01:2200] reading failed, assuming stdout is closed. error 0xc000014b
[01:2200] reading failed, assuming stderr is closed. error 0xc000014b
[01:2200] handling command SMPD_CLOSE from parent
[01:2200] sending close command to left child
[01:2200] Handling cmd=SMPD_EXIT result
[01:2200] cmd=SMPD_EXIT result will be handled locally
[01:2200] handling command SMPD_CLOSED src=2
[01:2200] 1 -> 2 : returning SMPD_CONTEXT_LEFT_CHILD
[01:2200] closed command received from left child.
[01:2200] closed context with error 1726.
[01:2200] sending 'closed' command to parent context
[01:2200] posting command SMPD_CLOSED to parent, src=1, dest=0.
[01:2200] Handling cmd=SMPD_CLOSED result
[01:2200] cmd=SMPD_CLOSED result will be handled locally
[01:2200] smpd manager successfully stopped listening.
[01:2200] SMPD exiting with error code 0.

↧

OPA upgrade from 10.2 to 10.3

January 6, 2017, 6:23 am

Latest and popular articles on Intel Technologies

≫ Next: How do I download older versions of MPI?

≪ Previous: Using MSMPI.dll - fortran crash forrtl 157

Are there best practices for upgrading between OPA versions? e.g. I want to upgrade from 10.2 to 10.3. The installation guide does not seem to have a "upgrade" section - http://www.intel.com/content/dam/support/us/en/documents/network-and-i-o...

Can upgrades be done while in production or should we take a full outage?

It seems like an upgrade just means bring everything down, wipe everything, then follow the 10.3 instructions. Is that correct?

Thanks