Bug in MPI_File_read_all MPI_File_write_all soubroutines

June 23, 2017, 5:00 am

Latest and popular articles on Intel Technologies

≪ Previous: Intel MPI unable to use 'ofa' fabric with Mellanox OFED on ConnectX-4 Lx EN ethernet cards

Dear Intel support team,

Maybe I was not clear enough in my previous topic https://software.intel.com/en-us/comment/1907393#comment-1907393 where two different problems were discussed. Here I would like to concentrate on the issue that was not answered in that topic.

According to the MPI 3.0 standard MPI_File_read_all takes two following parameters:

INTEGER, INTENT(IN) :: count ; TYPE(MPI_Datatype),INTENT(IN) :: datatype

count is 4 bytes integer that can have value of 2147483647. Therefore, It should be possible to Read or write such amount of array elements with the type that I want, which we can provide using the second variable datatype. However, it does not work. For example, if I want to read count=2147483647 elements; datatype=MPI_CHARACTER (1byte) - everything is fine. However, if I would like to read count=2147483647 elements; and datatype=MPI_INTEGER (4 bytes) the subroutine crushes. In the standard it is written that such suboutine should be able to read count=2147483647 elements of any data type but in Intel Implementation it does not work reducing the number of elements that can be read by a factor of the datatype. For example if I will make direved datatype of size 2147483647 bytes - I will be able only to read count=1 element. Again the standard says that I can read 2147483647 elements of any size (not only of 1 bytes elements).

I hope I was clear enough this time. Could you confirm this bug?

Regards, Serhiy

Thread Topic:

Bug Report

↧

Intel MPI with QoS option

June 23, 2017, 5:12 am

Latest and popular articles on Intel Technologies

≫ Next: Memory registration cache feature in DAPL -> random failure in simple code

≪ Previous: Bug in MPI_File_read_all MPI_File_write_all soubroutines

Does Intel MPI have a runtime option for using an Infiniband Quality of Service?

For instance OpenMPI has:

mpirun --mca btl_openib_service_level N

I would be grateful if someone could steer me towards a similar option in Intel MPI

↧

Memory registration cache feature in DAPL -> random failure in simple code

June 26, 2017, 9:04 am

Latest and popular articles on Intel Technologies

≫ Next: Cache management instructions for Intel Xeon Processors

≪ Previous: Intel MPI with QoS option

MPI w/ DAPL user ** beware **

In our open source finite element code, we have encountered a simple
manager-worker code section than fails randomly while moving arrays (blocks)
of double precision data from worker ranks to manager rank 0.

The failures occur (consistently) with DAPL but never with tcp over IB (IPoIB).

After much effort, the culprit was found to be the memory registration
cache feature in DAPL.

This feature/bug is ON by default ** even though ** the manual states:

"The cache substantially increases performance, but may lead
to correctness issues in certain situations."

From: Intel® MPI Library for Linux OS Developer Reference (2017). pg 95

Once we set this option OFF, the code runs successfully for all test cases over
large and small numbers of cluster nodes. The DAPL performance is still
at least 2x better than IPoIB.

export I_MPI_DAPL_TRANSLATION_CACHE=0

Recommendation to Intel MPI group:

Set I_MPI_DAPL_TRANSLATION_CACHE=0 as the DEFAULT. Encourage developers
to explore setting this option ON ** if ** their code works properly
with OFF.

Specifics:

- Intel ifort 17.0.2
- Intel MPI 17.0.2
- Ohio Supercomputer Center, Owens Cluster.
RedHat 7.3
Mellanox EDR (100Gbps) Infiniband
Broadwell/Haswell cluster nodes.

Code section that randomly fails:

-> Blocks are ALLOCATEd with variable size in a Fortran
derived type (itself also allocated to the number of blocks).
All blocks on rank 0 created before this code below is entered.

sync worker ranks to this point

if rank = 0 then

loop sequentially over all blocks to be moved
if rank 0 owns block -> next block
send worker who owns block the block number (MPI_SEND)
receive block from worker (MPI_RECV)
end loop

loop over all workers
send block = 0 to signal we are done moving blocks
end loop

else ! worker code

loop
post MPI_RECV to get a block number
if block number = 0 -> done
if worker does not own this block, the manager made an error !
send root the entire block -> MPI_SEND
end loop

end if

↧

Cache management instructions for Intel Xeon Processors

June 27, 2017, 6:03 pm

Latest and popular articles on Intel Technologies

≫ Next: INTERNAL ERROR with SLURM and PMI2

≪ Previous: Memory registration cache feature in DAPL -> random failure in simple code

Hi all,

I looked around but couldn't see any cache management instructions for Xeon processors (I am working on Xeon E7-8860 v4). I found that we can use _mm_clevict for MIC architecture.

Is there a similar way to do this on Xeon E7-8860 v4? What I am looking to do is, reduce priority of some cache line so that it will be one of the first ones to get evicted. For instance;

int* arr = new int[ length ];

for ( int i = 0; i < length; ++i )

{

// use arr[i]

if ( ( i - 1 ) % CACHE_LINE_SIZE == 0 )

reduce_priority( arr[ i - 1] ); // reduces the priority of the cache line in which a[ i - 1 ] resides in

}

If not, can I achieve this by different means?

Any suggestion will be greatly appreciated.

Thanks a lot.

Matara Ma Sukoy

Thread Topic:

Question

↧

INTERNAL ERROR with SLURM and PMI2

June 28, 2017, 7:20 am

Latest and popular articles on Intel Technologies

≫ Next: failed generate trace file with mpirun

≪ Previous: Cache management instructions for Intel Xeon Processors

I was pleasantly surprised to read that PMI2 & SLURM is supported by Intel MPI in the 2017 release. I tested it, but it fails immediately on my setup. I'm using intel parallel studio 2017 update 4 & SLURM 15.08.13. A simple MPI-program doesn't work:

[donners@int1 pmi2]$ cat mpi.f90
program test
  use mpi
  implicit none

  integer ierr,nprocs,rank

  call mpi_init(ierr)
  call mpi_comm_size(MPI_COMM_WORLD,nprocs,ierr)
  call mpi_comm_rank(mpi_comm_world,rank,ierr)
  if (rank .eq. 0) then
    print *,'Number of processes: ',nprocs
  endif
  print*,'I am rank ',rank
  call mpi_finalize(ierr)

end
[donners@int1 pmi2]$ mpiifort mpi.f90
[donners@int1 pmi2]$ ldd ./a.out
    linux-vdso.so.1 =>  (0x00007ffcc0364000)
    libmpifort.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002ad7432a9000)
    libmpi.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00002ad743652000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002ad744397000)
    librt.so.1 => /lib64/librt.so.1 (0x00002ad74459c000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ad7447a4000)
    libm.so.6 => /lib64/libm.so.6 (0x00002ad7449c1000)
    libc.so.6 => /lib64/libc.so.6 (0x00002ad744c46000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ad744fda000)
    /lib64/ld-linux-x86-64.so.2 (0x00002ad743086000)
[donners@int1 pmi2]$ I_MPI_PMI2=yes srun -n 1 --mpi=pmi2 ./a.out

INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPID_Init:2104
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1716)......: channel initialization failed
MPID_Init(2104)......: fail failed
srun: error: tcn1467: task 0: Exited with exit code 15
srun: Terminating job step 3270641.0

[donners@int1 pmi2]$ srun --version
slurm 15.08.13-Bull.1.0

The same problem occurs on a system with SLURM 17.02.3 (at TACC). What might be the problem here?

With regards,

John

↧

failed generate trace file with mpirun

July 4, 2017, 5:17 am

Latest and popular articles on Intel Technologies

≫ Next: performance of Iallreduce on xeon phi

≪ Previous: INTERNAL ERROR with SLURM and PMI2

Hi,

My application is a python program and mpi is called as mpi4py(built with intel mpi), and it needs to be killed during the runing(it needs a long time, we only profile a little). I use LD_PRELOAD=libVTfs.so mpirun -trace -n 2 python My application, it didn't generate stf as doc said. It only generate a file folder which contains stat-0.bin and stat-1.bin(filesize=0), any wrong with my configure? I already source mpivars.sh, itacvars. Thanks very much!

Zone:

Artificial Intelligence

Thread Topic:

Question

↧

performance of Iallreduce on xeon phi

July 5, 2017, 2:49 am

Latest and popular articles on Intel Technologies

≫ Next: Parallel jobs running on same processors.

≪ Previous: failed generate trace file with mpirun

Hi,

We are trying to use non blocking api(Iallreduce) on computation intensive program, we tried on two nodes(xeon phi) and find two nodes are not balance with intel trace analyzer tool, it said that one node spent more time on Iallreduce(sum?), We want to know whether we can create a thread and let the iallreduce/sum do in one specific core and let it parallel with user code(openmp)? or is there api or config in intel mpi can do this job? thanks

Zone:

Artificial Intelligence

Thread Topic:

Question

↧

Parallel jobs running on same processors.

July 7, 2017, 12:20 pm

Latest and popular articles on Intel Technologies

≫ Next: openmp application performance dropped with I_MPI_ASYNC_PROGRESS=enable

≪ Previous: performance of Iallreduce on xeon phi

Hello,

I just got a KNL system which has 68 cores with 4 threads each. So basically, it should run 272 jobs. I submit my first job using mpiexec and used 64 of them. Then I submitted another one with 64. But when I checked the CPU usage, I found that the two jobs were running on the same 64 threads and left the rest empty. What kind of environmental variable should I set? Or what kind of job schedulers are recommended? Btw, we did not purchase the PBS because it is commercial and out our budget.

Thanks!

Zone:

Server

Thread Topic:

Question

↧

openmp application performance dropped with I_MPI_ASYNC_PROGRESS=enable

July 11, 2017, 4:19 am

Latest and popular articles on Intel Technologies

≫ Next: Separate processes on separate cores

≪ Previous: Parallel jobs running on same processors.

Hi,

I tried MPI/openmp process pining, it seems that When I use non-blocking api(Iallreduce) and specific I_MPI_ASYNC_PROGRESS like the following command, it I set I_MPI_ASYNC_PROGRESS=enable, then application will spent much more time on libiomp.so(kmp_hyper_barrier_release), and vmlinux also got a little hotter, compare with (I_MPI_ASYNC_PROGRESS=disable), is there any issue with my configuration? I use vtune and it shows that all the cores are pin in the right cores. the only difference is core 67 is used by MPI communication thread.

========command=================

mpirun -n 2 -ppn 1 -genv OMP_PROC_BIND=true -genv I_MPI_ASYNC_PROGRESS= -genv I_MPI_ASYNC_PROGRESS_PIN=67 -genv I_MPI_PIN_PROCS=0-66 -genv OMP_NUM_THREADS=67 -genv I_MPI_PIN_DOMAIN=sock -genv I_MPI_FABRICS=ofi -f ./hostfile python train_imagenet_cpu.py --arch alex --batchsize 256 --loaderjob 68 --epoch 100 --train_root /home/jiangzho/imagenet/ILSVRC2012_img_train --val_root /home/jiangzho/imagenet/ILSVRC2012_img_val --communicator naive /home/jiangzho/train.txt /home/jiangzho/val.txt

Zone:

Artificial Intelligence

Thread Topic:

Question

↧

Separate processes on separate cores

July 15, 2017, 4:13 am

Latest and popular articles on Intel Technologies

≫ Next: install Intel Studio Cluster Edition after installing Composer Edition

≪ Previous: openmp application performance dropped with I_MPI_ASYNC_PROGRESS=enable

I'm using MPI to run processes that are nearly independent. They only talk at the very end, for an MPI_GATHER operation. My machine has a 4-core, 8-thread CPU. I run it with:

mpirun -n 101 ./a.out

When I do so, I see (from htop) that it utilises 100% of all the threads. How do I bind it to just the cores? (I tries '-map-by core')

Also, I see that all the processes seeem to be running concurrently (with ~ 3 - 8 % per process). Wouldn't it be more efficient if each process got 100% till each reaches the point of GATHERing ?

↧

install Intel Studio Cluster Edition after installing Composer Edition

July 14, 2017, 4:35 pm

Latest and popular articles on Intel Technologies

≫ Next: Issue with MPI_Iallreduce and MPI_IN_PLACE

≪ Previous: Separate processes on separate cores

Hi all,

I am a student and was using Intel Studio XE Composer edition for the past year. Recently I realized Intel® Trace Analyzer and Collector is also available for students with Cluster Edition. I wish to install only this tool without having to uninstall my previous installation of Composer Edition. When I attempt to customize my installation I get the following error msg:

Product files will be installed here:

/opt/intel/

"The install directory path cannot be changed because at least one software product component was detected as having already been installed on the system".

Any help how can I solve this issue and install only Trace Analyzer is highly appreciated.

Note that I have many jobs running currently, which I assume will be killed if I uninstall current Studio version. So I highly prefer to not kill my jobs.

↧

Issue with MPI_Iallreduce and MPI_IN_PLACE

July 18, 2017, 5:29 am

Latest and popular articles on Intel Technologies

≫ Next: Performance degration with larger message on knl(>128M)

≪ Previous: install Intel Studio Cluster Edition after installing Composer Edition

Hi,

I'm having some issues with using MPI_Iallreduce and MPI_IN_PLACE with FORTRAN (I haven't tested with C at this point), and I'm unclear if I'm doing something wrong w.r.t the standard. I've created a simple code that I can use to duplicate the issue:

Program Test
Use mpi
Implicit None
Integer, Dimension(0:19) :: test1, test2
Integer :: i, request, ierr, rank
Logical :: complete
Integer :: status(MPI_STATUS_SIZE)

Call MPI_Init(ierr)

do i =0,19
  test1(i) = i
end do

Call MPI_Iallreduce( test1, MPI_IN_PLACE, 20, MPI_INT, MPI_SUM, MPI_COMM_WORLD, request, ierr )
if(ierr /= MPI_Success) print *, "failed"

Call  MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)
if(ierr /= MPI_Success) print *, "failed"

Call MPI_Wait(request, status, ierr)
if(ierr /= MPI_Success) print *, "failed"
do i = 0, 1
if(rank == i) print *, rank , test1
Call MPI_Barrier(MPI_COMM_WORLD, ierr)
end do

End Program Test

I've executed with 2 ranks using this MPI vesrion:

bash-4.1$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2017 Update 2 Build 20170125 (id: 16752)
Copyright (C) 2003-2017, Intel Corporation. All rights reserved.

and the output is not as expected i.e. rank, 0, 2, 4, .... but instead:

0 0 1 2 3 4
5 6 7 8 9 10
11 12 13 14 15 16
17 18 19
1 0 1 2 3 4
5 6 7 8 9 10
11 12 13 14 15 16
17 18 19

i.e. the reduction sum never occurs. If instead of MPI_IN_PLACE I reduce to test2 then the code works correctly.

Am I violating the standard in some way or is there a workaround?

Thanks

Aidan Chalk

↧

Performance degration with larger message on knl(>128M)

July 25, 2017, 10:24 pm

Latest and popular articles on Intel Technologies

≫ Next: Fine-grain time synchronization among HPC nodes

≪ Previous: Issue with MPI_Iallreduce and MPI_IN_PLACE

Hi,

When I ran with IMPI benchmark, it always got an obvious performance drop when buffer size>128MB with OFI, is this reasonable or there is some configuration? Thanks

mpirun -genv I_MPI_STATS=ipm -genv I_MPI_FABRICS=tmi -n 2 -ppn 1 -f hostfile IMB-MPI1 -msglog 20:29 -iter 20000,1000 uniband -time 1000000 -mem 2

#---------------------------------------------------
# Benchmarking Uniband
# #processes = 2
#---------------------------------------------------
#bytes #repetitions Mbytes/sec Msg/sec
0 20000 0.00 607266
1048576 1000 8356.72 7970
2097152 500 8847.71 4219
4194304 250 9295.13 2216
8388608 125 9205.23 1097
16777216 62 9498.35 566
33554432 31 9577.55 285
67108864 15 9564.31 143
134217728 7 9523.83 71
268435456 3 2700.73 10 <-----------performance dropped much
536870912 1 3514.32 7

↧

Fine-grain time synchronization among HPC nodes

July 28, 2017, 6:20 am

Latest and popular articles on Intel Technologies

≫ Next: Intel MPI installation problem

≪ Previous: Performance degration with larger message on knl(>128M)

Hi all,

I need to profile an HPC application on multiple nodes with very low overhead impact. In the application code, I need to monitor MPI synchronization points (barrier, alltoall, etc.). I'm using invariant TSC (RDTSC/RDTSCP instruction) because I cannot rely on clock_gettime() due high overheads of syscalls. I knew that TSCs should be synchronized among cores and sockets on the same node, hence I should have no problems for intra-node timing synchronization.
But I have the following concerns:

1) How can I synchronize TSCs among different nodes with a very fine-grain accuracy (sub-microsecond accuracy)? I think that developers of "Intel Trace Analyzer and Collector" should had similar problems.

2) I suppose that TSCs on different nodes increment always at a fixed nominal frequency. Do you think that invariant clock oscillators can have little drifts? I suppose to yes, but in this case for long application runs, profilers on different nodes can produce inconsistent inter-node timing information. Moreover, If TSCs are affected to clock drifts, I cannot transform time stamp in seconds.

My target system is an HPC machine composed to double-socket Broadwell nodes interconneted with an Omni-Path network.

Thanks to all in advance,
Daniele

↧

Intel MPI installation problem

August 3, 2017, 1:13 am

Latest and popular articles on Intel Technologies

≫ Next: No mpiicc or mpiifort with composer_xe/2016.0.109 ?

≪ Previous: Fine-grain time synchronization among HPC nodes

Hi,

I am trying to install Intel MPI on Windows Server 2012 R2 SERVERSTANDARDCORE but during installation occurs error 1603 connected with installation error 0x80040154: wixCreateInternetShortcuts: failed to create an instance of IUniformResourceLocatorW, and failed to create Internet shortcut. Do you have any idea how I can troubleshoot this?

Thanks for help,

Patrycja

↧

No mpiicc or mpiifort with composer_xe/2016.0.109 ?

August 3, 2017, 2:22 pm

Latest and popular articles on Intel Technologies

≫ Next: Measuring data movement from DRAM to KNL memory

≪ Previous: Intel MPI installation problem

I started a new job and our company has composer_xe/2016.0.109 .When I load the module I do not get any mpiicc or mpiifort compiler? Does one need to have cluster edition for those?

↧

Measuring data movement from DRAM to KNL memory

August 4, 2017, 4:47 pm

Latest and popular articles on Intel Technologies

≫ Next: BCAST error for message size greater than 2 GB

≪ Previous: No mpiicc or mpiifort with composer_xe/2016.0.109 ?

Dear All,

I am implementing and testing LOBPCG algorithm on KNL machine for some big sparse matrices. For the performance report, I need to measure how much data is transferred from DRAM to KNL memory. I am wondering if there is a simple way of doing this. Any help or idea is appreciated.

Regards,

Fazlay

↧

BCAST error for message size greater than 2 GB

August 7, 2017, 9:21 am

Latest and popular articles on Intel Technologies

≫ Next: MPI_Alltoall error when running more than 2 cores per node

≪ Previous: Measuring data movement from DRAM to KNL memory

Hello,

I'm using Intel Fortran 16.0.1 and Intel MPI 5.1.3 and I'm getting an error with bcast as follows:

Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(2231)........: MPI_Bcast(buf=0x2b460bcc0040, count=547061260, MPI_INTEGER, root=0, MPI_COMM_WORLD) failed
MPIR_Bcast_impl(1798)...:
MPIR_Bcast(1826)........:
I_MPIR_Bcast_intra(2007): Failure during collective
MPIR_Bcast_intra(1592)..:
MPIR_Bcast_binomial(253): message sizes do not match across processes in the collective routine: Received -32766 but expected -2106722256

I'm broadcasting Integer array (4 byte) of size 547061260. Is there an upper limit on the message size? The bcast works fine for smaller counts.

Thanks!

↧

MPI_Alltoall error when running more than 2 cores per node

August 7, 2017, 10:18 am

Latest and popular articles on Intel Technologies

≫ Next: cluster error: /mpi/intel64/bin/pmi_proxy: No such file or directory found

≪ Previous: BCAST error for message size greater than 2 GB

We have 6 Intel(R) Xeon(R) CPU D-1557 @ 1.50GHz nodes, each containing 12 cores. hpcc version 1.5.0 has been compiled with Intel's MPI and MLK. We are able to run hpcc successfully when configuring mpirun for 6 nodes and 2 cores per node. However, attempting to specify more than 2 cores per nodes (we have 12) causes the error "invalid error code ffffffff (Ring Index out of range) in MPIR_Alltoall_intra:204"

Any ideas as to what could be causing this issue?

The following environment variables have been set:
I_MPI_FABRICS=tcp
I_MPI_DEBUG=5
I_MPI_PIN_PROCESSOR_LIST=0,1,2,3,4,5,6,7,8,9,10,11

The MPI library version is:
Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193)

hosts.txt contains a list of 6 hostnames

The line below shows how mpirun is specified to execute hpcc on all 6 nodes, 3 cores per node:
mpirun -print-rank-map -n 18 -ppn 3 --hostfile hosts.txt hpcc

INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIR_Alltoall_intra:204
Fatal error in PMPI_Alltoall: Other MPI error, error stack:
PMPI_Alltoall(974)......: MPI_Alltoall(sbuf=0x7fcdb107f010, scount=2097152, dtype=USER<contig>, rbuf=0x7fcdd1080010, rcount=2097152, dtype=USER<contig>, comm=0x84000004) failed
MPIR_Alltoall_impl(772).: fail failed
MPIR_Alltoall(731)......: fail failed
MPIR_Alltoall_intra(204): fail failed

Thanks!

↧

cluster error: /mpi/intel64/bin/pmi_proxy: No such file or directory found

August 10, 2017, 10:30 am

Latest and popular articles on Intel Technologies

≫ Next: Scalapack raise error under certain circumstance

≪ Previous: MPI_Alltoall error when running more than 2 cores per node

Hi,

I've installed Intel parallel studio cluster edition in single node installation configuration on the master node cluster of 8 nodes with 8 processors each. I've performed the pre-requisite steps before installation and verified shell connectivity also running the .sshconnectivity and creating machines.LINUX file which gave the result as suggesting all 8 nodes are found as follows:

*******************************************************************************
Node count = 8
Secure shell connectivity was established on all nodes.
See the log output listing "/tmp/sshconnectivity.aditya.log" for details.
Version number: $Revision: 259 $
Version date: $Date: 2012-06-11 23:26:12 +0400 (Mon, 11 Jun 2012) $
*******************************************************************************

machines.LINUX file has the following hostnames:

octopus100.ubi.pt
compute-0-0.local
compute-0-1.local
compute-0-2.local
compute-0-3.local
compute-0-4.local
compute-0-5.local
compute-0-6.local

I started the installation and installed all the modules in /export/apps/intel directory which can be accessed by all nodes as suggested by the administrator of the cluster. After completing the installation I've added the compilers environmental variable psxevar.sh and mpivars.sh to the bash script as advised in the getting started manual. I then prepared the hostfile with all the nodes of the cluster for running in the mpi environment and verifies the shell connectivity by running .sshconnectivity form the installation directory and it worked like earlier and detected all nodes successfully.

i wanted to check the cluster configuration, so I compiled and executed the test.c program in the mpi/test directory of the instalation. I compiled well but when I executed myprog it returned the error: /mpi/intel64/bin/pmi_proxy: No such file or directory found as follows:

[aditya@octopus100 Desktop]$ mpiicc -o myprog test.c
[aditya@octopus100 Desktop]$ mpirun -n 2 -ppn 1 -f ./hostfile ./myprog
Intel(R) Parallel Studio XE 2017 Update 4 for Linux*
Copyright (C) 2009-2017 Intel Corporation. All rights reserved.
bash: /export/apps/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/pmi_proxy: No such file or directory
^C[mpiexec@octopus100.ubi.pt] Sending Ctrl-C to processes as requested
[mpiexec@octopus100.ubi.pt] Press Ctrl-C again to force abort
[mpiexec@octopus100.ubi.pt] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@octopus100.ubi.pt] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:252): unable to write data to proxy
[mpiexec@octopus100.ubi.pt] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:174): unable to send signal downstream
[mpiexec@octopus100.ubi.pt] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@octopus100.ubi.pt] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event
[mpiexec@octopus100.ubi.pt] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion

later I referred trouble shooting manual then it suggested running a non-mpi for hostname and it returned the same error as follows:

[aditya@octopus100 Desktop]$ mpirun -ppn 1 -n 2 -hosts compute-0-0.local, compute-0-1.local hostname
Intel(R) Parallel Studio XE 2017 Update 4 for Linux*
Copyright (C) 2009-2017 Intel Corporation. All rights reserved.
bash: /export/apps/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/bin/pmi_proxy: No such file or directory
^C[mpiexec@octopus100.ubi.pt] Sending Ctrl-C to processes as requested
[mpiexec@octopus100.ubi.pt] Press Ctrl-C again to force abort
[mpiexec@octopus100.ubi.pt] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@octopus100.ubi.pt] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:252): unable to write data to proxy
[mpiexec@octopus100.ubi.pt] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:174): unable to send signal downstream
[mpiexec@octopus100.ubi.pt] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@octopus100.ubi.pt] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:501): error waiting for event
[mpiexec@octopus100.ubi.pt] main (../../ui/mpich/mpiexec.c:1147): process manager error waiting for completion

When I included the master ode octopus100.ubi.pt it worked only for that node but the rest nodes are not able to run the mpi commands I guess. I think may it is an environmental problem as the cluster nodes are not able to perform mpi communications with the master node.

Please help me resolve this issue so that I can perform some simulations on the cluster.

Thanks,

Aditya

↧