Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 930

pmi_proxy not found

$
0
0

Hi,

  I have installed under CentoOS 6.5 (latest release), the latest as of this writing intel mpi system and compilers l_mpi_p_4.1.3.049 and  parallel_studio_xe_2013_sp1_update3.  This is on a Dell T620 system with an 24 cores (Ivy Bridge 12 cores x 2 Cpus).  I have four of these nodes and I am not having the same sort of trouble for the other nodes as this one.  I have reproduced the trouble below, namely when I attempt to start a process using mpi hydra, it always hangs with the error "pmi_proxy: No such file or directory".  On the other hand if I use the mod system, the program (in this case the ab-initio calculation software VASP) starts up and runs without trouble.  I have reinstalled both the mpi and compiler systems and I am have no idea what is causing this problem.  Another symptom is that trying a simple diagnostic such as "mpirun -n 24 hostname" and mpiexec -n 24 hostname" produce different results.  While mpirun results in the same hang with pmi_proxy, mpiexec runs fine (reproduced below).  On the other nodes, "mpirun -n 24 hostname" prints out the hostnames as expected.

 

Any suggestions as to how to fix this would be gratefully appreciated.

 

Paul Fons

 

Output relating to the failure of hydra to run.

 

 

matstud@draco.a04.aist.go.jp:>source /opt/intel/bin/compilervars.sh intel64

matstud@draco.a04.aist.go.jp:>source /opt/intel/impi/4.1.3/intel64/bin/mpivars.sh

matstud@draco.a04.aist.go.jp:>mpdallexit

matstud@draco.a04.aist.go.jp:>mpiexec.hydra -n 24 -env I_MPI_FABRICS shm:ofa vasp

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>ls -l /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

-rwxr-xr-x 1 root root 1001113 Mar  3 17:51 /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy

matstud@draco.a04.aist.go.jp:>mpdboot

matstud@draco.a04.aist.go.jp:>mpiexec -n 24 vasp

 running on   24 total cores

 distrk:  each k-point on   24 cores,    1 groups

 distr:  one band on    1 cores,   24 groups

 using from now: INCAR   

 vasp.5.3.5 31Mar14 (build Apr 04 2014 15:18:05) complex                      

 

 POSCAR found :  2 types and     128 ions

 scaLAPACK will be used

 

 

 

 

Output showing the different behavior of mpirun and mpiexec

 

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpirun -n 24 hostname

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory

^CCtrl-C caught... cleaning up processes

[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed

[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream

[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event

[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpiexec -n 24 hostname

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

draco.a04.aist.go.jp

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpirun

/opt/intel/impi/4.1.3.049/intel64/bin/mpirun

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpiexec

/opt/intel/impi/4.1.3.049/intel64/bin/mpiexec

matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>

 


Viewing all articles
Browse latest Browse all 930

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>