Hi,
I have installed under CentoOS 6.5 (latest release), the latest as of this writing intel mpi system and compilers l_mpi_p_4.1.3.049 and parallel_studio_xe_2013_sp1_update3. This is on a Dell T620 system with an 24 cores (Ivy Bridge 12 cores x 2 Cpus). I have four of these nodes and I am not having the same sort of trouble for the other nodes as this one. I have reproduced the trouble below, namely when I attempt to start a process using mpi hydra, it always hangs with the error "pmi_proxy: No such file or directory". On the other hand if I use the mod system, the program (in this case the ab-initio calculation software VASP) starts up and runs without trouble. I have reinstalled both the mpi and compiler systems and I am have no idea what is causing this problem. Another symptom is that trying a simple diagnostic such as "mpirun -n 24 hostname" and mpiexec -n 24 hostname" produce different results. While mpirun results in the same hang with pmi_proxy, mpiexec runs fine (reproduced below). On the other nodes, "mpirun -n 24 hostname" prints out the hostnames as expected.
Any suggestions as to how to fix this would be gratefully appreciated.
Paul Fons
Output relating to the failure of hydra to run.
matstud@draco.a04.aist.go.jp:>source /opt/intel/bin/compilervars.sh intel64
matstud@draco.a04.aist.go.jp:>source /opt/intel/impi/4.1.3/intel64/bin/mpivars.sh
matstud@draco.a04.aist.go.jp:>mpdallexit
matstud@draco.a04.aist.go.jp:>mpiexec.hydra -n 24 -env I_MPI_FABRICS shm:ofa vasp
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
^CCtrl-C caught... cleaning up processes
[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>ls -l /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy
-rwxr-xr-x 1 root root 1001113 Mar 3 17:51 /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy
matstud@draco.a04.aist.go.jp:>mpdboot
matstud@draco.a04.aist.go.jp:>mpiexec -n 24 vasp
running on 24 total cores
distrk: each k-point on 24 cores, 1 groups
distr: one band on 1 cores, 24 groups
using from now: INCAR
vasp.5.3.5 31Mar14 (build Apr 04 2014 15:18:05) complex
POSCAR found : 2 types and 128 ions
scaLAPACK will be used
Output showing the different behavior of mpirun and mpiexec
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpirun -n 24 hostname
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
^CCtrl-C caught... cleaning up processes
[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpiexec -n 24 hostname
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpirun
/opt/intel/impi/4.1.3.049/intel64/bin/mpirun
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpiexec
/opt/intel/impi/4.1.3.049/intel64/bin/mpiexec
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>