Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 930

Intel MPI with LSF got stdoe_cb assert (!closed) failed.

$
0
0

Dear all,

I am trying to run an application with intel mpi and LSF on our cluster but I still got trouble with it. I have installed the Intel Cluster Studio XE 2013 for Linux and Platform LSF 7.

The application is an extention of RAMS - High Resolution Forecast Europe, Greece, Athens compiled with HDF5, Intel fortran, and Intel mpi. The application normally runs for 6 hours. But sometime, we will get the errors like below:

[mpiexec@cn104] stdoe_cb (./ui/utils/uiu.c:385): assert (!closed) failed

[mpiexec@cn104] control_cb (./pm/pmiserv/pmiserv_cb.c:831): error in the UI defined callback

[mpiexec@cn104] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

[mpiexec@cn104] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:430): error waiting for event

[mpiexec@cn104] main (./ui/mpich/mpiexec.c:847): process manager error waiting for completion



The error happens very often but is not repeatable. Retrying the error run with the same settings will pass.

The bsub command:

$ bsub -x -n 144 -oo ini.log -eo error.log -K 'mpirun -np 144 ./iclams_opt -f ICLAMSIN'

Do you have any idea?

Thanks in advance,

Tingyang Xu


Viewing all articles
Browse latest Browse all 930

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>