Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 930

Intel mpi/openmp hybrid programming on clustering!

$
0
0

Hello, Admin!
I'm now using Intel Cluster Studio Tool Kit! And I'm trying to run hybrid(mpi+openmp) program on 25 compute nodes!I compile my program using with -mt_mpi -openmp. I use I_MPI_DOMAIN=omp OMP_NUM_THREADS=2 environment variables, that means for every process(mpi) will have 2 threads(openmp).  I can run my program without errors still using with 14 compute nodes! But beyond 14 compute nodes, error outputs is following!

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)......................: 
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................: 
MPID_nem_tcp_post_init(344)................: 
MPID_nem_newtcp_module_connpoll(3099)......: 
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(344)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(344)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(344)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(344)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(344)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3099): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3113): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3113): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3113): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.

Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................: 
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................: 
MPID_nem_tcp_post_init(337)..........: 
MPID_nem_newtcp_module_connpoll(3113): 
gen_read_fail_handler(1194)..........: read from socket failed - The specified
job aborted:
rank: node: exit code[: error message]
0: HPC-01: 1: process 0 exited without calling finalize
1: HPC-02: 123
2: HPC-03: 1: process 2 exited without calling finalize
3: HPC-04: 1: process 3 exited without calling finalize
4: HPC-05: 1: process 4 exited without calling finalize
5: HPC-06: 1: process 5 exited without calling finalize
6: HPC-07: 1: process 6 exited without calling finalize
7: HPC-08: 1: process 7 exited without calling finalize
8: HPC-09: 1: process 8 exited without calling finalize
9: HPC-10: 1: process 9 exited without calling finalize
10: HPC-11: 1: process 10 exited without calling finalize
11: HPC-12: 1: process 11 exited without calling finalize
12: HPC-13: 1: process 12 exited without calling finalize
13: HPC-14: 1: process 13 exited without calling finalize
14: HPC-16: 1: process 14 exited without calling finalize
15: HPC-17: 1: process 15 exited without calling finalize
network name is no longer available.

 


Viewing all articles
Browse latest Browse all 930

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>