Quantcast
Channel: Clusters and HPC Technology
Viewing all 930 articles
Browse latest View live

Getting errors due to expired PGP key for MPI library

$
0
0

  Following the directions at

   https://software.intel.com/en-us/articles/installing-intel-free-libs-and...

  I downloaded the public key from
   https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS...

  previously and kept a copy.  Recently my use of that key started to fail and so I downloaded the key again  and noticed that it now has two values in it, which is odd.

 

  I tried using the public key that has two keys in it and it mostly works -- in some of the zones in GCP this works fine but in others it doesn't.    In one of them that doesn't I did an "apt-key list" and noticed that one of them was expired:

 

pub   2048R/7E6C5DBE 2019-09-30 [expires: 2023-09-30]
uid                  Intel(R) Software Development Products

pub   2048R/1911E097 2016-09-28 [expired: 2019-09-27]
uid                  "CN = Intel(R) Software Development Products", O=Intel Corporation

 

I wasn't able to do an "apt-key del" and so I started over with just the new key.  it shows up with apt-key list as okay:

 

pub   2048R/7E6C5DBE 2019-09-30 [expires: 2023-09-30]
uid                  Intel(R) Software Development Products

 

but I can't do an update:

$ sudo apt-get update

W: GPG error: https://apt.repos.intel.com/intelpython binary/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1A8497B11911E097
W: The repository 'https://apt.repos.intel.com/intelpython binary/ InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: GPG error: https://apt.repos.intel.com/mkl all InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1A8497B11911E097
W: The repository 'https://apt.repos.intel.com/mkl all InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: GPG error: https://apt.repos.intel.com/ipp all InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1A8497B11911E097

By the way I tried to email otc-digital-experiences@intel.com as mentioned on https://software.intel.com/en-us/faq and it came back as undeliverable.


CFD: Intel MPI + Mellanox InfiniBand + Win 10...?

$
0
0

Hi!

Does anyone know if it is possible to run Intel MPI with Mellanox InfiniBand (ConnectX-5 or 6) cards running Mellanox's latest WinOF-2 v2.2 in a Windows 10 environment? I've been googling and reading for hours but I can't find any concrete information.

This is for running Ansys CFX/Fluent on a relatively small CFD cluster of 4 compute nodes. The current release of CFX/Fluent (2019 R3) runs on Intel MPI 2018 Release 3 by default.

Older versions of Intel MPI (2017, for example) listed specifically "Windows* OpenFabrics* (WinOF*) 2.0 or higher" and "Mellanox* WinOF* Rev 4.40 or higher" as supported InfiniBand software. “Windows OpenFabrics (WinOF)” appears to be dead and does not support Windows 10. The older Mellanox WinOF Rev 4.40 does not support the newest Mellanox IB cards.

The release notes for Intel MPI 2018 and newer does not mention these older InfiniBand software, and instead mentions Intel Omni-Path.

Mellanox's own release notes for WinOF-2 v2.2 only mentions Microsoft MS MPI for the MPI protocol. ANSYS does run on MS MPI, but then I think I would have to move the cluster over to a Windows Server OS environment. I currently run the cluster successfully on Windows 10 using Intel MPI, but over 10GigE and not InfiniBand.

Thanks for any pointers!

Cheers.

 

 

MPI_FILE_WRITE_SHARED odd behavior?

$
0
0

I'm using MPI_FILE_WRITE_SHARED to write some error output to a single file.  NFS is used so that all nodes write/read to the same files.  When I run the program on any single node with multiple processes, error output occurs correctly.  However, when I run the code across multiple nodes, nothing get written to the file.  Here's a simple test program

   Program MPIwriteTest   
      use mpi
      implicit none
      integer mpiFHerr, mpiErr, myRank
      character (len=80) string
      character(len=2), parameter:: CRLF = char(13)//char(10) 
      
      ! Initialze MPI and get rank
      call MPI_INIT( mpierr )
      call MPI_COMM_RANK(MPI_COMM_WORLD, myRank, mpierr)
      
      ! open and close file MPIerror.dat to delete any existing file
      call MPI_FILE_OPEN(MPI_COMM_WORLD, 'MPIerror.dat', MPI_MODE_WRONLY+MPI_MODE_CREATE+MPI_MODE_SEQUENTIAL+MPI_MODE_DELETE_ON_CLOSE, &
               MPI_INFO_NULL, mpiFHerr, mpiErr)
      call MPI_FILE_CLOSE(mpiFHerr, mpiErr) ! This will delete the file.          
      ! open but don't delete on close         
      call MPI_FILE_OPEN(MPI_COMM_WORLD, 'MPIerror.dat', MPI_MODE_WRONLY+MPI_MODE_CREATE+MPI_MODE_SEQUENTIAL, &
               MPI_INFO_NULL, mpiFHerr, mpiErr)
   
      ! test code just just does simple write
      write(string,'(a,i0)') 'Error from process: ', myRank
      call MPI_FILE_WRITE_SHARED(mpiFHerr, trim(string)//CRLF, len_trim(string)+2, MPI_CHARACTER, MPI_STATUS_IGNORE, mpiErr)
      
      ! close and end
      call MPI_FILE_CLOSE(mpiFHerr, mpiErr)
      call MPI_FINALIZE(mpierr)    
      
   end program MPIwriteTest

I've also noticed if the file already exists (and I don't do the open and delete_on_close), then the file contains text, but sometime the file is corrupt.  Is there something wrong in this code?  Is MPI not playing well with NFS?

BTW, I'm using parallel studio xe 2019 update 4 cluster ed.  

thanks, -joe

MPI code runs just on one core, problem with hydra service

$
0
0

I am trying to run the simple hello world code in fortran using Intel MPI library. But all cores have the same rank, as if the program does not run on more than one core. I was following the troubleshooting procedures provided by Intel (Point 2 - https://software.intel.com/en-us/mpi-developer-guide-windows-troubleshoo...), and I got this:

C:\Program Files (x86)\IntelSWTools>mpiexec -ppn 1 -n 2 -hosts node01,node02 hostname
[mpiexec@Sebastian-PC] HYD_sock_connect (..\windows\src\hydra_sock.c:216): getaddrinfo returned error 11001
[mpiexec@Sebastian-PC] HYD_connect_to_service (bstrap\service\service_launch.c:76): unable to connect to service at node01:8680
[mpiexec@Sebastian-PC] HYDI_bstrap_service_launch (bstrap\service\service_launch.c:416): unable to connect to hydra service
[mpiexec@Sebastian-PC] launch_bstrap_proxies (bstrap\src\intel\i_hydra_bstrap.c:525): error launching bstrap proxy
[mpiexec@Sebastian-PC] HYD_bstrap_setup (bstrap\src\intel\i_hydra_bstrap.c:714): unable to launch bstrap proxy
[mpiexec@Sebastian-PC] wmain (mpiexec.c:1919): error setting up the boostrap proxies

Any ideas how to fix it? Any help would be appreciated.

 

Update 5 psxevars.sh echos: "Intel(R) Parallel Studio XE 2019 Update 4 for Linux*"

$
0
0

Argh, please fix on next update.

Intel 2019.5, MPI compiles but does not run

$
0
0

I am able to compile a hello_world.c program with mpiicc but am unable to get it to run.  It works for me with Intel 2014 - 2018, but not with 2019.5.

Debugging output:

===================================================================================
hjohnson@tuxfast:/tmp/intelmpi-2019$ env I_MPI_DEBUG=6 I_MPI_HYDRA_DEBUG=on mpirun -np 1 ./a.out
[mpiexec@tuxfast] Launch arguments: /project/software/intel_psxe/2019_update1/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_bstrap_proxy --upstream-host tuxfast --upstream-port 33709 --pgid 0 --launcher ssh --launcher-number 0 --base-path /project/software/intel_psxe/2019_update1/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin/ --tree-width 16 --tree-level 1 --time-left -1 --collective-launch 1 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /project/software/intel_psxe/2019_update1/compilers_and_libraries_2019.5.281/linux/mpi/intel64/bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 30410 RUNNING AT tuxfast
=   KILLED BY SIGNAL: 4 (Illegal instruction)
===================================================================================
hjohnson@tuxfast:/tmp/intelmpi-2019$ env I_MPI_DEBUG=6 I_MPI_HYDRA_DEBUG=on gdb ./a.out
...
(gdb) run
Starting program: /tmp/intelmpi-2019/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
MPL_dbg_pre_init (argc_p=0x0, argv_p=0x0, wtimeNotReady=61440) at ../../../../src/mpl/src/dbg/mpl_dbg.c:722
722     ../../../../src/mpl/src/dbg/mpl_dbg.c: No such file or directory.
(gdb) backtrace
#0  MPL_dbg_pre_init (argc_p=0x0, argv_p=0x0, wtimeNotReady=61440) at ../../../../src/mpl/src/dbg/mpl_dbg.c:722
#1  0x00001555545850fe in PMPI_Init (argc=0x0, argv=0x0) at ../../src/mpi/init/init.c:225
#2  0x0000000000400ec3 in main ()
(gdb) quit
A debugging session is active.

 

MPI rank reordering

$
0
0

Dear all,

I am currently using the MPI distributed graph topologies and I allow rank reordering (https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node195.htm#Node195).

 

However, after small tests, I noticed that the Intel MPI (2019) does not reorder my ranks.
Since it increases the complexity of the code, I would like to be sure that in some cases it will be useful.

Does Intel MPI reorder the ranks for MPI topologies? If yes, what are the requirements (machine files etc...)?

Thank you very much for your help!

MPI 2019's mpitune

$
0
0

Hi all

Has anybody has much luck/experience with mpitune under 2019?  It feels like a **lot** more work than the equivalent sort of activities under previous versions, so I'm wondering if I'm making it more difficult than need be?

Specific 'challenges':

1. 'msg_size' is not considered in the example/supplied tuning configurations.  I feel it should be, as that in particular affects the algorithm choice for optimal performance (e.g. ALLTOALLV switches optimal algorithm at message size of 1KB).  I guess I can manually fiddle with the mpitune generated JSON configuration, but that doesn't feel quite right, unless I'm just being lazy.

2.  I'm not quite sure I understand the supplied and downloadable (https://software.intel.com/en-us/articles/replacing-tuning-configuration...) tuning files, perhaps due to a lack of words surrounding them. Are they used by default in any shape or only when specified?

3.  'Autotuning' - linked to the above - is this intended to be used 'live' and repeatedly or should I be using it once and then capturing something from it?  Lots of config variables surrounding this.

Perhaps I'm missing some key document or reading or understanding, but any comments or thoughts would be appreciated.

~~
A


MPI bad termination due to STOP

$
0
0

We observed a peculiar behavior of Intel MPI with exit status emitted from STOP statement.

PROGRAM hello
    USE mpi
    IMPLICIT NONE

    INTEGER :: rank, size, ierror

    CALL MPI_INIT(ierror)
    CALL MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror)
    CALL MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)

    PRINT *, 'rank', rank, ': Hello, World!'

    CALL MPI_FINALIZE(ierror)

    STOP 2
END

The exit status will be utilized for quick debugging purpose in our codes.

With Intel MPI 2019 Update 4, we received bad termination errors, for instance using 2 MPI ranks:

 rank           1 : Hello, World!
 rank           0 : Hello, World!
2
2

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 167623 RUNNING AT login03
=   EXIT STATUS: 2
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 167624 RUNNING AT login03
=   EXIT STATUS: 2
===================================================================================

This behavior was not observed with previous versions of the library. We are not sure whether this is a bug of IMPI or an intended feature. 

Ideally, we are looking for a way to suppress this bad termination error.

Thanks.

Can't get IntelMPI to run with Altair PBS job manager

$
0
0

I run a 16 node 256 core Dell cluster running on Redhat Enterprise Linux.

Our primary use is to run the engineering software LSTC LS-Dyna. With a recent change in LSTC licensing the newest versions of the software we want to now will only run using IntelMPI (previously we used PlatformMPI).

I cannot however now seem to get the PBS job submission script to work with InelMPI that used to work with PlatformMPI.

The submission script reads (with the last line being the submission line for the L-Dyna testjob.k):

#!/bin/bash
#PBS -l select=8:ncpus=16:mpiprocs=16
#PBS -j oe
cd $PBS_JOBDIR
echo "starting dyna .. "
machines=$(sort -u $PBS_NODEFILE)
ml=""
for m in $machines
do
   nproc=$(grep $m $PBS_NODEFILE | wc -l)
   sm=$(echo $m | cut -d'.' -f1)
   if [ "$ml" == "" ]
   then
      ml=$sm:$nproc
   else
      ml=$ml:$sm:$nproc
   fi
done
echo Machine line: $ml
echo PBS_O_WORKDIR=$PBS_O_WORKDIR
echo "Current directory is:"
pwd
echo "machines"
/opt/intel/impi/2018.4.274/intel64/bin/mpirun -machines $ml /usr/local/ansys/v170/ansys/bin/linx64/ls-dyna_mpp_s_R11_1_0_x64_centos65_ifort160_sse2_intelmpi-2018 i=testjob.k pr=dysmp

When i attempt to run this job via PBS job manager and I look into the standard error file I see:

[mpiexec@gpunode03.hpc.internal] HYDU_sock_write (../../utils/sock/sock.c:418): write error (Bad file descriptor)
[mpiexec@gpunode03.hpc.internal] HYD_pmcd_pmiserv_send_signal (../../pm/pmiserv/pmiserv_cb.c:253): unable to write data to proxy
[mpiexec@gpunode03.hpc.internal] ui_cmd_cb (../../pm/pmiserv/pmiserv_pmci.c:176): unable to send signal downstream
[mpiexec@gpunode03.hpc.internal] HYDT_dmxu_poll_wait_for_event (../../tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@gpunode03.hpc.internal] HYD_pmci_wait_for_completion (../../pm/pmiserv/pmiserv_pmci.c:520): error waiting for event
[mpiexec@gpunode03.hpc.internal] main (../../ui/mpich/mpiexec.c:1157): process manager error waiting for completion

I know I can submit a job manually (no PBS involved) and it will run on a node of the cluster ok using the IntelMPI.

So I have boiled the issue down to the section of the submission line that says -machines $ml to do with the node allocation.

For some reason IntelMPI does not accept this syntax whereas PlatormMPI did?

I am quite stumped here and any advice would be greatly appreciated.

Thanks.

Richard.

 

 

Intel MPI, GCC, and mpi_f08

$
0
0

All,

This is both a question and a feature request.

First the question. What is the "correct" way for Intel MPI to support use of GCC 9? I know (as of Intel MPI 19.0.2, the latest I have access to) that it has support for GCC 4-8. I also know there is a binding kit for Intel MPI that could make bindings for Intel MPI with GCC 9. But, as far as I can tell, the mpif90/mpifc scripts have no idea GCC 9 exists. Does one need to actually edit these scripts to add a 9) case so that the scripts can find the appropriate include/ directories?

Second, the "feature request" but I'm not sure where one makes those. Namely, I was wondering if it's possible for Intel to add support for use mpi_f08 with GCC compilers. At the moment, the included bindings (and the kit) only have the F90 modules and not the F08 modules. I can see not supporting them with really old GCC, but I'm fairly certain I've built Open MPI with GCC 8 and it makes the mpi_f08 mod files just fine. I tried looking at the binding kit, but it only compiles the f90 modules as well it seems.

Thanks,

Matt

TCE Open Date: 

Thursday, November 14, 2019 - 05:48

Running MPI jobs from inside Singularity container with Intel MPI 2019.5

$
0
0

Hi All,

As per the recent webinar introducing new Intel MPI 2019 update 5 features, it is now in theory possible to include Intel MPI libaries, and call mpirun for a multi-node MPI job entirely inside a Singularity container, with no need to have Intel MPI installed outside the container. So instead of launching an MPI job in a container using an external MPI stack, like so:

     mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> singularity exec <container_name> <path_to_executable_inside_container>

one should now be able to do:

    singularity exec <container_name> mpirun -n <nprocs> -perhost <procs_per_node> -hosts <hostlist> <path_to_executable_inside_container>

I have the Intel MPI 2019.5 libraries (as well as Intel run-time libraries for C++), plus libfabric, inside my container, along with sourcing the following in the container:

cat /.singularity.d/env/90-environment.sh 
#!/bin/sh
# Custom environment shell code should follow
    source /opt/intel/bin/compilervars.sh intel64
    source /opt/intel/impi/2019.5.281/intel64/bin/mpivars.sh -ofi_internal=1 release

This is not working so far. Below I illustrate with a simple test, and run from inside the container (shell mode), and get the following error messages after about 20-30 seconds of the command just hanging with no output:

Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_DEBUG=500
Singularity image.sif:~/singularity/fv3-upp-apps> export FI_PROVIDER=verbs
Singularity image.sif:~/singularity/fv3-upp-apps> export FI_VERBS_IFACE="ib0"
Singularity image.sif:~/singularity/fv3-upp-apps> export I_MPI_FABRICS=shm:ofi
Singularity image.sif:~/singularity/fv3-upp-apps> mpirun -n 78 -perhost 20 -hosts appro07,appro08,appro09,appro10 hostname 
[mpiexec@appro07.internal.redlineperf.com] check_exit_codes (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:114): unable to run proxy on appro07 (pid 109898)
[mpiexec@appro07.internal.redlineperf.com] poll_for_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:152): check exit codes error
[mpiexec@appro07.internal.redlineperf.com] HYD_dmx_poll_wait_for_proxy_event (../../../../../src/pm/i_hydra/libhydra/demux/hydra_demux_poll.c:205): poll for event error
[mpiexec@appro07.internal.redlineperf.com] HYD_bstrap_setup (../../../../../src/pm/i_hydra/libhydra/bstrap/src/intel/i_hydra_bstrap.c:731): error waiting for event
[mpiexec@appro07.internal.redlineperf.com] main (../../../../../src/pm/i_hydra/mpiexec/mpiexec.c:1919): error setting up the boostrap proxies

I also tried just calling mpirun using just one host (and only enough processes that fit on one host), with the same result.

Is there a specific list of dependencies (e.g. do I need openssh-clients installed?) to use this all-inside-the-container approach? I do not see anything within the Intel MPI 2019 upsate 5 Developer Reference about running with Singularity containers.

 

Thanks, Keith

How to tell what I_MPI_ADJUST are set to with Intel MPI 19

$
0
0

Is there a way with Intel MPI 19 to see what the I_MPI_ADJUST_* values are set to? 

With Intel 18.0.5, I see a lot like:

[0] MPI startup(): Gather: 3: 3073-16397 & 129-2147483647
[0] MPI startup(): Gather: 2: 16398-65435 & 129-2147483647
[0] MPI startup(): Gather: 3: 0-2147483647 & 129-2147483647
[0] MPI startup(): Gatherv: 1: 0-2147483647 & 0-2147483647
[0] MPI startup(): Reduce_scatter: 4: 0-0 & 0-8
[0] MPI startup(): Reduce_scatter: 1: 1-16 & 0-8

On my cluster which admittedly only has Intel 19.0.2 at the moment installed, I tried running various codes with Intel MPI 19.0.2 and I_MPI_DEBUG set from 1 to 1000 and...not much. For example, when running a hello world:

(1189)(master) $ mpiifort -V
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.0.2.187 Build 20190117
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

(1190)(master) $ mpirun -V
Intel(R) MPI Library for Linux* OS, Version 2019 Update 2 Build 20190123 (id: e2d820d49)
Copyright 2003-2019, Intel Corporation.
(1191)(master) $ mpirun -genv I_MPI_DEBUG=1000 -np 4 ./helloWorld.mpi3.hybrid.IMPI19.exe
[0] MPI startup(): libfabric version: 1.7.0a1-impi
[0] MPI startup(): libfabric provider: psm2
[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       126170   borga065   {0,1,2,3,4,5,6,7,8,9}
[0] MPI startup(): 1       126171   borga065   {10,11,12,13,14,15,16,17,18,19}
[0] MPI startup(): 2       126172   borga065   {20,21,22,23,24,25,26,27,28,29}
[0] MPI startup(): 3       126173   borga065   {30,31,32,33,34,35,36,37,38,39}
Hello from thread    0 out of    1 on process    1 of    4 on processor borga065
Hello from thread    0 out of    1 on process    2 of    4 on processor borga065
Hello from thread    0 out of    1 on process    3 of    4 on processor borga065
Hello from thread    0 out of    1 on process    0 of    4 on processor borga065

Honestly I'm used to I_MPI_DEBUG being *very* verbose, but I guess not anymore? Is there another value I need to set?

Thanks,

Matt

TCE Open Date: 

Wednesday, November 20, 2019 - 08:59

The Fortran program is not working with Windows 10 as it used to work with Windows 7

$
0
0

Hi everyone,

It is a 64bit Intel Fortran program that was compiled by Compiler 11.1.51 under Windows 7 pro 64bit and MS MPI. When I moved to Windows 10 and run the same program I noticed that arrays are not any more sent between tasks. I used to specify the array with a starting location, say A(1), and its length, say 100.  I unstalled the old MS MPI and replace it with a new one but the problem still exists. Is there any idea why is that? I need help.

Best regards.

Said.

TCE Open Date: 

Saturday, November 23, 2019 - 11:03

The Fortran program is not working with Windows 10 as it used to work with Windows 7

$
0
0

Hi everyone,

I have a 64bit Fortran program that was compiled by Intel Fortran Compiler under Windows 7 pro 64bit and MS MPI 2008 R2. When I moved to Windows 10 and run the same program I noticed that arrays are not any more sent between tasks. I used to send the array variable specifying its start location; say A(1), and it's length; say 100. Only non-array variables are sent and received. I unstalled the old MS MPI and replace it with a new one (V10) but the problem still exists. Is there any idea why is that? I need help.

Best regards.

Said

TCE Open Date: 

Sunday, November 24, 2019 - 07:07

What is HPC cluster used for?

$
0
0

I would like to know What is HPC cluster used for? If you have resources or knowledge do share here.

Thanks.

Declan Lawton,

Trainee at Moweb Technologies

TCE Open Date: 

Monday, November 25, 2019 - 03:28

MLNX_OFED_LINUX-4.6-1.0.1.1 (OFED-4.6-1.0.1) has hang issue with Intel MPI 5.0 - While using OFA fabric

$
0
0

While using MPI with I_MPI_FABRIC as shm:ofa with EDR & OFED 4.6-1.0.1.1. The RDMA pull hangs.

An another incident was noticed with different version of software having same combinations, memory corruption happens.

Is there any problem with latest OFED (4.6) with Intel MPI (OFA)?

Note: The same software run fine with DAPL fabric selection.

Thanks in advance.

 

 

TCE Open Date: 

Sunday, December 1, 2019 - 23:15

mpiexec.hydra 2019u4 crashes on AMD Zen2

$
0
0

Hello,

mpexec.hydra binary from Inltel 2019U4 crashes on Zen2 and Zen1 platforms.

 

 

user@Zen1[pts/0]stream $ mpirun -np 2   /vend/intel/parallel_studio_xe_2019_update4/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1

/vend/intel/parallel_studio_xe_2019_update4/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpirun: line 103:  7399 Floating point exception(core dumped) mpiexec.hydra "$@" 0<&0

 

user@Zen2[pts/1]demo $ mpirun -np 2   /vend/intel/parallel_studio_xe_2019_update4/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1

/vend/intel/parallel_studio_xe_2019_update4/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpirun: line 103: 121108 Floating point exception(core dumped) mpiexec.hydra "$@" 0<&0

A strace reveals that mpiexec.hydra crashes trying to parse to processor configuration, I believe binary cpuininfo suffers from the same symptoms.

...

openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, [{d_ino=37, d_off=1, d_reclen=24, d_name=".", d_type=DT_DIR}, {d_ino=9, d_off=2690600, d_reclen=24, d_name="..", d_type=DT_DIR}, {d_ino=170171, d_off=25909499, d_reclen=24, d_name="smt", d_type=DT_DIR}, {d_ino=90582, d_off=25909675, d_reclen=24, d_name="cpu0", d_type=DT_DIR}, {d_ino=90600, d_off=25909851, d_reclen=24, d_name="cpu1", d_type=DT_DIR}, {d_ino=90619, d_off=25910027, d_reclen=24, d_name="cpu2", d_type=DT_DIR}, {d_ino=90638, d_off=25910203, d_reclen=24, d_name="cpu3", d_type=DT_DIR}, {d_ino=90657, d_off=25910379, d_reclen=24, d_name="cpu4", d_type=DT_DIR}, {d_ino=90676, d_off=25910555, d_reclen=24, d_name="cpu5", d_type=DT_DIR}, {d_ino=90695, d_off=25910731, d_reclen=24, d_name="cpu6", d_type=DT_DIR}, {d_ino=90714, d_off=25910907, d_reclen=24, d_name="cpu7", d_type=DT_DIR}, {d_ino=90733, d_off=25911083, d_reclen=24, d_name="cpu8", d_type=DT_DIR}, {d_ino=90752, d_off=141151836, d_reclen=24, d_name="cpu9", d_type=DT_DIR}, {d_ino=222492, d_off=141566558, d_reclen=32, d_name="cpufreq", d_type=DT_DIR}, {d_ino=82070, d_off=285014906, d_reclen=32, d_name="cpuidle", d_type=DT_DIR}, {d_ino=90771, d_off=285015082, d_reclen=32, d_name="cpu10", d_type=DT_DIR}, {d_ino=90790, d_off=285015258, d_reclen=32, d_name="cpu11", d_type=DT_DIR}, {d_ino=90809, d_off=285015434, d_reclen=32, d_name="cpu12", d_type=DT_DIR}, {d_ino=90828, d_off=285015610, d_reclen=32, d_name="cpu13", d_type=DT_DIR}, {d_ino=90847, d_off=285015786, d_reclen=32, d_name="cpu14", d_type=DT_DIR}, {d_ino=90866, d_off=285015962, d_reclen=32, d_name="cpu15", d_type=DT_DIR}, {d_ino=90885, d_off=285016138, d_reclen=32, d_name="cpu16", d_type=DT_DIR}, {d_ino=90904, d_off=285016314, d_reclen=32, d_name="cpu17", d_type=DT_DIR}, {d_ino=90923, d_off=285016490, d_reclen=32, d_name="cpu18", d_type=DT_DIR}, {d_ino=90942, d_off=285016842, d_reclen=32, d_name="cpu19", d_type=DT_DIR}, {d_ino=90961, d_off=285017018, d_reclen=32, d_name="cpu20", d_type=DT_DIR}, {d_ino=90980, d_off=285017194, d_reclen=32, d_name="cpu21", d_type=DT_DIR}, {d_ino=90999, d_off=285017370, d_reclen=32, d_name="cpu22", d_type=DT_DIR}, {d_ino=91018, d_off=285017546, d_reclen=32, d_name="cpu23", d_type=DT_DIR}, {d_ino=91037, d_off=285017722, d_reclen=32, d_name="cpu24", d_type=DT_DIR}, {d_ino=91056, d_off=285017898, d_reclen=32, d_name="cpu25", d_type=DT_DIR}, {d_ino=91075, d_off=285018074, d_reclen=32, d_name="cpu26", d_type=DT_DIR}, {d_ino=91094, d_off=285018250, d_reclen=32, d_name="cpu27", d_type=DT_DIR}, {d_ino=91113, d_off=285018426, d_reclen=32, d_name="cpu28", d_type=DT_DIR}, {d_ino=91132, d_off=285018778, d_reclen=32, d_name="cpu29", d_type=DT_DIR}, {d_ino=91151, d_off=285018954, d_reclen=32, d_name="cpu30", d_type=DT_DIR}, {d_ino=91170, d_off=285019130, d_reclen=32, d_name="cpu31", d_type=DT_DIR}, {d_ino=91189, d_off=285019306, d_reclen=32, d_name="cpu32", d_type=DT_DIR}, {d_ino=91208, d_off=285019482, d_reclen=32, d_name="cpu33", d_type=DT_DIR}, {d_ino=91227, d_off=285019658, d_reclen=32, d_name="cpu34", d_type=DT_DIR}, {d_ino=91246, d_off=285019834, d_reclen=32, d_name="cpu35", d_type=DT_DIR}, {d_ino=91265, d_off=285020010, d_reclen=32, d_name="cpu36", d_type=DT_DIR}, {d_ino=91284, d_off=285020186, d_reclen=32, d_name="cpu37", d_type=DT_DIR}, {d_ino=91303, d_off=285020362, d_reclen=32, d_name="cpu38", d_type=DT_DIR}, {d_ino=91322, d_off=285020714, d_reclen=32, d_name="cpu39", d_type=DT_DIR}, {d_ino=91341, d_off=285020890, d_reclen=32, d_name="cpu40", d_type=DT_DIR}, {d_ino=91360, d_off=285021066, d_reclen=32, d_name="cpu41", d_type=DT_DIR}, {d_ino=91379, d_off=285021242, d_reclen=32, d_name="cpu42", d_type=DT_DIR}, {d_ino=91398, d_off=285021418, d_reclen=32, d_name="cpu43", d_type=DT_DIR}, {d_ino=91417, d_off=285021594, d_reclen=32, d_name="cpu44", d_type=DT_DIR}, {d_ino=91436, d_off=285021770, d_reclen=32, d_name="cpu45", d_type=DT_DIR}, {d_ino=91455, d_off=285021946, d_reclen=32, d_name="cpu46", d_type=DT_DIR}, {d_ino=91474, d_off=285022122, d_reclen=32, d_name="cpu47", d_type=DT_DIR}, {d_ino=91493, d_off=285022298, d_reclen=32, d_name="cpu48", d_type=DT_DIR}, {d_ino=91512, d_off=285022650, d_reclen=32, d_name="cpu49", d_type=DT_DIR}, {d_ino=91531, d_off=285022826, d_reclen=32, d_name="cpu50", d_type=DT_DIR}, {d_ino=91550, d_off=285023002, d_reclen=32, d_name="cpu51", d_type=DT_DIR}, {d_ino=91569, d_off=285023178, d_reclen=32, d_name="cpu52", d_type=DT_DIR}, {d_ino=91588, d_off=285023354, d_reclen=32, d_name="cpu53", d_type=DT_DIR}, {d_ino=91607, d_off=285023530, d_reclen=32, d_name="cpu54", d_type=DT_DIR}, {d_ino=91626, d_off=285023706, d_reclen=32, d_name="cpu55", d_type=DT_DIR}, {d_ino=91645, d_off=285023882, d_reclen=32, d_name="cpu56", d_type=DT_DIR}, {d_ino=91664, d_off=285024058, d_reclen=32, d_name="cpu57", d_type=DT_DIR}, {d_ino=91683, d_off=285024234, d_reclen=32, d_name="cpu58", d_type=DT_DIR}, {d_ino=91702, d_off=285024586, d_reclen=32, d_name="cpu59", d_type=DT_DIR}, {d_ino=91721, d_off=285024762, d_reclen=32, d_name="cpu60", d_type=DT_DIR}, {d_ino=91740, d_off=285024938, d_reclen=32, d_name="cpu61", d_type=DT_DIR}, {d_ino=91759, d_off=285025114, d_reclen=32, d_name="cpu62", d_type=DT_DIR}, {d_ino=91778, d_off=318580955, d_reclen=32, d_name="cpu63", d_type=DT_DIR}, {d_ino=47, d_off=385790491, d_reclen=32, d_name="power", d_type=DT_DIR}, {d_ino=57, d_off=661204875, d_reclen=40, d_name="vulnerabilities", d_type=DT_DIR}, {d_ino=46, d_off=718872595, d_reclen=32, d_name="modalias", d_type=DT_REG}, {d_ino=42, d_off=900028725, d_reclen=32, d_name="kernel_max", d_type=DT_REG}, {d_ino=40, d_off=1321717208, d_reclen=32, d_name="possible", d_type=DT_REG}, {d_ino=39, d_off=1412398250, d_reclen=32, d_name="online", d_type=DT_REG}, {d_ino=43, d_off=1431608070, d_reclen=32, d_name="offline", d_type=DT_REG}, {d_ino=44, d_off=1472641949, d_reclen=32, d_name="isolated", d_type=DT_REG}, {d_ino=38, d_off=1826905203, d_reclen=32, d_name="uevent", d_type=DT_REG}, {d_ino=45, d_off=1905639739, d_reclen=32, d_name="nohz_full", d_type=DT_REG}, {d_ino=197551, d_off=2084586514, d_reclen=32, d_name="microcode", d_type=DT_DIR}, {d_ino=41, d_off=2147483647, d_reclen=32, d_name="present", d_type=DT_REG}], 32768) = 2496
getdents(3, [], 32768)                  = 0
close(3)                                = 0
uname({sysname="Linux", nodename="SERVER", release="3.10.0-1062.1.2.el7.x86_64", version="#1 SMP Mon Sep 30 14:19:46 UTC 2019", machine="x86_64", domainname="houston"}) = 0
sched_getaffinity(0, 128, [0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63]) = 128
--- SIGFPE {si_signo=SIGFPE, si_code=FPE_INTDIV, si_addr=0x44d325} ---
+++ killed by SIGFPE (core dumped) +++
Floating point exception (core dumped)

 

 

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    1
Core(s) per socket:    32
Socket(s):             2
NUMA node(s):          8
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 49
Model name:            AMD EPYC 7502 32-Core Processor
Stepping:              0
CPU MHz:               1500.000
CPU max MHz:           2500.0000
CPU min MHz:           1500.0000
BogoMIPS:              5000.07
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             32K
L2 cache:              512K
L3 cache:              16384K
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15
NUMA node2 CPU(s):     16-23
NUMA node3 CPU(s):     24-31
NUMA node4 CPU(s):     32-39
NUMA node5 CPU(s):     40-47
NUMA node6 CPU(s):     48-55
NUMA node7 CPU(s):     56-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
 

 

 

TCE Open Date: 

Monday, December 9, 2019 - 09:10

Release date for 2019u6

$
0
0

I was wondering when Intel cluster studio XE version 2019 update 6 is scheduled for release?

Thanks

Michael

 

TCE Open Date: 

Monday, December 9, 2019 - 09:27

How can I download MPI Fortran Full Library (including statically linked)

$
0
0

I keep going in circles on the website trying to download the full MPI library including the statically linked libraries.  When I go to https://software.intel.com/en-us/mpi-library/choose-download/linux, I choose "register and download".  When I go to the next page (https://software.seek.intel.com/performance-libraries), it has a "welcome back <email address>", and allows me to click submit.

After clicking submit, it thinks for a second and then takes me to a page that says the following but does not have a download link, and I can't find a download link anywhere else on the site:

Thank you for activating your Intel® Performance Libraries product.

Access your support resources. Click here for technical support.

Intel takes your privacy seriously. Refer to Intel's Privacy Notice and Serial Number Validation Notice regarding the collection and handling of your personal information, the Intel product’s serial number and other information.

This was originally discussed in https://software.intel.com/en-us/comment/1949259, but it was accurately indicated that it belongs here.

TCE Open Date: 

Tuesday, December 10, 2019 - 07:00
Viewing all 930 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>