Good Afternoon Friend,
I am trying to figure out how to map threads to specific OS Processor IDs per the following article: https://software.intel.com/en-us/articles/using-kmp-affinity-to-create-o...
I have a cluster with dual Intel® Xeon® E5-2650 CPUs (8 physical cores each, 16 HT CPUs each). Let's say I have 4 MPI ranks that run 4 OpenMP threads each, and that I would like them to map to logical CPUs 0-15 (some of our users want to allow one thread to dominate each physical core). Here is how I am attempting to do this:
[frenchrd@rhea16 /lustre/atlas/scratch/frenchrd/stf007]$ GOMP_CPU_AFFINITY="0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" OMP_NUM_THREADS=4 mpirun --cpus-per-proc 4 --bind-to-core -report-bindings -n 4 ./hello_world.exe
This produces the following output:
[rhea16:107285] MCW rank 0 bound to socket 0[core 0-3]: [B B B B . . . .][. . . . . . . .] [rhea16:107285] MCW rank 1 bound to socket 0[core 4-7]: [. . . . B B B B][. . . . . . . .] [rhea16:107285] MCW rank 2 bound to socket 1[core 0-3]: [. . . . . . . .][B B B B . . . .] [rhea16:107285] MCW rank 3 bound to socket 1[core 4-7]: [. . . . . . . .][. . . . B B B B] OMP: Warning #123: Ignoring invalid OS proc ID 0. OMP: Warning #123: Ignoring invalid OS proc ID 1. OMP: Warning #123: Ignoring invalid OS proc ID 2. OMP: Warning #123: Ignoring invalid OS proc ID 3. OMP: Warning #123: Ignoring invalid OS proc ID 4. OMP: Warning #123: Ignoring invalid OS proc ID 5. OMP: Warning #123: Ignoring invalid OS proc ID 6. OMP: Warning #123: Ignoring invalid OS proc ID 7. OMP: Warning #123: Ignoring invalid OS proc ID 0. OMP: Warning #123: Ignoring invalid OS proc ID 1. OMP: Warning #123: Ignoring invalid OS proc ID 2. OMP: Warning #123: Ignoring invalid OS proc ID 3. OMP: Warning #123: Ignoring invalid OS proc ID 4. OMP: Warning #123: Ignoring invalid OS proc ID 5. OMP: Warning #123: Ignoring invalid OS proc ID 6. OMP: Warning #123: Ignoring invalid OS proc ID 7. OMP: Warning #123: Ignoring invalid OS proc ID 8. OMP: Warning #123: Ignoring invalid OS proc ID 9. OMP: Warning #123: Ignoring invalid OS proc ID 10. OMP: Warning #123: Ignoring invalid OS proc ID 12. OMP: Warning #123: Ignoring invalid OS proc ID 13. OMP: Warning #123: Ignoring invalid OS proc ID 14. OMP: Warning #123: Ignoring invalid OS proc ID 15. OMP: Warning #123: Ignoring invalid OS proc ID 11. Rank 2 | Thread 0 | CPU 8 Rank 2 | Thread 2 | CPU 10 Rank 2 | Thread 1 | CPU 9 Rank 2 | Thread 3 | CPU 11 Global sum: 0.000000 Rank 3 | Thread 0 | CPU 12 Rank 3 | Thread 1 | CPU 13 Rank 3 | Thread 3 | CPU 15 Rank 3 | Thread 2 | CPU 14 Global sum: 0.000000 OMP: Warning #123: Ignoring invalid OS proc ID 4. OMP: Warning #123: Ignoring invalid OS proc ID 5. OMP: Warning #123: Ignoring invalid OS proc ID 6. OMP: Warning #123: Ignoring invalid OS proc ID 7. OMP: Warning #123: Ignoring invalid OS proc ID 8. OMP: Warning #123: Ignoring invalid OS proc ID 9. OMP: Warning #123: Ignoring invalid OS proc ID 10. OMP: Warning #123: Ignoring invalid OS proc ID 11. OMP: Warning #123: Ignoring invalid OS proc ID 12. OMP: Warning #123: Ignoring invalid OS proc ID 13. OMP: Warning #123: Ignoring invalid OS proc ID 14. OMP: Warning #123: Ignoring invalid OS proc ID 15. OMP: Warning #123: Ignoring invalid OS proc ID 0. OMP: Warning #123: Ignoring invalid OS proc ID 1. OMP: Warning #123: Ignoring invalid OS proc ID 2. OMP: Warning #123: Ignoring invalid OS proc ID 3. OMP: Warning #123: Ignoring invalid OS proc ID 8. OMP: Warning #123: Ignoring invalid OS proc ID 9. OMP: Warning #123: Ignoring invalid OS proc ID 10. OMP: Warning #123: Ignoring invalid OS proc ID 11. OMP: Warning #123: Ignoring invalid OS proc ID 12. OMP: Warning #123: Ignoring invalid OS proc ID 13. OMP: Warning #123: Ignoring invalid OS proc ID 14. OMP: Warning #123: Ignoring invalid OS proc ID 15. Rank 1 | Thread 0 | CPU 4 Rank 1 | Thread 1 | CPU 5 Rank 1 | Thread 2 | CPU 6 Rank 1 | Thread 3 | CPU 7 Global sum: 0.000000 Rank 0 | Thread 1 | CPU 1 Rank 0 | Thread 0 | CPU 0 Rank 0 | Thread 2 | CPU 2 Rank 0 | Thread 3 | CPU 3 Global sum: 0.000000
It does appear that the threads are bound to the logical CPUs that I was aiming for, so that is exciting. However, I would like to get a better understanding for what is causing the errors. There are 48 separate errors, so I am *guessing* that each of the 4 ranks is complaining about being issued a GOMP_CPU_AFFINITY variable containing the 4 CPUs it is responsible for plus 12 that it isn't. So for example, perhaps rank 1 issues 12 warnings for CPUs 0,1,2,3, and CPUs 8,9,10,11,12,13,14,15.
Does that sound reasonable? Do I need to figure out how to provide separate versions of the GOMP_CPU_AFFINITY variable to each rank? Or is there a way I can just... get OMP to turn the warnings off since it does seem to be doing what I intended?
Thanks very much for your help, I certainly appreciate your time.