Quantcast
Viewing all articles
Browse latest Browse all 930

Incorrect program or MPI implementation bug?

Hi,

Below is a simple reproduction case for the issue we're facing:

#include "stdio.h"
#include "mpi.h"
#include "stdlib.h"

int main(int argc, char* argv[]) {
    int rank;
    MPI_Group group;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_group(MPI_COMM_WORLD, &group);

    if (rank == 0) {
        printf("rank 0: about to send\n");
        MPI_Ssend(NULL, 0, MPI_INT, 1, 0, MPI_COMM_WORLD);
        printf("rank 0: send completed\n");
    } else {
        MPI_Request req[2];
        int which;

        MPI_Isend(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[0]);
        MPI_Irecv(NULL, 0, MPI_INT, 0, 0, MPI_COMM_WORLD, &req[1]);

        MPI_Waitany(2, req, &which, MPI_STATUS_IGNORE);

        if (which == 0) {
            printf("rank 1: send succeeded; cancelling receive request\n");
            MPI_Cancel(&req[1]);
            MPI_Wait(&req[1], MPI_STATUS_IGNORE);
        } else {
            printf("rank 1: receive succeeded; cancelling send request\n");
            MPI_Cancel(&req[0]);
            MPI_Wait(&req[0], MPI_STATUS_IGNORE);
        }
    }

    MPI_Finalize();
    return 0;
}

This program outputs the following, after which it hangs indefinitely:

rank 0: about to send
rank 1: send succeeded; cancelling receive request

I understand that this is caused by the "eager completion" of MPI_Isend() on rank 1. Also, I understand that the expected behaviour of a program that initiates an unmatched operation is undefined. However, I don't believe this is the case here, as I do eventually call MPI_Cancel() on the request. If that was not enough, then wouldn't that imply that a program that simply does MPI_Isend(...); MPI_Cancel(...); MPI_Wait(...); is also incorrect?

I also noticed that changing the MPI_Isend() into MPI_Issend() makes the program work as expected:

rank 0: about to send
rank 0: send completed
rank 1: receive succeeded; cancelling send request

So, to keep it short, my questions are:

  1. Is the initial (MPI_Isend()) version of my program an incorrect MPI program, whose behaviour is undefined?
  2. If so, then could you please explain why and point me to the relevant section of the MPI standard or any other resources that would clarify these matters for me?
  3. Is the MPI_Issend() version of my program also incorrect?
  4. If MPI_Issend() still doesn't make the program correct, can I at least be sure that, with the Intel implementation, it will always work as expected? Or is it just a coincidence that it does?

Many thanks to anyone willing to help me with this!

- Adrian


Viewing all articles
Browse latest Browse all 930

Trending Articles