The example in this section shows you one of the ways to change a legacy program to effectively use the advantages of the
MPI_THREAD_SPLIT threading model.
In the original code (thread_split.cpp), the functions
work_portion_1(),
work_portion_2(),
and work_portion_3() represent a CPU load that modifies the content of the memory pointed to by the
in and
out pointers. In this particular example, these functions perform correctness checking of the
MPI_Allreduce() function.
Changes Required to Use the OpenMP* Threading Model
- To run MPI functions in a multithreaded environment,
MPI_Init_thread() with the argument equal to
MPI_THREAD_MULTIPLE must be called instead of
MPI_Init().
- According to the
MPI_THREAD_SPLIT model, in each thread you must execute MPI operations over the communicator specific to this thread only. So, in this example, the
MPI_COMM_WORLD communicator must be duplicated several times so that each thread has its own copy of
MPI_COMM_WORLD.
NOTE: The limitation is that communicators must be used in such a way that the thread with
thread_id n on one node communicates only with the thread with
thread_id m on the other. Communications between different threads (thread_id n on one node,
thread_id m on the other) are not supported.
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the OpenMP level must be combined.
- Check that the runtime sets up a reasonable affinity for OpenMP threads. Typically, the OpenMP runtime does this out of the box, but sometimes, setting up the
OMP_PLACES=cores environment variable might be necessary for optimal multi-threaded MPI performance.
Changes Required to Use the POSIX Threading Model
- To run MPI functions in a multithreaded environment,
MPI_Init_thread() with the argument equal to
MPI_THREAD_MULTIPLE must be called instead of
MPI_Init().
- You must execute MPI collective operation over a specific communicator in each thread. So the duplication of
MPI_COMM_WORLD should be made, creating a specific communicator for each thread.
- The info key
thread_id must be properly set for each of the duplicated communicators.
NOTE: The limitation is that communicators must be used in such a way that the thread with
thread_idn on one node communicates only with the thread with
thread_idm on the other. Communications between different threads (thread_idn on one node,
thread_idm on the other) are not supported.
- The data to transfer must be split so that each thread handles its own portion of the input and output data.
- The barrier becomes a two-stage one: the barriers on the MPI level and the POSIX level must be combined.
- The affinity of POSIX threads can be set up explicitly to reach optimal multithreaded MPI performance.