To run a hybrid MPI/OpenMP* program, follow these steps:
$ source vars.sh release
$ export I_MPI_PIN_DOMAIN=omp
This sets the process pinning domain size to be equal to OMP_NUM_THREADS. Therefore, if for example OMP_NUM_THREADS is equal to 4, each MPI process can create up to four threads within the corresponding domain (set of logical processors). If OMP_NUM_THREADS is not set, each node is treated as a separate domain, which allows as many threads per MPI process as there are cores.
$ mpirun -n 4 -genv OMP_NUM_THREADS=4 -genv I_MPI_PIN_DOMAIN=omp ./myprog
Intel® MPI Library Developer Reference, section Tuning Reference > Process Pinning > > Interoperability with OpenMP*.