Please note that this documentation is not an MPI tutorial. If you are not a developer, you most likely
only need to compile code.
If you need more in-depth knowledge about what is going on, we need to refer you to your favorite search engine.
Introduction
MPI (Message Passing Interface) is a protocol that allows for a more low level way of parallelizing workload than just splitting up
the load into multiple jobs that are independent of each other.
To be more precise, it allows subprocesses to share and exchange data, even when these subprocesses are distributed over multiple nodes. The SCC uses a dedicated Infiniband interconnect between all nodes to ensure that the data transfer is fast.
Historically, MPI implementations have also taken care of pinning the processes to a fixed set of CPU cores to increase efficiency. However, the slurm daemon
on the SCC already assigns the respective cores to the job. If the MPI implementation tries to do it itself, your job is going to fail.
Choosing an MPI implementation
Using the SCC provided MPI Implementations
Currently, the SCC provides three of the four major MPI implementations via the modules system:
- MPICH
- MVAPICH
- OpenMPI 5
We are actively working on providing the Intel implementation as well.
In order to use one of these you load the respective module alongside the compiler:
module purge # We always purge to make sure not other modules are loaded
module load gnu15 mpich # Or mvapich or openmpi5
You can now compile and link your MPI program just as normal, usually using the mpicc wrapper.
Compiling your own MPI Implementation
You can, of course, compile and then link your own MPI implementation using EasyBuild, spack or whatever tool(chain) you prefer.
Just make sure you make the implementation slurm-aware using the slurm instance that is shipped with the SCC. For instance, when using spack, slurm and munge need to be added as externals.
Running your MPI application
In order to run your MPI application, you write an sbatch file almost as usual.
The only difference is that you cannot call the executable directly. Instead, you supply it to a so-called wrapper command.
This wrapper command basically takes care of starting the correct amount of processes on the node(s) and start up the communication between these processes.
When you look at tutorial, most of these are going to use either mpirun or mpiexec. These might work out of the box on the SCC for you as well, depending on how well slurm is integrated into your specific MPI Implementation.
However, you might end up having to specify paramaters for setting up the core pinning, infiniband communication etc.
Luckily, slurm provides a nice method to do all the heavy lifting for you by just using srun --mpi=pmix when using OpenMPI5 and srun --mpi=pmi2 when using
MPICH or MVAPICH instead.
So, supposed we have an executable called hello.x, the sbatch file would look like this:
#!/bin/bash
#SBATCH --tasks-per-node=4
#SBATCH --mem=2G
#SBATCH --nodes 2
#SBATCH --time=512
#SBATCH --export=ALL
module purge
module load gnu15 openmpi # Or whatever mpi implementation you used for compiling
srun --mpi=pmix hello.x # or --mpi=pmi2 when using MPICH or MVAPICH
Combining MPI and CUDA
When working with CUDA enabled programs, you most likely need the nvcc wrapper. Unfortunately, the default
nvhpc module come with an MPI implementation that is not compatible with slurm.
You therefore need to tell nvhpc not to bring their MPI and compiler implementation.
This is something that has worked well in tests:
nvcc -ccbin mpicc -O3 hello.cu -o hello.x -lcudart -lgcc -lstdc++