GPUs

Introduction

The SCC provides users with access to 10 GPU Nodes, each equipped with 2 NVIDIA H100 96GB GPUs. These GPUs can be requested as so-called "Generic Resources" (GRES) when submitting jobs via Slurm.

Available GPU Types

Because the NVIDIA H100 offers lots of processing power and VRAM, we expect that lots of jobs actually need much less ressources. We therefore decided to split the GPUs into multiple "virtual GPUs" using NVIDIA's Multi-Instance GPU (MIG) technology. This allows us to provide GPU processing power to more users at the same time and increases the overall utilization of the GPUs.

Currently, the following GPU "types" are available on the SCC:

Total Number GPU Type VRAM Compute Units
46 1g.12gb 12 GB 1
12 2g.24gb 24 GB 2
8 3g.47gb 47 GB 3
4 7g.94gb 94 GB 7

You can request any of these GPU types when submitting a job like this:

srun --gpus=1g.12gb:1 ...

Based on future actual usage patterns and user feedback, we might adjust the number and types of available GPUs.

Compiling CUDA aware Software

CUDA is NVIDIA's interface for computation on the GPUs. If you use precompiled software that uses CUDA, for instance python modules or scripts that use cupy, this should work out of the box.

If you need to compile your own software using CUDA, you can use the CUDA SDK that is shipped by default.

In order to use it, you need to load the respective lmod module:

module purge  # Just a safeguard in case you have other modules loaded
module load gnu15 nvhpc

Now your ./configure or cmake workflow should find the SDK with the cuda libraries.

Mixing CUDA and MPI

Some applications need both CUDA and MPI. Take a look here regarding how to combine these.