GPUs¶
Learning outcomes
- Practice using the UPPMAX documentation
- I can see the GPU and memory usage of jobs
- I know the correct flags to utilize GPUs.
- I understand the GPU configuration on Bianca.
Hardware and flags¶
- 10 Nodes with 2 Nvidia A100 40 GB each.
- Nodelist : sens2025xxx-b[201-2010]
- All GPU nodes have at least 256 GB RAM (fat nodes) with 16 CPU cores and 2 GPUs per node.
- In order to avoid GPU misuse, a project cannot request more than 7 GPU nodes, in total.
- SBATCH flags:
#SBATCH -A sens2025xxx
#SBATCH -p node
#SBATCH -N 1
#SBATCH -C gpu
#SBATCH --gpus-per-node=2 #number of GPUs per node
#SBATCH -t 1:00:00
nvidia-smi
You can also ask for 1 gpu per node and few cores in order to use 2 gpus for different jobs:
#SBATCH -A sens2025xxx
#SBATCH -p core
#SBATCH -n 8
#SBATCH -C gpu
#SBATCH --gpus-per-node=1
#SBATCH -t 1:00:00
- Similarly for interactive session on GPU node, use the same
-C gpu --gpus-per-node=
flag.
GPU accessibility check¶
- Sanity check if CUDA is loaded properly by checking CUDA environment variable:
OR
OR check by loading torch:
module load python_ML_packages/3.9.5-gpu
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.get_device_properties(0)); print(torch.randn(1).cuda())"
output:
1.9.0+cu111
11.1
_CudaDeviceProperties(name='NVIDIA A100-PCIE-40GB', major=8, minor=0, total_memory=40326MB, multi_processor_count=108)
tensor([0.1014], device='cuda:0')
Profiling¶
-
Monitor GPU utilization with nvidia-smi:
Tips¶
- Use
--nv
flag while running your apptainer containers to use GPU on a GPU node - Correct CUDA modules gets automatically loaded if you use
python_ML_packages/3.9.5-gpu
- Various CUDA libraries are available in
module spider cuda