HPC clusters¶

The HPC centers UPPMAX, HPC2N, LUNARC, NSC and PDC¶

Five HPC centers

There are many similarities:

Login vs. calculation/compute nodes
Environmental module system with software hidden until loaded with module load
Slurm batch job and scheduling system

… many small differences:

commands to load R, Matlab and Julia and packages/libraries
sometimes different versions of R, Matlab and Julia, etc.
slightly different flags to Slurm

… and some bigger differences:

UPPMAX has three different clusters
- Rackham for general purpose computing on CPUs only
- Snowy available for local projects and suits long jobs (< 1 month) and has GPUs
- Bianca for sensitive data and has GPUs
HPC2N has Kebnekaise with GPUs
LUNARC has Cosmos with GPUs (and Cosmos-SENS)
NSC has several clusters
- BerzeLiUs (AI/ML, NAISS)
- Tetralith (NAISS)
- Sigma (LiU local)
- Freja (R&D, located at SMHI)
- Nebula (MET Norway R&D)
- Stratus (weather forecasts, located at NSC)
- Cirrus (weather forecasts, located at SMHI)
- We will be using Tetralith, which also has GPUs
PDC has Dardel with AMD GPUs

Terminology: modules

We call the applications available via the module system modules.

Briefly about the cluster hardware and system at UPPMAX, HPC2N, LUNARC, NSC and PDC¶

What is a cluster?

Login nodes and calculations/computation nodes
A network of computers, each computer working as a node.
Each node contains several processor cores and RAM and a local disk called scratch.

A node

The user logs in to login nodes via Internet through ssh or Thinlinc.
- Here the file management and lighter data analysis can be performed.

Multiple nodes

The calculation nodes have to be used for intense computing.

Common features¶

Linux kernel
Bash shell
x86-64 CPUs, some clusters with Intel processors and some with AMD.
NVidia GPUs (HPC2N/LUNARC, also AMD) except for Dardel with AMD.

HPC Cluster	Kebnekaise	Rackham	Snowy	Bianca	COSMOS	Tetralith	Dardel
Cores/compute node	28 (72 for largemem, 128/256 for AMD Zen3/Zen4)	20	16	16	48	32	128
Memory/compute node	128-3072 GB	128-1024 GB	128-4096 GB	128-512 GB	256-512 GB	96-384 GB	256-2048 GB
GPU	NVidia V100, A100, A6000, L40s, H100, A40, AMD MI100	None	NVidia T4	NVidia A100	NVidia A100	NVidia T4	four AMD Instinct™ MI250X á 2 GCDs

Overview of the UPPMAX systems¶

graph TB Node1 -- interactive --> SubGraph2Flow Node1 -- sbatch --> SubGraph2Flow subgraph "Snowy" SubGraph2Flow(calculation nodes) end ThinLinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow Terminal/ThinLinc -- usr --> Node1 Terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow subgraph "Bianca" SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster) private -- interactive --> calcB(calculation nodes) private -- sbatch --> calcB end subgraph "Rackham" Node1[Login] -- interactive --> Node2[calculation nodes] Node1 -- sbatch --> Node2 end

Overview of the HPC2N system¶

graph TB Terminal/ThinLinc -- usr --> Node1 subgraph "Kebnekaise" Node1[Login] -- interactive --> Node2[compute nodes] Node1 -- sbatch --> Node2 end

Overview of the LUNARC system¶

COSMOS resources

Overview of the NSC systems¶

Tetralith image