HPC clusters¶
The HPC centers UPPMAX, HPC2N, LUNARC, NSC and PDC¶
Five HPC centers
There are many similarities:
- Login vs. calculation/compute nodes
- Environmental module system with
software hidden until loaded with
module load
- Slurm batch job and scheduling system
… many small differences:
- commands to load R, Matlab and Julia and packages/libraries
- sometimes different versions of R, Matlab and Julia, etc.
- slightly different flags to Slurm
… and some bigger differences:
-
UPPMAX has three different clusters
- Rackham for general purpose computing on CPUs only
- Snowy available for local projects and suits long jobs (< 1 month) and has GPUs
- Bianca for sensitive data and has GPUs
-
HPC2N has Kebnekaise with GPUs
- LUNARC has Cosmos with GPUs (and Cosmos-SENS)
- NSC has several clusters
- BerzeLiUs (AI/ML, NAISS)
- Tetralith (NAISS)
- Sigma (LiU local)
- Freja (R&D, located at SMHI)
- Nebula (MET Norway R&D)
- Stratus (weather forecasts, located at NSC)
- Cirrus (weather forecasts, located at SMHI)
- We will be using Tetralith, which also has GPUs
- PDC has Dardel with AMD GPUs
Terminology: modules
We call the applications available via the module system modules. - HPC2N - LUNARC - NSC - PDC - UPPMAX
Briefly about the cluster hardware and system at UPPMAX, HPC2N, LUNARC, NSC and PDC¶
What is a cluster?
-
Login nodes and calculations/computation nodes
-
A network of computers, each computer working as a node.
-
Each node contains several processor cores and RAM and a local disk called scratch.
- The user logs in to login nodes via Internet through ssh or Thinlinc.
- Here the file management and lighter data analysis can be performed.
- The calculation nodes have to be used for intense computing.
Common features¶
- Linux kernel
- Bash shell
- x86-64 CPUs, some clusters with Intel processors and some with AMD.
- NVidia GPUs (HPC2N/LUNARC, also AMD) except for Dardel with AMD.
HPC Cluster | Kebnekaise | Rackham | Snowy | Bianca | COSMOS | Tetralith | Dardel |
---|---|---|---|---|---|---|---|
Cores/compute node | 28 (72 for largemem, 128/256 for AMD Zen3/Zen4) | 20 | 16 | 16 | 48 | 32 | 128 |
Memory/compute node | 128-3072 GB | 128-1024 GB | 128-4096 GB | 128-512 GB | 256-512 GB | 96-384 GB | 256-2048 GB |
GPU | NVidia V100, A100, A6000, L40s, H100, A40, AMD MI100 | None | NVidia T4 | NVidia A100 | NVidia A100 | NVidia T4 | four AMD Instinct™ MI250X á 2 GCDs |
Overview of the UPPMAX systems¶
graph TB
Node1 -- interactive --> SubGraph2Flow
Node1 -- sbatch --> SubGraph2Flow
subgraph "Snowy"
SubGraph2Flow(calculation nodes)
end
ThinLinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
Terminal/ThinLinc -- usr --> Node1
Terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow
subgraph "Bianca"
SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster)
private -- interactive --> calcB(calculation nodes)
private -- sbatch --> calcB
end
subgraph "Rackham"
Node1[Login] -- interactive --> Node2[calculation nodes]
Node1 -- sbatch --> Node2
end
Overview of the HPC2N system¶
graph TB
Terminal/ThinLinc -- usr --> Node1
subgraph "Kebnekaise"
Node1[Login] -- interactive --> Node2[compute nodes]
Node1 -- sbatch --> Node2
end