Pre-requirements

Coding

  • Basic knowledge of Python.

  • We won’t test your skills though.

  • Rather you will learn to understand the ecosystems and navigations for using Python on a HPC cluster.

See below for links to useful material if you need a refresher before the course. Level 1 should be enough.

PANDAS link

  • More important is how to work in Linux and especially Bash

Linux

Material for improving your programming skills

First level

The Carpentries teaches basic lab skills for research computing.

Second level

Code Refinery develops and maintains training material on software best practices for researchers that already write code.

  • Their material addresses all academic disciplines and tries to be as programming language-independent as possible.

  • Code refinery lessons

Third level

ENCCS (EuroCC National Competence Centre Sweden) is a national centre that supports industry, public administration and academia accessing and using European supercomputers. They give higher-level training of programming and specific software.

Understanding clusters

The HPC centers UPPMAX, HPC2N, LUNARC, and NSC

Four HPC centers

  • There are many similarities:

    • Login vs. calculation/compute nodes

    • Environmental module system with software hidden until loaded with module load

    • Slurm batch job and scheduling system

    • pip install procedure

  • … and small differences:

    • commands to load Python, Python packages

    • sometimes different versions of Python, etc.

    • slightly different flags to Slurm

  • … and some bigger differences:

    • UPPMAX has three different clusters

      • Rackham for general purpose computing on CPUs only

      • Snowy available for local projects and suits long jobs (< 1 month) and has GPUs

      • Bianca for sensitive data and has GPUs

  • HPC2N has Kebnekaise with GPUs

  • LUNARC has Cosmos with GPUs (and Cosmos-SENS)

  • NSC has several clusters
    • BerzeLiUs (AI/ML, NAISS)

    • Tetralith (NAISS)

    • Sigma (LiU local)

    • Freja (R&D, located at SMHI)

    • Nebula (MET Norway R&D)

    • Stratus (weather forecasts, located at NSC)

    • Cirrus (weather forecasts, located at SMHI)

    • We will be using Tetralith, which also has GPUs

  • Conda is recommended only for UPPMAX/LUNARC/NSC users

Warning

To distinguish these modules from the python modules that work as libraries we refer to the later ones as packages.

Briefly about the cluster hardware and system at UPPMAX, HPC2N, LUNARC, and NSC

What is a cluster?

  • Login nodes and calculations/computation nodes

  • A network of computers, each computer working as a node.

  • Each node contains several processor cores and RAM and a local disk called scratch.

_images/node.png
  • The user logs in to login nodes via Internet through ssh or Thinlinc.

    • Here the file management and lighter data analysis can be performed.

_images/nodes.png
  • The calculation nodes have to be used for intense computing.

Common features

  • Intel (and for HPC2N/LUNARC, also AMD) CPUs

  • Linux kernel

  • Bash shell

Hardware

Technology

Kebnekaise

Rackham

Snowy

Bianca

Cosmos

Tetralith

Cores/compute node

28 (72 for largemem, 128/256 for AMD Zen3/Zen4)

20

16

16

48

32

Memory/compute node

128-3072 GB

128-1024 GB

128-4096 GB

128-512 GB

256-512 GB

96-384 GB

GPU

NVidia V100, A100, A6000, L40s, H100, A40, AMD MI100

None

NVidia T4

NVidia A100

NVidia A100

NVidia T4

Overview of the UPPMAX systems

graph TB

  Node1 -- interactive --> SubGraph2Flow
  Node1 -- sbatch --> SubGraph2Flow
  subgraph "Snowy"
  SubGraph2Flow(calculation nodes) 
        end

        thinlinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
        terminal -- usr --> Node1
        terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
        Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow
        
        subgraph "Bianca"
        SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster)
        private -- interactive --> calcB(calculation nodes)
        private -- sbatch --> calcB
        end

        subgraph "Rackham"
        Node1[Login] -- interactive --> Node2[calculation nodes]
        Node1 -- sbatch --> Node2
        end

Overview of the HPC2N system

graph TB


        Terminal/ThinLinc -- usr --> Node1
        

        subgraph "Kebnekaise"
        Node1[Login] -- interactive --> Node2[compute nodes]
        Node1 -- sbatch --> Node2
        end

Overview of the LUNARC system

_images/cosmos-resources.png

Overview of the NSC systems