Running Python in batch mode

Questions

  • What is a batch job?

  • What are some important commands regarding batch jobs?

  • How to make a batch job?

Objectives

  • Short introduction to SLURM scheduler commands

  • Show structure of a batch script

  • Try example

Compute allocations in this workshop

  • Rackham: uppmax2025-2-296

  • Kebnekaise: hpc2n2025-076

  • Cosmos: lu2025-7-34

  • Tetralith: naiss2025-22-403

  • Dardel: naiss2025-22-403

Storage space for this workshop

  • Rackham: /proj/hpc-python-uppmax

  • Kebnekaise: /proj/nobackup/hpc-python-spring

  • Cosmos: /lunarc/nobackup/projects/lu2024-17-44

  • Tetralith: /proj/hpc-python-spring-naiss

  • Dardel: /cfs/klemming/projects/snic/hpc-python-spring-naiss

Reservation

Include with #SBATCH --reservation==<reservation-name>. On UPPMAX it is “magnetic” and so follows the project ID without you having to add the reservation name.

NOTE as there is only one/a few nodes reserved, you should NOT use the reservations for long jobs as this will block their use for everyone else. Using them for short test jobs is what they are for.

  • UPPMAX
    • the reservation is “magnetic” and so will be used automatically

  • HPC2N
    • hpc-python-fri for cpu on Friday

    • hpc-python-mon for cpu on Monday

    • hpc-python-tue for gpu on Tuesday

  • LUNARC
    • py4hpc_day1 for cpu on Thursday

    • py4hpc_day2 for cpu on Friday

    • py4hpc_day3 for cpu on Monday

    • py4hpc_day4 for cpu on Tuesday

    • py4hpc_gpu for gpu on Tuesday

Running your programs and scripts on UPPMAX, HPC2N, LUNARC, NSC, and PDC

As mentioned under interactive jobs, any longer, resource-intensive, or parallel jobs must be run through a batch script or in an interactive session on allocated compute nodes.

A batch job is not interactive, so you cannot make changes to the job while it is running.

In order to run a batch job, you need to create and submit a SLURM submit file (also called a batch submit file, a batch script, or a job script).

Guides and documentation at:

Workflow

  • Write a batch script

    • Inside the batch script you need to load the modules you need (Python, Python packages, any prerequisites, … )

    • Possibly activate an isolated/virtual environment to access own-installed packages

    • Ask for resources depending on if it is a parallel job or a serial job, if you need GPUs or not, etc.

    • Give the command(s) to your Python script

  • Submit batch script with sbatch <my-python-script.sh>

Common file extensions for batch scripts are .sh or .batch, but they are not necessary. You can choose any name that makes sense to you.

Useful commands to the batch system

  • Submit job: sbatch <jobscript.sh>

  • Get list of your jobs: squeue -u <username>

  • Check on a specific job: scontrol show job <job-id>

  • Delete a specific job: scancel <job-id>

  • Useful info about a job: sacct -l -j <job-id> | less -S

  • Url to a page with info about the job (Kebnekaise only): job-usage <job-id>

Example Python batch scripts

Serial code

Hint

Type along!

This first example shows how to run a short, serial script. The batch script (named run_mmmult.sh) can be found in the directory: - If you did git clone https://github.com/UPPMAX/HPC-python.git

  • HPC-Python/Exercises/examples/<center>, where <center> is hpc2n, uppmax, lunarc, nsc, or pdc.

  • The Python script is in HPC-Python/Exercises/examples/programs and is named mmmult.py.

  • If you did wget https://github.com/UPPMAX/HPC-python/raw/refs/heads/main/exercises.tar.gz and then tar -xvzf exercises.tar.gz
    • exercises/examples/<center>, where <center> is hpc2n, uppmax, lunarc, nsc, or pdc.

    • The Python script is in exercises/examples/programs and is named mmmult.py.

  1. The batch script is run with sbatch run_mmmult.sh.

  2. Try type squeue -u <username> to see if it is pending or running.

  3. When it has run, look at the output with nano slurm-<jobid>.out.

Short serial example script for Rackham. Loading Python 3.11.8. Numpy is preinstalled and does not need to be loaded.

#!/bin/bash -l
#SBATCH -A uppmax2025-2-296 # Change to your own after the course
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core

# Load any modules you need, here Python 3.11.8.
module load python/3.11.8

# Run your Python script
python mmmult.py

Serial code + self-installed package in virt. env.

Hint

Don’t type along! There are other examples like this with your self-installed virtual environment.

Short serial example for running on Rackham. Loading python/3.11.8 + using any Python packages you have installed yourself with venv.

#!/bin/bash -l
#SBATCH -A uppmax2025-2-296 # Change to your own after the course
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core

# Load any modules you need, here for python 3.11.8
module load python/3.11.8

# Activate your virtual environment.
source /proj/hpc-python-uppmax/<user-dir>/<path-to-virtenv>/<virtenv>/bin/activate

# Run your Python script (remember to add the path to it
# or change to the directory with it first)
python <my_program.py>

Job arrays

This is a very simple example of how to run a Python script with a job array.

Hint

Do not type along! You can try it later during exercise time if you want!

# import sys library (we need this for the command line args)
import sys

# print task number
print('Hello world! from task number: ', sys.argv[1])

MPI code

We will talk more about parallel code in the session “Parallel computing with Python” tomorrow. This is a simple example of a batch script to run an MPI code.

#!/bin/bash
# The name of the account you are running in, mandatory.
#SBATCH -A NAISSXXXX-YY-ZZZ
# Request resources - here for eight MPI tasks
#SBATCH -n 8
# Request runtime for the job (HHH:MM:SS) where 168 hours is the maximum. Here asking for 15 min.
#SBATCH --time=00:15:00

# Clear the environment from any previously loaded modules
module purge > /dev/null 2>&1

# Load the module environment suitable for the job, it could be more or
# less, depending on other package needs. This is for a simple job needing
# mpi4py. Remove # from the relevant center line

# Rackham: here mpi4py are not installed and you need a virtual env.
# module load python/3.11.8 python_ML_packages/3.11.8-cpu openmpi/4.1.5
# python -m venv mympi4py
# source mympi4py/bin/activate
# pip install mpi4py

# Kebnekaise
# ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 OpenMPI/4.1.5 mpi4py/3.1.4

# Cosmos
# ml GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 OpenMPI/4.1.6 mpi4py/3.1.5

# Tetralith
# ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/11.3.0 OpenMPI/4.1.4 Python/3.10.4 SciPy-bundle/2022.05

# Dardel
# ml cray-python/3.11.7

# And finally run the job - use srun for MPI jobs, but not for serial jobs
srun ./my_mpi_program

GPU code

We will talk more about Python on GPUs in the section “Using GPUs with Python”. This is just an example.

Hint

If you want, you can try running it now, or wait for tomorrow!

Short GPU example for running compute.py on Snowy.

#!/bin/bash -l
#SBATCH -A uppmax2025-2-296
#SBATCH -t 00:10:00
#SBATCH --exclusive
#SBATCH -n 1
#SBATCH -M snowy
#SBATCH --gres=gpu=1

# Set a path where the example programs are installed.
# Change the below to your own path to where you placed the example programs
MYPATH=/proj/hpc-python-uppmax/<userdir>/HPC-python/Exercises/examples/programs/

# Load any modules you need, here loading python 3.11.8 and the ML packages
module load uppmax
module load python/3.11.8
module load python_ML_packages/3.11.8-gpu

# Run your code
python $MYPATH/compute.py

Exercises

Run the first serial example script (the one that was used to run mmmult.py) from further up on the page for this short Python code (sum-2args.py) instead

import sys

x = int(sys.argv[1])
y = int(sys.argv[2])

sum = x + y

print("The sum of the two numbers is: {0}".format(sum))

Remember to give the two arguments to the program in the batch script.

How to run a Pandas and matplotlib example as a batch job.

How you might do it interactively

  1. Load Python and prerequisites (and activate any needed virtual environments)
    • UPPMAX: ml python/3.11.8

    • HPC2N: ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 Tkinter/3.11.3

    • LUNARC: ml GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 matplotlib/3.8.2 Tkinter/3.11.5

    • NSC: ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/11.3.0 OpenMPI/4.1.4 matplotlib/3.5.2 SciPy-bundle/2022.05 Tkinter/3.10.4

    • PDC:
      • ml cray-python/3.11.7

      • python -m venv –system-site-packages mymatplotlib

      • source mymatplotlib/bin/activate

      • pip install matplotlib

  2. Start Python (python) in the <path-to>/Exercises/examples/programs directory

  3. Run these lines:

    • At UPPMAX and PDC

    import pandas as pd
    import matplotlib.pyplot as plt
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.show()
    
    • At HPC2N, LUNARC, and NSC

    import pandas as pd
    import matplotlib
    import matplotlib.pyplot as plt
    matplotlib.use('TkAgg')
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.show()
    

CHALLENGE: How would you do it so you could run as a batch script?

  • Hint: The main difference is that here we cannot open the plot directly, but have to save to a file instead, for instance with plt.savefig("myplot.png").

  • Make the change to the Python script and then make a batch script to run it! You can find solutions in the exercises directory, for each centre.

NOTE We will not talk about pandas and matplotlib otherwise. You already learned about them earlier.

Submit with sbatch <batch-script.sh>.

The batch scripts can be found in the directories for hpc2n, uppmax, lunarc, nsc, and pdc under Exercises/examples/, and is named pandas_matplotlib-batch.sh .

Keypoints

  • The SLURM scheduler handles allocations to the calculation nodes

  • Batch jobs runs without interaction with user

  • A batch script consists of a part with SLURM parameters describing the allocation and a second part describing the actual work within the job, for instance one or several Python scripts.

    • Remember to include possible input arguments to the Python script in the batch script.