Running Python in batch mode

Questions

  • Can I run more advanced Python scripts as batch jobs?

  • Can you do matplotlib or pandas as batch jobs?

  • What about using virtual environments in batch jobs?

Objectives

  • Use a virtual environment in a batch script.

  • Show how a Python code with pandas and matplotlib can be transformed to run in a batch script.

  • Some examples to try.

Compute allocations, storage space, reservations in this workshop

Running your programs and scripts on UPPMAX, HPC2N, LUNARC, C3SE, NSC, and PDC

As mentioned Friday, in the introduction to Slurm and batch jobs:

  • Any longer, resource-intensive, or parallel jobs must be run through a batch script or in an interactive session on allocated compute nodes.

  • A batch job is not interactive, so you cannot make changes to the job while it is running.

  • In order to run a batch job, you need to create and submit a SLURM submit file (also called a batch submit file, a batch script, or a job script).

“Recap: Useful commands to the batch system”

  • Submit job: sbatch <jobscript.sh>

  • Get list of your jobs: squeue -u <username> or squeue --me

  • Check on a specific job: scontrol show job <job-id>

  • Delete a specific job: scancel <job-id>

  • Useful info about a job: sacct -l -j <job-id> | less -S

Example Python batch scripts

Friday we looked at a simple Python serial code run as a batch script. There are many other situations:

  • Python code needing self-installed packages in virtual environment

  • Python code requiring tweaking before running as a batch job

  • Python code that is parallel

  • Python code that needs GPUs

Today we will look at some of these situations. The GPU example will be covered tomorrow where we will also talk about parallelism (today that will only be shown with a small batch script template).

Serial code + self-installed package in virt. env.

Hint

Don’t type along! This just shows how you would activate and use a virtual environment in a batch script.

Short serial example for running on Pelle. We are loading Python 3.11.5 and a compatible SciPy-bundle and Python-bundle-PyPi. This gives us access to packages like scipy, numpy, pandas, seaborn. PyTorch and matplotlib are their own modules and only available for Python 3.12.3.

The important thing is to load the SAME modules you used in the virtual environment you have installed the needed packages in.

#!/bin/bash -l
#SBATCH -A uppmax2025-2-393 # Change to your own after the course
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core

# Load any modules you need, here for Python 3.11.5 and a compatible SciPy-bundle and a compatible Python-bundle-PyPi.
module load Python/3.11.5-GCCcore-13.2.0
module load SciPy-bundle/2023.11-gfbf-2023b
module load Python-bundle-PyPI/2023.10-GCCcore-13.2.0

# Activate your virtual environment, which you previously created with the above modules loaded.
source /proj/hpc-python-uppmax/<user-dir>/<path-to-virtenv>/<virtenv>/bin/activate

# Run your Python script (remember to add the path to it
# or change to the directory with it first)
python <my_program.py>

MPI code

We will talk more about parallel code in the session “Parallel computing with Python” tomorrow. This is a simple example of a batch script to run an MPI code.

Short MPI example for running on Tetralith.

#!/bin/bash
# Change to your own project account after the course
#SBATCH -A naiss2025-22-934
# Asking for 10 min
#SBATCH -t 00:10:00
# ask for 32 cores here, modify for your needs.
# Aim to use multiples of 32 for larger jobs
#SBATCH -n 32
# name output and error file
#SBATCH -o mpi_process_%j.out
#SBATCH -e mpi_process_%j.err

# Load Python and mpi4py
ml purge > /dev/null 2>&1
ml buildtool-easybuild/4.8.0-hpce082752a2  GCC/13.2.0  OpenMPI/4.1.6 mpi4py/3.1.5

# Run your mpi_executable
mpirun -np 32 python integration2d_mpi.py

Tweak for batch

Some codes need to be tweaked a little bit to run under a batch job instead of interactively, for instance. Examples could be:

  • They are querying for input during running

  • They are creating plots and open them

In both cases the codes need to be rewritten (more or less), depending on what is needed:

  • Rewrite so they can take input from a file/dataset, or from arguments given when starting the run

  • Rewrite so the plots are saved to file instead of being opened directly

Simple example of option 1

Run the serial example script from Friday (https://uppmax.github.io/HPC-python/day2/basic_batch_slurm.html#simple-example-batch-script - the one that was used to run mmmult.py) but with this code (sum-2args.py) instead

import sys

x = int(sys.argv[1])
y = int(sys.argv[2])

sum = x + y

print("The sum of the two numbers is: {0}".format(sum))

Remember to give the two arguments to the program in the batch script.

Simple example of option 2

How to run a Pandas and matplotlib example as a batch job.

Let us first see how you might do it interactively, from the command line

You need to open a terminal window either in ThinLinc, on a DesktopOnDemand, or with regular ssh -Y <username|domain> first!

  1. Load Python and prerequisites (and activate any needed virtual environments)
    • UPPMAX: ml Python/3.12.3-GCCcore-13.3.0 SciPy-bundle/2024.05-gfbf-2024a Python-bundle-PyPI/2024.06-GCCcore-13.3.0 matplotlib/3.9.2-gfbf-2024a

    • HPC2N: ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 Tkinter/3.11.3

    • LUNARC: ml GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 matplotlib/3.8.2 Tkinter/3.11.5

    • NSC: buildtool-easybuild/4.9.4-hpc71cbb0050 GCC/13.2.0 matplotlib/3.8.2 SciPy-bundle/2023.11 Tkinter/3.11.5

    • PDC:
      • ml cray-python/3.11.7

      • python -m venv –system-site-packages mymatplotlib

      • source mymatplotlib/bin/activate

      • pip install matplotlib

    • C3SE: module load matplotlib/3.10.5-gfbf-2025b (Loads Python/3.13.5, SciPy-buncle, Python-bundle-PyPi, Tkinter, etc.)

  1. Start Python (python) in the <path-to>/Exercises/examples/programs directory

  2. Run these lines:

    • At PDC

    import pandas as pd
    import matplotlib.pyplot as plt
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.show()
    
    • At UPPMAX, HPC2N, LUNARC, and C3SE

    import pandas as pd
    import matplotlib
    import matplotlib.pyplot as plt
    matplotlib.use('TkAgg')
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.show()
    
    • At NSC

      There is a problem with the Tkinter and matplotlib version causing the backend not to work for GUI so you cannot do graphics except saving it to a file and opening it afterwards (with eog for instance, if you have logged in with -X or -Y). You can use this script to save it to a file:

    import pandas as pd
    import matplotlib
    import matplotlib.pyplot as plt
    matplotlib.use('Agg')
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.savefig("myplot.png")
    

CHALLENGE: How would you do it so you could run as a batch script?

  • Hint: The main difference is that here we cannot open the plot directly, but have to save to a file instead, for instance with plt.savefig("myplot.png").

  • Make the change to the Python script and then make a batch script to run it! You can find solutions in the exercises directory, for each centre.

NOTE We will not talk about pandas and matplotlib otherwise. You already learned about them earlier.

Submit with sbatch <batch-script.sh>.

The batch scripts can be found in the exercises directories for day3 for hpc2n, uppmax, lunarc, nsc, pdc, and c3se, and is named pandas_matplotlib-batch.sh .

Keypoints

  • Remember to include possible input arguments to the Python script in the batch script.

  • We saw an example of a batch script where we activated a virtual environment and used our own installed packages

  • We saw a brief example of a parallel batch job

  • We saw something about how to tweak interactive jobs to run them as batch jobs