Exercises and demos

Examples

Load and run

You need the data-file [scottish_hills.csv](https://raw.githubusercontent.com/UPPMAX/HPC-python/main/Exercises/examples/programs/scottish_hills.csv). Download here or find in the Exercises/examples/programs directory in the files you got from cloning the repo.

Since the exercise opens a plot, you need to login with ThinLinc (or otherwise have an x11 server running on your system and login with ssh -X ...).

The exercise is modified from an example found on https://ourcodingclub.github.io/tutorials/pandas-python-intro/.

Warning

Not relevant if using UPPMAX. Only if you are using HPC2N!

You need to also load Tkinter. Use this:

ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 Tkinter/3.11.3

In addition, you need to add the following two lines to the top of your python script/run them first in Python:

import matplotlib
matplotlib.use('TkAgg')

Python example with packages pandas and matplotlib

We are using Python version 3.11.x. To access the packages pandas and matplotlib, you may need to load other modules, depending on the site where you are working.

Here you only need to load the python module, as the relevant packages are included (as long as you are not using GPUs, but that is talked about later in the course). Thus, you just do:

ml python/3.11.8
  1. From inside Python/interactive (if you are on Kebnekaise, mind the warning above):

    Start python and run these lines:

    import pandas as pd
    
    import matplotlib.pyplot as plt
    
    dataframe = pd.read_csv("scottish_hills.csv")
    
    x = dataframe.Height
    
    y = dataframe.Latitude
    
    plt.scatter(x, y)
    
    plt.show()
    

    If you change the last line to plt.savefig("myplot.png") then you will instead get a file myplot.png containing the plot. This is what you would do if you were running a python script in a batch job.

  2. As a Python script (if you are on Kebnekaise, mind the warning above):

    Copy and save this script as a file (or just run the file pandas_matplotlib-<system>.py that is located in the <path-to>/Exercises/examples/programs directory you got from the repo or copied. Where <system> is either rackham or kebnekaise.

    import pandas as pd
    import matplotlib.pyplot as plt
    
    dataframe = pd.read_csv("scottish_hills.csv")
    x = dataframe.Height
    y = dataframe.Latitude
    plt.scatter(x, y)
    plt.show()
    

Install packages

This is for the course environment and needed for one of the exercisesin the ML section.

Create a virtual environment called vpyenv. First load the python version you want to base your virtual environment on, as well as the site-installed ML packages.

$ module load uppmax
$ module load python/3.11.8
$ module load python_ML_packages/3.11.8-cpu
$ python -m venv --system-site-packages /proj/hpc-python/<user-dir>/vpyenv

Activate it.

$ source /proj/hpc-python/<user-dir>/vpyenv/bin/activate

Note that your prompt is changing to start with (vpyenv) to show that you are within an environment.

Install your packages with pip (--user not needed as you are in your virtual environment) and (optionally) giving the correct versions, like:

(vpyenv) $ pip install --no-cache-dir --no-build-isolation scikit-build-core cmake lightgbm

The reason for the other packages (scikit-build-core and cmake) being installed is that they are prerequisites for lightgbm.

Check what was installed

(vpyenv) $ pip list

Deactivate it.

(vpyenv) $ deactivate

Everytime you need the tools available in the virtual environment you activate it as above, after loading the python module.

$ source /proj/hpc-python/<user-dir>/vpyenv/bin/activate

More on virtual environment: https://docs.python.org/3/tutorial/venv.html

Interactive

Now for the examples:

Batch mode

Serial code

This first example shows how to run a short, serial script. The batch script (named run_mmmult.sh) can be found in the directory /HPC-Python/Exercises/examples/<center>, where <center> is hpc2n or uppmax. The Python script is in /HPC-Python/Exercises/examples/programs and is named mmmult.py.

  1. The batch script is run with sbatch run_mmmult.sh.

  2. Try type squeue -u <username> to see if it is pending or running.

  3. When it has run, look at the output with nano slurm-<jobid>.out.

Short serial example script for Rackham. Loading Python 3.11.8. Numpy is preinstalled and does not need to be loaded.

#!/bin/bash -l
#SBATCH -A naiss2024-22-415 # Change to your own after the course
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core

# Load any modules you need, here Python 3.11.8.
module load python/3.11.8

# Run your Python script
python mmmult.py

GPU code

Short GPU example for running compute.py on Snowy.

#!/bin/bash -l
#SBATCH -A naiss2024-22-415
#SBATCH -t 00:10:00
#SBATCH --exclusive
#SBATCH -n 1
#SBATCH -M snowy
#SBATCH --gres=gpu=1

# Load any modules you need, here loading python 3.11.8 and the ML packages
module load uppmax
module load python/3.11.8
module load python_ML_packages/3.11.8-gpu

# Run your code
python compute.py

Run the first serial example script from further up on the page for this short Python code (sum-2args.py)

import sys

x = int(sys.argv[1])
y = int(sys.argv[2])

sum = x + y

print("The sum of the two numbers is: {0}".format(sum))

Remember to give the two arguments to the program in the batch script.

Machine Learning

Pandas and matplotlib

This is the same example that was shown in the section about loading and running Python, but now changed slightly to run as a batch job. The main difference is that here we cannot open the plot directly, but have to save to a file instead. You can see the change inside the Python script.

Remove the # if running on Kebnekaise

import pandas as pd
#import matplotlib
import matplotlib.pyplot as plt

#matplotlib.use('TkAgg')

dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
plt.scatter(x, y)
plt.show()

Batch scripts for running on Rackham and Kebnekaise.

#!/bin/bash -l
#SBATCH -A naiss2024-22-415
#SBATCH --time=00:05:00 # Asking for 5 minutes
#SBATCH -n 1 # Asking for 1 core

# Load any modules you need, here for Python 3.11.8
ml python/3.11.8

# Run your Python script
python pandas_matplotlib-batch.py

Submit with sbatch <batch-script.sh>.

The batch scripts can be found in the directories for hpc2n and uppmax, under Exercises/examples/, and they are named pandas_matplotlib-batch.sh and pandas_matplotlib-batch-kebnekaise.sh.

PyTorch

In order to run this at HPC2N/UPPMAX you should either do a batch job or run interactively on compute nodes. Remember, you should not run long/resource heavy jobs on the login nodes, and they also do not have GPUs if you want to use that.

This is an example of a batch script for running the above example, using PyTorch 2.1.x and Python 3.11.x, and running on GPUs.

TensorFlow

The example comes from https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/ but there are also good examples at https://www.tensorflow.org/tutorials

We are using Tensorflow 2.11.0-CUDA-11.7.0 (and Python 3.10.4) at HPC2N, since that is the newest GPU-enabled TensorFlow currently installed there.

On UPPMAX we are using TensorFlow 2.15.0 (included in python_ML_packages/3.11.8-gpu) and Python 3.11.8.

Since we need scikit-learn, we are also loading the scikit-learn/1.1.2 which is compatible with the other modules we are using.

Thus, load modules: GCC/11.3.0  OpenMPI/4.1.4 TensorFlow/2.11.0-CUDA-11.7.0 scikit-learn/1.1.2 in your batch script.

In order to run the above example, we will create a batch script and submit it.

Submit with sbatch <myjobscript.sh>. After submitting you will (as usual) be given the job-id for your job. You can check on the progress of your job with squeue -u <username> or scontrol show <job-id>.

Note: if you are logged in to Rackham on UPPMAX and have submitted a GPU job to Snowy, then you need to use this to see the job queue:

squeue -M snowy -u <username>

The output and errors will in this case be written to slurm-<job-id>.out.

General

You almost always want to run several iterations of your machine learning code with changed parameters and/or added layers. If you are doing this in a batch job, it is easiest to either make a batch script that submits several variations of your Python script (changed parameters, changed layers), or make a script that loops over and submits jobs with the changes.

Running several jobs from within one job

This example shows how you would run several programs or variations of programs sequentially within the same job:

Example batch script for Kebnekaise, TensorFlow version 2.11.0 and Python version 3.11.3

#!/bin/bash
# Remember to change this to your own project ID after the course!
#SBATCH -A hpc2n2024-052
# We are asking for 5 minutes
#SBATCH --time=00:05:00
# Asking for one V100
#SBATCH --gres=gpu:v100:1
# Remove any loaded modules and load the ones we need
module purge  > /dev/null 2>&1
module load GCC/10.3.0 OpenMPI/4.1.1 SciPy-bundle/2021.05 TensorFlow/2.6.0-CUDA-11.3-1
# Output to file - not needed if your job creates output in a file directly
# In this example I also copy the output somewhere else and then run another executable (or you could just run the same executable for different parameters).
python <my_tf_program.py> <param1> <param2> > myoutput1 2>&1
cp myoutput1 mydatadir
python <my_tf_program.py> <param3> <param4> > myoutput2 2>&1
cp myoutput2 mydatadir
python <my_tf_program.py> <param5> <param6> > myoutput3 2>&1
cp myoutput3 mydatadir

The challenge here is to adapt the above batch scripts to suitable python scripts and directories.

Exercise

Try to modify the files pandas_matplotlib-linreg-<rackham/kebnekaise>.py and ``pandas_matplotlib-linreg-pretty-<rackham/kebnekaise>.py so they could be run from a batch job (change the pop-up plots to save-to-file).

Also change the batch script pandas_matplotlib.sh (or pandas_matplotlib-kebnekaise.sh) to run your modified python codes.

Exercise

In this exercise you will be using the course environment that you prepared in the “Install packages” section (here: https://uppmax.github.io/HPC-python/install_packages.html#prepare-the-course-environment).

You will run the Python code simple_lightgbm.py found in the Exercises/examples/programs directory. The code was taken from https://github.com/microsoft/LightGBM/tree/master and lightly modified.

Try to write a batch script that runs this code. Remember to activate the course environment.

# coding: utf-8
from pathlib import Path

import pandas as pd
from sklearn.metrics import mean_squared_error

import lightgbm as lgb

print("Loading data...")
# load or create your dataset
df_train = pd.read_csv(str("regression.train"), header=None, sep="\t")
df_test = pd.read_csv(str("regression.test"), header=None, sep="\t")

y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)

# specify your configurations as a dict
params = {
    "boosting_type": "gbdt",
    "objective": "regression",
    "metric": {"l2", "l1"},
    "num_leaves": 31,
    "learning_rate": 0.05,
    "feature_fraction": 0.9,
    "bagging_fraction": 0.8,
    "bagging_freq": 5,
    "verbose": 0,
}

print("Starting training...")
# train
gbm = lgb.train(
    params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, callbacks=[lgb.early_stopping(stopping_rounds=5)]
)

print("Saving model...")
# save model to file
gbm.save_model("model.txt")

print("Starting predicting...")
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse_test = mean_squared_error(y_test, y_pred) ** 0.5
print(f"The RMSE of prediction is: {rmse_test}")

GPU

Numba is installed as a module at HPC2N, but not in a version compatible with the Python we are using in this course (3.10.4), so we will have to install it ourselves. The process is the same as in the examples given for the isolated/virtual environment, and we will be using the virtual environment created earlier here. We also need numpy, so we are loading SciPy-bundle as we have done before:

As before, we need a batch script to run the code. There are no GPUs on the login node.

As before, submit with sbatch add-list.sh (assuming you called the batch script thus - change to fit your own naming style).

Numba example 2

An initial implementation of the 2D integration problem with the CUDA support for Numba could be as follows:

The time for executing the kernel and doing some postprocessing to the outputs (copying the C array and doing a reduction) was 4.35 sec. which is a much smaller value than the time for the serial numba code of 152 sec.

Notice the larger size of the grid in the present case (100*1024) compared to the serial case’s size we used previously (10000). Large computations are necessary on the GPUs to get the benefits of this architecture.

One can take advantage of the shared memory in a thread block to write faster code. Here, we wrote the 2D integration example from the previous section where threads in a block write on a shared[] array. Then, this array is reduced (values added) and the output is collected in the array C. The entire code is here:

We need a batch script to run this Python code, an example script is here:

#!/bin/bash
#SBATCH -A project_ID
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH -n 28
#SBATCH -o output_%j.out   # output file
#SBATCH -e error_%j.err    # error messages
#SBATCH --gres=gpu:k80:2
#SBATCH --exclusive

ml purge > /dev/null 2>&1
ml GCCcore/11.2.0 Python/3.9.6
ml GCC/11.2.0 OpenMPI/4.1.1
ml CUDA/11.7.0

virtualenv --system-site-packages /proj/nobackup/<your-project-storage>/vpyenv-python-course
source /proj/nobackup/<your-project-storage>/vpyenv-python-course/bin/activate

python integration2d_gpu.py

The simulation time for this problem’s size was 1.87 sec.

Exercises

Run the first serial example from further up on the page for this short Python code (sum-2args.py)

import sys

x = int(sys.argv[1])
y = int(sys.argv[2])

sum = x + y

print("The sum of the two numbers is: {0}".format(sum))

Remember to give the two arguments to the program in the batch script.