Exercises and demos
Examples
Load and run
You need the data-file [scottish_hills.csv](https://raw.githubusercontent.com/UPPMAX/HPC-python/main/Exercises/examples/programs/scottish_hills.csv). Download here or find in the Exercises/examples/programs
directory in the files you got from cloning the repo.
Since the exercise opens a plot, you need to login with ThinLinc (or otherwise have an x11 server running on your system and login with
ssh -X ...
).
The exercise is modified from an example found on https://ourcodingclub.github.io/tutorials/pandas-python-intro/.
Warning
Not relevant if using UPPMAX. Only if you are using HPC2N!
You need to also load Tkinter. Use this:
ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 Tkinter/3.11.3
In addition, you need to add the following two lines to the top of your python script/run them first in Python:
import matplotlib
matplotlib.use('TkAgg')
Python example with packages pandas and matplotlib
We are using Python version 3.11.x
. To access the packages pandas
and matplotlib
, you may need to load other modules, depending on the site where you are working.
Here you only need to load the
python
module, as the relevant packages are included (as long as you are not using GPUs, but that is talked about later in the course). Thus, you just do:
ml python/3.11.8
On Kebnekaise you also need to load SciPy-bundle
and matplotlib
(and their prerequisites). These versions will work well together:
ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2
From inside Python/interactive (if you are on Kebnekaise, mind the warning above):
Start python and run these lines:
import pandas as pd
import matplotlib.pyplot as plt
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
plt.scatter(x, y)
plt.show()
If you change the last line to
plt.savefig("myplot.png")
then you will instead get a filemyplot.png
containing the plot. This is what you would do if you were running a python script in a batch job.As a Python script (if you are on Kebnekaise, mind the warning above):
Copy and save this script as a file (or just run the file
pandas_matplotlib-<system>.py
that is located in the<path-to>/Exercises/examples/programs
directory you got from the repo or copied. Where <system> is eitherrackham
orkebnekaise
.import pandas as pd import matplotlib.pyplot as plt dataframe = pd.read_csv("scottish_hills.csv") x = dataframe.Height y = dataframe.Latitude plt.scatter(x, y) plt.show()
import pandas as pd import matplotlib import matplotlib.pyplot as plt matplotlib.use('TkAgg') dataframe = pd.read_csv("scottish_hills.csv") x = dataframe.Height y = dataframe.Latitude plt.scatter(x, y) plt.show()
Install packages
This is for the course environment and needed for one of the exercisesin the ML section.
Create a virtual environment called vpyenv
. First load the python version you want to base your virtual environment on, as well as the site-installed ML packages.
$ module load uppmax
$ module load python/3.11.8
$ module load python_ML_packages/3.11.8-cpu
$ python -m venv --system-site-packages /proj/hpc-python/<user-dir>/vpyenv
Activate it.
$ source /proj/hpc-python/<user-dir>/vpyenv/bin/activate
Note that your prompt is changing to start with (vpyenv) to show that you are within an environment.
Install your packages with pip
(--user
not needed as you are in your virtual environment) and (optionally) giving the correct versions, like:
(vpyenv) $ pip install --no-cache-dir --no-build-isolation scikit-build-core cmake lightgbm
The reason for the other packages (scikit-build-core
and cmake
) being installed is that they are prerequisites for lightgbm
.
Check what was installed
(vpyenv) $ pip list
Deactivate it.
(vpyenv) $ deactivate
Everytime you need the tools available in the virtual environment you activate it as above, after loading the python module.
$ source /proj/hpc-python/<user-dir>/vpyenv/bin/activate
More on virtual environment: https://docs.python.org/3/tutorial/venv.html
First go to the directory you want your environment in.
Load modules for Python, SciPy-bundle, matplotlib, create the virtual environment, activate the environment, and install lightgbm and scikit-learn (since the versions available are not compatible with this Python) on Kebnekaise at HPC2N
$ module load GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2
$ python -m venv --system-site-packages vpyenv
$ source vpyenv/bin/activate
(vpyenv) $ pip install --no-cache-dir --no-build-isolation lightgbm scikit-learn
Deactivating a virtual environment.
(vpyenv) $ deactivate
Every time you need the tools available in the virtual environment you activate it as above (after first loading the modules for Python, Python packages, and prerequisites)
$ source vpyenv/bin/activate
Interactive
Example, Kebnekaise, Requesting 4 cores for 30 minutes, then running Python
b-an01 [~]$ salloc -n 4 --time=00:30:00 -A hpc2n2024-052 salloc: Pending job allocation 20174806 salloc: job 20174806 queued and waiting for resources salloc: job 20174806 has been allocated resources salloc: Granted job allocation 20174806 salloc: Waiting for resource configuration salloc: Nodes b-cn0241 are ready for job b-an01 [~]$ module load GCC/12.3.0 Python/3.11.3 b-an01 [~]$
Adding two numbers from user input (add2.py)
# This program will add two numbers that are provided by the user # Get the numbers a = int(input("Enter the first number: ")) b = int(input("Enter the second number: ")) # Add the two numbers together sum = a + b # Output the sum print("The sum of {0} and {1} is {2}".format(a, b, sum))
Adding two numbers given as arguments (sum-2args.py)
import sys x = int(sys.argv[1]) y = int(sys.argv[2]) sum = x + y print("The sum of the two numbers is: {0}".format(sum))
Now for the examples:
Example, Kebnekaise, Running a Python script in the allocation we made further up. Notice that since we asked for 4 cores, the script is run 4 times, since it is a serial script
b-an01 [~]$ srun python sum-2args.py 3 4 The sum of the two numbers is: 7 The sum of the two numbers is: 7 The sum of the two numbers is: 7 The sum of the two numbers is: 7 b-an01 [~]$
Example, Running a Python script in the above allocation, but this time a script that expects input from you.
b-an01 [~]$ srun python add2.py 2 3 Enter the first number: Enter the second number: The sum of 2 and 3 is 5 Enter the first number: Enter the second number: The sum of 2 and 3 is 5 Enter the first number: Enter the second number: The sum of 2 and 3 is 5 Enter the first number: Enter the second number: The sum of 2 and 3 is 5
Batch mode
Serial code
This first example shows how to run a short, serial script. The batch script (named run_mmmult.sh
) can be found in the directory /HPC-Python/Exercises/examples/<center>, where <center> is hpc2n or uppmax. The Python script is in /HPC-Python/Exercises/examples/programs and is named mmmult.py
.
The batch script is run with
sbatch run_mmmult.sh
.Try type
squeue -u <username>
to see if it is pending or running.When it has run, look at the output with
nano slurm-<jobid>.out
.
Short serial example script for Rackham. Loading Python 3.11.8. Numpy is preinstalled and does not need to be loaded.
#!/bin/bash -l
#SBATCH -A naiss2024-22-415 # Change to your own after the course
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core
# Load any modules you need, here Python 3.11.8.
module load python/3.11.8
# Run your Python script
python mmmult.py
Short serial example for running on Kebnekaise. Loading SciPy-bundle/2023.07 and Python/3.11.3
#!/bin/bash
#SBATCH -A hpc2n2024-052 # Change to your own
#SBATCH --time=00:10:00 # Asking for 10 minutes
#SBATCH -n 1 # Asking for 1 core
# Load any modules you need, here for Python/3.11.3 and compatible SciPy-bundle
module load GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07
# Run your Python script
python mmmult.py
Python example code
import timeit
import numpy as np
starttime = timeit.default_timer()
np.random.seed(1701)
A = np.random.randint(-1000, 1000, size=(8,4))
B = np.random.randint(-1000, 1000, size =(4,4))
print("This is matrix A:\n", A)
print("The shape of matrix A is ", A.shape)
print()
print("This is matrix B:\n", B)
print("The shape of matrix B is ", B.shape)
print()
print("Doing matrix-matrix multiplication...")
print()
C = np.matmul(A, B)
print("The product of matrices A and B is:\n", C)
print("The shape of the resulting matrix is ", C.shape)
print()
print("Time elapsed for generating matrices and multiplying them is ", timeit.default_timer() - starttime)
GPU code
Short GPU example for running compute.py
on Snowy.
#!/bin/bash -l
#SBATCH -A naiss2024-22-415
#SBATCH -t 00:10:00
#SBATCH --exclusive
#SBATCH -n 1
#SBATCH -M snowy
#SBATCH --gres=gpu=1
# Load any modules you need, here loading python 3.11.8 and the ML packages
module load uppmax
module load python/3.11.8
module load python_ML_packages/3.11.8-gpu
# Run your code
python compute.py
Example with running compute.py
on Kebnekaise.
#!/bin/bash
#SBATCH -A hpc2n2024-052 # Change to your own
#SBATCH --time=00:10:00 # Asking for 10 minutes
# Asking for one V100 card
#SBATCH --gres=gpu:v100:1
# Remove any loaded modules and load the ones we need
module purge > /dev/null 2>&1
module load GCC/12.3.0 OpenMPI/4.1.5 Python/3.11.3 SciPy-bundle/2023.07 numba/0.58.1
# Run your Python script
python compute.py
This Python script can (just like the batch scripts for UPPMAX and HPC2N), be found in the /HPC-Python/Exercises/examples
directory, under the subdirectory programs
- if you have cloned the repo or copied the tarball with the exercises.
from numba import jit, cuda
import numpy as np
# to measure exec time
from timeit import default_timer as timer
# normal function to run on cpu
def func a):
for i in range(10000000):
a[i]+= 1
# function optimized to run on gpu
@jit(target_backend='cuda')
def func2(a):
for i in range(10000000):
a[i]+= 1
if __name__=="__main__":
n = 10000000
a = np.ones(n, dtype = np.float64)
start = timer()
func(a)
print("without GPU:", timer()-start)
start = timer()
func2(a)
print("with GPU:", timer()-start)
Run the first serial example script from further up on the page for this short Python code (sum-2args.py)
import sys
x = int(sys.argv[1])
y = int(sys.argv[2])
sum = x + y
print("The sum of the two numbers is: {0}".format(sum))
Remember to give the two arguments to the program in the batch script.
Solution for HPC2N
This batch script is for Kebnekaise. Adding the numbers 2 and 3.
#!/bin/bash #SBATCH -A hpc2n2024-052 # Change to your own #SBATCH --time=00:05:00 # Asking for 5 minutes #SBATCH -n 1 # Asking for 1 core # Load any modules you need, here for Python 3.11.3 module load GCC/12.3.0 Python/3.11.3 # Run your Python script python sum-2args.py 2 3
Solution for UPPMAX
This batch script is for UPPMAX. Adding the numbers 2 and 3.
#!/bin/bash -l #SBATCH -A naiss2024-22-415 # Change to your own after the course #SBATCH --time=00:05:00 # Asking for 5 minutes #SBATCH -n 1 # Asking for 1 core # Load any modules you need, here for python 3.11.8 module load python/3.11.8 # Run your Python script python sum-2args.py 2 3
Machine Learning
Pandas and matplotlib
This is the same example that was shown in the section about loading and running Python, but now changed slightly to run as a batch job. The main difference is that here we cannot open the plot directly, but have to save to a file instead. You can see the change inside the Python script.
Remove the # if running on Kebnekaise
import pandas as pd
#import matplotlib
import matplotlib.pyplot as plt
#matplotlib.use('TkAgg')
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
plt.scatter(x, y)
plt.show()
Remove the # if running on Kebnekaise. The script below can be found as pandas_matplotlib-batch.py
or pandas_matplotlib-batch-kebnekaise.py
in the Exercises/examples/programs
directory.
import pandas as pd
#import matplotlib
import matplotlib.pyplot as plt
#matplotlib.use('TkAgg')
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
plt.scatter(x, y)
plt.savefig("myplot.png")
Batch scripts for running on Rackham and Kebnekaise.
#!/bin/bash -l
#SBATCH -A naiss2024-22-415
#SBATCH --time=00:05:00 # Asking for 5 minutes
#SBATCH -n 1 # Asking for 1 core
# Load any modules you need, here for Python 3.11.8
ml python/3.11.8
# Run your Python script
python pandas_matplotlib-batch.py
#!/bin/bash
#SBATCH -A hpc2n2024-052
#SBATCH --time=00:05:00 # Asking for 5 minutes
#SBATCH -n 1 # Asking for 1 core
# Load any modules you need, here for Python 3.11.3
ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2
# Run your Python script
python pandas_matplotlib-batch-kebnekaise.py
Submit with sbatch <batch-script.sh>
.
The batch scripts can be found in the directories for hpc2n and uppmax, under Exercises/examples/
, and they are named pandas_matplotlib-batch.sh
and pandas_matplotlib-batch-kebnekaise.sh
.
PyTorch
We use PyTorch Tensors to fit a third order polynomial to a sine function. The forward and backward passes through the network are manually implemented.
# -*- coding: utf-8 -*- import torch import math dtype = torch.float device = torch.device("cpu") device = torch.device("cuda:0") # Comment this out to not run on GPU # Create random input and output data x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype) y = torch.sin(x) # Randomly initialize weights a = torch.randn((), device=device, dtype=dtype) b = torch.randn((), device=device, dtype=dtype) c = torch.randn((), device=device, dtype=dtype) d = torch.randn((), device=device, dtype=dtype) learning_rate = 1e-6 for t in range(2000): # Forward pass: compute predicted y y_pred = a + b * x + c * x ** 2 + d * x ** 3 # Compute and print loss loss = (y_pred - y).pow(2).sum().item() if t % 100 == 99: print(t, loss) # Backprop to compute gradients of a, b, c, d with respect to loss grad_y_pred = 2.0 * (y_pred - y) grad_a = grad_y_pred.sum() grad_b = (grad_y_pred * x).sum() grad_c = (grad_y_pred * x ** 2).sum() grad_d = (grad_y_pred * x ** 3).sum() # Update weights using gradient descent a -= learning_rate * grad_a b -= learning_rate * grad_b c -= learning_rate * grad_c d -= learning_rate * grad_d print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')
In order to run this at HPC2N/UPPMAX you should either do a batch job or run interactively on compute nodes. Remember, you should not run long/resource heavy jobs on the login nodes, and they also do not have GPUs if you want to use that.
This is an example of a batch script for running the above example, using PyTorch 2.1.x and Python 3.11.x, and running on GPUs.
Example batch script, running on Kebnekaise
#!/bin/bash # Remember to change this to your own project ID after the course! #SBATCH -A hpc2n2024-052 # We are asking for 5 minutes #SBATCH --time=00:05:00 # The following two lines splits the output in a file for any errors and a file for other output. #SBATCH --error=job.%J.err #SBATCH --output=job.%J.out # Asking for one V100 #SBATCH --gres=gpu:V100:1 # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load GCC/12.3.0 OpenMPI/4.1.5 PyTorch/2.1.2-CUDA-12.1.1 srun python pytorch_fitting_gpu.py
UPPMAX as run in an interactive Snowy session
$ interactive -A naiss2024-22-415 -n 1 -M snowy --gres=gpu:1 -t 1:00:01 You receive the high interactive priority. Please, use no more than 8 GB of RAM. Waiting for job 6907137 to start... Starting job now -- you waited for 90 seconds. $ ml uppmax $ ml python/3.11.8 $ module load python_ML_packages/3.11.8-gpu $ cd /proj/naiss2024-22-415/<user-dir>/HPC-python/Exercises/examples/programs $ srun python pytorch_fitting_gpu.py 99 134.71942138671875 199 97.72868347167969 299 71.6167221069336 399 53.178802490234375 499 40.15779113769531 599 30.9610652923584 699 24.464630126953125 799 19.875120162963867 899 16.632421493530273 999 14.341087341308594 1099 12.721846580505371 1199 11.577451705932617 1299 10.76859188079834 1399 10.196844100952148 1499 9.792669296264648 1599 9.506935119628906 1699 9.304922103881836 1799 9.162087440490723 1899 9.061092376708984 1999 8.989676475524902 Result: y = 0.013841948471963406 + 0.855550229549408 x + -0.002387965563684702 x^2 + -0.09316103905439377 x^3
TensorFlow
The example comes from https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/ but there are also good examples at https://www.tensorflow.org/tutorials
We are using Tensorflow 2.11.0-CUDA-11.7.0 (and Python 3.10.4) at HPC2N, since that is the newest GPU-enabled TensorFlow currently installed there.
On UPPMAX we are using TensorFlow 2.15.0 (included in python_ML_packages/3.11.8-gpu) and Python 3.11.8.
Since we need scikit-learn, we are also loading the scikit-learn/1.1.2 which is compatible with the other modules we are using.
Thus, load modules: GCC/11.3.0 OpenMPI/4.1.4 TensorFlow/2.11.0-CUDA-11.7.0 scikit-learn/1.1.2
in your batch script.
UPPMAX has scikit-learn in the python_ML_packages, so we do not need to load anything extra there.
- Load modules:
module load uppmax python/3.11.8 python_ML_packages/3.11.8-gpu
On Rackham we should use python_ML-packages/3.11.8-cpu, while on a GPU node the GPU version should be loaded (like we do in this example, which will work either in a batch script submitted to Snowy or in an interactive job running on Snowy).
We will work with this example
# mlp for binary classification from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense # load the dataset path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv' df = read_csv(path, header=None) # split into input and output columns X, y = df.values[:, :-1], df.values[:, -1] # ensure all data are floating point values X = X.astype('float32') # encode strings to integer y = LabelEncoder().fit_transform(y) # split into train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33) print(X_train.shape, X_test.shape, y_train.shape, y_test.shape) # determine the number of input features n_features = X_train.shape[1] # define model model = Sequential() model.add(Dense(10, activation='relu', kernel_initializer='he_normal', input_shape=(n_features,))) model.add(Dense(8, activation='relu', kernel_initializer='he_normal')) model.add(Dense(1, activation='sigmoid')) # compile the model model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # fit the model model.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0) # evaluate the model loss, acc = model.evaluate(X_test, y_test, verbose=0) print('Test Accuracy: %.3f' % acc) # make a prediction row = [1,0,0.99539,-0.05889,0.85243,0.02306,0.83398,-0.37708,1,0.03760,0.85243,-0.17755,0.59755,-0.44945,0.60536,-0.38223,0.84356,-0.38542,0.58212,-0.32192,0.56971,-0.29674,0.36946,-0.47357,0.56811,-0.51171,0.41078,-0.46168,0.21266,-0.34090,0.42267,-0.54487,0.18641,-0.45300] yhat = model.predict([row]) print('Predicted: %.3f' % yhat)
In order to run the above example, we will create a batch script and submit it.
Example batch script for Kebnekaise, TensorFlow version 2.11.0 and Python version 3.10.4, and scikit-learn 1.1.2
#!/bin/bash # Remember to change this to your own project ID after the course! #SBATCH -A hpc2n2024-052 # We are asking for 5 minutes #SBATCH --time=00:05:00 # Asking for one V100 GPU #SBATCH --gres=gpu:v100:1 # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load GCC/11.3.0 Python/3.10.4 OpenMPI/4.1.4 TensorFlow/2.11.0-CUDA-11.7.0 scikit-learn/1.1.2 # Run your Python script python example-tf.py
Example batch script for Snowy, Python version 3.11.8, and the python_ML_packages/3.11.8-gpu containing Tensorflow
#!/bin/bash -l # Remember to change this to your own project ID after the course! #SBATCH -A naiss2024-22-415 # We want to run on Snowy #SBATCH -M snowy # We are asking for 15 minutes #SBATCH --time=00:15:00 #SBATCH --gres=gpu:1 # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load uppmax module load python_ML_packages/3.11.8-gpu # Run your Python script python example-tf.py
Submit with sbatch <myjobscript.sh>
. After submitting you will (as usual) be given the job-id for your job. You can check on the progress of your job with squeue -u <username>
or scontrol show <job-id>
.
Note: if you are logged in to Rackham on UPPMAX and have submitted a GPU job to Snowy, then you need to use this to see the job queue:
squeue -M snowy -u <username>
The output and errors will in this case be written to slurm-<job-id>.out
.
General
You almost always want to run several iterations of your machine learning code with changed parameters and/or added layers. If you are doing this in a batch job, it is easiest to either make a batch script that submits several variations of your Python script (changed parameters, changed layers), or make a script that loops over and submits jobs with the changes.
Running several jobs from within one job
This example shows how you would run several programs or variations of programs sequentially within the same job:
Example batch script for Kebnekaise, TensorFlow version 2.11.0 and Python version 3.11.3
#!/bin/bash
# Remember to change this to your own project ID after the course!
#SBATCH -A hpc2n2024-052
# We are asking for 5 minutes
#SBATCH --time=00:05:00
# Asking for one V100
#SBATCH --gres=gpu:v100:1
# Remove any loaded modules and load the ones we need
module purge > /dev/null 2>&1
module load GCC/10.3.0 OpenMPI/4.1.1 SciPy-bundle/2021.05 TensorFlow/2.6.0-CUDA-11.3-1
# Output to file - not needed if your job creates output in a file directly
# In this example I also copy the output somewhere else and then run another executable (or you could just run the same executable for different parameters).
python <my_tf_program.py> <param1> <param2> > myoutput1 2>&1
cp myoutput1 mydatadir
python <my_tf_program.py> <param3> <param4> > myoutput2 2>&1
cp myoutput2 mydatadir
python <my_tf_program.py> <param5> <param6> > myoutput3 2>&1
cp myoutput3 mydatadir
Example batch script for Snowy, TensorFlow version 2.15 and Python version 3.11.8.
#!/bin/bash -l
# Remember to change this to your own project ID after the course!
#SBATCH -A naiss2024-22-415
# We are asking for at least 1 hour
#SBATCH --time=01:00:01
#SBATCH -M snowy
#SBATCH --gres=gpu:1
# Remove any loaded modules and load the ones we need
module purge > /dev/null 2>&1
module load uppmax
module load python_ML_packages/3.11.8-gpu
# Output to file - not needed if your job creates output in a file directly
# In this example I also copy the output somewhere else and then run another executable (or you could just run the same executable for different parameters).
python tf_program.py 1 2 > myoutput1 2>&1
cp myoutput1 mydatadir
python tf_program.py 3 4 > myoutput2 2>&1
cp myoutput2 mydatadir
python tf_program.py 5 6 > myoutput3 2>&1
cp myoutput3 mydatadir
The challenge here is to adapt the above batch scripts to suitable python scripts and directories.
Exercise
Try to modify the files pandas_matplotlib-linreg-<rackham/kebnekaise>.py
and ``pandas_matplotlib-linreg-pretty-<rackham/kebnekaise>.py so they could be run from a batch job (change the pop-up plots to save-to-file).
Also change the batch script pandas_matplotlib.sh
(or pandas_matplotlib-kebnekaise.sh
) to run your modified python codes.
Exercise
In this exercise you will be using the course environment that you prepared in the “Install packages” section (here: https://uppmax.github.io/HPC-python/install_packages.html#prepare-the-course-environment).
You will run the Python code simple_lightgbm.py
found in the Exercises/examples/programs
directory. The code was taken from https://github.com/microsoft/LightGBM/tree/master and lightly modified.
Try to write a batch script that runs this code. Remember to activate the course environment.
# coding: utf-8
from pathlib import Path
import pandas as pd
from sklearn.metrics import mean_squared_error
import lightgbm as lgb
print("Loading data...")
# load or create your dataset
df_train = pd.read_csv(str("regression.train"), header=None, sep="\t")
df_test = pd.read_csv(str("regression.test"), header=None, sep="\t")
y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)
# create dataset for lightgbm
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
# specify your configurations as a dict
params = {
"boosting_type": "gbdt",
"objective": "regression",
"metric": {"l2", "l1"},
"num_leaves": 31,
"learning_rate": 0.05,
"feature_fraction": 0.9,
"bagging_fraction": 0.8,
"bagging_freq": 5,
"verbose": 0,
}
print("Starting training...")
# train
gbm = lgb.train(
params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, callbacks=[lgb.early_stopping(stopping_rounds=5)]
)
print("Saving model...")
# save model to file
gbm.save_model("model.txt")
print("Starting predicting...")
# predict
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)
# eval
rmse_test = mean_squared_error(y_test, y_pred) ** 0.5
print(f"The RMSE of prediction is: {rmse_test}")
Click to reveal the solution!
#!/bin/bash -l # Change to your own project ID after the course! #SBATCH -A naiss2024-22-415 # We are asking for 10 minutes #SBATCH --time=00:10:00 #SBATCH -n 1 # Set a path where the example programs are installed. # Change the below to your own path to where you placed the example programs MYPATH=/proj/hpc-python/<mydir-name>/HPC-python/Exercises/examples/programs/ # Activate the course environment (assuming it was called vpyenv) source /proj/hpc-python/<mydir-name>/<path-to-my-venv>/vpyenv/bin/activate # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load uppmax module load python/3.11.8 # Run your Python script python $MYPATH/simple_lightgbm.py
Click to reveal the solution!
#!/bin/bash # Change to your own project ID after the course! #SBATCH -A hpc2n2024-052 # We are asking for 10 minutes #SBATCH --time=00:10:00 #SBATCH -n 1 # Set a path where the example programs are installed. # Change the below to your own path to where you placed the example programs MYPATH=/proj/nobackup/python-hpc/<mydir-name>/HPC-python/Exercises/examples/programs/ # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 # Activate the course environment (assuming it was called vpyenv) source /proj/nobackup/python-hpc/<mydir-name>/<path-to-my-venv>/vpyenv/bin/activate # Run your Python script python $MYPATH/simple_lightgbm.py
GPU
Numba is installed as a module at HPC2N, but not in a version compatible with the Python we are using in this course (3.10.4), so we will have to install it ourselves. The process is the same as in the examples given for the isolated/virtual environment, and we will be using the virtual environment created earlier here. We also need numpy, so we are loading SciPy-bundle as we have done before:
Load Python 3.10.4 and its prerequisites + SciPy-bundle + CUDA, then activate the virtual environment before installing numba
$ module load GCC/11.2.0 OpenMPI/4.1.1 Python/3.9.6 SciPy-bundle/2021.10 CUDA/11.7.0 $ python -m venv --system-site-packages vpyenv $ source /proj/nobackup/python-hpc/bbrydsoe/vpyenv/bin/activate (vpyenv) $ pip install --no-cache-dir --no-build-isolation numba Collecting numba Downloading numba-0.56.0-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 38.7 MB/s eta 0:00:00 Requirement already satisfied: setuptools in /pfs/proj/nobackup/fs/projnb10/python-hpc/bbrydsoe/vpyenv/lib/python3.9/site-packages (from numba) (63.1.0) Requirement already satisfied: numpy<1.23,>=1.18 in /cvmfs/ebsw.hpc2n.umu.se/amd64_ubuntu2004_bdw/software/SciPy-bundle/2021.05-foss-2021a/lib/python3.9/site-packages (from numba) (1.20.3) Collecting llvmlite<0.40,>=0.39.0dev0 Downloading llvmlite-0.39.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.6/34.6 MB 230.0 MB/s eta 0:00:00 Installing collected packages: llvmlite, numba Successfully installed llvmlite-0.39.0 numba-0.56.0 [notice] A new release of pip available: 22.1.2 -> 22.2.2 [notice] To update, run: pip install --upgrade pipLet us try using it. We are going to use the following program for testing (it was taken from https://linuxhint.com/gpu-programming-python/ but there are also many great examples at https://numba.readthedocs.io/en/stable/cuda/examples.html):
Python example using Numba
import numpy as np from timeit import default_timer as timer from numba import vectorize # This should be a substantially high value. NUM_ELEMENTS = 100000000 # This is the CPU version. def vector_add_cpu(a, b): c = np.zeros(NUM_ELEMENTS, dtype=np.float32) for i in range(NUM_ELEMENTS): c[i] = a[i] + b[i] return c # This is the GPU version. Note the @vectorize decorator. This tells # numba to turn this into a GPU vectorized function. @vectorize(["float32(float32, float32)"], target='cuda') def vector_add_gpu(a, b): return a + b; def main(): a_source = np.ones(NUM_ELEMENTS, dtype=np.float32) b_source = np.ones(NUM_ELEMENTS, dtype=np.float32) # Time the CPU function start = timer() vector_add_cpu(a_source, b_source) vector_add_cpu_time = timer() - start # Time the GPU function start = timer() vector_add_gpu(a_source, b_source) vector_add_gpu_time = timer() - start # Report times print("CPU function took %f seconds." % vector_add_cpu_time) print("GPU function took %f seconds." % vector_add_gpu_time) return 0 if __name__ == "__main__": main()
As before, we need a batch script to run the code. There are no GPUs on the login node.
Batch script to run the numba code (add-list.py) at Kebnekaise
#!/bin/bash # Remember to change this to your own project ID after the course! #SBATCH -A hpc2nXXXX-YYY # We are asking for 5 minutes #SBATCH --time=00:05:00 # Asking for one K80 #SBATCH --gres=gpu:k80:1 # Remove any loaded modules and load the ones we need module purge > /dev/null 2>&1 module load GCC/11.2.0 OpenMPI/4.1.1 Python/3.9.6 SciPy-bundle/2021.10 CUDA/11.7.0 # Activate the virtual environment we installed to source /proj/nobackup/support-hpc2n/bbrydsoe/vpyenv/bin/activate # Run your Python script python add-list.py
As before, submit with sbatch add-list.sh
(assuming you called the batch script thus - change to fit your own naming style).
Numba example 2
An initial implementation of the 2D integration problem with the CUDA support for Numba could be as follows:
integration2d_gpu.py
from __future__ import division from numba import cuda, float32 import numpy import math from time import perf_counter # grid size n = 100*1024 threadsPerBlock = 16 blocksPerGrid = int((n+threadsPerBlock-1)/threadsPerBlock) # interval size (same for X and Y) h = math.pi / float(n) @cuda.jit def dotprod(C): tid = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x if tid >= n: return #cummulative variable mysum = 0.0 # fine-grain integration in the X axis x = h * (tid + 0.5) # regular integration in the Y axis for j in range(n): y = h * (j + 0.5) mysum += math.sin(x + y) C[tid] = mysum # array for collecting partial sums on the device C_global_mem = cuda.device_array((n),dtype=numpy.float32) starttime = perf_counter() dotprod[blocksPerGrid,threadsPerBlock](C_global_mem) res = C_global_mem.copy_to_host() integral = h**2 * sum(res) endtime = perf_counter() print("Integral value is %e, Error is %e" % (integral, abs(integral - 0.0))) print("Time spent: %.2f sec" % (endtime-starttime))
The time for executing the kernel and doing some postprocessing to the outputs (copying the C array and doing a reduction) was 4.35 sec. which is a much smaller value than the time for the serial numba code of 152 sec.
Notice the larger size of the grid in the present case (100*1024) compared to the serial case’s size we used previously (10000). Large computations are necessary on the GPUs to get the benefits of this architecture.
One can take advantage of the shared memory in a thread block to write faster code. Here,
we wrote the 2D integration example from the previous section where threads in a block
write on a shared[] array. Then, this array is reduced (values added) and the output is
collected in the array C
. The entire code is here:
integration2d_gpu_shared.py
from __future__ import division from numba import cuda, float32 import numpy import math from time import perf_counter # grid size n = 100*1024 threadsPerBlock = 16 blocksPerGrid = int((n+threadsPerBlock-1)/threadsPerBlock) # interval size (same for X and Y) h = math.pi / float(n) @cuda.jit def dotprod(C): # using the shared memory in the thread block shared = cuda.shared.array(shape=(threadsPerBlock), dtype=float32) tid = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x shrIndx = cuda.threadIdx.x if tid >= n: return #cummulative variable mysum = 0.0 # fine-grain integration in the X axis x = h * (tid + 0.5) # regular integration in the Y axis for j in range(n): y = h * (j + 0.5) mysum += math.sin(x + y) shared[shrIndx] = mysum cuda.syncthreads() # reduction for the whole thread block s = 1 while s < cuda.blockDim.x: if shrIndx % (2*s) == 0: shared[shrIndx] += shared[shrIndx + s] s *= 2 cuda.syncthreads() # collecting the reduced value in the C array if shrIndx == 0: C[cuda.blockIdx.x] = shared[0] # array for collecting partial sums on the device C_global_mem = cuda.device_array((blocksPerGrid),dtype=numpy.float32) starttime = perf_counter() dotprod[blocksPerGrid,threadsPerBlock](C_global_mem) res = C_global_mem.copy_to_host() integral = h**2 * sum(res) endtime = perf_counter() print("Integral value is %e, Error is %e" % (integral, abs(integral - 0.0))) print("Time spent: %.2f sec" % (endtime-starttime))
We need a batch script to run this Python code, an example script is here:
#!/bin/bash
#SBATCH -A project_ID
#SBATCH -t 00:05:00
#SBATCH -N 1
#SBATCH -n 28
#SBATCH -o output_%j.out # output file
#SBATCH -e error_%j.err # error messages
#SBATCH --gres=gpu:k80:2
#SBATCH --exclusive
ml purge > /dev/null 2>&1
ml GCCcore/11.2.0 Python/3.9.6
ml GCC/11.2.0 OpenMPI/4.1.1
ml CUDA/11.7.0
virtualenv --system-site-packages /proj/nobackup/<your-project-storage>/vpyenv-python-course
source /proj/nobackup/<your-project-storage>/vpyenv-python-course/bin/activate
python integration2d_gpu.py
The simulation time for this problem’s size was 1.87 sec.
Exercises
Run the first serial example from further up on the page for this short Python code (sum-2args.py)
import sys
x = int(sys.argv[1])
y = int(sys.argv[2])
sum = x + y
print("The sum of the two numbers is: {0}".format(sum))
Remember to give the two arguments to the program in the batch script.
Solution
This is for Kebnekaise. Adding the numbers 2 and 3.
#!/bin/bash #SBATCH -A hpc2nXXXX-YYY # Change to your own after the course #SBATCH --time=00:05:00 # Asking for 5 minutes #SBATCH -n 1 # Asking for 1 core # Load any modules you need, here for Python 3.9.6 module load GCC/11.2.0 OpenMPI/4.1.1 Python/3.9.6 # Run your Python script python sum-2args.py 2 3