Using packages

Learning outcomes

  • Practice using the documentation of your HPC cluster

  • Can find and load a Python package module

  • Can determine if a Python package is installed

Why Python packages are important

Python packages are pieces of tested Python code. Prefer using a Python package over writing your own code.

Some definitions

  • Library: A collection of code used by a program.

  • Package: A library that has been made easily installable and reusable. Often published on public repositories such as the Python Package Index

  • Dependency: A requirement of another program, not included in that program.

What packages are out there

  • Core numerics libraries: Ex numpy

  • Plotting: Ex matplotlib and seaborn

  • Data analysis and other important core packages: Ex pandas, dask, xarray

  • Interactive computing and human interface: Ex Jupyter, spyder

  • Data format support and data ingestion: Ex h5py

  • Speeding up code and parallelism: Ex mpi4py, numba, dask

  • Machine learning: Ex scikit-learn

  • Deep learning: Ex pytorch, tensorflow, keras

Plan of the week:

  • Cover the use of the above packages in more or less detail

Why software modules are important on an HPC cluster

Software modules allows users of any HPC cluster to activate their favorite software and/or packages of any version. This helps to assure reproducible research.

Where are the python packages?

Python packages can be included inside a Python software module, in a bundle module or needs to be installed by the user.

Cluster

Recommended Python module

Python packages

Dardel

cray-python

Many installed in the Python module

Tetralith

Python

Many installed in the Python module

Alvis

Python

Other then core module in Bundle modules

Bianca

python

Many installed in the Python module

Kebnekaise

Python

Other then core module in Bundle modules

Pelle

Python

Other then core module in Bundle modules

Cosmos

Python

Other then core module in Bundle modules

About Python bundles from EasyBuild.

How to see which Python packages are installed

There are two ways to determine which Python packages are installed (with software modules loaded):

Where

Command to run

The package is present when …

On the command-line

pip list

It shows up in the list

In the Python interpreter

import [package_name], e.g. import scipy

There is no error

Exercises

Exercise 1: using Python packages

  • login to your HPC cluster

  • load the Python module of the version below

HPC cluster

Python version

Alvis

3.12.3

Bianca

3.12.7

COSMOS

3.11.5

Dardel

3.11.7

Kebnekaise

3.11.3

LUMI

3.11.7

Pelle

3.12.3

Tetralith

3.11.5 (bare)

  • Confirm that the Python package, indicated in the table below, is absent. You can use any way to do so.

HPC cluster

Python package

Alvis

scipy

Bianca

tensorflow (CPU version)

COSMOS

scipy

Dardel

matplotlib

Kebnekaise

scipy

LUMI

matplotlib

Pelle

torch

Tetralith

scipy

  • Find the software module to load the package. Use either the documentation of the HPC center, or use the module system

  • Load the software module

  • See the package is now present

In all cases, the package is now installed. Well done!

Done?

When done, and if you haven’t done so yet, do Use the tarball with exercises.

After that, get acquainted about packages in the “See also section”

Using a cluster with bundles (all but Dardel,Tetralith and Bianca)

Read about Python bundles from EasyBuild.

Discussion

  • Questions?

  • About Dardel?

  • Coming Arrhenius, probably a combination of Dardel and Kebnekaise and Alvis.