Install packages
Objectives
Learn how to install a (general-purpose) Python package with pip
Understand limitations of this way, e.g. use cases/best practices
Introduction
There are 2-3 ways to install missing python packages at a HPC cluster.
- Local installation, always available for the version of Python you had active when doing the installation
pip install --user [package name]
Isolated environment. See next session
Normally you want reproducibility and the safe way to go is with isolated environments specific to your different projects.
Typical workflow
- Load the Python module with correct version.
Differs among the clusters
- Check that the right python is used with
which python3
orwhich python
Double check the version
python3 -V
orpython -V
- Check that the right python is used with
Install with:
pip install --user [package name]
- package name can be pinned,
like
numpy==1.26.4
(Note the double==
)like
numpy>1.22
read more
The package most often ends up in
~/.local/lib/python3.X
Target directory can be changed by adding
--prefix=[root_folder of installation]
Note
Note that if you install for 3.11.X the package will not be seen by another minor version, like 3.12.X (or may not even be compatible with)
Note that installing with python 3.11.7 will end up in same folder as 3.11.5 and can be used by both bugfix versions.
Naming convention: python/major.minor.bugfix
Exercise
(optional) Exercise 1: Install a python package you know of for an old version
Load an older module (perhaps one you won’t use anymore)
- install the python package (it may already be there but with an older version)
(you can always remove your local installation later if you regret it)
We may add a solution in a coming instance of the course
Already installed Python packages in HPC and ML
It is difficult to give an exhaustive list of useful packages for Python in HPC, but this list contains some of the more popular ones:
Package |
Module to load, UPPMAX |
Module to load, HPC2N |
Brief description |
---|---|---|---|
Dask |
|
|
An open-source Python library for parallel computing. |
Keras |
|
|
An open-source library that provides a Python interface for artificial neural networks. Keras acts as an interface for both the TensorFlow and the Theano libraries. |
Matplotlib |
|
|
A plotting library for the Python programming language and its numerical mathematics extension NumPy. |
Mpi4Py |
Not installed |
|
MPI for Python package. The library provides Python bindings for the Message Passing Interface (MPI) standard. |
Numba |
|
|
An Open Source NumPy-aware JIT optimizing compiler for Python. It translates a subset of Python and NumPy into fast machine code using LLVM. It offers a range of options for parallelising Python code for CPUs and GPUs. |
NumPy |
|
|
A library that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. |
Pandas |
|
|
Built on top of NumPy. Responsible for preparing high-level data sets for machine learning and training. |
PyTorch/Torch |
|
|
PyTorch is an ML library based on the C programming language framework, Torch. Mainly used for natural language processing or computer vision. |
SciPy |
|
|
Open-source library for data science. Extensively used for scientific and technical computations, because it extends NumPy (data manipulation, visualization, image processing, differential equations solver). |
Seaborn |
|
Not installed |
Based on Matplotlib, but features Pandas’ data structures. Often used in ML because it can generate plots of learning data. |
Sklearn/SciKit-Learn |
|
|
Built on NumPy and SciPy. Supports most of the classic supervised and unsupervised learning algorithms, and it can also be used for data mining, modeling, and analysis. |
StarPU |
Not installed |
|
A task programming library for hybrid architectures. C/C++/Fortran/Python API, or OpenMP pragmas. |
TensorFlow |
|
|
Used in both DL and ML. Specializes in differentiable programming, meaning it can automatically compute a function’s derivatives within high-level language. |
Theano |
Not installed |
|
For numerical computation designed for DL and ML applications. It allows users to define, optimise, and gauge mathematical expressions, which includes multi-dimensional arrays. |
Remember, in order to find out how to load one of the modules, which prerequisites needs to be loaded, as well as which versions are available, use module spider <module>
and module spider <module>/<version>
.
Often, you also need to load a python module, except in the cases where it is included in python
or python_ML_packages
at UPPMAX or with SciPy-bundle
at HPC2N.
NOTE that not all versions of Python will have all the above packages installed!