Install packages
!!! info “Learning objectives”
- Install a Python package with `pip`
- Work (create, activate, work, deactivate) `venv` virtual environments
- [RJCB: I suggest to remove this] work (create, activate, work, deactivate) with Conda environments
- [RJCB: I suggest to remove this] export and import a virtual environment
… objectives::
Learners can
work (create, activate, work, deactivate) with virtual environments
install a python package
export and import a virtual environment
Introduction
There are 2-3 ways to install missing python packages at a HPC cluster.
Local installation, always available for the version of Python you had active when doing the installation
pip install --user [package name]
Isolated environment. Use some packages just needed for a specific use case.
venv
/virtualenv
in combination withpip
recommended/working in all HPC centers in Sweden
conda
just recommended in some HPC centers in Sweden
Local (general installation) …
… note::
pip install --user [package name]
- The package end up in ``~/.local``
- target directory can be changed by ``--prefix=[root_folder of installation]``
Isolated environments …
As an example, maybe you have been using TensorFlow 1.x.x for your project and now you need to install a package that requires TensorFlow 2.x.x but you will still be needing the old version of TensorFlow for another package, for instance. This is easily solved with isolated environments.
… note::
Isolated/virtual environments solve a couple of problems:
You can install specific, also older, versions into them.
You can create one for each project and no problem if the two projects require different versions.
You can remove the environment and create a new one, if not needed or with errors.
Isolated environments lets you create separate workspaces for different versions of Python and/or different versions of packages.
You can activate and deactivate them one at a time, and work as if the other workspace does not exist.
The tools
venv UPPMAX+HPC2N+LUNARC+NSC
virtualenv UPPMAX+HPC2N+LUNARC+NSC
Conda LUNARC + UPPMAX (recommended only for Bianca cluster)
… warning::
About Conda on HPC systems
Conda is good in many ways but can interact negatively when trying to use the Python modules in the HPC systems.
LUNARC seems to have working solutions
At UPPMAX Conda is installed but we have many users that get into problems. - However, on Bianca this is the most straight-forward way to install packages (no ordinary internet)
… admonition:: Conda in HPC
Anaconda at LUNARC <https://lunarc-documentation.readthedocs.io/en/latest/guides/applications/Python/#anaconda-distributions>
_Conda at UPPMAX <https://docs.uppmax.uu.se/software/conda/>
_Conda on Bianca <https://uppmax.github.io/bianca_workshop/intermediate/install/#install-packages-principles>
_
Virtual environment - venv & virtualenv
… admonition:: Workflow
You load the Python module you will be using, as well as any site-installed package modules (requires the
--system-site-packages
option later)You create the isolated environment with something like venv, virtualenv (use the
--system-site-packages
to include all “non-base” packages)You activate the environment
You install (or update) the environment with the packages you need
You work in the isolated environment
You deactivate the environment after use
… admonition:: venv vs. virtualenv
These are almost completely interchangeable
The difference being that virtualenv supports older python versions and has a few more minor unique features, while venv is in the standard library.
Step 1:
Virtualenv:
virtualenv --system-site-packages Example
venv:
python -m venv --system-site-packages Example2
Next steps are identical and involves “activating” and
pip installs
We recommend
venv
in the course. Then we are just needing the Python module itself!
… keypoints::
With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.
Make it for each project you have for reproducibility.
There are different tools to create virtual environments.
conda
, only recommended for personal use and at some clustersvirtualenv
, may require to load extra python bundle modules.venv
, most straight-forward and available at all HPC centers. Recommended
More details to follow!
Example …
… tip::
Do not type along!
Create a venv
. First load the python version you want to base your virtual environment on:
… tabs::
… tab:: UPPMAX
.. code-block:: console
$ module load python/3.11.8
$ python -m venv --system-site-packages Example2
"Example2" is the name of the virtual environment. The directory "Example2" is created in the present working directory. The ``-m`` flag makes sure that you use the libraries from the python version you are using.
… tab:: HPC2N
.. code-block:: console
$ module load GCC/12.3.0 Python/3.11.3
$ python -m venv --system-site-packages Example2
"Example2" is the name of the virtual environment. You can name it whatever you want. The directory “Example2” is created in the present working directory.
… tab:: LUNARC
.. code-block:: console
$ module load GCC/12.3.0 Python/3.11.3
$ python -m venv --system-site-packages Example2
"Example2" is the name of the virtual environment. You can name it whatever you want. The directory “Example2” is created in the present working directory.
… tab:: NSC
.. code-block:: console
$ ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5
$ python -m venv --system-site-packages Example2
"Example2" is the name of the virtual environment. You can name it whatever you want. The directory “Example2” is created in the present working directory.
… note::
To save space, you should load any other Python modules you will need that are system installed before installing your own packages! Remember to choose ones that are compatible with the Python version you picked!
--system-site-packages
includes the packages already installed in the loaded python module.
At HPC2N, NSC and LUNARC, you often have to load SciPy-bundle. This is how you could create a venv (Example3) with a SciPy-bundle included which is compatible with Python/3.11.3:
… code-block:: console
$ module load GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 # for HPC2N and LUNAR
$ module load buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 # for NSC
$ python -m venv --system-site-packages Example3
NOTE: since it may take up a bit of space if you are installing many Python packages to your virtual environment, we strongly recommend you place it in your project storage!
NOTE: if you need to for instance working with both Python 2 and 3, then you can of course create more than one virtual environment, just name them so you can easily remember which one has what.
… admonition:: If you want your virtual environment in a certain place…
Example for course project location and
$USER
being you user name.If your directory in the project has another name, replace
$USER
with that one!
UPPMAX:
Create:
python -m venv /proj/hpc-python-fall/$USER/Example
Activate:
source /proj/hpc-python-fall/<user-dir>/Example/bin/activate
HPC2N:
Create:
python -m venv /proj/nobackup/hpc-python-fall-hpc2n/$USER/Example
Activate:
source /proj/nobackup/hpc-python-fall-hpc2n/<user-dir>/Example/bin/activate
LUNARC:
Create:
python -m venv /lunarc/nobackup/projects/lu2024-17-44/$USER/Example
Activate:
source /lunarc/nobackup/projects/lu2024-17-44/<user-dir>/Example/bin/activate
NSC:
Create:
python -m venv /proj/hpc-python-fall-nsc/$USER/Example
Activate:
source /proj/hpc-python-fall-nsc/<user-dir>/Example/bin/activate
Note that your prompt is changing to start with (Example) to show that you are within an environment.
… note::
source
can most often be replaced by.
, like in. Example/bin/activate
. Note the importantafter .
For clarity we use the
source
style here.
Install packages to the virtual environment with pip …
… tip::
Do not type along!
Install your packages with pip
. While not always needed, it is often a good idea to give the correct versions you want, to ensure compatibility with other packages you use. This example assumes your venv is activated:
… code-block:: console
(Example) $ pip install --no-cache-dir --no-build-isolation numpy matplotlib
The --no-cache-dir"
option is required to avoid it from reusing earlier installations from the same user in a different environment. The --no-build-isolation
is to make sure that it uses the loaded modules from the module system when building any Cython libraries.
Deactivate the venv.
… code-block:: console
(Example) $ deactivate
Every time you need the tools available in the virtual environment you activate it as above (after also loading the modules).
… prompt:: console
source /proj/
… note::
You can use “pip list” on the command line (after loading the python module) to see which packages are available and which versions.
Some packages may be inherited from the modules you have loaded
You can do
pip list --local
to see what is installed by you in the environment.Some IDE:s like Spyder may only find those “local” packages
Working with virtual environments defined from files
Creator/developer …
First create and activate an environment (see above)
Install packages with pip
Create file from present virtual environment:
… code-block:: console
$ pip freeze > requirements.txt
That includes also the system site packages if you included them with
--system-site-packages
Test that everything works by running use cases scripts within the environment
You can list packages specific for the virtualenv by
pip list --local
So, creating a file from just the local environment:
… code-block:: console
$ pip freeze –local > requirements.txt
… note::
requirements.txt
(used by the virtual environment) is a simple text file which looks similar to this::
numpy
matplotlib
pandas
scipy
requirements.txt
with versions that could look like this::
numpy==1.20.2
matplotlib==3.2.2
pandas==1.1.2
scipy==1.6.2
Deactivate
User …
Create an environment based on dependencies given in an environment file
This can be done in new virtual environment or as a genera installation locally (not activating any environment
… code-block:: console
pip install -r requirements.txt
Check
… code-block:: console
pip list
… admonition:: More on dependencies
Dependency management from course Python for Scientific computing <https://aaltoscicomp.github.io/python-for-scicomp/dependencies/>
_
… admonition:: Python packages in HPC and ML :class: dropdown
It is difficult to give an exhaustive list of useful packages for Python in HPC, but this list contains some of the more popular ones:
… list-table:: Popular packages :widths: 8 10 10 20 :header-rows: 1
* - Package
- Module to load, UPPMAX
- Module to load, HPC2N
- Brief description
* - Dask
- ``python``
- ``dask``
- An open-source Python library for parallel computing.
* - Keras
- ``python_ML_packages``
- ``Keras``
- An open-source library that provides a Python interface for artificial neural networks. Keras acts as an interface for both the TensorFlow and the Theano libraries.
* - Matplotlib
- ``python`` or ``matplotlib``
- ``matplotlib``
- A plotting library for the Python programming language and its numerical mathematics extension NumPy.
* - Mpi4Py
- Not installed
- ``SciPy-bundle``
- MPI for Python package. The library provides Python bindings for the Message Passing Interface (MPI) standard.
* - Numba
- ``python``
- ``numba``
- An Open Source NumPy-aware JIT optimizing compiler for Python. It translates a subset of Python and NumPy into fast machine code using LLVM. It offers a range of options for parallelising Python code for CPUs and GPUs.
* - NumPy
- ``python``
- ``SciPy-bundle``
- A library that adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
* - Pandas
- ``python``
- ``SciPy-bundle``
- Built on top of NumPy. Responsible for preparing high-level data sets for machine learning and training.
* - PyTorch/Torch
- ``PyTorch`` or ``python_ML_packages``
- ``PyTorch``
- PyTorch is an ML library based on the C programming language framework, Torch. Mainly used for natural language processing or computer vision.
* - SciPy
- ``python``
- ``SciPy-bundle``
- Open-source library for data science. Extensively used for scientific and technical computations, because it extends NumPy (data manipulation, visualization, image processing, differential equations solver).
* - Seaborn
- ``python``
- Not installed
- Based on Matplotlib, but features Pandas’ data structures. Often used in ML because it can generate plots of learning data.
* - Sklearn/SciKit-Learn
- ``scikit-learn``
- ``scikit-learn``
- Built on NumPy and SciPy. Supports most of the classic supervised and unsupervised learning algorithms, and it can also be used for data mining, modeling, and analysis.
* - StarPU
- Not installed
- ``StarPU``
- A task programming library for hybrid architectures. C/C++/Fortran/Python API, or OpenMP pragmas.
* - TensorFlow
- ``TensorFlow``
- ``TensorFlow``
- Used in both DL and ML. Specializes in differentiable programming, meaning it can automatically compute a function’s derivatives within high-level language.
* - Theano
- Not installed
- ``Theano``
- For numerical computation designed for DL and ML applications. It allows users to define, optimise, and gauge mathematical expressions, which includes multi-dimensional arrays.
Remember, in order to find out how to load one of the modules, which prerequisites needs to be loaded, as well as which versions are available, use module spider <module>
and module spider <module>/<version>
.
Often, you also need to load a python module, except in the cases where it is included in python
or python_ML_packages
at UPPMAX or with SciPy-bundle
at HPC2N.
NOTE that not all versions of Python will have all the above packages installed!
… admonition:: Summary of workflow
In addition to loading Python, you will also often need to load site-installed modules for Python packages, or use own-installed Python packages. The work-flow would be something like this:
Load Python and prerequisites:
module load <pre-reqs> Python/<version>
Load site-installed Python packages (optional):
module load <pre-reqs> <python-package>/<version>
Create the virtual environment:
python -m venv [PATH]/Example
Activate your virtual environment:
source <path-to-virt-env>/Example/bin/activate
Install any extra Python packages:
pip install --no-cache-dir --no-build-isolation <python-package>
Start Python or run python script:
python
Do your work
Deactivate
Installed Python modules (modules and own-installed) can be accessed within Python with
import <package>
as usual.The command
pip list
given within Python will list the available modules to import.More about packages and virtual/isolated environment to follow in later sections of the course!
Exercises
… challenge:: 1. Make a test environment
make a virtual environment with the name
venv1
. Do not include packages from the the loaded module(s)activate
install
matplotlib
make a requirements file of the content
deactivate
make another virtual environment with the name
venv2
activate that
install with the aid of the requirements file
check the content
open python shell from command line and try to import
exit python
deactivate
… solution:: Solution :class: dropdown
First load the required Python module(s) if not already done so in earlier lessons. Remember that this steps differ between the HPC centers
make the first environment
… code-block:: console
$ python -m venv venv1
Activate it.
… code-block:: console
$ source venv1/bin/activate
- Note that your prompt is changing to start with ``(venv1)`` to show that you are within an environment.
install
matplotlib
… code-block:: console
pip install matplotlib
make a requirements file of the content
… code-block:: console
pip freeze --local > requirements.txt
deactivate
… code-block:: console
deactivate
make another virtual environment with the name
venv2
… code-block:: console
python -m venv venv2
activate that
… code-block:: console
source venv2/bin/activate
install with the aid of the requirements file
… code-block:: console
pip install -r requirements.txt
check the content
… code-block:: console
pip list
open python shell from command line and try to import
… code-block:: console
python
… code-block:: python
import matplotlib
exit python
… code-block:: python
exit()
deactivate
… code-block:: console
deactivate
Prepare fore the course environments …
… note::
All centers has had different approaches in what is included in the module system and not.
Therefore the solution to complete the necessary packages needed for the course lessons, different approaches has to be made.
This is left as exercise for you
We will need to install the LightGBM Python package for one of the examples in the ML section.
… tip::
Follow the track where you are working right now
Create a virtual environment called vpyenv
. First load the python version you want to base your virtual environment on, as well as the site-installed ML packages.
… tabs::
… tab:: NSC
**If you do not have matplotlib already outside any virtual environment**
- Install matplotlib in your ``.local`` folder, not in a virtual environment.
- Do:
.. code-block:: console
ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5
pip install --user matplotlib
- Check that matplotlib is there by ``pip list``
**Check were to find environments needed for the lessons in the afternoon tomorrow**
- browse ``/proj/hpc-python-fall-nsc/`` to see the available environments.
- their names are
- ``venvNSC-TF``
- ``venvNSC-torch``
- ``venvNSC-numba``
- ``venv-spyder-only``
… tab:: LUNARC
- Everything will work by just loading modules, see each last section
- Extra exercise can be to reproduce the examples above.
… tab:: UPPMAX
**Check were to find environments needed for the lessons in the afternoon tomorrow**
- browse ``/proj/hpc-python-fall/`` to see the available environments.
- their names are, for instance
- ``venv-spyder``
- ``venv-TF``
- ``venv-torch``
- Extra exercise can be to reproduce the examples above.
… tab:: HPC2N
**Check were to find possible environments needed for the lessons in the afternoon tomorrow**
- browse ``/proj/nobackup/hpc-python-fall-hpc2n/`` to see the available environments.
- It may be empty for now but may show up by tomorrow
- their names may be, for instance
- ``venv-TF``
- ``venv-torch``
- Extra exercise can be to reproduce the examples above.
… note::
To use self-installed Python packages in a batch script, you also need to load the above mentioned modules and activate the environment. An example of this will follow later in the course.
To see which Python packages you, yourself, have installed, you can use
pip list --user
while the environment you have installed the packages in are active. To see all packages, usepip list
.
… seealso::
UPPMAX’s documentation pages about installing Python packages and virtual environments: http://docs.uppmax.uu.se/software/python/#installing-python-packages
HPC2N’s documentation pages about installing Python packages and virtual environments: https://www.hpc2n.umu.se/resources/software/user_installed/python
… keypoints::
With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.
Make it for each project you have for reproducibility.
There are different tools to create virtual environments.
UPPMAX has
conda
andvenv
andvirtualenv
HPC2N has
venv
andvirtualenv