Use isolated environments

Learning objectives

  • Practice using the documentation of your HPC cluster

  • Find out which isolated environment tool to use on your HPC cluster

  • Work (create, activate, work, deactivate) with isolated environments in the way recommended by your HPC cluster

  • (optional) work (create, activate, work, deactivate) with isolated environments in the other way (if any) possible on your HPC cluster

  • (optional) export and import a virtual environment

Isolated environments

  • As an example, maybe you have been using TensorFlow 1.x.x for your project and
    • now you need to install a package that requires TensorFlow 2.x.x

    • but you will still be needing the old version of TensorFlow for another package, for instance.

  • This is easily solved with isolated environments.

  • Another example is when a reviewer want you to remake a figure.
    • You have already started to use a newer Python version or newer packages and

    • realise that your earlier script does not work anymore.

  • Having freezed the environment would have solved you from this issue!

Note

Isolated/virtual environments solve a couple of problems:

  • You can install specific, also older, package versions into them.

  • You can create one for each project and no problem if the two projects require different versions.

  • You can remove the environment and create a new one, if not needed or with errors.

  • Good for reproducibility!

  • Isolated environments let you create separate workspaces for different versions of Python and/or different versions of packages.

  • You can activate and deactivate them one at a time, and work as if the other workspace does not exist.

The tools

  • Python’s built-in venv module: uses pip

  • virtualenv (can be installed): uses pip

  • conda/forge: uses conda/mamba

What happens at activation?

  • Python version is defined by the environment.
    • Check with which python, should show at path to the environment.

    • In conda you can define python version as well

    • Since venv is part of Python you will get the python version used when running the venv command.

  • Packages are defined by the environent.
    • Check with pip list

    • Conda can only see what you installed for it.

    • venv and virtualenv also see other packages if you allowed for that when creating the environment (--system-site-packages).

  • You can work in a Python shell or IDE (coming session)

  • You can run scripts dependent on packages now instaleld in your environment.

Warning

About Conda on HPC systems

  • Conda is good in many ways but can interact negatively when
    • using the python modules (module load) at the same time

    • having base environment always active

  • Not recommended at HPC2N

  • At the other clusters, handle with care!

  • However, on Bianca this is the most straight-forward way to install packages (no ordinary internet)

HPC cluster

Conda vs venv

Alvis

venv, conda in container

Bianca

conda/latest, venv via wharf

COSMOS

Anaconda3/2024.02-1

Dardel

miniconda3/24.7.1-0-cpeGNU-23.12

Kebnekaise

venv only

LUMI

venv, conda in container

Rackham

venv, conda/latest

Tetralith

Anaconda3/2024.02-1

LUMI

conda-containerize

Tip

  • Try with venv first

  • If very troublesome, try with conda

  • To use self-installed Python packages in a batch script, you also need to load the above mentioned modules and activate the environment. An example of this will follow later in the course.

  • To see which Python packages you, yourself, have installed, you can use pip list --user while the environment you have installed the packages in are active. To see all packages, use pip list.

Other tools perhaps covered in the future

  • pixi: package management tool for developers
    • It allows the developer to install libraries and applications in a reproducible way. Use pixi cross-platform, on Windows, Mac and Linux.

    • could replace conda/mamba

  • uv: An extremely fast Python package and project manager, written in Rust.
    • A single tool to replace pip, pip-tools, pipx, poetry, pyenv, twine, virtualenv, and more

Virtual environment - venv & virtualenv

With this tool you can download and install with pip from the PyPI repository

Typical workflow

  1. Start from a Python version you would like to use (load the module):
    • This step are different at different clusters since the naming is different

  2. Load the Python module you will be using, as well as any site-installed package modules (requires the --system-site-packages option later)
    • module load <python module>

The next points will be the same for all clusters

  1. Create the isolated environment with something like python -m venv <name-of-environment>
    • use the --system-site-packages to include all “non-base” packages

    • include the full path in the name if you want the environment to be stored other than in the “present working directory”.

  2. Activate the environment with source <path to virtual environment>/bin activate

Note

  • source can most often be replaced by ., like in . Example/bin/activate. Note the important <space> after .

  • For clarity we use the source style here.

  1. Install (or update) the environment with the packages you need with the pip install command
    • note that --user must be omitted: else the package will be installed in the global user folder.

  2. Work in the isolated environment - When activated you can always continue to add packages!

  3. Deactivate the environment after use with deactivate

Note

To save space, you should load any other Python modules you will need that are system installed before installing your own packages! Remember to choose ones that are compatible with the Python version you picked!

--system-site-packages includes the packages already installed in the loaded python module.

At HPC2N, NSC and LUNARC, you often have to load SciPy-bundle. This is how you on Tetralith (NSC) could create a venv (Example) with a SciPy-bundle included which is compatible with Python/3.11.5:

$ module load buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 # for NSC
$ python -m venv --system-site-packages Example

Draw-backs

  • Only works for Python environments

  • Only works with Python versions already installed

Example NSC

ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5
which python
python -V
cd /proj/hpc-python-spring-naiss/users/<username>
python -m venv env-matplotlib
source activate  env-matplotlib
pip install matplotlib
python
>>> import matplotlib

Note

  • You can use “pip list” on the command line (after loading the python module) to see which packages are available and which versions.

  • Some packaegs may be inhereted from the moduels yopu have loaded

  • You can do pip list --local to see what is installed by you in the environment.

  • Some IDE:s like Spyder may only find those “local” packages

Conda

  • Conda is an installer of packages but also bigger toolkits and is useful also for R packages and C/C++ installations.

  • Conda creates isolated environments not clashing with other installations of python and other versions of packages.

  • Conda environment requires that you install all packages needed by yourself.
    • That is, you cannot load the python module and use the packages therein inside you Conda environment.

Warning

  • Conda is known to create many small files. Your diskspace is not only limited in GB, but also in number of files (typically 300000 in $HOME).

  • Check your disk usage and quota limit

  • Do a conda clean -a once in a while to remove unused and unnecessary files

Tip

  • The conda environemnts inclusing many small files are by default stored in ~/.conda folder that is in your $HOME directory with limited storage.

  • Move your .conda directory to your project folder and make a soft link to it from $HOME

  • Do the following (mkdir -p ignores error output and will not recfreate anothe folder if it already exists):
    • (replace what is inside <> with relevant path)

  • Solution 1

    This works nicely if you have several projects. Then you can change these varables according to what you are currently working with.

export CONDA_ENVS_PATH="path/to/your/project/(subdir)"
export CONDA_PKG_DIRS="path/to/your/project/(subdir)"
  • Solution 2

    • This is not good if you have several projects.

$ mkdir -p ~/.conda
$ mv ~/.conda /<path-to-project-folder>/<username>/
$ ln -s /<path-to-project-folder>/<username>/.conda ~/.conda

Typical workflow

The first 2 steps are cluster dependent and will therefore be slightly different.

  1. Make conda available from a software module, like ml load conda or similar, or use own installation of miniconda or miniforge.

  2. First time

Next steps are the same for all clusters

  1. Create the conda environment conda create -n <name-of-env>

  2. Activate the conda environment by: source activate <conda-env-name>

    • You can define the packages to be installed here already.

    • If you want another Python version, you have to define it here, like:

  3. Install the packages with conda install ... or pip install ...

  4. Now do your work!

    • When activated you can always continue to add packages!

  5. Deactivate

conda deactivate

Conda base env

  • When conda is loaded you will by default be in the base environment, which works in the same way as other conda environments. It includes a Python installation and some core system libraries and dependencies of Conda. It is a “best practice” to avoid installing additional packages into your base software environment.

Warning

Install from file

  • All centers has had different approaches in what is included in the module system and not.

  • Therefore the solution to complete the necessary packages needed for the course lessons, different approaches has to be made.

  • This is left as exercise for you, see Exercise 4.

venv

pip install -r requirements.txt

conda

conda env create -f environment.yaml

Exercises

Exercise 0: Make a decision between venv or conda.

  • We recommend Conda for LUNARC.

  • We recommend venv for HPC2N

  • Otherwise there are some kind of documentation at all sites.

  • venv “should” work everywhere but has not been fully tested

Breakout room according to grouping

Exercise 2: Prepare the course environment

There will be a mix of conda and venv att all clusters except for HPC2N where all is venv

  1. Let’s make a Spyder installation in a conda environment

module load Miniforge/24.7.1-2-hpc1
export CONDA_PKG_DIRS=/proj/hpc-python-spring-naiss/$USER
export CONDA_ENVS_PATH=/proj/hpc-python-spring-naiss/$USER
mamba create -n spyder-env spyder
mamba activate spyder-env

If you do not have matplotlib already outside any virtual environment

  • Install matplotlib in your .local folder, not in a virtual environment.

  • Do:

ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5
pip install --user matplotlib
  • Check that matplotlib is there by pip list

We will put requirements files in the course project folder that you can build from in latter lessons

  • These will cover

    • TensorFlow

    • PyTorch

    • numba

(Optional) Exercise 3: Install package with venv

  • Choose a track below

  • Confirm package is absent

  • Create environment

  • Activate environment

  • Confirm package is absent

  • Install package in isolated environment

  • Confirm package is now present

  • Deactivate environment

  • Confirm package is now absent again

NOTE: since it may take up a bit of space if you are installing many Python packages to your isolated environment, we strongly recommend you place it in your project storage!

Create a venv. First load the python version you want to base your virtual environment on:

$ module load python/3.11.8
$ python -m venv --system-site-packages Example2

“Example2” is the name of the virtual environment. The directory “Example2” is created in the present working directory. The -m flag makes sure that you use the libraries from the python version you are using.

Note

  • source can most often be replaced by ., like in . Example/bin/activate. Note the important <space> after .

  • For clarity we use the source style here.

Install your packages with pip. While not always needed, it is often a good idea to give the correct versions you want, to ensure compatibility with other packages you use. This example assumes your venv is activated:

(Example) $ pip install --no-cache-dir --no-build-isolation numpy matplotlib

The --no-cache-dir" option is required to avoid it from reusing earlier installations from the same user in a different environment. The --no-build-isolation is to make sure that it uses the loaded modules from the module system when building any Cython libraries.

Deactivate the venv.

(Example) $ deactivate

Everytime you need the tools available in the virtual environment you activate it as above (after also loading the modules).

source /proj/<your-project-id>/<your-dir>/Example/bin/activate

(optional) 4. Make a test environment and spread (venv)

Read here

  1. make a virtual environment with the name venv1. Do not include packages from the the loaded module(s)

  2. activate

  3. install matplotlib

  4. make a requirements file of the content

  5. deactivate

  6. make another virtual environment with the name venv2

  7. activate that

  8. install with the aid of the requirements file

  9. check the content

  10. open python shell from command line and try to import matplotlib

  11. exit python

  12. deactivate

(optional) Exercise 4b. Make a test environment (conda)

(optional) Exercise 5: like 3, but for other tool (venv/conda)

Summary

Keypoints

  • With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.

  • Make it for each project you have for reproducibility.

  • There are different tools to create virtual environments.
    • venv, most straight-forward and available at all HPC centers. Recommended

    • conda, only recommended for personal use and at some clusters

See also