Install packages

Objectives

Learners can

  • work (create, activate, work, deactivate) with virtual environments

  • install a python package

  • export and import a virtual environment

Introduction

There are 2-3 ways to install missing python packages at a HPC cluster.

  • Local installation, always available for the version of Python you had active when doing the installation
    • pip install --user [package name]

  • Isolated environment. Use some packages just needed for a specific use case.
    • venv/virtualenv in combination with pip
      • recommended/working in all HPC centers in Sweden

    • conda
      • just recommended in some HPC centers in Sweden

Local (general installation)

Note

pip install --user [package name]

  • The package end up in ~/.local

  • target directory can be changed by --prefix=[root_folder of installation]

Isolated environments

As an example, maybe you have been using TensorFlow 1.x.x for your project and now you need to install a package that requires TensorFlow 2.x.x but you will still be needing the old version of TensorFlow for another package, for instance. This is easily solved with isolated environments.

Note

Isolated/virtual environments solve a couple of problems:

  • You can install specific, also older, versions into them.

  • You can create one for each project and no problem if the two projects require different versions.

  • You can remove the environment and create a new one, if not needed or with errors.

  • Isolated environments lets you create separate workspaces for different versions of Python and/or different versions of packages.

  • You can activate and deactivate them one at a time, and work as if the other workspace does not exist.

The tools

  • venv UPPMAX+HPC2N+LUNARC+NSC

  • virtualenv UPPMAX+HPC2N+LUNARC+NSC

  • Conda LUNARC + UPPMAX (recommended only for Bianca cluster)

Warning

About Conda on HPC systems

  • Conda is good in many ways but can interact negatively when trying to use the pytrhon modules in the HPC systems.

  • LUNARC seems to have working solutions

  • At UPPMAX Conda is installed but we have many users that get into problems.
    • However, on Bianca this is the most straight-forward way to install packages (no ordinary internet)

Virtual environment - venv & virtualenv

Workflow

  1. You load the Python module you will be using, as well as any site-installed package modules (requires the --system-site-packages option later)

  2. You create the isolated environment with something like venv, virtualenv (use the --system-site-packages to include all “non-base” packages)

  3. You activate the environment

  4. You install (or update) the environment with the packages you need

  5. You work in the isolated environment

  6. You deactivate the environment after use

venv vs. virtualenv

  • These are almost completely interchangeable

  • The difference being that virtualenv supports older python versions and has a few more minor unique features, while venv is in the standard library.

  • Step 1:
    • Virtualenv: virtualenv --system-site-packages Example

    • venv: python -m venv --system-site-packages Example2

  • Next steps are identical and involves “activating” and pip installs

  • We recommend venv in the course. Then we are just needing the Python module itself!

Keypoints

  • With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.

  • Make it for each project you have for reproducibility.

  • There are different tools to create virtual environments.
    • conda, only recommended for personal use and at some clusters

    • virtualenv, may require to load extra python bundle modules.

    • venv, most straight-forward and available at all HPC centers. Recommended

  • More details to follow!

Example

Tip

Do not type along!

Create a venv. First load the python version you want to base your virtual environment on:

$ module load python/3.11.8
$ python -m venv --system-site-packages Example2

“Example2” is the name of the virtual environment. The directory “Example2” is created in the present working directory. The -m flag makes sure that you use the libraries from the python version you are using.

Note

To save space, you should load any other Python modules you will need that are system installed before installing your own packages! Remember to choose ones that are compatible with the Python version you picked! --system-site-packages includes the packages already installed in the loaded python module.

At HPC2N, NSC and LUNARC, you often have to load SciPy-bundle. This is how you could create a venv (Example3) with a SciPy-bundle included which is compatible with Python/3.11.3:

$ module load GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 # for HPC2N and LUNAR
$ module load buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5 SciPy-bundle/2023.11 # for NSC
$ python -m venv --system-site-packages Example3

NOTE: since it may take up a bit of space if you are installing many Python packages to your virtual environment, we strongly recommend you place it in your project storage!

NOTE: if you need to for instance working with both Python 2 and 3, then you can of course create more than one virtual environment, just name them so you can easily remember which one has what.

If you want your virtual environment in a certain place…

  • Example for course project location and $USER being you user name.
    • If your directory in the project has another name, replace $USER with that one!

  • UPPMAX:
    • Create: python -m venv /proj/hpc-python-fall/$USER/Example

    • Activate: source /proj/hpc-python-fall/<user-dir>/Example/bin/activate

  • HPC2N:
    • Create: python -m venv /proj/nobackup/hpc-python-fall-hpc2n/$USER/Example

    • Activate: source /proj/nobackup/hpc-python-fall-hpc2n/<user-dir>/Example/bin/activate

  • LUNARC:
    • Create: python -m venv /lunarc/nobackup/projects/lu2024-17-44/$USER/Example

    • Activate: source /lunarc/nobackup/projects/lu2024-17-44/<user-dir>/Example/bin/activate

  • NSC:
    • Create: python -m venv /proj/hpc-python-fall-nsc/$USER/Example

    • Activate: source /proj/hpc-python-fall-nsc/<user-dir>/Example/bin/activate

Note that your prompt is changing to start with (Example) to show that you are within an environment.

Note

  • source can most often be replaced by ., like in . Example/bin/activate. Note the important <space> after .

  • For clarity we use the source style here.

Install packages to the virtual environment with pip

Tip

Do not type along!

Install your packages with pip. While not always needed, it is often a good idea to give the correct versions you want, to ensure compatibility with other packages you use. This example assumes your venv is activated:

(Example) $ pip install --no-cache-dir --no-build-isolation numpy matplotlib

The --no-cache-dir" option is required to avoid it from reusing earlier installations from the same user in a different environment. The --no-build-isolation is to make sure that it uses the loaded modules from the module system when building any Cython libraries.

Deactivate the venv.

(Example) $ deactivate

Everytime you need the tools available in the virtual environment you activate it as above (after also loading the modules).

source /proj/<your-project-id>/<your-dir>/Example/bin/activate

Note

  • You can use “pip list” on the command line (after loading the python module) to see which packages are available and which versions.

  • Some packaegs may be inhereted from the moduels yopu have loaded

  • You can do pip list --local to see what is instaleld by you in the environment.

  • Some IDE:s like Spyder may only find those “local” packages

Working with virtual environments defined from files

Creator/developer

  • First _create_ and _activate_ an environment (see above)

  • Install packages with pip

  • Create file from present virtual environment:

$ pip freeze > requirements.txt
  • That includes also the system site packages if you included them with --system-site-packages

  • Test that everything works by running use cases scripts within the environment

  • You can list packages specific for the virtualenv by pip list --local

  • So, creating a file from just the local environment:

$ pip freeze --local > requirements.txt

Note

requirements.txt (used by the virtual environment) is a simple text file which looks similar to this:

numpy
matplotlib
pandas
scipy

requirements.txt with versions that could look like this:

numpy==1.20.2
matplotlib==3.2.2
pandas==1.1.2
scipy==1.6.2
  • Deactivate

User

  • Create an environment based on dependencies given in an environment file

  • This can be done in new virtual environment or as a genera installtion locally (not activating any environment

pip install -r requirements.txt
  • Check

pip list

Summary of workflow

In addition to loading Python, you will also often need to load site-installed modules for Python packages, or use own-installed Python packages. The work-flow would be something like this:

  1. Load Python and prerequisites: module load <pre-reqs> Python/<version>

  2. Load site-installed Python packages (optional): module load <pre-reqs> <python-package>/<version>

  3. Create the virtual environment: python -m venv [PATH]/Example

  4. Activate your virtual environment: source <path-to-virt-env>/Example/bin/activate

  5. Install any extra Python packages: pip install --no-cache-dir --no-build-isolation <python-package>

  6. Start Python or run python script: python

  7. Do your work

  8. Deactivate

  • Installed Python modules (modules and own-installed) can be accessed within Python with import <package> as usual.

  • The command pip list given within Python will list the available modules to import.

  • More about packages and virtual/isolated environment to follow in later sections of the course!

Exercises

1. Make a test environment

  1. make a virtual environment with the name venv1. Do not include packages from the the loaded module(s)

  2. activate

  3. install matplotlib

  4. make a requirements file of the content

  5. deactivate

  6. make another virtual environment with the name venv2

  7. activate that

  8. install with the aid of the requirements file

  9. check the content

  10. open python shell from command line and try to import

  11. exit python

  12. deactivate

Prepare fore the course environments

Note

  • All centers has had different approaches in what is included in the module system and not.

  • Therefore the solution to complete the necessary packages needed for the course lessons, different approaches has to be made.

  • This is left as exercise for you

We will need to install the LightGBM Python package for one of the examples in the ML section.

Tip

Follow the track where you are working right now

Create a virtual environment called vpyenv. First load the python version you want to base your virtual environment on, as well as the site-installed ML packages.

If you do not have matplotlib already outside any virtual environment

  • Install matplotlib in your .local folder, not in a virtual environment.

  • Do:

ml buildtool-easybuild/4.8.0-hpce082752a2 GCC/13.2.0 Python/3.11.5
pip install --user matplotlib
  • Check that matplotlib is there by pip list

Check were to find environments needed for the lessons in the afternoon tomorrow

  • browse /proj/hpc-python-fall-nsc/ to see the available environments.

  • their names are
    • venvNSC-TF

    • venvNSC-torch

    • venvNSC-numba

    • venv-spyder-only

Note

  • To use self-installed Python packages in a batch script, you also need to load the above mentioned modules and activate the environment. An example of this will follow later in the course.

  • To see which Python packages you, yourself, have installed, you can use pip list --user while the environment you have installed the packages in are active. To see all packages, use pip list.

See also

Keypoints

  • With a virtual environment you can tailor an environment with specific versions for Python and packages, not interfering with other installed python versions and packages.

  • Make it for each project you have for reproducibility.

  • There are different tools to create virtual environemnts.

    • UPPMAX has conda and venv and virtualenv

    • HPC2N has venv and virtualenv