Packages

Python modules AKA Python packages

  • Python packages broaden the use of python to almost infinity!

  • Instead of writing code yourself there may be others that have done the same!

  • Many scientific tools are distributed as python packages, making it possible to run a script in the prompt and there define files to be analysed and arguments defining exactly what to do.

  • A nice introduction to packages can be found here: Python for scientific computing

Questions

  • How do I find which packages and versions are available?

  • What to do if I need other packages?

  • Are there differences between HPC2N and UPPMAX?

Objectives

  • Show how to check for Python packages

  • show how to install own packages on the different clusters

There are two package installation systems

  • PyPI (pip) is traditionally for Python-only packages but it is no problem to also distribute packages written in other languages as long as they provide a Python interface.

  • Conda (conda) is more general and while it contains many Python packages and packages with a Python interface, it is often used to also distribute packages which do not contain any Python (e.g. C or C++ packages).

    • Creates its own environment that does not interact with other python installations

    • At HPC2N, Conda is not recommended, and we do not support it there

  • Many libraries and tools are distributed in both ecosystems.

Check current available packages

General for both centers

Some python packages are working as stand-alone tools, for instance in bioinformatics. The tool may be already installed as a module. Check if it is there by:

$ module spider <tool-name or tool-name part>

Using module spider lets you search regardless of upper- or lowercase characters and regardless of already loaded modules (like GCC on HPC2N and bioinfo-tools on UPPMAX).

Check the pre-installed packages of a specific python module:

$ module help python/<version>

Check the pre-installed packages of a loaded python module, in shell:

$ pip list

To see which Python packages you, yourself, has installed, you can use pip list --user while the environment you have installed the packages in are active.

You can also test from within python to make sure that the package is not already installed:

>>> import <package>

Does it work? Then it is there! Otherwise, you can either use pip or conda.

NOTE: at HPC2N, the available Python packages needs to be loaded as modules before using! See a list of some of them below, under the HPC2N tab or find more as mentioned above, using module spider -r ....

A selection of the Python packages and libraries installed on UPPMAX and HPC2N are give in extra reading: UPPMAX clusters and Kebnekaise cluster

  • The python application at UPPMAX comes with several preinstalled packages.

  • You can check them here: UPPMAX packages.

  • In addition there are packages available from the module system as python tools/packages

  • Note that bioinformatics-related tools can be reached only after loading bioinfo-tools.

  • Two modules contains topic specific packages. These are:

    • Machine learning: python_ML_packages (cpu and gpu versions and based on python/3.9.5)

    • GIS: python_GIS_packages (cpu vrson based on python/3.10.8)

Install with pip

You use pip this way, in a Linux shell OR a python shell:

$ pip install --user <package>

Use pip3 if you loaded python3.

Then the package ends up in ~/.local/lib/python<version>/site-packages/ .

Note that python<version> is omitting the last number (bug fix), like 3.8 for python-3.8.7. We HIGHLY recommend using a virtual environment during installation, since this makes it easier to install for different versions of Python. More information will follow later in this course in isolated environements.

Note

  • We recommend that you always install with pip in an isolated environment unless you think that the package will be useful for all you projects

  • You will test this in the session about isolated environments in a while.

FAQ:s

When to use pip install and when to use module load command?

  1. check if package is available in the Python module or a site-installed module.

  2. If not, use pip

Comment: We recommend that you use pip install in an isolated environment, using virtualenv or venv, see next session.

Keypoints

  • You can check for packages

    • from the Python shell with the import command

    • from BASH shell with the

      • pip list command at both centers

      • ml help python/3.9.5 at UPPMAX

  • Installation of Python packages can be done either with PYPI or Conda

  • You install own packages with the pip install command (This is the recommended way on HPC2N)

  • At UPPMAX Conda is also available (See Conda section)

Conda

Questions

  • What does Conda do?

  • How to create a Conda environment

Objectives

  • Learn pros and cons with Conda

  • Learn how to install packages and work with the Conda (isolated) environment

Hint

  • On Bianca (with no internet), Conda is the first choice when installing packages, because there is a local mirror of most of the Conda repositories. - Check the On Bianca Cluster extra reading for more info.

Conda cheat sheet

  • List packages in present environment: conda list

  • List all environments: conda info -e or conda env list

  • Install a package: conda install somepackage

  • Install from certain channel (conda-forge): conda install -c conda-forge somepackage

  • Install a specific version: conda install somepackage=1.2.3

  • Create a new environment: conda create --name myenvironment

  • Create a new environment from requirements.txt: conda create --name myenvironment --file requirements.txt

  • On e.g. HPC systems where you don’t have write access to central installation directory: conda create –prefix /some/path/to/env``

  • Activate a specific environment: conda activate myenvironment

  • Deactivate current environment: conda deactivate

Note

We have mirrored all major conda repositories directly on UPPMAX, on both Rackham and Bianca. These are updated every third day. We have the following channels available:

  • bioconda

  • biocore

  • conda-forge

  • dranew

  • free

  • main

  • pro

  • qiime2

  • r

  • r2018.11

  • scilifelab-lts

You reach them all by loading the conda module. You don’t have to state the specific channel when using UPPMAX. Otherwise you do with conda -c <channel> ...

Tip

There will be an exercise in the end!

  1. First load our conda module (there is no need to install you own miniconda, for instance)

$ module load conda
  • This grants you access to the latest version of Conda and all major repositories on all UPPMAX systems.

  • Check the text output as conda is loaded, especially the first time, see below

Conda load output :class: dropdown

  • The variable CONDA_ENVS_PATH contains the location of your environments. Set it to your project’s environments folder if you have one.

  • Otherwise, the default is ~/.conda/envs.

  • You may run source conda_init.sh to initialise your shell to be able to run conda activate etc.

  • Just remember that this command adds stuff to your shell outside the scope of the module system.

  • REMEMBER TO conda clean -a once in a while to remove unused and unnecessary files

  1. First time

  • The variable CONDA_ENVS_PATH contains the location of your environments. Set it to your project’s environments folder if you have one.

  • Otherwise, the default is ~/.conda/envs.

  • Example:

    $ export CONDA_ENVS_PATH=/proj/<your-project-id>/nobackup/<username>
    
  • When conda is loaded you will by default be in the base environment, which works in the same way as other conda environments. include a Python installation and some core system libraries and dependencies of Conda. It is a “best practice” to avoid installing additional packages into your base software environment.

  1. Create the conda environment

  • Example:

    $ conda create --name python36-env python=3.6 numpy=1.13.1 matplotlib=2.2.2
    
  1. Activate the conda environment by:

    source activate python36-env
    
    • You will see that your prompt is changing to start with (python-36-env) to show that you are within an environment.

    • If you set up your shell with source conda_init.sh you can use conda activate python-36-env instead.

  2. Now do your work!

  3. Deactivate

    (python-36-env) $ conda deactivate
    

    Notre that source deactivate will not work but conda deactivate.

Warning

  • Conda is known to create many small files. Your diskspace is not only limited in GB, but also in number of files (typically 300000 in $home).

  • Check your disk usage and quota limit with uquota

  • Do a conda clean -a once in a while to remove unused and unnecessary files

FAQ:s

  1. I get “Your shell has not been properly configured to use conda activate.” When I try to conda activate python36-env (which I can see in condo env list) - Try with source activate ... `` - You may make ``conda activate functioning by the source conda_init.sh and choosing your shell. - But some of you may not want to fill your .bashrc with too much “junk”. This could be like always starting the the base conda environment at startup, which may not be what you want. Therefore source activate may be preferable for you!

  2. What is the difference between conda activate and source activate - They will do the same! Se above for details.

Note

  • source can most often be replaced by ., like in . activate python36-env. Note the important <space> after .

  • For clarity we use the source style here

  • Create an environment based on dependencies given in an environment file:

$ conda env create --file environment.yml
  • Create file from present conda environment:

$ conda env export --from-history > environment.yml
  • Create file from an unactivated conda environment:

$ conda env export --from-history --name <env-name> > environment.yml

environments.yml (for conda) is a yaml-file which looks like this:

name: my-environment
channels:
  - defaults
dependencies:
  - numpy
  - matplotlib
  - pandas
  - scipy

environments.yml with versions:

name: my-environment
channels:
  - defaults
dependencies:
  - python=3.7
  - numpy=1.18.1
  - matplotlib=3.1.3
  - pandas=1.1.2
  - scipy=1.6.2

More on dependencies

Exercises

UPPMAX: Create a conda environment and install some packages

  • First check the current installed packages while having python/3.9.5 loaded

  • Open a new terminal and have the old one available for later comparison

  • Use the conda module on Rackham and create an environment with name HPC-python23 with python 3.7 and numpy 1.15

    • Use your a path for CONDA_ENVS_PATH of your own choice or /proj/naiss2023-22-1126/<user>

    • (It may take a minute or so)

  • Activate!

  • Check with pip list what is there. Compare with the environment given from the python module in the first terminal window.

    • Which version of Python did you get?

  • Don’t forget to deactivate the Conda environment before doing other exercises!

Keypoints

  • Conda is an installer of packages but also bigger toolkits

  • Conda creates isolated environments (see next section) not clashing with other installations of python and other versions of packages

  • Conda environment requires that you install all packages needed by yourself.

    • That is, you cannot load the python module and use the packages therein inside you Conda environment.

  • Also, do not rely on the python module at UPPMAX at the same time as you have Conda in the back- or foreground.