Load and run python and use packages
- At both UPPMAX and HPC2N we call the applications available via the module system modules.
Objectives
Show how to load Python
Show how to run Python scripts and start the Python command line
Short cheat sheet
See which modules exists:
module spider
orml spider
Find module versions for a particular software:
module spider <software>
Modules depending only on what is currently loaded:
module avail
orml av
See which modules are currently loaded:
module list
orml
Load a module:
module load <module>/<version>
orml <module>/<version>
Unload a module:
module unload <module>/<version>
orml -<module>/<version>
More information about a module:
module show <module>/<version>
orml show <module>/<version>
Unload all modules except the ‘sticky’ modules:
module purge
orml purge
Warning
Note that the module systems at UPPMAX and HPC2N are slightly different.
While all modules at UPPMAX not directly related to bio-informatics are shown by
ml avail
, modules at HPC2N are hidden until one has loaded a prerequisite like the compilerGCC
.
For reproducibility reasons, you should always load a specific version of a module instead of just the default version
Many modules have prerequisite modules which needs to be loaded first (at HPC2N this is also the case for the Python modules). When doing
module spider <module>/<version>
you will get a list of which other modules needs to be loaded first
Check for Python versions
Tip
Type along!
Check all available Python versions with:
$ module avail python
Check all available version Python versions with:
$ module spider Python
To see how to load a specific version of Python, including the prerequisites, do
$ module spider Python/<version>
Example for Python 3.9.5
$ module spider Python/3.9.5
Output at UPPMAX as of Nov 30 2023
-------------------------------------- /sw/mf/rackham/applications --------------------------------------- python_GIS_packages/3.10.8 python_ML_packages/3.9.5-gpu (D) python_ML_packages/3.9.5-cpu wrf-python/1.3.1 --------------------------------------- /sw/mf/rackham/compilers ---------------------------------------- python/2.7.6 python/3.3 python/3.6.0 python/3.9.5 python3/3.7.2 python/2.7.6 python/3.3.1 python/3.7.2 python3/3.6.0 python3/3.10.8 python/2.7.9 python/3.4.3 python/3.8.7 python3/3.6.8 python3/3.11.4 (D) python/2.7.11 python/3.5.0 python/3.9.5 python3/3.7.2 python/2.7.15 python/3.6.0 python/3.10.8 python3/3.8.7 python/3.3 python/3.6.8 python/3.11.4 (D) python3/3.9.5 Where: D: Default Module Use module spider" to find all possible modules and extensions. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
Output at HPC2N as of 30 Nov 2023
b-an01 [~]$ module spider Python ---------------------------------------------------------------------------- Python: ---------------------------------------------------------------------------- Description: Python is a programming language that lets you work more quickly and integrate your systems more effectively. Versions: Python/2.7.15 Python/2.7.16 Python/2.7.18-bare Python/2.7.18 Python/3.7.2 Python/3.7.4 Python/3.8.2 Python/3.8.6 Python/3.9.5-bare Python/3.9.5 Python/3.9.6-bare Python/3.9.6 Python/3.10.4-bare Python/3.10.4 Python/3.11.3 Other possible modules matches: Biopython Boost.Python GitPython IPython flatbuffers-python ... ---------------------------------------------------------------------------- To find other possible module matches execute: $ module -r spider '.*Python.*' ---------------------------------------------------------------------------- For detailed information about a specific "Python" package (including how to load the modules) use the module's full name. Note that names that have a trailing (E) are extensions provided by other modules. For example: $ module spider Python/3.9.5 ----------------------------------------------------------------------------
Load a Python module
For reproducibility, we recommend ALWAYS loading a specific module instad of using the default version!
For this course, we recommend using Python 3.11.x (except for some GPU examples that will use 3.9.5).
Tip
Type along!
Go back and check which Python modules were available. To load version 3.11.8, do:
$ module load python/3.11.8
Note: Lowercase p
.
For short, you can also use:
$ ml python/3.11.8
To load Python version 3.11.3, do:
$ module load GCC/12.3.0 Python/3.11.3
Note: Uppercase P
.
For short, you can also use:
$ ml GCC/12.3.0 Python/3.11.3
Warning
UPPMAX: Don’t use system-installed python (2.7.5)
UPPMAX: Don’t use system installed python3 (3.6.8)
HPC2N: Don’t use system-installed python (2.7.18)
HPC2N: Don’t use system-installed python3 (3.8.10)
ALWAYS use python module
Why are there both Python/2.X.Y and Python/3.Z.W modules?
Some existing software might use Python2 and some will use Python3. Some of the Python packages have both Python2 and Python3 versions. Check what your software as well as the installed modules need when you pick!
UPPMAX: Why are there both python/3.X.Y and python3/3.X.Y modules?
Sometimes existing software might use python2 and there’s nothing you can do about that. In pipelines and other toolchains the different tools may together require both python2 and python3. Here’s how you handle that situation:
You can run two python modules at the same time if ONE of the module is
python/2.X.Y
and the other module ispython3/3.X.Y
(notpython/3.X.Y
).
Run
Run Python script
Hint
There are many ways to edit your scripts.
If you are rather new.
Graphical:
$ gedit <script> &
(
&
is for letting you use the terminal while editor window is open)Requires ThinLinc or
ssh -Y ...
orssh -X
Terminal:
$ nano <script>
Otherwise you would know what to do!
- ⚠️ The teachers may use their common editor, like
vi
/vim
If you get stuck, press:
<esc>
and then:q
!
- ⚠️ The teachers may use their common editor, like
Type-Along
Let’s make a script with the name
example.py
$ nano example.py
Insert the following text
# This program prints Hello, world!
print('Hello, world!')
Save and exit. In nano:
<ctrl>+O
,<ctrl>+X
You can run a python script in the shell like this:
$ python example.py
# or
$ python3 example.py
Warning
ONLY run jobs that are short and/or do not use a lot of resources from the command line.
Otherwise use the batch system (see the batch session)
Run an interactive Python shell
You can start a simple python terminal by:
$ python
Example
>>> a=3
>>> b=7
>>> c=a+b
>>> c
10
Exit Python with <Ctrl-D>,
quit()
orexit()
in the python prompt
>>> <Ctrl-D>
>>> quit()
>>> exit()
For more interactiveness you can run Ipython.
Tip
Type along!
NOTE: remember to load a python module first. Then start IPython from the terminal
$ ipython
or
$ ipython3
UPPMAX has also jupyter-notebook
installed and available from the loaded Python module. Start with
$ jupyter-notebook
You can decide on your own favorite browser and add --no-browser
and open the given URL from the output given.
From python/3.10.8 and forward, also jupyterlab is available.
NOTE: remember to load an IPython module first. You can see possible modules with
$ module spider IPython
And load one of them (here 7.25.0) with
..code-block:: console
$ ml IPython/7.25.0
Then start Ipython with (lowercase):
$ ipython
HPC2N also has JupyterLab
installed. It is available as a module, but the process of using it is somewhat involved. We will cover it more under the session on <a href=”https://uppmax.github.io/HPC-python/interactive.html”>Interactive work on the compute nodes</a>. Otherwise, see this tutorial:
Exit IPython with <Ctrl-D>,
quit()
orexit()
in the python prompt
iPython
In [2]: <Ctrl-D>
In [12]: quit()
In [17]: exit()
Packages/Python modules
Python modules AKA Python packages
Python packages broaden the use of python to almost infinity!
Instead of writing code yourself there may be others that have done the same!
Many scientific tools are distributed as python packages, making it possible to run a script in the prompt and there define files to be analysed and arguments defining exactly what to do.
A nice introduction to packages can be found here: Python for scientific computing
Questions
How do I find which packages and versions are available?
What to do if I need other packages?
Are there differences between HPC2N and UPPMAX?
Objectives
Show how to check for Python packages
show how to install own packages on the different clusters
Check current available packages
General for both centers
Some python packages are working as stand-alone tools, for instance in bioinformatics. The tool may be already installed as a module. Check if it is there by:
$ module spider <tool-name or tool-name part>
Using module spider
lets you search regardless of upper- or lowercase characters and regardless of already loaded modules (like GCC
on HPC2N and bioinfo-tools
on UPPMAX).
Check the pre-installed packages of a specific python module:
$ module help python/<version>
At HPC2N, a way to find Python packages that you are unsure how are names, would be to do
$ module -r spider ’.*Python.*’
or
$ module -r spider ’.*python.*’
Do be aware that the output of this will not just be Python packages, some will just be programs that are compiled with Python, so you need to check the list carefully.
Check the pre-installed packages of a loaded python module, in shell:
$ pip list
To see which Python packages you, yourself, has installed, you can use pip list --user
while the environment you have installed the packages in are active.
You can also test from within python to make sure that the package is not already installed:
>>> import <package>
Does it work? Then it is there!
Otherwise, you can either use pip
or conda
.
Check packages (5 min)
See if the following packages are installed. Use python version
3.11.8
on Rackham and3.11.3
on Kebnekaise (remember: the Python module on kebnekaise has a prerequisite).numpy
mpi4py
distributed
multiprocessing
time
dask
Solution
- Rackham has for ordinary python/3.9.5 module already installed:
numpy
✅pandas
✅mpi4py
❌distributed
❌multiprocessing
✅ (standard library)time
✅ (standard library)dask
✅
- Kebnekaise has for ordinary Python/3.11.3 module already installed:
numpy
❌pandas
❌mpi4py
❌distributed
❌multiprocessing
✅ (standard library)time
✅ (standard library)dask
❌
See next session how to find more pre-installed packages!
NOTE: at HPC2N, the available Python packages needs to be loaded as modules before using! See a list of some of them below, under the HPC2N tab or find more as mentioned above, using module spider -r ...
A selection of the Python packages and libraries installed on UPPMAX and HPC2N are give in extra reading: UPPMAX clusters and Kebnekaise cluster
The python application at UPPMAX comes with several preinstalled packages.
You can check them here: UPPMAX packages.
In addition there are packages available from the module system as python tools/packages
Note that bioinformatics-related tools can be reached only after loading
bioinfo-tools
.Two modules contains topic specific packages. These are:
Machine learning:
python_ML_packages
(cpu and gpu versions and based on python/3.9.5)GIS:
python_GIS_packages
(cpu version based on python/3.10.8)
The python application at HPC2N comes with several preinstalled packages - check first before installing yourself!.
HPC2N has both Python 2.7.x and Python 3.x installed.
We will be using Python 3.x in this course. For this course, the recommended version of Python to use on Kebnekaise is 3.9.5
NOTE: HPC2N do NOT recommend (and do not support) using Anaconda/Conda on our systems. You can read more about this here: Anaconda.
This is a selection of the packages and libraries installed at HPC2N. These are all installed as modules and need to be loaded before use.
ASE
Keras
PyTorch
SciPy-bundle
(Bottleneck, deap, mpi4py, mpmath, numexpr, numpy, pandas, scipy - some of the versions have more)TensorFlow
Theano
matplotlib
scikit-learn
scikit-image
pip
iPython
Cython
Flask
Exercises
This is an exercise that combines loading, running, and using site-installed packages. Later, during the batch session, we will look at running the same exercise, but as a batch job. There is also a follow-up exercise of an extended version of the script, if you want to try run that as well (see further down on the page).
Note
You need the data-file scottish_hills.csv
which can be found in the directory Exercises/examples/programs
. If you have cloned the git-repo for the course, or copied the tar-ball, you should have this directory. The easiest thing to do is just change to that directory and run the exercise there.
Since the exercise opens a plot, you need to login with ThinLinc (or otherwise have an x11 server running on your system and login with ssh -X ...
).
The exercise is modified from an example found on https://ourcodingclub.github.io/tutorials/pandas-python-intro/.
Warning
Not relevant if using UPPMAX. Only if you are using HPC2N!
You need to also load Tkinter. Use this:
ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2 Tkinter/3.11.3
In addition, you need to add the following two lines to the top of your python script/run them first in Python:
import matplotlib
matplotlib.use('TkAgg')
Python example with packages pandas and matplotlib
We are using Python version 3.11.x
. To access the packages pandas
and matplotlib
, you may need to load other modules, depending on the site where you are working.
Here you only need to load the
python
module, as the relevant packages are included (as long as you are not using GPUs, but that is talked about later in the course). Thus, you just do:
ml python/3.11.8
On Kebnekaise you also need to load SciPy-bundle
and matplotlib
(and their prerequisites). These versions will work well together:
ml GCC/12.3.0 Python/3.11.3 SciPy-bundle/2023.07 matplotlib/3.7.2
From inside Python/interactive (if you are on Kebnekaise, mind the warning above):
Start python and run these lines:
import pandas as pd
import matplotlib.pyplot as plt
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
plt.scatter(x, y)
plt.show()
If you change the last line to
plt.savefig("myplot.png")
then you will instead get a filemyplot.png
containing the plot. This is what you would do if you were running a python script in a batch job.As a Python script (if you are on Kebnekaise, mind the warning above):
Copy and save this script as a file (or just run the file
pandas_matplotlib-<system>.py
that is located in the<path-to>/Exercises/examples/programs
directory you got from the repo or copied. Where <system> is eitherrackham
orkebnekaise
.import pandas as pd import matplotlib.pyplot as plt dataframe = pd.read_csv("scottish_hills.csv") x = dataframe.Height y = dataframe.Latitude plt.scatter(x, y) plt.show()
import pandas as pd import matplotlib import matplotlib.pyplot as plt matplotlib.use('TkAgg') dataframe = pd.read_csv("scottish_hills.csv") x = dataframe.Height y = dataframe.Latitude plt.scatter(x, y) plt.show()
If you have time, you can also try and run these extended versions, which also requires the scipy
packages (included with python at UPPMAX and with the same modules loaded as for pandas
for HPC2N):
Python example that requires pandas
, matplotlib
, and scipy
packages.
You can either save the scripts or run them line by line inside Python. The scripts are also available in the directory <path-to>/Exercises/examples/programs
, as pandas_matplotlib-linreg.py
and pandas_matplotlib-linreg-pretty.py
.
NOTE that there are separate versions for rackham and kebnekaise and that you for kebnekaise need to again add the same lines as mentioned under the warning before the previous exercise.
Remember that you also need the data file scottish_hills.csv
located in the above directory.
Examples are from https://ourcodingclub.github.io/tutorials/pandas-python-intro/
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
stats = linregress(x, y)
m = stats.slope
b = stats.intercept
plt.scatter(x, y)
plt.plot(x, m * x + b, color="red") # I've added a color argument here
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import linregress
dataframe = pd.read_csv("scottish_hills.csv")
x = dataframe.Height
y = dataframe.Latitude
stats = linregress(x, y)
m = stats.slope
b = stats.intercept
# Change the default figure size
plt.figure(figsize=(10,10))
# Change the default marker for the scatter from circles to x's
plt.scatter(x, y, marker='x')
# Set the linewidth on the regression line to 3px
plt.plot(x, m * x + b, color="red", linewidth=3)
# Add x and y lables, and set their font size
plt.xlabel("Height (m)", fontsize=20)
plt.ylabel("Latitude", fontsize=20)
# Set the font size of the number lables on the axes
plt.xticks(fontsize=18)
plt.yticks(fontsize=18)
plt.show()
Keypoints
Before you can run Python scripts or work in a Python shell, first load a python module and probable prerequisites
Start a Python shell session either with
python
oripython
Run scripts with
python3 <script.py>
You can check for packages
from the Python shell with the
import
commandfrom BASH shell with the
pip list
command at both centersml help python/<version>
at UPPMAX
Installation of Python packages can be done either with PYPI or Conda
You install own packages with the
pip install
command (This is the recommended way on HPC2N)At UPPMAX Conda is also available (See Conda section)