Managing environments and packages

Learning outcomes for this session

  • How to work with Julia’s environment and package management.
  • How to check for and use site-installed packages, if any.

Your background?

  • Have you used Python?
  • Have you used Matlab? Were you at yesterdays Matlab sessions?
  • Have you used Git?
  • Are you completely new to this, code packages and version control?

Key points

  • Environments
    • Project
    • Version control
    • Nested, base, tools not part of the project, use with caution
  • Packages

Introduction

Packages are pieces of software to be used by your scripts or interactive sessions as-is or modified, stand-alone or—most often—in combinations to make a full toolkit customized to your task. Such a toolkit may itself become a new package. Packages allow us to cooperate on code that can be useful in several places.

Environments are sets of packages that are available for use simultaneously and, if all is right, work together.

If you come from Python, you have likely seen that there are several different ways to deal with environments and package management in this language, for instance, conda and pip. The situation can be described as several ecosystems of packages. You may have come across the term “dependency hell” or even had a taste (or perhaps far more than you could stomach) of this yourself.

Julia comes with a great system for environments and packages included. As a result there is essentially a single ecosystem of packages and “dependency hell” is as close to eliminated as it can be.

Julia environments are defined by two TOML (Tom’s Obvious Minimal Language) files, Project.toml and Manifest.toml. The former specifies which packages you’ve asked for, the latter specifies all packages loaded including their exact versions. Include both of these files (Project.toml all the time, Manifest.toml at points for which you want exact reproducibility) in your version control and you get traceable, reproducible environments with minimal effort!

Getting started

To get started we will go to the Julia documentation and make use of the official tutorial on the Julia package manager: https://docs.julialang.org/en/v1/stdlib/Pkg/#Pkg

Summary of what we’ve seen so far

  • Get to the Pkg REPL by pressing ] in the Julia REPL
  • In the Pkg REPL, use the following commands:
    • ? to get help
    • activate to activate an existing or new environment
    • st (alias for status) to see which packages have been added to the active environment
    • add to add packages to the active environment, this may or may not have download and precompilation steps that take a bit of time
    • rm (alias for remove) to remove packages from the active environment
    • up (alias for update) to update packages in the active environment
  • Project.toml stores what packages you’ve asked for
  • Manifest.toml stores how Pkg resolved this, with all dependencies and exact versions

Environment stacking

When you have not activated any specific environment, the active environment is your personal base environment for the Julia version you’re currently running (called e.g. @v1.11). This is normally reachable in addition to your active environment. This can be a convenient way to access development tools that you want active but your project does not depend on.

If you have tools available through your base environment and you need to check that your project can be reproduced properly without your base environment, you’ll want to read this:

Loading project environment only

The full stack of reachable environments are defined by the global Julia variable LOAD_PATH:

julia>LOAD_PATH
3-element Vector{String}:
"@"
"@v#.#"
"@stdlib"

where @ is the active environment, @v#.# is your base environment for the Julia version in use, and @stdlib is the standard library.

To include just the current environment we can modify the LOAD_PATH variable from the julian prompt with the following functions:

julia> empty!(LOAD_PATH)        # this will clean out the path
julia> push!(LOAD_PATH, "@")    # this will add the current environment

Choosing package versions

] add <package> @<version>

Example:

] add DataFrames @1.1

But where are the packages installed?

Each environment only consists of two files, the code is not here. Packages will be installed to and loaded from paths defined by the global Julia variable DEPOT_PATH. In a Julia installation on the clusters this will by default be set to:

~/.julia where ~ is the user home as appropriate on the system.

If Julia is installed centrally

On a local system like a personal linux computer or if Julia would have been installed at a “system level” and not i a module system

  1. an architecture-specific shared system directory, e.g. /usr/local/share/julia;
  2. an architecture-independent shared system directory, e.g. /usr/share/julia.

Packages can consist of relatively many files and some clusters have a limit on the number of files you can have in your home directory. If this becomes a problem, you can set the DEPOT_PATH to a Project folder

Non-interactive use of Pkg

We’ve seen how to use Pkg interactively, with its special REPL mode. All the functions of Pkg can also be used by loading the Pkg module and calling its functions just like any other module. On the clusters, this is especially useful for activating or even setting up an environment from inside a Julia script, so that you can run it in a batch job. For example: activate the environment of the current folder by calling these two lines in your .jl file:

    using Pkg
    Pkg.activate(".")

Besides the previous two options for activating an environment, you can also activate it on the Linux command line (assuming that you are located in the environment directory):

julia --project=.
Bianca
  • At Bianca there is a central library with installed packages.

  • You may control the present “central library” by typing ml help julia/<version> in the BASH shell.

  • A possibly more up-to-date status can be found from the Julia shell:
julia> using Pkg
julia> Pkg.activate(DEPOT_PATH[2]*"/environments/v1.8");     #change version (1.8) accordingly if you have another main version of Julia
julia> Pkg.status()
julia> Pkg.activate(DEPOT_PATH[1]*"/environments/v1.8");     #to return to user library

A selection of the Julia packages and libraries installed on Bianca are:

- BenchmarkTools
- CSV
- CUDA
- MPI
- Distributed
- IJulia
- Plots
- PyPlot
- Gadfly
- DataFrames
- DistributedArrays
- PlotlyJS

Site-installed packages in environments

At Bianca the central environment adds to the environment stack:

julia> LOAD_PATH
4-element Vector{String}:
  "@"
  "@v#.#"
  "@stdlib"
  "/sw/comp/julia/1.8.5/rackham/lib/glob_pkg/environments/v1.8"

Bianca Intermediate workshop

Exercises

  • We need the packages IJulia and Pluto for running the integrated development environments (IDEs) Jupyter and Pluto.
  • We also need the MPI package on Friday.

  • Make these exercises be the installation of the packages that we will later use.

  • It might be advisable to install IJulia and Pluto in separate environments.

Challenge 1. Install Pluto

  • It may take 5-10 minutes or so.
  • This you can do in an ordinary terminal
$ ml julia/1.10.2-bdist
$ julia

Note: not fully tested successfully, but this step works

$ ml PDC/23.12 julia/1.10.2-cpeGNU-23.12
$ julia
 $ module load julia/1.8.5
$ module load Julia/1.10.9-LTS-linux-x86_64
$ module load GCCcore/13.2.0  Julia/1.9.3-linux-x86_64
$ julia

In Julia for all clusters (output may differ for different clusters and Julia versions):

    shell> mkdir pluto-env
    shell> cd pluto-env
    (@v1.10) pkg> activate .
      Activating new project at `path-to-folder\pluto-env`
    (pluto-env) pkg> add Pluto
    (pluto-env) pkg> status
            Status `path-to-folder\pluto-env\Project.toml`
            [c3e4b0f8] Pluto v0.20.19

2. Install IJulia

  • This is done only once, but for each combination of Julia you would like to use.
  • Also Python must be loaded
  • It may take 5-10 minutes or so.
  • This you can do in an ordinary terminal (book an interactive session, for safety)
$ ml Python/3.11.5-env-hpc1-gcc-2023b-eb
$ ml julia/1.10.2-bdist
$ julia
$ ml PDC/23.12 julia/1.10.2-cpeGNU-23.12
$ ml cray-python/3.11.5
$ julia
$ module load julia/1.8.5
$ module load python/3.9.5
$ julia
$ module load Julia/1.10.9-LTS-linux-x86_64
$ module load JupyterLab/4.2.5-GCCcore-13.3.0
$ julia
$ module load GCC/13.2.0  JupyterLab/4.2.0
$ module load Julia/1.8.5-linux-x86_64
$ julia

In Julia for all clusters (output may differ for different clusters and Julia versions):

    shell> mkdir jupyter-env
    shell> cd jupyter-env
    (@v1.10) pkg> activate .
      Activating new project at `path-to-folder\jupyter-env`
    (jupyter-env) pkg> add IJulia
    (jupyter-env) pkg> status
            Status `path-to-folder\jupyter-env\Project.toml`
            [7073ff75] IJulia v1.27.0

Challenge 3. Required package for parallel jobs

In order to use MPI with Julia you will need to follow the next steps (only the first time):

# Load the tool chain which contains a MPI library
$ ml gcc/11.3.0 openmpi/4.1.3
# Load Julia
$ ml Julia/1.8.5
# Start Julia on the command line
$ julia
# Change to ``package mode`` and add the ``MPI`` package
# Load the tool chain which contains a MPI library
$ ml OpenMPI/5.0.3-GCC-13.3.0
# Load Julia
$ ml Julia/1.10.9-LTS-linux-x86_64 
# Start Julia on the command line
$ julia
# Change to ``package mode``
# Load the tool chain which contains a MPI library
$ ml foss/2021b
# Load Julia
$ ml Julia/1.8.5-linux-x86_64
# Start Julia on the command line
$ julia
# Change to ``package mode``
# Load the tool chain which contains a MPI library
$ ml foss/2021b
# Load Julia
$ ml Julia/1.8.5-linux-x86_64
# Start Julia on the command line
$ julia
# Change to ``package mode``
# Load the tool chain for Julia which already contains a MPI library (cray-mpich)
$ ml PDCOLD/23.12 julia/1.10.2-cpeGNU-23.12
# Start Julia on the command line
$ julia
# Change to ``package mode``
# Load the tool chain which contains a MPI library
$ ml buildtool-easybuild/4.8.0-hpce082752a2 foss/2023b
# Load Julia
$ ml julia/1.9.4-bdist
# Start Julia on the command line
$ julia
# Change to ``package mode``

In Julia for all clusters (output may differ for different clusters and Julia versions):

shell> mkdir MPI-env
shell> cd MPI-env
(@v1.10) pkg> activate .
      Activating new project at `path-to-folder\MPI-env`
(MPI-env) pkg> add MPI
# In the ``julian`` mode run these commands:
(MPI-env)julia> using MPI
(MPI-env)julia> MPI.install_mpiexecjl()
        [ Info: Installing `mpiexecjl` to `/home/u/username/.julia/bin`...
        [ Info: Done!
  • End Julia with <CTRL>+D
  • In terminal shell for all clusters (output may differ for different clusters and Julia versions):
# Add the installed ``mpiexecjl`` wrapper to your path on the Linux command line
$ export PATH=~/.julia/bin:$PATH
# Now the wrapper should be available on the command line

Extra Challenge. Project environment with csv

Create a project environment called new-env and activate it. Then, install the package CSV in this environment. For your knowledge, CSV is a package that offers tools for dealing with .csv files. After this, check that this package was installed. Finally, deactivate the environment.

Solution for all centres

``` julia

    shell> mkdir new-env
    shell> cd new-env
    (@v1.8) pkg> activate .
          Activating new project at `path-to-folder\new-env`
    (new-env) pkg> add CSV
    (new-env) pkg> status
          Status `path-to-folder\new-env\Project.toml`
          [336ed68f] CSV v0.10.9
    (new-env) pkg> deactivate

```

Summary

  • Environments in Julia created by Julia itself so third party software are not required.
  • With a virtual environment you can tailor an environment with specific versions for Julia and packages, not interfering with other installed Julia versions and packages.
  • Make it for each project you have for reproducibility.
  • The environments in Julia are lightweight so it is recommended to start a new environment for each project that you are developing.
  • Bianca

Extra reading