Load and run R

  • find the module to be able to run R

  • load the module to be able to run R

  • run the R interpreter

  • run the R command to get the list of installed R packages

  • run an R script from the command-line

Introduction

flowchart TD

  find_r_module[1.Find an R module]
  load_r_module[2.Load an R module]
  use_r_interpreter[3.Use the R interpreter]
  start_r_interpreter[3.1 Start the R interpreter]
  subgraph R
    say_hello[4.2 Say hello]
    see_installed_packages[3.2 See installed packages]
    load_package[3.3 Load a package]
  end
  stop_r_interpreter[Stop the R interpreter]
  run_r_script[4.Run an R script]

  find_r_module --> |needed for| load_r_module
  load_r_module --> |allows for| use_r_interpreter
  load_r_module --> |allows for| run_r_script  

  use_r_interpreter --> start_r_interpreter
  use_r_interpreter --> say_hello
  use_r_interpreter --> see_installed_packages
  use_r_interpreter --> load_package
  use_r_interpreter --> stop_r_interpreter

  
  run_r_script --> say_hello
  run_r_script --> see_installed_packages
  run_r_script --> load_package

To allow us to work with R on an HPC cluster, we will:

  • find the module to be able to run R, so we know which versions of R we can pick from

  • load the module to be able to run R, so we can actually run R

  • run the R interpreter, so we can test/develop R code

  • run an R script from the command-line, so we can run R code

In this session, we will follow this typical user journey.

1. Find an R module

To be able to work with R on an HPC cluster, we will need to find a module that loads a specific version of R.

HPC2N, UPPMAX, LUNARC, and most of the Swedish HPC centres use the same module system:

Here is how to find the modules that load different versions of R:

From a terminal, do:

module spider R

Here is how to find out how to load an R module of a specific version:

To see how to load a specific version of R, including the prerequisites, do

module spider R/<version>

where <version> is an R version, in major.minor.patch format, for example, module spider R/4.1.2.

2. Load an R module

When you have a found a modules to load your favorite version of R, here is how you load that module:

After having done module spider R/4.1.2, you will get a list of which other modules needs to be loaded first, resulting in:

module load GCC/10.2.0 OpenMPI/4.0.5 R/<version>

where <version> is an R version, in major.minor.patch format, for example, module load GCC/11.2.0 OpenMPI/4.1.1 R/4.1.2

If you care about reproducibility of your programming environments and R scripts, you should always load a specific version of a module.

3. Use the R interpreter

flowchart TD

  use_r_interpreter[3.Use the R interpreter]
  start_r_interpreter[3.1 Start the R interpreter]
  subgraph R
    say_hello[Say hello]
    see_installed_packages[3.2 See installed packages]
    load_package[3.3 Load a package]
  end
  stop_r_interpreter[Stop the R interpreter]

  use_r_interpreter --> start_r_interpreter
  use_r_interpreter --> say_hello
  use_r_interpreter --> see_installed_packages
  use_r_interpreter --> load_package
  use_r_interpreter --> stop_r_interpreter

Now you have loaded a module for a specific version of R, from the terminal, we can use the R interpreter.

Here we show:

  • how to start the interpreter

  • how to do a trivial R thing

  • how to see the list of installed R packages

  • how to load an R package

  • how to quit the interpreter

3.1. Start the R interpreter

Now you have loaded a module for a specific version of R, from the terminal, we can start the R interpreter like this:

R

3.2 how to do a trivial R thing

Warning

Only do lightweight things!

We are still on the login node, which is shared with many other users. This means, that if we do heavy calculations, all these other users are affected.

If you need to do heavy calculations:

  • Submit that calculation as a batch job

  • UPPMAX only: use an interactive session

This will be shown in the course in a later session

Within the R interpreter we can give R commands:

print("Hello world")

Which will give the output:

[1] "Hello world"

3.3. how to see the list of installed R packages

From within the R interpreter, we can check which packages are installed using:

installed.packages()

3.4. how to load an R package

From within the R interpreter, we can load a package like:

library(ggplot2)

3.5. how to quit the interpreter

To quit the R interpreter, use the quit function:

quit()

You will get the question:

Save workspace image? [y/n/c]:

where you type n until you know what that is :-)

4. Run an R script

flowchart TD

  subgraph R
    say_hello[4.2.Say hello]
    see_installed_packages[See installed packages]
    load_package[Load a package]
  end
  run_r_script[4.Run an R script Rscript]

  run_r_script --> say_hello
  run_r_script --> see_installed_packages
  run_r_script --> load_package

Now you have loaded a module for a specific version of R, from the terminal, we can run an R script like this:

Rscript <r_script_name>

where <r_script_name> is the path to an R script, for example Rscript hello.R.

Warning

Only do lightweight things!

We are still on the login node, which is shared with many other users. This means, that if we do heavy calculations, all these other users are affected.

If you need to do heavy calculations:

  • Submit that calculation as a batch job

  • UPPMAX only: use an interactive session

This will be shown in the course in a later session

Exercises

Exercise 1: find an R module

Note

Learning objectives

  • find the module to be able to run R

Use the module system to find which versions of R are provided by your cluster’s module system.

Exercise 2: load an R module

Note

Learning objectives

  • load the module to be able to run R

For this course, we recommend these versions of R:

HPC center

R version

HPC2N

4.1.2

LUNARC

4.2.1

UPPMAX

4.1.1

Load the module for the R version recommended to use in this course.

Exercise 3: use the R interpreter

Note

Learning objectives

  • run the R interpreter

  • run the R command to get the list of installed R packages

flowchart TD

  use_r_interpreter[3.Use the R interpreter]
  start_r_interpreter[3.1 Start the R interpreter]
  subgraph R
    say_hello[Say hello]
    see_installed_packages[3.2 See installed packages]
    load_package[3.3 Load a package]
  end
  stop_r_interpreter[Stop the R interpreter]

  use_r_interpreter --> start_r_interpreter
  use_r_interpreter --> say_hello
  use_r_interpreter --> see_installed_packages
  use_r_interpreter --> load_package
  use_r_interpreter --> stop_r_interpreter

Here we:

  • start the R interpreter

  • find out which packages are already installed

  • load an R package

Exercise 3.1: start the R interpreter

Start the R interpreter.

Exercise 3.2: check which packages are installed

From within the R interpreter, check which packages are installed.

Exercise 3.3: load a package

From within the R interpreter, load the parallel package.

Exercise 4: run an R script

Note

Learning objectives

  • run an R script from the command-line

flowchart TD

  subgraph R
    say_hello[4.2.Say hello]
    see_installed_packages[See installed packages]
    load_package[Load a package]
  end
  run_r_script[4.Run an R script Rscript]

  run_r_script --> say_hello
  run_r_script --> see_installed_packages
  run_r_script --> load_package

In this exercise, we will run an example script.

Exercise 4.1: get an R script

Get the R script hello.R by downloading it from the terminal:

wget https://raw.githubusercontent.com/UPPMAX/R-python-julia-HPC/main/exercises/r/hello.R

Exercise 4.2: run

Run the R script called hello.R, using Rscript.

Exercise 5: download and extract the tarbal with exercises

See here how to download and extract the tarbal with exercises.

Conclusions

Keypoints

One needs to:

  • first find a module to run R

  • load one or more modules to run R.

  • if one cares about reproducibility, use explicit versions of modules

  • start the R interpreter with R

  • run R scripts scripts with Rscript

However:

  • as we work on a login node, we can only do lightweight things

  • we can only use the R packages installed with the R module

  • we do not work in an isolated environment

These will be discussed in other sessions.