Introduction R

../_images/r_logo_50.png
  • see a first overview of the R programming language

  • see the overview of the course

Course learning objectives

  • use the module system to load R

  • use the module system to load site-installed R packages

  • find out which versions of R and packages are installed

  • run R scripts

  • write a batch script for running R

  • install R packages from CRAN

  • see how to install other R packages yourself

  • start batch jobs

  • run RStudio

on HPC2N or UPPMAX

Course non-goals

  • improve R coding skills

  • use R on other HPC clusters

First overview of R

R is a programming language for statistical computing and data visualization (from Wikipedia).

flowchart TD

    subgraph r[R]
      r_interpreter[the R interpreter]
      r_packages[R packages]
      r_language[the R programming language]
      r_dev[R software development]
      rstudio[RStudio]

      interpreted_language[Interpreted]
      cran[CRAN]
    end

    r_language --> |has| r_dev
    r_language --> |is| interpreted_language 
    r_language --> |uses| r_packages
    interpreted_language --> |done by| r_interpreter
    r_packages --> |maintained by| cran
    r_dev --> |commonly done in| rstudio

The main general R resources are:

R is used in many NAISS centres:

R Exercise files

  • On HPC2N, you can copy the R exercise tarball from /proj/nobackup/hpc2n2024-025/exercises-r.tar.gz

  • On UPPMAX, you can copy the R exercise tarball from /proj/naiss2024-22-107/exercises-r.tar.gz

Schedule

flowchart TD

    subgraph login[HPC login]
      ssh[0. SSH]
      remote_desktop_website[5. Remote desktop website]
      remote_desktop_local_thinlinc_client[5. Remote desktop with local ThinLinc client]
    end
    subgraph scheduler[scheduler]
      running_batch_jobs[3. Running batch jobs]
      running_interactive_session[5. Running an interactive session]
    end
  
    login --> |allows for| scheduler flowchart TD

    subgraph r[R]
      r_interpreter[1. the R interpreter]
      r_packages[2. R packages]
      r_virtual_environments[2. R virtual environments]
      r_language[1. the R programming language]
      parallel_and_multithreaded_functions[3. Parallel and multithreaded functions]
      r_dev[5. R software development]
      rstudio[5. RStudio]
      ml[4. Machine learning]
      interpreted_language[1. Interpreted]
      cran[1. CRAN]
    end
    subgraph modules[modules]
      r_module[1. R module]
      r_packages_module[2. R_packages module]
      rstudio_module[5. RStudio module]
    end

  
    r_language --> |has| r_dev
    r_language --> |is| interpreted_language 
    r_language --> |uses| r_packages
    interpreted_language --> |done by| r_interpreter
    r_packages --> |maintained by| cran
    r_packages --> |isolated by|r_virtual_environments 
    r_language --> |allows| parallel_and_multithreaded_functions
    r_language --> |provides for| ml
    r_dev --> |commonly done in| rstudio

    r_interpreter --> |loaded by|r_module
    r_packages --> |loaded by|r_packages_module
    rstudio --> |loaded by|rstudio_module

    rstudio_module --> |automatically loads latest| r_packages_module
    r_packages_module --> |automatically loads corresponding version of| r_module
Schedule

Time

Topic

Teacher(s)

9:00

First login

BB + PO + RB

9:45

Break

.

10:00

First login

BB + PO + RB

10:10

Syllabus

RB

10:20

Load modules and run

RB

10:45

Break

.

11:00

Packages

BB

11:30

Isolated environments

BB

12:00

Lunch

.

13:00

Batch

BB

13:30

Parallel

PO

14:15

Break

.

14:30

Simultaneous session

PO * RB * ?RP

15:15

Break

.

15:30

Machine learning

BB or PO

16:00

Summary and evaluation

RB

16:15

Done

.

Simultaneous session:

  • HPC2N: ThinLinc & RStudio, by PO

  • UPPMAX: ThinLinc, RStudio, interactive, by RB

  • LUNARC, by ?RP