# Using the compute nodes
```{objectives}
- This is a short introduction in how to reach the compute nodes
- Wednesday afternoon is wedded to this topic!
```
```{instructor-note}
- Approx timing: 13.30-14.30 (10 min break)
- Theory
- Hands-on
```
```{attention}
- For now, **this course**, we use the **material on this present page**.
- A SLURM introduction can otherwise be found here:
```
```{note}
- project number: ``naiss2024-22-49``
```
## The compute nodes
When you are logged in, you are on a login node.
There are two types of nodes:
Type |Purpose
------------|--------------------------
Login node |Start jobs for worker nodes, do easy things.
Compute nodes |Do hard calculations, either from scripts of an interactive session.
```{mermaid}
graph TB
Node1 -- interactive --> SubGraph2Flow
Node1 -- sbatch --> SubGraph2Flow
subgraph "Snowy"
SubGraph2Flow(calculation nodes)
end
thinlinc -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
terminal/thinlinc -- usr --> Node1
terminal -- usr-sensXXX + 2FA + VPN ----> SubGraph1Flow
Node1 -- usr-sensXXX + 2FA + no VPN ----> SubGraph1Flow
subgraph "Bianca"
SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster)
private -- interactive --> calcB(calculation nodes)
private -- sbatch --> calcB
end
subgraph "Rackham"
Node1[Login] -- interactive --> Node2[calculation nodes]
Node1 -- sbatch --> Node2
end
```
## Slurm, sbatch, the job queue
- Problem: _1000 users, 300 nodes, 5000 cores_
- We need a **queue**:
- [Slurm](https://slurm.schedmd.com/) is a job scheduler
- You define **jobs** to be run on the compute nodes and therefore sent to the queue.
### Jobs
- Job = what happens during booked time
- Described in
- a script file or
- the command-line (priority over script)
- The definitions of a job:
- Slurm parameters (**flags**)
- Load software modules
- (Navigate in file system)
- Run program(s)
- (Collect output)
- ... and more
```{admonition} "Some keywords"
- A program may run _serially_ and then needs only ONE _compute thread_, which will occupy 1 core, which is a physical unit of the CPU on the node.
- You should most often just book 1 core. If you require more than 7 GB you can allocate more cores and you will get multiples of 7 GB.
- A program may run in _parallel_ and then needs either several _threads_ or several _tasks_, both occupying several cores.
- If you need all 128 GB RAM (actually 112) or all 16 cores for your job, book a complete node.
```
### Slurm parameters
- 1 mandatory setting for jobs:
- Which compute project? (`-A`)
- 3 settings you really should set:
- Type of queue or partition? (`-p`)
- ``core`` for most jobs and **default**!
- ``node`` for larger jobs
- for short development jobs and tests: ``devcore``, ``devel``)
- How many cores? (`-n`)
- up to 16 for core job
- How long at most? (`-t`)
- If in doubt:
- `-p core`
- `-n 1`
- `-t 10-00:00:00`
### The queue
- How does the queue work?
- Let's look graphically at jobs presently running.

- *x-axis: cores, one thread per core*
- *y-axis: time*
- We see some holes where we may fit jobs already!
- Let's see which type of jobs that can fit!

- 4 one-core jobs can run immediately (or a 4-core wide job).
- *The jobs are too long to fit at core number 9-13.*

- A five-core job has to wait.
- *Too long to fit in cores 9-13 and too wide to fit in the last cores.*
- Easiest to schedule *single-threaded*, short jobs
```{tip}
- You don't see the queue graphically, however.
- But, overall:
- short and narrow jobs will start fast
- test and development jobs can get use of specific development nodes if they are shorter than 1 hour and uses up to two nodes.
- waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
```
### Core-hours
- Remember that you are charged CPU-hours according to booked #cores x hours
- Example 1: 60 hours with 2 cores = 120 CPU-hours
- Example 2: 12 hours with a full node = 192 hours
- Waste of resources unless you have a parallel program using all cores or need all the memory, e.g. 128 GB per node
### Choices
- Work interactively with your data or develop or test
- Run an **Interactive session**
- ``$ interactive ...``
- If you _don't_ need any live interaction with your workflow/analysis/simulation
- Send your job to the slurm job batch (sbatch)
- `$ sbatch ` or
- `$ sbatch `
```{mermaid}
flowchart TD
UPPMAX(What to run on which node?)
operation_type{What type of operation/calculation?}
interaction_type{What type of interaction?}
login_node(Work on login node)
interactive_node(Work on interactive node)
calculation_node(Schedule for calculation node)
UPPMAX-->operation_type
operation_type-->|light,short|login_node
operation_type-->|heavy,long|interaction_type
interaction_type-->|Direct|interactive_node
interaction_type-->|Indirect|calculation_node
```
### What kind of compute work are you doing?
- **Compute bound**
- you use mainly CPU power
- does the software support threads or MPI?
- **Threads/openMP** are rather often supported. **Use several cores!**
- **MPI** (Message Passing Interface) allows for inter-node jobs but are seldom supported for bioinformatics software. **You could use several nodes!**
- **Memory bound**
- if the bottlenecks are allocating memory, copying/duplicating
- use more cores up to 1 node, perhaps using a "fat" node.
```{admonition} "Slurm Cheat Sheet"
- ``-A`` project number
- ``-t`` wall time
- ``-n`` number of cores
- ``-N`` number of nodes (can only be used if your code is parallelized with MPI)
- ``-p`` partition
- ``core`` is default and works for jobs narrower than 16 cores
- ``node`` can be used if you need the whole node and its memory
```
### Walltime at the different clusters
- Rackham: 10 days
- Snowy: 30 days
- Bianca: 10 days
## Interactive jobs
- Most work is most effective as submitted jobs, but e.g. development needs responsiveness
- Interactive jobs are high-priority but limited in `-n` and `-t`
- Quickly give you a job and logs you in to the compute node
- Require same Slurm parameters as other jobs
- Log in to compute node
- `$ interactive ...`
- Logout with `-D` or `logout`
- To use an interactive node, in a terminal, type:
```bash
interactive -A [project name] -p core -n [number_of_cores] -t [session_duration]
```
For example:
```bash
interactive -A naiss2024-22-49 -p core -n 2 -t 8:0:0
```
This starts an interactive session using project `naiss2024-22-49`
that uses 2 cores and has a maximum duration of 8 hours.
### Try interactive and run RStudio
- We recommend using at least two cores for RStudio, and to get those resources, you must start an interactive job.
```{type-along}
- Use **ThinLinc**
- Start **interactive session** on compute node (2 cores)
- If you already have an interactive session going on use that.
- If you don't find it, do
``$ squeue``
- find your session, ssh to it, like:
``$ ssh r483``
- If you have no ongoing session:
``$ interactive -A naiss2024-22-49 -p devcore -n 2 -t 60:00``
- Once the interactive job has begun you need to load needed modules, even if you had loaded them before in the login node
- You can check which node you are on?
`$ hostname`
- Also try:
`$ srun hostname`
- This will give several output lines resembling the number of cores you allocated.
- How many in this case??
``[bjornc@r483 ~]$``
- Load an RStudio module and an R_packages module (if not loading R you will have to stick with R/3.6.0) and run "rstudio" from there.
`$ ml R_packages/4.2.1`
`$ ml RStudio/2022.07.1-554`
- **Start rstudio**, keeping terminal active (`&`)
`$ rstudio &`
- Slow to start?
- Depends on:
- number of packages
- if you save a lot of data in your RStudio workspace, to be read during start up.
- **Quit RStudio**!
- **Log out** from interactive session with `-D` or `logout` or `exit`
```
## Job scripts (batch)
- Batch scripts can be written in any scripting language. We will use BASH
- Make first line be `#!/bin/bash` in the top line
- It is good practice to end the line with ``-l`` to reload a fresh environment with no modules loaded.
- This makes you sure that you don't enable other software or versions that may interfere with what you want to do in the job.
- Before the job content, add the batch flags starting the lines with the keyword `#SBATCH`, like:
- ``#SBATCH -t 2:00:00``
- ``#SBATCH -p core``
- ``#SBATCH -n 3``
- `#` will be ignored by `bash` and can run as an ordinary bash script
- if running the script with the command `sbatch