Submitting jobs

Objectives

This is a short introduction in how to reach the calculation nodes
Wednesday afternoon is wedded to this topic!

Slurm, sbatch, the job queue

Problem: 1000 users, 500 nodes, 10k cores
Need a queue:

x-axis: cores, one thread per core
y-axis: time
Slurm is a jobs scheduler
Plan your job and but in the slurm job batch (sbatch) sbatch <flags> <program> or sbatch <job script>
Easiest to schedule single-threaded, short jobs

Left: 4 one-core jobs can run immediately (or a 4-core wide job).
- The jobs are too long to fit in core number 9-13.
Right: A 5-core job has to wait.
- Too long to fit in cores 9-13 and too wide to fit in the last cores.

Jobs

Job = what happens during booked time
Described in a Bash script file
- Slurm parameters (flags)
- Load software modules
- (Move around file system)
- Run programs
- (Collect output)
… and more

Slurm parameters

1 mandatory setting for jobs:
- Which compute project? (-A)
- For example, if your project is named NAISS 2017/1-334 you specify -A naiss2017-1-234
3 settings you really should set:
- Type of queue? (-p)
  - core, node, (for short development jobs and tests: devcore, devel)
- How many cores? (-n)
  - up to 16 (20 on Rackham) for core job
- How long at most? (-t)
If in doubt:
- -p core
- -n 1
- -t 7-00:00:00

Where should it run? (-p node or -p core)
Use a whole node or just part of it?
- 1 node = 20 cores (16 on Bianca & Snowy)
- 1 hour walltime = 20 core hours = expensive
- Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
Default value: core

Walltime at the different clusters

Rackham: 10 days
Snowy: 30 days
Bianca: 10 days

Interactive jobs

Most work is most effective as submitted jobs, but e.g. development needs responsiveness
Interactive jobs are high-priority but limited in -n and -t
Quickly give you a job and logs you in to the compute node
Require same Slurm parameters as other jobs

Try interactive

$ interactive -A naiss2023-22-793 -p core -n 1 -t 10:00

Which node are you on?
- Logout with <Ctrl>-D or logout

A simple job script template

#!/bin/bash -l 
# tell it is bash language and -l is for starting a session with a "clean environment, e.g. with no modules loaded and paths reset"

#SBATCH -A naiss2023-22-793  # Project name

#SBATCH -p devcore  # Asking for cores (for test jobs and as opposed to multiple nodes) 

#SBATCH -n 1  # Number of cores

#SBATCH -t 00:10:00  # Ten minutes

#SBATCH -J Template_script  # Name of the job

# go to some directory

cd /proj/introtouppmax/labs
pwd -P

# load software modules

module load bioinfo-tools
module list

# do something

echo Hello world!

Other Slurm tools

squeue — quick info about jobs in queue
jobinfo — detailed info about jobs
finishedjobinfo — summary of finished jobs
jobstats— efficiency of booked resources

Exercise at home

Copy the code just further up!
Put it into a file named “jobtemplate.sh”
Make the file executable (chmod)
Submit the job:

$ sbatch jobtemplate.sh

Note the job id!
Check the queue:

$ squeue -u <username>
$ jobinfo -u <username>

When it’s done (rather fast), look for the output file (slurm-.out):

$ ls -lrt slurm-*

Check the output file to see if it ran correctly

$ cat <filename>

What kind of work are you doing?

Compute bound
- you use mainly CPU power (more cores can help)
Memory bound
- if the bottlenecks are allocating memory, copying/duplicating

More on Wednesday afternoon!

Keypoints

You are always in the login node unless you:
- start an interactive session
- start a batch job
Slurm is a job scheduler
- add flags to describe your job.