This is a short introduction in how to reach the calculation nodes
Wednesday afternoon is wedded to this topic!
Slurm, sbatch, the job queue
Problem: 1000 users, 500 nodes, 10k cores
Need a queue:
x-axis: cores, one thread per core
Slurm is a jobs scheduler
Plan your job and but in the slurm job batch (sbatch)
sbatch <flags> <program>or
sbatch <job script>
Easiest to schedule single-threaded, short jobs
Left: 4 one-core jobs can run immediately (or a 4-core wide job).
The jobs are too long to fit in core number 9-13.
Right: A 5-core job has to wait.
Too long to fit in cores 9-13 and too wide to fit in the last cores.
Job = what happens during booked time
Described in a Bash script file
Slurm parameters (flags)
Load software modules
(Move around file system)
… and more
1 mandatory setting for jobs:
Which compute project? (
For example, if your project is named
NAISS 2017/1-334you specify
3 settings you really should set:
Type of queue? (
core, node, (for short development jobs and tests: devcore, devel)
How many cores? (
up to 16 (20 on Rackham) for core job
How long at most? (
If in doubt:
Where should it run? (
Use a whole node or just part of it?
1 node = 20 cores (16 on Bianca & Snowy)
1 hour walltime = 20 core hours = expensive
Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node
Default value: core
Walltime at the different clusters
Rackham: 10 days
Snowy: 30 days
Bianca: 10 days
Most work is most effective as submitted jobs, but e.g. development needs responsiveness
Interactive jobs are high-priority but limited in
Quickly give you a job and logs you in to the compute node
Require same Slurm parameters as other jobs
$ interactive -A naiss2023-22-793 -p core -n 1 -t 10:00
Which node are you on?
A simple job script template
#!/bin/bash -l # tell it is bash language and -l is for starting a session with a "clean environment, e.g. with no modules loaded and paths reset" #SBATCH -A naiss2023-22-793 # Project name #SBATCH -p devcore # Asking for cores (for test jobs and as opposed to multiple nodes) #SBATCH -n 1 # Number of cores #SBATCH -t 00:10:00 # Ten minutes #SBATCH -J Template_script # Name of the job # go to some directory cd /proj/introtouppmax/labs pwd -P # load software modules module load bioinfo-tools module list # do something echo Hello world!
Other Slurm tools
squeue— quick info about jobs in queue
jobinfo— detailed info about jobs
finishedjobinfo— summary of finished jobs
jobstats— efficiency of booked resources
Exercise at home
Copy the code just further up!
Put it into a file named “jobtemplate.sh”
Make the file executable (chmod)
Submit the job:
$ sbatch jobtemplate.sh
Note the job id!
Check the queue:
$ squeue -u <username> $ jobinfo -u <username>
When it’s done (rather fast), look for the output file (slurm-
$ ls -lrt slurm-*
Check the output file to see if it ran correctly
$ cat <filename>
What kind of work are you doing?
you use mainly CPU power (more cores can help)
if the bottlenecks are allocating memory, copying/duplicating
More on Wednesday afternoon!
You are always in the login node unless you:
start an interactive session
start a batch job
Slurm is a job scheduler
add flags to describe your job.