# Submitting jobs ```{objectives} - This is a short introduction in how to reach the calculation nodes - Wednesday afternoon is wedded to this topic! ``` ## Slurm, sbatch, the job queue - Problem: 1000 users, 500 nodes, 10k cores - Need a queue: ![Image](./img/queue1.png) - x-axis: cores, one thread per core - y-axis: time

- [Slurm](https://slurm.schedmd.com/) is a jobs scheduler - Plan your job and but in the slurm job batch (sbatch) `sbatch ` or `sbatch ` - Easiest to schedule *single-threaded*, short jobs ![Image](./img/queue2.png) ![Image](./img/queue3.png) - Left: 4 one-core jobs can run immediately (or a 4-core wide job). - The jobs are too long to fit in core number 9-13. - Right: A 5-core job has to wait. - Too long to fit in cores 9-13 and too wide to fit in the last cores. ## Jobs - Job = what happens during booked time - Described in a Bash script file - Slurm parameters (**flags**) - Load software modules - (Move around file system) - Run programs - (Collect output) - ... and more ## Slurm parameters - 1 mandatory setting for jobs: - Which compute project? (`-A`) - For example, if your project is named ``NAISS 2017/1-334`` you specify ``-A naiss2017-1-234`` - 3 settings you really should set: - Type of queue? (`-p`) - core, node, (for short development jobs and tests: devcore, devel) - How many cores? (`-n`) - up to 16 (20 on Rackham) for core job - How long at most? (`-t`) - If in doubt: - -`p core` - -`n 1` - `-t 7-00:00:00` ![Image](./img/queue1.png) - Where should it run? (`-p node` or `-p core`) - Use a whole node or just part of it? - 1 node = 20 cores (16 on Bianca & Snowy) - 1 hour walltime = 20 core hours = expensive - Waste of resources unless you have a parallel program or need all the memory, e.g. 128 GB per node - Default value: core ### Walltime at the different clusters - Rackham: 10 days - Snowy: 30 days - Bianca: 10 days ## Interactive jobs - Most work is most effective as submitted jobs, but e.g. development needs responsiveness - Interactive jobs are high-priority but limited in `-n` and `-t` - Quickly give you a job and logs you in to the compute node - Require same Slurm parameters as other jobs ``````{challenge} Try interactive ``` {code-block} console $ interactive -A naiss2023-22-793 -p core -n 1 -t 10:00 ``` - Which node are you on? - Logout with `-D` or `logout` `````` ### A simple job script template ```bash= #!/bin/bash -l # tell it is bash language and -l is for starting a session with a "clean environment, e.g. with no modules loaded and paths reset" #SBATCH -A naiss2023-22-793 # Project name #SBATCH -p devcore # Asking for cores (for test jobs and as opposed to multiple nodes) #SBATCH -n 1 # Number of cores #SBATCH -t 00:10:00 # Ten minutes #SBATCH -J Template_script # Name of the job # go to some directory cd /proj/introtouppmax/labs pwd -P # load software modules module load bioinfo-tools module list # do something echo Hello world! ``` ## Other Slurm tools - ``squeue`` — quick info about jobs in queue - ``jobinfo`` — detailed info about jobs - ``finishedjobinfo`` — summary of finished jobs - ``jobstats``— efficiency of booked resources ``````{challenge} Exercise at home - Copy the code just further up! - Put it into a file named “jobtemplate.sh” - Make the file executable (chmod) - Submit the job: ``` {code-block} console $ sbatch jobtemplate.sh ``` - Note the job id! - Check the queue: ``` {code-block} console $ squeue -u $ jobinfo -u ``` - When it’s done (rather fast), look for the output file (slurm-.out): ``` {code-block} console $ ls -lrt slurm-* ``` - Check the output file to see if it ran correctly ``` {code-block} console $ cat ``` `````` ## What kind of work are you doing? - Compute bound - you use mainly CPU power (more cores can help) - Memory bound - if the bottlenecks are allocating memory, copying/duplicating **More on Wednesday afternoon!** ```{keypoints} - You are always in the login node unless you: - start an interactive session - start a batch job - Slurm is a job scheduler - add flags to describe your job.