Overview¶
What is needed to be able to run at UPPMAX¶
- SUPR
- account
- project
How to access the clusters?¶
- login
- ssh
- ThinLinc
Where should you keep your data?¶
uquota
How to transfer files?¶
sftp
scp
The module system¶
Built on LMOD, a module system to handle user's environment variables.
Some useful comamnds:
module avail <name>
module spider <name>
module load <module>/<version>
module list
module unload <module>/<version>
module purge
Slurm¶
A job schedular used in many supercomputers and HPCs.
How to submit a job to Slurm?
What should a jobscript contain?
-A
: project number-t
: max time-p
: partition-n/-N
: number or core and/or nodes-J
: job name- special features :
--gres, --gpus-per-node
etc.
A typical job script:¶
#!/bin/bash
#SBATCH -A uppmax2025-3-5
#SBATCH -p node
#SBATCH -N 1
#SBATCH -t 24:00:00
module load software/version
./my-script.sh
Useful SBATCH options:
--mail-type=BEGIN,END,FAIL,TIME_LIMIT_80
--output=slurm-%j.out
--error=slurm-%j.err
Useful commands:
interactive -A naiss2023-22-247 -M snowy -p core -n 4
starts an interactive job on snowyjobinfo -p devel
sinfo -p node -M snowy
jobinfo -u username
How to cancel jobs?¶
scancel <jobid>
Job dependencies¶
sbatch jobscript.sh
submitted job with jobid1sbatch anotherjobscript.sh
submitted job with jobid2--dependency=afterok:jobid1:jobid2
job will only start running after the successful end of jobs jobid1:jobid2- very handy for clearly defined workflows
- One may also use
--dependency=afternotok:jobid
in case you’d like to resubmit a failed job, OOM for example, to a node with a higher memory:-C mem215GB
or-C mem1TB
- More in slurm documents.
GPU flags¶
Example of a job running on part of a GPU node
Example of an interactive session on Snowy
I/O intensive jobs: use the scratch local to the node¶
Example
#!/bin/bash
#SBATCH -J jobname
#SBATCH -A uppmax2025-3-5
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 10:00:00
module load bioinfo-tools
module load bwa/0.7.17 samtools/1.14
export SRCDIR=$HOME/path-to-input
cp $SRCDIR/foo.pl $SRCDIR/bar.txt $SNIC_TMP/.
cd $SNIC_TMP
./foo.pl bar.txt
cp *.out $SRCDIR/path-to-output/.
Job arrays¶
Example
Submit many jobs at once with the same or similar parameters Use $SLURM_ARRAY_TASK_ID in the script in order to find the correct path
#!/bin/bash
#SBATCH -A naiss2023-22-21
#SBATCH -p node
#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -J jobarray
#SBATCH --array=0-19
#SBATCH --mail-type=ALL,ARRAY_TASKS
# SLURM_ARRAY_TASK_ID tells the script which iteration to run
echo $SLURM_ARRAY_TASK_ID
cd /pathtomydirectory/dir_$SLURM_ARRAY_TASK_ID/
srun -n 40 my-program
env
You may use scontrol to modify some of the job arrays.
GPU accessibility check¶
-
Chech CUDA environment variable
or -
Check CUDA and pytorch accessibility from python