Overview¶
What is needed to be able to run at UPPMAX¶
- SUPR
- account
- project
How to access the clusters?¶
- login
- ssh
- ThinLinc
Where should you keep your data?¶
uquota
How to transfer files?¶
sftp
scp
The module system¶
- built on LMOD
Some useful comamnds:
module avail <name>
module spider <name>
module load <module>/<version>
module list
module unload <module>/<version>
module purge
Slurm¶
How to submit a job to Slurm?
What should a jobscript contain?
- project number
- max time
- partition
- number or core and/or nodes
- job name
- special features
A typical job script:¶
#!/bin/bash
#SBATCH -A uppmax2024-2-21
#SBATCH -p node
#SBATCH -N 1
#SBATCH -t 24:00:00
module load software/version
./my-script.sh
Useful SBATCH options:
--mail-type=BEGIN,END,FAIL,TIME_LIMIT_80
--output=slurm-%j.out
--error=slurm-%j.err
Useful commands:
jobinfo -p devel
sinfo -p node - M snowy
jobinfo -u username --state=running
jobinfo -u username --state=pending
salloc -A naiss2023-22-247 --begin=2023-03-24T08:00:00
starts an interactive job earliest tomorrow at 08:00
How to cancel jobs?¶
scancel <jobid>
Job dependencies¶
sbatch jobscript.sh
submitted job with jobid1sbatch anotherjobscript.sh
submitted job with jobid2--dependency=afterok:jobid1:jobid2
job will only start running after the successful end of jobs jobid1:jobid2- very handy for clearly defined workflows
- One may also use
--dependency=afternotok:jobid
in case you’d like to resubmit a failed job, OOM for example, to a node with a higher memory:-C mem215GB
or-C mem1TB
- More in slurm documents.
GPU flags¶
Example of a job running on part of a GPU node
Example of an interactive session on Snowy
I/O intensive jobs: use the scratch local to the node¶
Example
#!/bin/bash
#SBATCH -J jobname
#SBATCH -A uppmax2024-2-21
#SBATCH -p core
#SBATCH -n 1
#SBATCH -t 10:00:00
module load bioinfo-tools
module load bwa/0.7.17 samtools/1.14
export SRCDIR=$HOME/path-to-input
cp $SRCDIR/foo.pl $SRCDIR/bar.txt $SNIC_TMP/.
cd $SNIC_TMP
./foo.pl bar.txt
cp *.out $SRCDIR/path-to-output/.
Job arrays¶
Example
Submit many jobs at once with the same or similar parameters Use $SLURM_ARRAY_TASK_ID in the script in order to find the correct path
#!/bin/bash
#SBATCH -A naiss2023-22-21
#SBATCH -p node
#SBATCH -N 2
#SBATCH -t 01:00:00
#SBATCH -J jobarray
#SBATCH --array=0-19
#SBATCH --mail-type=ALL,ARRAY_TASKS
# SLURM_ARRAY_TASK_ID tells the script which iteration to run
echo $SLURM_ARRAY_TASK_ID
cd /pathtomydirectory/dir_$SLURM_ARRAY_TASK_ID/
srun -n 40 my-program
env
You may use scontrol to modify some of the job arrays.
Profiling on the GPUs¶
-
nvidia-smi
nvidia-smi dmon -o DT
nvidia-smi --format=noheader,csv --query-compute-apps=timestamp,gpu_name,pid,name,used_memory --loop=1 -f sample_run.log
nvidia-smi --help
orman nvidia-smi
-
module load nvtop
nvtop
-
Check CUDA and pytorch accessibility from python