DL Exercises¶
Info
We put some exercises here for you, if you want to get some more hands-on.
Prepare your project folder¶
Make arrangements for the new project
- Find your way into your project uppmax2024-2-21 by logging in to Rackham by ThinLinc/ssh/VSCode.
-
Go to private folder and make an empty folder with your name
Answer
ssh jayan@rackham.uppmax.uu.se
ssh -X jayan@rackham.uppmax.uu.se
mkdir
Transfering files¶
Copy files between to your private folder
- Use scp to copy a file from the your local laptop to your folder on uppmax2024-2-21. Download CIFAR-10 python pickeled dataset here
-
Do the same activity but with Filezilla or WinSCP. Delete your ealier uploaded data to make space for the new incoming one.
Answer
Refer to SCP documentation here
Using the compute nodes¶
Submit a Slurm job
- Close the cifar10 resnet repository and edit the run.sh by adding appropriate slurm sbatch commands.
Answer
- edit a file using you prefered editor, named
my_bio_worksflow.sh
, for example, with the content
#!/bin/bash -l
#SBATCH -A uppmax2024-2-21
#SBATCH -p node
#SBATCH -N 1
#SBATCH -t 01:00:00
#SBATCH -J cifar_demo
#SBATCH -M snowy
#SBATCH --gres=gpu:1
module load python_ML_packages/3.9.5-gpu
python -c "import torch; print(torch.__version__); print(torch.version.cuda); print(torch.cuda.get_device_properties(0)); print(torch.randn(1).cuda())"
#for model in resnet20 resnet32 resnet44 resnet56 resnet110 resnet1202
for model in resnet20 resnet110
do
echo "python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model"
python -u trainer.py --arch=$model --save-dir=save_$model |& tee -a log_$model
done
-
make the job script executable
-
submit the job
Doing installations¶
Conda installation¶
Install with Conda directly on Rackham
- Install
python>3.11
, transformers, torch, torchvision, notebook (using pip), pytorch-cuda=12.4, ipython, pillow