Efficient jobs¶
Learning outcomes
- Practice using the UPPMAX documentation
- I can see the CPU and memory usage of jobs
- I can read a
jobstats
plot - I can create a
jobstats
plot - I understand how to set up jobs efficiently
Want to see this session as a video?
Watch the YouTube video 'Intermediate Bianca workshop: efficient jobs'.
For teachers
Teaching goals are:
- Learners have practiced using the UPPMAX documentation
- Learners have seen the CPU and memory usage of jobs
- Learners have read a
jobstats
plot - Learners have created a
jobstats
plot - Learners have discussed how to set up jobs efficiently
Lesson plan:
gantt
title Efficient jobs
dateFormat X
axisFormat %s
section First hour
Course introduction: done, course_intro, 0, 10s
Prior : intro, after course_intro, 5s
Present: theory_1, after intro, 5s
Challenge: crit, exercise_1, after theory_1, 40s
Break: crit, milestone, after exercise_1
section Second hour
Challenge: crit, exercise_2, 0, 10s
Feedback: feedback_2, after exercise_2, 10s
SLURM: done, slurm, after feedback_2, 25s
Break: done, milestone, after slurm
Prior questions:
- How to schedule jobs efficiently?
- What is the
jobstats
tool?
Present:
- ?Show documentation
Why?¶
If everyone would use our computational resources effectively, there would be no queue.
From the UPPMAX documentation, original source unknown
Running efficient jobs allows you to run more jobs that start running faster.
Exercises¶
Exercise 1: reading a jobstats
plot¶
- Read the UPPMAX
jobstats
documentation to know enough to be able to (practice) read ajobstats
plot. Especially the 'effective use' section is important.
Exercise 1.1: jobstats plot 1
¶
See jobstats plot 1
below and answer these questions:
- How much cores should this user book?
- Why?
jobstats plot 1
Answer
The user should have booked 1 core: the memory use will work fine with 1 core and this matches the CPU usage exactly.
It may be that the program is set up incorrectly and that it can use multiple cores if set up correctly.
Exercise 1.2: jobstats plot 2
¶
See jobstats plot 2
below and answer these questions:
- Did the job finish successfully?
- How much cores should this user book?
- Why?
jobstats plot 2
Answer
The job did not finish successfully, the OUT_OF_MEMORY
error
indicites that.
How much cores the user should book is uncertain, we only know that it is more then currently used. One strategy is to double to amount of cores and finetune after a successful run.
Exercise 1.3: jobstats plot 3
¶
- See
jobstats plot 3
below and answer these questions: - How much cores should this user book?
- Why?
jobstats plot 3
Answer
We don't know. The user uses all CPU power perfectly and there is enough memory available.
The user may benefit from more CPUs, as the program may be CPU limited.
It may be that the program used is designed to use 20 CPUs maximally, hence scheduling 20 cores is perfect!
It may be that using 20 cores is a strategy of the user: using multiple cores always brings computational overhead and hence wasted CPU resources.
Exercise 1.4: jobstats plot 4
¶
See jobstats plot 4
below and answer these questions:
- How much cores should this user book?
- Why?
jobstats plot 4
Answer
This seems to be the cleanest example of using the algorithm to use computational resources efficiently: the user needs 2 cores for memory and adds 1 for safely. The job is not clearly CPU limited.
Exercise 2: creating a jobstats
plot¶
We are going to create a jobstats
plot. For that, we need a job
to plot. Here we first look for a job, after which we plot it.
- Scan the UPPMAX
finishedjobinfo
documentation - Log in to your own Bianca project.
- Find a job that has finished successfully that took longer than one hour (if there is none, use the job the ran longest)
Answer
Use any of the code snippets, for example How do I find jobs that have finished and took longer than an hour?:
Press CTRL-C
to stop the process: it will take very long to finish.
- Read the UPPMAX
jobstats
documentation, create ajobstats
plot of that job
- View the
jobstats
plot. Use the UPPMAX documentation on 'eog' if you want to be fast :-)
- Was that a job that was set up well? If not, how should it be setup? Why?
- Does the quote at the start of this sessions ('If everyone would use our computational resources effectively, there would be no queue') apply to your job?