Clusters¶
Learning objectives
- Understand what a cluster is
- Understand that a cluster has nodes and that nodes have cores
- Understand what a job is
- Understand that a job can use one core, multiple cores and multiple nodes
- Understand there are two types of nodes
- Understand to run only light jobs on the login node
- Understand how to run heavy jobs
- Understand how to run interactive jobs
- Understand how to pick an UPPMAX cluster
- Understand how to find a cluster's hardware specifications
For teachers
Unprepared as I don't believe the learners will reach this session
Why?¶
Clusters are complex machines. Understanding the basics of what a cluster is, helps us understand why the are the way they are.
What is a cluster?¶
A (computer) cluster is a group of computers working together. A cluster is shared by many users, that have widely different demands. Different users need different software and -for the same software!- possibly different versions.
When to use a cluster?¶
- Calculations take too long
- Calculations require too much memory
- Calculations require too much disk space
- Any combination of the above
UPPMAX clusters¶
These are the most important UPPMAX clusters for us:
Name | Purpose |
---|---|
Bianca | Sensitive data |
Rackham | General purpose |
Snowy | Provides GPUs |
Cluster architecture¶
A (computer) cluster is a group of computers working together. Each of those computers is called a node. Each node has, among others, several CPUs (called 'cores') and RAM.
flowchart TD
cluster[Cluster: 'a group of computers']
node[Node: 'a computer']
core[Core: 'a CPU']
cluster --> |has many| node
node --> |has many| core
Each core can run a calculation. A calculation is called a job. A job can use one core, multiple cores or multiple nodes. One special job is an interactive session, in which a user can interactively start hard calculations on a compute node.
There are two types of nodes:
- login node: a nodes that is shared by multiple users, which allows to do lightweight things, such as scheduling a hard calculation
- compute node: a node that is scheduled for hard calculations or individual interactive use.
flowchart TD
subgraph login_nodes[Login nodes]
subgraph rackham1
user_1
user_3
user_6
end
subgraph rackham2
user_2
user_4
end
end
subgraph computation_nodes[Computation nodes]
subgraph r100
job_121
job_122
job_123
end
subgraph r101
job_124
end
subgraph r102
job_124_2[job_124]
end
subgraph r103
job_125
user_5
end
end
A simplified view of Rackham. The login node
rackham1
has three users on it. The login noderackham2
has two users on it. The compute noder100
is running three different jobs. The compute nodesr101
andr102
are running the same job. The compute noder103
is used for an interactive job byuser_5
.
Exercises¶
[...]
Exercise 1: Understand¶
Learning objectives
- [...]
(optional) Exercise 2: Understand system usage¶
Go to https://status.uppmax.uu.se/usage/.
- Why are there three systems?
- Which system has the most cores?
- Which system has the least cores?
- Can we see a difference between a regular weekday and weekends?
- Can we see a difference between a working hours and the other hours?
(optional) Exercise 3: Find hardware specifications¶
- Pick your favorite UPPMAX cluster
- Read the doc
- Find the hardware specifications of this cluster