# High Performance Computing — HPC ```{objectives} - Let's recap and go a little further into the UPPMAX hardware! ``` ## HPC, HTC and MTC - The Buzz word is **HPC or High Performance Computing**, but this is rather narrow focusing on fast calculation, i.e. processors and parallelism - Many of your projects are more focusing on high throughput, large memory demands and many tasks. - Here is a list of the three most common **Computing paradigms**: - **HPC**: High Performance Computing — Focus on floating point operations per second (**FLOPS**, flops or flop/s) - characterized as needing large amounts of computing power for short periods of time - **HTC**: High-Throughput Computing — - operations or **jobs per month or per year**. - more interested in how many jobs can be completed over a long period of time instead of how fast. - independent, sequential jobs that can be individually scheduled o - **MTC**: Many-task Computing — emphasis of using many computing resources over short periods of time to accomplish many computational tasks - bridge the gap between HTC and HPC. - reminiscent of HTC, but including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. **FLOPS**, tasks/s, **MB/s** **I/O rates**), as opposed to operations (e.g. jobs) per month. - high-performance computations comprising multiple distinct activities, coupled via file system operations. ## What is a cluster? - A network of computers, each computer working as a node. - From small scale RaspberryPi cluster... ![RaspBerry](./img/IMG_5111.jpeg) - To supercomputers like Rackham. ![Rackham](./img/uppmax-light2.jpg) - Each node contains several processor cores and RAM and a local disk called scratch. ![Node](./img/node.png) - The user logs in to login nodes via Internet through ssh or Thinlinc. - Here the file management and lighter data analysis can be performed. ![RaspBerry](./img/nodes.png) ![RaspBerry](./img/Bild1.png) - The calculation nodes has to be used for intense computing. - "Normal" softwares use one core. - Parallelized software can utilize several cores or even several nodes. Keywords signalizing this are e.g.: - "multi-threaded", "MPI", "distributed memory", "openMP", "shared memory". - To let your software run on the calculation nodes - start an "interactive session" or - "submit a batch job". - More about this in today's introduction to jobs. ## Storage basics - All nodes can access: - Your home directory on Domus or Castor - Your project directories on Crex or Castor - Its own local scratch disk (2-3 TB) - If you’re reading/writing a file once, use a directory on Crex or Castor - If you’re reading/writing a file many times... - Copy the file to ”scratch”, the node local disk: ``` cp myFile $SNIC_TMP ``` ## The UPPMAX hardware ### Clusters - We have a number of compute clusters: - [Rackham](https://www.uppmax.uu.se/resources/systems/the-rackham-cluster/) , reserved for SNIC projects - [Snowy](https://www.uppmax.uu.se/resources/systems/the-snowy-cluster/), GPU, long jobs reserved for UPPMAX projects and Education - [Bianca](https://www.uppmax.uu.se/resources/systems/the-bianca-cluster/) , a part of SNIC-SENS - [Miarka](https://www.uppmax.uu.se/resources/systems/miarka-cluster/), reserved for Scilifelab production - [UPPMAX cloud](https://www.uppmax.uu.se/resources/systems/the-uppmax-cloud/), a part of SNIC Science Cloud - [User guides](https://www.uppmax.uu.se/support/user-guides/) - The storage systems we have provide a total volume of about 25 PB, the equivalent of 50,000 years of 128-bit encoded music. Read more on the [storage systems page](https://www.uppmax.uu.se/resources/systems/storage-systems/). ### UPPMAX storage system names (projects & home directories) - Rackham storage: Crex & Domus - Bianca storage: Castor & Cygnus - NGI production system (Miarka): Vulpes - NGI delivery server: Grus - Off-load storage: Lutra ### System usage [System usage](https://www.uppmax.uu.se/resources/system-usage/) - More about the systems can be found at the [System resources page](https://www.uppmax.uu.se/resources/systems/) ### A little bit more about Snowy - [User guide](https://www.uppmax.uu.se/support/user-guides/snowy-user-guide/). - There is a [local compute round](https://supr.snic.se/round/uppmaxcompute2021/) for UU users applying for Snowy in SUPR. - GU (courses) applications (including GU GPU usage) are not done in SUPR, but are supposed to be routed through the service desk. - The details can be found at the [Getting started page](https://www.uppmax.uu.se/support/getting-started/course-projects/). ### About Bianca? - Wait for it! ## Summary about the three "common" UPPMAX clusters | |Rackham|Snowy|Bianca| |-------|-----|------|---| |**Purpose**|General-purpose|General-purpose|Sensitive| |**# Nodes (Intel)**|486+144|228+
50 Nvidia T4 GPUs|288 +
10 nodes á 2
NVIDIA A100 GPUs| |**Cores per node**|20/16|16|16/64| |**Memory per node**|128 GB|128 GB|128 GB |**Fat nodes**|256 GB & 1 TB| 256, 512 GB & 4 TB| 256 & 512 GB| |**Local disk (scratch)**|2/3 TB| 4 TB| 4 TB | |**Login nodes**|Yes| No (reached from Rackham)|Yes (2 cores and 15 GB)| |**"Home" storage**|Domus|Domus|Castor| |**"Project" Storage**|Crex, Lutra|Crex, Lutra|Castor| ## Overview of the UPPMAX systems ```{mermaid} graph TB Node1 -- interactive --> SubGraph2Flow Node1 -- sbatch --> SubGraph2Flow subgraph "Snowy" SubGraph2Flow(calculation nodes) end thinlinc -- usr-sensXXX + 2FA----> SubGraph1Flow Node1 -- usr-sensXXX + 2FA----> SubGraph1Flow subgraph "Bianca" SubGraph1Flow(Bianca login) -- usr+passwd --> private(private cluster) private -- interactive --> calcB(calculation nodes) private -- sbatch --> calcB end subgraph "Rackham" Node1[Login] -- interactive --> Node2[calculation nodes] Node1 -- sbatch --> Node2 end ``` ```{keypoints} - UPPMAX has several clusters - each having its focus and limitation or possibilites - access is determined by type of project ```