Parallel computation

Learning outcomes

  • Schedule and run a job that needs more cores, with a calculation in their favorite language
  • Understand when it is possible/impossible and/or useful/useless to run a job with multiple cores
For teachers

Teaching goals are:

  • Learners have scheduled and run a job that needs more cores, with a calculation in their favorite language
  • Learners understand when it is possible/impossible and/or useful/useless to run a job with multiple cores

Prior:

  • What is parallel computing?

Feedback:

  • When to use parallel computing?
  • When not to use parallel computing?

Arnold (at the left): a robot that was controlled by MPI Cora, the robotic platform for Arnold

Why parallel computing is important

Most HPC clusters use 10 days as a maximum duration for a job. Your calculation may take longer than that. One technique that may work is to use parallel computing, where one uses multiple CPU cores to work together on a same calculation

Types of ‘doing more things at the same time’

Type of parallelism Number of cores Number of nodes Memory Library
Single-threaded 1 1 As given by operating system None
Threaded/shared memory Multiple 1 Shared by all cores OpenMP
Distributed Multiple Multiple Distributed OpenMPI
  • Threaded parallelism: calculations that can use multiple cores with a shared memory.

  • Distributed programming. Uses a Message Passing Interface. For a job that use many different nodes, for example, a weather prediction.

  • Slurm job arrays: for running jobs that are embarassingly parallel, for example, running a simulation with different random numbers Not in this session

When to use parallel computing

  • Be aware of Amdahl’s law and/or Gustafson’s law
  • Single-threaded programs will never work

Output

Using 2 OpenMP threads 

               Core t (s)   Wall t (s)        (%)
       Time:       86.902       43.452      200.0
                 (ns/day)    (hour/ns)
Performance:        1.740       13.794
               Core t (s)   Wall t (s)        (%)
       Time:      100.447       50.224      200.0
                 (ns/day)    (hour/ns)
Performance:        1.591       15.082
               Core t (s)   Wall t (s)        (%)
       Time:      150.753       37.689      400.0
                 (ns/day)    (hour/ns)
Performance:        3.783        6.345
               Core t (s)   Wall t (s)        (%)
       Time:      292.200       36.526      800.0
                 (ns/day)    (hour/ns)
Performance:        6.446        3.723

Remember

  • Use --ntasks=N
  • Use srun
  • Use an MPI version of your software: a ‘regular’ non-MPI version will never work!

Julia stuff here

MATLAB stuff here

R stuff here