matplotlib

The matplotlib logo

Learning outcomes

At the end of this sessions, learners …

  • have practiced using the documentation of favorite HPC cluster

  • understand what matplotlib is

  • understand why matplotlib is important

  • have run Python code that uses matplotlib

  • have created a plot with matplotlib

  • (optional) have created a plot with matplotlib from a pandas table

What is matplotlib?

matplotlib allows you to create figures:

import matplotlib.pyplot as plt
plt.plot([0, 1, 4, 9, 16])
plt.show()

Which shows:

A minimal  plot

Why matplotlib is important

matplotlib is one of the most popular Python plotting libraries. It can be used to create publication-quality figures and the matplotlib plot types overview shows that most plot types are present.

Exercises

Exercise 1: minimal code

Go to the documentation of the HPC cluster you work on.

In that documentation, find the software module to load the matplotlib Python package.

In a terminal (on your HPC cluster), load the software module to use matplotlib

On your HPC cluster, create a script called matplotlib_exercise_1.py with the following code:

import matplotlib
print(matplotlib.__version__)

Run matplotlib_exercise_1.py.

What do you see?

Even though the code shows nothing directly useful, why is this a useful exercise anyways?

Exercise 2: a minimal plot

On your HPC cluster, create a script called matplotlib_exercise_2.py with the following code:

import matplotlib.pyplot as plt
plt.plot([0, 1, 4, 9, 16])
plt.savefig("matplotlib_exercise_2.png")

Run matplotlib_exercise_2.py.

Check that the figure is created

(optional) Exercise 3: displaying a pandas table

In this exercise, we will again use the ‘diamonds’ dataset (as a comma-separated file): a dataset about diamonds.

This dataset contains information about more than fifty thousand diamonds. Two such features are the weight (in carats) and the price (in USD). Here we want to use an image to display the relationship between these two.

Create a script called matplotlib_exercise_3.py. In that script:

  • use pandas to read the dataset (as done in the pandas session)

  • use matplotlib to create a scatter plot from that data: Put the diamond weight on the x-axis and the diamond price on the y-axis. Use the matplotlib documentation, a search engine or an AI chatbot for the answer.

  • save the plot as matplotlib_exercise_3.png Use the matplotlib documentation, a search engine or an AI chatbot for the answer.

(optional) Exercise 4: making the plot pretty

Use the matplotlib documentation to improve the plot, for example:

  • Add a title

  • Add titles to the axes

  • Add a linear trendline

  • Whatever you like

(optional) Exercise 5: should I use matplotlib or seaborn?

Search the academic literature to answer the question if you should use matplotlib or seaborn, for example by searching Google Scholar for ‘matplotlib versus seaborn’.

Which paper will you find?

What does the paper conclude, regarding using matplotlib or seaborn?

Done?

Go to the session about seaborn

References

  • [Sial et al., 2021] Sial, Ali Hassan, Syed Yahya Shah Rashdi, and Abdul Hafeez Khan. “Comparative analysis of data visualization libraries Matplotlib and Seaborn in Python.” International Journal 10.1 (2021): 277-281.