A Brief Intro to Matplotlib

Matplotlib is one of the most popular and flexible function libraries for data visualization in use today. This crash course is meant to summarize and compliment the official documentation, but you are encouraged to refer to the original documentation for fuller explanations of function arguments.

Prerequisites

In order to follow this course, you will need to be familiar with:

  • The Python 3.X language, data structures (e.g. dictionaries), and built-in functions (e.g. string manipulation functions)

  • NumPy: array I/O and manipulation

It will also help to have experience with:

  • SciPy

  • Pandas

  • LaTeX math typesetting

You should be familiar with the meanings of the terms args and kwargs, since they will appear frequently:

  • args refer to positional arguments, which are usually mandatory, but not always. These always come before the kwargs.

  • kwargs are short for keyword arguments. These are usually optional, but it’s fairly common for some python functions to require a variable subset of all available kwargs dependent on previous inputs. These always come after args.

Load and Run

In most cases, you will need to load a compatible version of SciPy-bundle to use NumPy, which you will need to create or prepare data for plotting.

from matplotlib import pyplot as plt

In all cases, once you have reached the stage where you are at a Python command prompt or in a script file, the first thing you will have to do to use any plotting commands is import matplotlib.pyplot. Usually the command is written as it is in the title of this info box, but you can also just write import matplotlib.pyplot as plt for short. If you work in a development environment like Spyder, that will often be the only package that you need out of Matplotlib.

If you use Matplotlib at the command line, you will need to load the module Tkinter and then, after importing matplotlib, set matplotlib.use('TkAgg') in your script or at the Python prompt in order to view your plots.

Alternatively, you can use a GUI, either JupyterLab or Spyder, but you will still have to pre-load Matplotlib and any other modules you want to use (if you forget any, you’ll have to close the GUI and reopen it after loading the missing modules) before loading either of them. The command to start Jupyter Lab after you load it is jupyter-lab, and the Spyder launch command is spyder3. The only version of Spyder available is pretty old, but the backend should work as-is.

As of 27-11-2024, ml spider matplotlib outputs the following versions:

----------------------------------------------------------------------------
  matplotlib:
----------------------------------------------------------------------------
     Versions:
        matplotlib/2.2.4-Python-2.7.15
        matplotlib/2.2.4-Python-2.7.16
        matplotlib/2.2.4 (E)
        matplotlib/2.2.5-Python-2.7.18
        matplotlib/2.2.5 (E)
        matplotlib/3.1.1-Python-3.7.4
        matplotlib/3.1.1 (E)
        matplotlib/3.2.1-Python-3.8.2
        matplotlib/3.2.1 (E)
        matplotlib/3.3.3
        matplotlib/3.3.3 (E)
        matplotlib/3.4.2
        matplotlib/3.4.2 (E)
        matplotlib/3.4.3
        matplotlib/3.4.3 (E)
        matplotlib/3.5.2-Python-3.8.6
        matplotlib/3.5.2
        matplotlib/3.5.2 (E)
        matplotlib/3.7.0
        matplotlib/3.7.0 (E)
        matplotlib/3.7.2
        matplotlib/3.7.2 (E)
        matplotlib/3.8.2
        matplotlib/3.8.2 (E)

Names marked by a trailing (E) are extensions provided by another module.

Controlling the Display

Command Line. For Python 3.11.x, a Tkinter-based backend is typically required to generate figure popups when you type plt.show() at the command line (on Dardel this is preset). Backends are engines for either displaying figures or writing them to image files (see the matplotlib docs page on backends for more detail for more info). To set the appropriate backend:

  1. import the top-level matplotlib package

2. run matplotlib.use('TkAgg') before doing any plotting (if you forget, you can set it at any time). (3.) If for some reason that backend or the default backend doesn’t work, you can try matplotlib.use('Qt5Agg').

Jupyter. In Jupyter, after importing matplotlib or any of its sub-modules, you typically need to add % matplotlib inline before you make any plots. You should not need to set matplotlib.use().

Spyder. In Spyder, the default setting is for figures to be displayed either in-line at the IPython console or in a “Graphics” tab in the upper right, depending on the version. In either case, the graphic will be small and not the best use of the resources Spyder makes available. To make figures appear in an interactive popup:

  • go to “Preferences”, then “IPython console”, and click the “Graphics” tab

  • toggle the drop-down menu to the right of “Backend” and select “Automatic”.

These settings should be retained from session to session, so you only have to do it the first time you run Spyder. The interactive popup for Spyder offers extensive editing and saving options.

Matplotlib uses a default resolution of 100 dpi and a default figure size of 6.4” x 4.8” (16.26 x 12.19 cm) in GUIs and with the default backend. The inline backend in Jupyter (what the % matplotlib inline command sets) uses an even lower-res default of 80 dpi.

  • The dpi kwarg in plt.figure() or plt.subplots() (not a valid kwarg in plt.subplot() singular) lets you change the figure resolution at runtime. For on-screen display, 100-150 dpi is fine as long as you don’t set figsize too big, but publications often request 300 DPI.

  • The figsize = (i,j) kwarg in plt.figure() and plt.subplots() also lets you adjust the figure size and aspect ratio. The default unit is inches.

Follow the preceding sections to get to the stage of importing matplotlib.pyplot and numpy in your choice of interface, on your local computing resource.

Basic Terms and Application Programming Interface (API)

The Matplotlib documentation has a nicely standardized vocabulary for the different components of its output graphics. For all but the simplest plots, you will need to know what the different components are called and what they do so that you know how to access and manipulate them.

  • Figure: the first thing you do when you create a plot is make a Figure instance. It’s essentially the canvas, and it contains all other components.

  • Axes: most plots have 1 or more sets of Axes, which are the grids on which the plots are drawn, plus all text that labels the axes and their increments.

  • Axis: each individual axis is its own object. This lets you control the labels, increments, scaling, text format, and more.

  • Artist: In Python, almost everything is an object. In Matplotlib, the figure and everything on it are customizable objects, and every object is an Artist–every axis, data set, annotation, legend, etc. This word typically only comes up in the context of functions that create more complicated plot elements, like polygons or color bars.

For everything else on a typical plot, there’s this handy graphic:

anatomy

fig? ax? What are those?

There are 2 choices of application programming interface (API, basically a standardized coding style) in Matplotlib:

  1. Implicit API: the quick and dirty way to visualize isolated data sets if you don’t need to fiddle with the formatting.

  2. Explicit API (recommended): the method that gives you handles to the figure and axes objects (typically denoted fig and ax/axes, respectively) so you can adjust the formatting and/or accommodate multiple subplots.

Most people’s first attempt to plot something in matplotlib looks like the following example of the implicit API. The user simply imports matplotlib.pyplot (usually as plt) and then plugs their data into their choice of plotting function, plt.<function>(*args,**kwargs).

import numpy as np
import matplotlib.pyplot as plt
# this code block uses Jupyter to execute
%matplotlib inline
x = np.linspace(0,2*np.pi, 50)   # fake some data
# Minimum working example with 2 functions
plt.plot(x,3+3*np.sin(x),'b-',
         x, 2+2*np.cos(x), 'r-.')
plt.xlabel('x [rads]')
plt.ylabel('y')
plt.title('Demo Plot - Implicit API')
plt.show()
../_images/new-matplotlib-intro_0_0.png

The explicit API looks more like the following example.

import numpy as np
import matplotlib.pyplot as plt
# this code block uses Jupyter to execute
%matplotlib inline
x = np.linspace(0,2*np.pi, 50)
# Better way for later formatting
fig, ax = plt.subplots()
ax.plot(x,3+3*np.sin(x),'b-')
ax.plot(x, 2+2*np.cos(x), 'r-.')
ax.set_xlabel('x [rads]')
ax.set_ylabel('y')
ax.set_title('Demo Plot - Explicit API')
plt.show()
../_images/new-matplotlib-intro_1_0.png

A figure and a set of axes objects are created explicitly, usually with fig,axes = plt.subplots(nrows=nrows, ncols=ncols), even if there will be only 1 set of axes (in which case the nrows and ncols kwargs are omitted). Then the vast majority of the plotting and formatting commands are called as methods of the axes object (with the most oft-encountered exception being fig.colorbar(); see this article on colorbar placement for details). Notice that most of the formatting methods now start with set_ when called upon an axes object.

The outputs look the same for both of these examples because the plot type was chosen to work with both APIs, but the explicit API offers a much wider range of plot types and customizations.

Let x be an array of 50 values from -5 to 5. Plot y = 1/(1+exp(-x)).

Saving your Data

The Matplotlib GUI has a typical save menu option (indicated by the usual floppy disc icon) that lets you set the name, file type, and location. To save from your code or at the command line, there are 2 options:

  • plt.savefig(fname, *, transparent=None, dpi='figure', format=None) is the general-purpose save function. There are other kwargs not shown here, but these are the most important. The file type can be given format or inferred from an extension given in fname. The default dpi is inherited from plt.figure() or plt.subplots(). If transparent=True, the white background of a typical figure is removed so the figure can be displayed on top of other content.

  • plt.imsave(fname, arr, **kwargs) is specifically for saving arrays to images. It accepts a 2D (single-channel) array with a specified colormap and normalization, or an RGB(A) array (a stack of images in 3 color channels, or 3 color channels and an opacity array). Generally you also have to set origin='lower' for the image to be rendered right-side up.

A few common formats that Matplotlib supports include PDF, PS, EPS, PNG, and JPG/JPEG. Other desirable formats like TIFF and SVG are not supported natively in interactive display backends, but can be used with static backends (used for saving figures without displaying them) or with the installation of the Pillow module. At most facilities, Pillow is loaded with Matplotlib, so you will see SVG as a save-format option in the GUI. Matplotlib has a tutorial here on importing images into arrays for use with pyplot.imshow().

Rerun your earlier example and save it as an SVG file if the option is available, PDF otherwise.

Standard Available Plot Types

These are the categories of plots that come standard with any Matplotlib distribution:

  1. Pairwise plots (which accept 1D arrays of x and y data to plot against each other),

  2. Statistical plots (which can be pairwise or other array-like data),

  3. Gridded data plots (for image-like data, vector fields, and contours),

  4. Irregularly gridded data plots (which rely on some kind of triangulation)*, and

  5. Volumetric data plots.

Almost all available plot types are visually indexed and easy to find in the Matplotlib official documentation.

* Quick note on contouring functions on irregular grids: these functions contour by the values Z at triangulation vertices (X,Y), not by spatial point density, and so should not be used if Z values are not spatially correlated. If you want to contour by data point density in parameter-space, you still have to interpolate your data to a regular (X,Y) grid.

Volumetric, polar, and other data that rely on 3D or non-cartesian grids typically require you to specify a projection before you can choose the right plot type. For example, for a polar plot, you could

  • set fig, ax = plt.subplots(subplot_kw = {"projection": "polar"}) to set all subplots to the same projection,

  • set ax = plt.subplot(nrows, ncols, index, projection='polar') to add one polar subplot to a group of subplots with different coordinate systems or projections, or

  • set ax = plt.figure().add_subplot(projection='polar') if you only need 1 set of axes in total.

For volumetric data, the options are similar:

  • fig, ax = plt.subplots(subplot_kw = {"projection": "3d"}) for multiple subplots with the same projection,

  • ax = plt.subplot(nrows, ncols, index, projection='3d') for one 3D subplot among several with varying projections or coordinate systems, or

  • ax = plt.figure().add_subplot(projection='3d') for a singular plot.

Colors and colormaps. Every plotting method accepts either a single color (the kwarg for which may be c or color) or a colormap (which is usually cmap in kwargs) depending on the shape of the data. Matplotlib has an excellent series of pages on how to specify colors and transparency, how to adjust colormap normalizations, and which colormaps to choose based on the types of data and your audience.

Keypoints

  • Matplotlib is the essential Python data visualization package, with nearly 40 different plot types to choose from depending on the shape of your data and which qualities you want to highlight.

  • Almost every plot will start by instantiating the figure, fig (the blank canvas), and 1 or more axes objects, ax, with fig, ax = plt.subplots(*args, **kwargs).

  • There are several ways to tile subplots depending on how many there are, how they are shaped, and whether they require non-Cartesian coordinate systems.

  • Most of the plotting and formatting commands you will use are methods of Axes objects. (A few, like colorbar are methods of the Figure, and some commands are methods both.)