2019年6月16日 星期日

[ Py DS ] Ch4 - Visualization with Matplotlib (Part5)

Source From Here 

Text and Annotation 
Creating a good visualization involves guiding the reader so that the figure tells a story. In some cases, this story can be told in an entirely visual manner, without the need for added text, but in others, small textual cues and labels are necessary. Perhaps the most basic types of annotations you will use are axes labels and titles, but the options go beyond this. Let’s take a look at some data and how we might visualize and annotate it to help convey interesting information. We’ll start by setting up the notebook for plotting and importing the functions we will use: 
  1. %matplotlib inline  
  2. import matplotlib.pyplot as plt  
  3. import matplotlib as mpl  
  4. plt.style.use('seaborn-whitegrid')  
  5. import numpy as np  
  6. import pandas as pd  
Example: Effect of Holidays on US Births 
Let’s return to some data we worked with earlier in “Example: Birthrate Data” on page 174, where we generated a plot of average births over the course of the calendar year; as already mentioned, this data can be downloaded at https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv. 

We’ll start with the same cleaning procedure we used there, and plot the results (Figure 4-67): 
  1. births = pd.read_csv('births.csv')  
  2.   
  3. quartiles = np.percentile(births['births'], [255075])  
  4. mu, sig = quartiles[1], 0.74 * (quartiles[2] - quartiles[0])  
  5. births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')  
  6. births['day'] = births['day'].astype(int)  
  7. births.index = pd.to_datetime(10000 * births.year +  
  8. 100 * births.month +  
  9. births.day, format='%Y%m%d')  
  10. births_by_date = births.pivot_table('births',  
  11. [births.index.month, births.index.day])  
  12. births_by_date.index = [pd.datetime(2012, month, day) for (month, day) in births_by_date.index]  
  13.   
  14. fig, ax = plt.subplots(figsize=(124))  
  15. births_by_date.plot(ax=ax);  
Figure 4-67. Average daily births by date 

When we’re communicating data like this, it is often useful to annotate certain features of the plot to draw the reader’s attention. This can be done manually with the plt.text/ax.text command, which will place text at a particular x/y value (Figure 4-68): 
  1. fig, ax = plt.subplots(figsize=(124))  
  2. births_by_date.plot(ax=ax)  
  3. # Add labels to the plot  
  4. style = dict(size=10, color='gray')  
  5. ax.text('2012-1-1'3950"New Year's Day", **style)  
  6. ax.text('2012-7-4'4250"Independence Day", ha='center', **style)  
  7. ax.text('2012-9-4'4850"Labor Day", ha='center', **style)  
  8. ax.text('2012-10-31'4600"Halloween", ha='right', **style)  
  9. ax.text('2012-11-25'4450"Thanksgiving", ha='center', **style)  
  10. ax.text('2012-12-25'3850"Christmas ", ha='right', **style)  
  11.   
  12. # Label the axes  
  13. ax.set(title='USA births by day of year (1969-1988)',  
  14. ylabel='average daily births')  
  15.   
  16. # Format the x axis with centered month labels  
  17. ax.xaxis.set_major_locator(mpl.dates.MonthLocator())  
  18. ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))  
  19. ax.xaxis.set_major_formatter(plt.NullFormatter())  
  20. ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));  
Figure 4-68. Annotated average daily births by date 

The ax.text method takes an x position, a y position, a string, and then optional keywords specifying the color, size, style, alignment, and other properties of the text. Here we used ha='right' and ha='center', where ha is short for horizontal alignment. See the docstring of plt.text() and of mpl.text.Text() for more information on available options. 

Transforms and Text Position 
In the previous example, we anchored our text annotations to data locations. Sometimes it’s preferable to anchor the text to a position on the axes or figure, independent of the data. In Matplotlib, we do this by modifying the transform. Any graphics display framework needs some scheme for translating between coordinate systems. For example, a data point at x, y = 1, 1 needs to somehow be represented at a certain location on the figure, which in turn needs to be represented in pixels on the screen. Mathematically, such coordinate transformations are relatively straightforward, and Matplotlib has a well-developed set of tools that it uses internally to perform them (the tools can be explored in the matplotlib.transforms submodule). 

The average user rarely needs to worry about the details of these transforms, but it is helpful knowledge to have when considering the placement of text on a figure. There are three predefined transforms that can be useful in this situation: 
- ax.transData 
Transform associated with data coordinates

- ax.transAxes 
Transform associated with the axes (in units of axes dimensions)

- fig.transFigure 
Transform associated with the figure (in units of figure dimensions)

Here let’s look at an example of drawing text at various locations using these transforms (Figure 4-69): 
  1. fig, ax = plt.subplots(facecolor='lightgray')  
  2. ax.axis([010010])  
  3. # transform=ax.transData is the default, but we'll specify it anyway  
  4. ax.text(15". Data: (1, 5)", transform=ax.transData)  
  5. ax.text(0.50.1". Axes: (0.5, 0.1)", transform=ax.transAxes)  
  6. ax.text(0.20.2". Figure: (0.2, 0.2)", transform=fig.transFigure);  
Figure 4-69. Comparing Matplotlib’s coordinate systems 

Note that by default, the text is aligned above and to the left of the specified coordinates; here the “.” at the beginning of each string will approximately mark the given coordinate location. 

The transData coordinates give the usual data coordinates associated with the x- and y-axis labels. The transAxes coordinates give the location from the bottom-left corner of the axes (here the white box) as a fraction of the axes size. The transFigure coordinates are similar, but specify the position from the bottom left of the figure (here the gray box) as a fraction of the figure size. 

Notice now that if we change the axes limits, it is only the transData coordinates that will be affected, while the others remain stationary (Figure 4-70): 
  1. ax.set_xlim(02)  
  2. ax.set_ylim(-66)  
  3. fig  
Figure 4-70. Comparing Matplotlib’s coordinate systems 

You can see this behavior more clearly by changing the axes limits interactively; if you are executing this code in a notebook, you can make that happen by changing %mat plotlib inline to %matplotlib notebook and using each plot’s menu to interact with the plot. 

Arrows and Annotation 
Along with tick marks and text, another useful annotation mark is the simple arrow. 

Drawing arrows in Matplotlib is often much harder than you might hope. While there is a plt.arrow() function available, I wouldn’t suggest using it; the arrows it creates are SVG objects that will be subject to the varying aspect ratio of your plots, and the result is rarely what the user intended. Instead, I’d suggest using the plt.annotate() function. This function creates some text and an arrow, and the arrows can be very flexibly specified. 

Here we’ll use annotate with several of its options (Figure 4-71): 
  1. %matplotlib inline  
  2. fig, ax = plt.subplots(figsize=(127))  
  3. x = np.linspace(0201000)  
  4. ax.plot(x, np.cos(x))  
  5. ax.axis('equal')  
  6. ax.annotate('local maximum', xy=(6.281), xytext=(104),  
  7. arrowprops=dict(facecolor='black', shrink=0.05))  
  8. ax.annotate('local minimum', xy=(5 * np.pi, -1), xytext=(2, -5),  
  9. arrowprops=dict(arrowstyle="->", connectionstyle="angle3,angleA=0,angleB=-90"));  
Figure 4-71. Annotation examples 

The arrow style is controlled through the arrowprops dictionary, which has numerous options available. These options are fairly well documented in Matplotlib’s online documentation, so rather than repeating them here I’ll quickly show some of the possibilities. Let’s demonstrate several of the possible options using the birthrate plot from before (Figure 4-72): 
  1. fig, ax = plt.subplots(figsize=(124))  
  2. births_by_date.plot(ax=ax)  
  3. # Add labels to the plot  
  4. ax.annotate("New Year's Day", xy=('2012-1-1', 4100), xycoords='data',  
  5.             xytext=(50, -30), textcoords='offset points',  
  6.             arrowprops=dict(arrowstyle="->", connectionstyle="arc3,rad=-0.2"))  
  7.   
  8. ax.annotate("Independence Day", xy=('2012-7-4'4250), xycoords='data',   
  9.             bbox=dict(boxstyle="round", fc="none", ec="gray"),  
  10.             xytext=(10, -40), textcoords='offset points', ha='center', arrowprops=dict(arrowstyle="->"))  
  11.   
  12. ax.annotate('Labor Day', xy=('2012-9-4'4850), xycoords='data', ha='center',  
  13. xytext=(0, -20), textcoords='offset points')  
  14. ax.annotate('', xy=('2012-9-1'4850), xytext=('2012-9-7'4850), xycoords='data', textcoords='data',  
  15.             arrowprops={'arrowstyle''|-|,widthA=0.2,widthB=0.2', })  
  16. ax.annotate('Halloween', xy=('2012-10-31'4600), xycoords='data', xytext=(-80, -40), textcoords='offset points',  
  17.             arrowprops=dict(arrowstyle="fancy", fc="0.6", ec="none", connectionstyle="angle3,angleA=0,angleB=-90"))  
  18. ax.annotate('Thanksgiving', xy=('2012-11-25'4500), xycoords='data', xytext=(-120, -60), textcoords='offset points',  
  19.             bbox=dict(boxstyle="round4,pad=.5", fc="0.9"),  
  20.             arrowprops=dict(arrowstyle="->", connectionstyle="angle,angleA=0,angleB=80,rad=20"))  
  21. ax.annotate('Christmas', xy=('2012-12-25'3850), xycoords='data',  
  22.             xytext=(-300), textcoords='offset points',  
  23.             size=13, ha='right', va="center", bbox=dict(boxstyle="round", alpha=0.1),  
  24.             arrowprops=dict(arrowstyle="wedge,tail_width=0.5", alpha=0.1));  
  25.   
  26. # Label the axes  
  27. ax.set(title='USA births by day of year (1969-1988)',  
  28. ylabel='average daily births')  
  29. # Format the x axis with centered month labels  
  30. ax.xaxis.set_major_locator(mpl.dates.MonthLocator())  
  31. ax.xaxis.set_minor_locator(mpl.dates.MonthLocator(bymonthday=15))  
  32. ax.xaxis.set_major_formatter(plt.NullFormatter())  
  33. ax.xaxis.set_minor_formatter(mpl.dates.DateFormatter('%h'));  
  34. ax.set_ylim(36005400);  
Figure 4-72. Annotated average birth rates by day 

You’ll notice that the specifications of the arrows and text boxes are very detailed: this gives you the power to create nearly any arrow style you wish. Unfortunately, it also means that these sorts of features often must be manually tweaked, a process that can be very time-consuming when one is producing publication-quality graphics! Finally, I’ll note that the preceding mix of styles is by no means best practice for presenting data, but rather included as a demonstration of some of the available options. 

More discussion and examples of available arrow and annotation styles can be found in the Matplotlib gallery, in particular http://matplotlib.org/examples/pylab_examples/annotation_demo2.html. 

Customizing Ticks 
Matplotlib’s default tick locators and formatters are designed to be generally sufficient in many common situations, but are in no way optimal for every plot. This section will give several examples of adjusting the tick locations and formatting for the particular plot type you’re interested in. 

Before we go into examples, it will be best for us to understand further the object hierarchy of Matplotlib plots. Matplotlib aims to have a Python object representing everything that appears on the plot: for example, recall that the figure is the bounding box within which plot elements appear. Each Matplotlib object can also act as a container of sub-objects; for example, each figure can contain one or more axes objects, each of which in turn contain other objects representing plot contents. 

The tick marks are no exception. Each axes has attributes xaxis and yaxis, which in turn have attributes that contain all the properties of the lines, ticks, and labels that make up the axes. 

Major and Minor Ticks 
Within each axis, there is the concept of a major tick mark and a minor tick mark. As the names would imply, major ticks are usually bigger or more pronounced, while minor ticks are usually smaller. By default, Matplotlib rarely makes use of minor ticks, but one place you can see them is within logarithmic plots (Figure 4-73): 
  1. import numpy as np  
  2.   
  3. %matplotlib inline  
  4. import matplotlib.pyplot as plt  
  5. plt.style.use('classic')  
  6.   
  7. ax = plt.axes(xscale='log', yscale='log')  
  8. ax.set_xlim(0.011000)  
  9. ax.set_ylim(0.011000)  
Figure 4-73. Example of logarithmic scales and labels 

We see here that each major tick shows a large tick mark and a label, while each minor tick shows a smaller tick mark with no label. 

We can customize these tick properties—that is, locations and labels—by setting the formatter and locator objects of each axis. Let’s examine these for the x axis of the plot just shown: 
  1. print(ax.xaxis.get_major_locator())  
  2. print(ax.xaxis.get_minor_locator())  
  3. print(ax.xaxis.get_major_formatter())  
  4. print(ax.xaxis.get_minor_formatter())  
Output: 
<matplotlib.ticker.LogLocator object at 0x000001F5D39BC6D8> 
<matplotlib.ticker.LogLocator object at 0x000001F5BDB10D30> 
<matplotlib.ticker.LogFormatterSciNotation object at 0x000001F5D39BC860> 
<matplotlib.ticker.LogFormatterSciNotation object at 0x000001F5D3631518
>

We see that both major and minor tick labels have their locations specified by a LogLocator (which makes sense for a logarithmic plot). For formatter of major, LogFormatterMathtext is used here. We’ll now show a few examples of setting these locators and formatters for various plots. 

Hiding Ticks or Labels 
Perhaps the most common tick/label formatting operation is the act of hiding ticks or labels. We can do this using plt.NullLocator and plt.NullFormatter, as shown here (Figure 4-74): 
  1. plt.style.use('seaborn-whitegrid')  
  2. ax = plt.axes()  
  3. ax.plot(np.random.rand(50))  
  4. ax.yaxis.set_major_locator(plt.NullLocator())  
  5. ax.xaxis.set_major_formatter(plt.NullFormatter())  
Figure 4-74. Plot with hidden tick labels (x-axis) and hidden ticks (y-axis) 

Notice that we’ve removed the labels (but kept the ticks/gridlines) from the x axis, and removed the ticks (and thus the labels as well) from the y axis. Having no ticks at all can be useful in many situations—for example, when you want to show a grid of images. For instance, consider Figure 4-75, which includes images of different faces, an example often used in supervised machine learning problems (for more information, see “In-Depth: Support Vector Machines”): 
  1. fig, ax = plt.subplots(55, figsize=(55))  
  2. fig.subplots_adjust(hspace=0, wspace=0)  
  3. # Get some face data from scikit-learn  
  4. from sklearn.datasets import fetch_olivetti_faces  
  5.   
  6. faces = fetch_olivetti_faces().images  
  7. for i in range(5):  
  8.     for j in range(5):  
  9.         ax[i, j].xaxis.set_major_locator(plt.NullLocator())  
  10.         ax[i, j].yaxis.set_major_locator(plt.NullLocator())  
  11.         ax[i, j].imshow(faces[10 * i + j], cmap="bone")  
Figure 4-75. Hiding ticks within image plots 

Notice that each image has its own axes, and we’ve set the locators to null because the tick values (pixel number in this case) do not convey relevant information for this particular visualization. 

Reducing or Increasing the Number of Ticks 
One common problem with the default settings is that smaller subplots can end up with crowded labels. We can see this in the plot grid shown in Figure 4-76: 
  1. fig, ax = plt.subplots(44, sharex=True, sharey=True)  
 Figure 4-76. A default plot with crowded ticks 

Particularly for the x ticks, the numbers nearly overlap, making them quite difficult to decipher. We can fix this with the plt.MaxNLocator, which allows us to specify the maximum number of ticks that will be displayed. Given this maximum number, Matplotlib will use internal logic to choose the particular tick locations (Figure 4-77): 
  1. # For every axis, set the x and y major locator  
  2. for axi in ax.flat:  
  3.     axi.xaxis.set_major_locator(plt.MaxNLocator(3))  
  4.     axi.yaxis.set_major_locator(plt.MaxNLocator(3))  
  5.       
  6. fig  
 Figure 4-77. Customizing the number of ticks 

This makes things much cleaner. If you want even more control over the locations of regularly spaced ticks, you might also use plt.MultipleLocator, which we’ll discuss in the following section. 

Fancy Tick Formats 
Matplotlib’s default tick formatting can leave a lot to be desired; it works well as a broad default, but sometimes you’d like to do something more. Consider the plot shown in Figure 4-78, a sine and a cosine: 
  1. # Plot a sine and cosine curve  
  2. fig, ax = plt.subplots(figsize=(84))  
  3. x = np.linspace(03 * np.pi, 1000)  
  4. ax.plot(x, np.sin(x), lw=3, label='Sine')  
  5. ax.plot(x, np.cos(x), lw=3, label='Cosine')  
  6.   
  7. # Set up grid, legend, and limits  
  8. ax.grid(True)  
  9. ax.legend(frameon=False)  
  10. ax.axis('equal')  
  11. ax.set_xlim(03 * np.pi);  
 Figure 4-78. A default plot with integer ticks 

There are a couple changes we might like to make. First, it’s more natural for this data to space the ticks and grid lines in multiples of π. We can do this by setting a MultipleLocator, which locates ticks at a multiple of the number you provide. For good measure, we’ll add both major and minor ticks in multiples of π/4 (Figure 4-79): 
  1. ax.xaxis.set_major_locator(plt.MultipleLocator(np.pi / 2))  
  2. ax.xaxis.set_minor_locator(plt.MultipleLocator(np.pi / 4))  
  3. fig  
 Figure 4-79. Ticks at multiples of pi/2 

But now these tick labels look a little bit silly: we can see that they are multiples of π, but the decimal representation does not immediately convey this. To fix this, we can change the tick formatter. There’s no built-in formatter for what we want to do, so we’ll instead use plt.FuncFormatter, which accepts a user-defined function giving fine-grained control over the tick outputs (Figure 4-80): 
  1. def format_func(value, tick_number):  
  2.     # find number of multiples of pi/2  
  3.     N = int(np.round(2 * value / np.pi))  
  4.     if N == 0:  
  5.         return "0"  
  6.     elif N == 1:  
  7.         return r"$\pi/2$"  
  8.     elif N == 2:  
  9.         return r"$\pi$"  
  10.     elif N % 2 > 0:  
  11.         return r"${0}\pi/2$".format(N)  
  12.     else:  
  13.         return r"${0}\pi$".format(N // 2)  
  14.       
  15. ax.xaxis.set_major_formatter(plt.FuncFormatter(format_func))  
  16. fig  
Figure 4-80. Ticks with custom labels 

This is much better! Notice that we’ve made use of Matplotlib’s LaTeX support, specified by enclosing the string within dollar signs. This is very convenient for display of mathematical symbols and formulae; in this case, "$\pi$" is rendered as the Greek character π. 

The plt.FuncFormatter offers extremely fine-grained control over the appearance of your plot ticks, and comes in very handy when you’re preparing plots for presentation or publication. 

Summary of Formatters and Locators 
We’ve mentioned a couple of the available formatters and locators. We’ll conclude this section by briefly listing all the built-in locator and formatter options. For more information on any of these, refer to the docstrings or to the Matplotlib online documentation. Each of the following is available in the plt namespace: 


Supplement 
Python Data Science Handbook - Text and Annotation 
Python Data Science Handbook - Customizing Ticks 
Matplotlib Doc - Major and minor ticks

[ Py DS ] Ch4 - Visualization with Matplotlib (Part5)

Source From  Here   Text and Annotation   Creating a good visualization involves guiding the reader so that the figure tells a story. In s...