Plot comms proofing

This commit is contained in:
hadley 2016-08-26 11:35:00 -05:00
parent d800a19a83
commit 891ab1d04d
1 changed files with 35 additions and 33 deletions

View File

@ -2,11 +2,11 @@
## Introduction
In [exploratory data analysis], you learned how to use plots as tools for _exploration_. When making plots for exploration, you knew---even before looking at them---which variables the plot would display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses, you'll produce tens or hundreds of plots, most of which are immediately discarded.
In [exploratory data analysis], you learned how to use plots as tools for _exploration_. When you make exploratory plots, you know---even before looking---which variables the plot will display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses, you'll produce tens or hundreds of plots, most of which are immediately thrown away.
Now you need to _communicate_ the results of your analysis to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. In this chapter, you'll learn some of the tools that ggplot2 provides to do so.
Now that you understand your data, you need to _communicate_ your understanding to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. In this chapter, you'll learn some of the tools that ggplot2 provides to do so.
This chapter focuses on the tools you need to create good graphics. I assume you have an idea of what you want, and just need to know how to do it. For that reason, I highly recommend pairing this advice with a good general visualisation book. I particularly like [_The Truthful Art_](https://amzn.com/0321934075), by Albert Cairo. It doesn't teach the mechanics of creating visualisations, but instead focuses on what you need to think about in order to create effective graphics.
This chapter focuses on the tools you need to create good graphics. I assume that you know what you want, and just need to know how to do it. For that reason, I highly recommend pairing this chapter with a good general visualisation book. I particularly like [_The Truthful Art_](https://amzn.com/0321934075), by Albert Cairo. It doesn't teach the mechanics of creating visualisations, but instead focuses on what you need to think about in order to create effective graphics.
### Prerequisites
@ -19,28 +19,30 @@ library(dplyr)
## Label
The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You can start with a plot title using `labs()`:
The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You add labels with the `labs()` function. This example adds a plot title:
```{r}
```{r, message = FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency decreases with engine size")
labs(title = "Fuel efficiency generally decreases with engine size")
```
Generally, titles describe the main finding in the plot, not just what plot displays. If you need to add more text, there are two other useful labels that you can use in ggplot2 2.2.0 and above (which should be available by the time you're reading this book):
The purpose of a plot title is to summarise the main finding. Avoid titles that just describe what the plot is, e.g. "A scatterplot of engine displacement vs. fuel economy".
If you need to add more text, there are two other useful labels that you can use in ggplot2 2.2.0 and above (which should be available by the time you're reading this book):
* `subtitle` adds additional detail in a smaller font beneath the title.
* `caption` adds text at the bottom right of the plot, often used to describe
the source of the data.
```{r}
```{r, message = FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel efficiency decreases with engine size",
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
)
@ -48,7 +50,7 @@ ggplot(mpg, aes(displ, hwy)) +
You can also use `labs()` to replace the axis and legend titles. It's usually a good idea to replace short variable names with more detailed descriptions, and to include the units.
```{r}
```{r, message = FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
@ -61,7 +63,7 @@ ggplot(mpg, aes(displ, hwy)) +
It's possible to use mathematical equations instead of text strings. Just switch `""` out for `quote()` and read about the available options in `?plotmath`:
```{r}
```{r, fig.asp = 1, out.width = "50%", fig.width = 3}
df <- tibble(
x = runif(10),
y = runif(10)
@ -152,7 +154,6 @@ label <- mpg %>%
hwy = max(hwy),
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
label
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
@ -181,9 +182,9 @@ In these examples, I manually broke the label up into lines using `"\n"`. Anothe
writeLines()
```
Also, note the use of `hjust` and `vjust` to control the alignment of the label. Figure \@ref(fig:just) shows all nine possible combinations.
Note the use of `hjust` and `vjust` to control the alignment of the label. Figure \@ref(fig:just) shows all nine possible combinations.
```{r just, echo = FALSE, fig.cap = "All nine combinations of `hjust` and `vjust`."}
```{r just, echo = FALSE, fig.cap = "All nine combinations of `hjust` and `vjust`.", fig.asp = 0.5, fig.width = 4.5, out.width = "60%"}
vjust <- c(bottom = 0, center = 0.5, top = 1)
hjust <- c(left = 0, center = 0.5, right = 1)
@ -197,7 +198,8 @@ df <- tidyr::crossing(hj = names(hjust), vj = names(vjust)) %>%
ggplot(df, aes(x, y)) +
geom_point(colour = "grey70", size = 5) +
geom_point(size = 0.5, colour = "red") +
geom_text(aes(label = label, hjust = hj, vjust = vj), size = 4)
geom_text(aes(label = label, hjust = hj, vjust = vj), size = 4) +
labs(x = NULL, y = NULL)
```
Remember, in addition to `geom_text()`, you have many other geoms in ggplot2 available to help annotate your plot. A few ideas:
@ -215,12 +217,12 @@ Remember, in addition to `geom_text()`, you have many other geoms in ggplot2 ava
to a point with an arrow. Use aesthetics `x` and `y` to define the
starting location, and `xend` and `yend` to define the end location.
The only limit is your imagination (and your patience at positioning annotations to be aesthetically pleasing)!
The only limit is your imagination (and your patience with positioning annotations to be aesthetically pleasing)!
### Exercises
1. Use `geom_text()` with infinite positions to place text at each corner
of the plot.
1. Use `geom_text()` with infinite positions to place text at of the
four corners of the plot.
1. Read the documentation for `annotate()`. How can you use it to add a text
label to a plot without having to create a tibble?
@ -237,14 +239,14 @@ The only limit is your imagination (and your patience at positioning annotations
## Scales
The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. When you type:
The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive. Normally, ggplot2 automatically adds scales for you. For example, when you type:
```{r default-scales, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
```
Behind the scenes, ggplot2 automatically adds default scales:
ggplot2 automatically adds default scales behind the scenes:
```{r, fig.show = "hide"}
ggplot(mpg, aes(displ, hwy)) +
@ -268,7 +270,7 @@ The default scales have been carefully chosen to do a good job for a wide range
### Axis ticks and legend keys
There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: `breaks` and `labels`. Breaks controls the position of the ticks, or the values associated with the keys. Labels controls the text label associated with each tick/key. The most common use of `breaks` is to override the defaults choice:
There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: `breaks` and `labels`. Breaks controls the position of the ticks, or the values associated with the keys. Labels controls the text label associated with each tick/key. The most common use of `breaks` is to override the default choice:
```{r}
ggplot(mpg, aes(displ, hwy)) +
@ -285,7 +287,7 @@ ggplot(mpg, aes(displ, hwy)) +
scale_y_continuous(labels = NULL)
```
You can also use `breaks` and `labels` to control the appearance of legends. Collecting axes and legends are called guides. Axes are used for x and y aesthetics; legends are used used for everything else.
You can also use `breaks` and `labels` to control the appearance of legends. Collectively axes and legends are called __guides__. Axes are used for x and y aesthetics; legends are used used for everything else.
Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur. For example, take this plot that shows when each US president started and ended their term.
@ -309,9 +311,9 @@ Note that the specification of breaks and labels for date and datetime scales is
You will most often use `breaks` and `labels` to tweak the axes. While they both also work for legends, there are a few other techniques you are more likely to use.
To control the overall position of the legend, you need to use a `theme()` setting. We'll come back to themes at the end of the chapter, but in brief, they control the non-data parts of the plot. The themes setting `legend.position` controls where the legend is drawn:
To control the overall position of the legend, you need to use a `theme()` setting. We'll come back to themes at the end of the chapter, but in brief, they control the non-data parts of the plot. The theme setting `legend.position` controls where the legend is drawn:
```{r fig.asp = 1, fig.align = "default", out.width = "50%", fig.width = 3}
```{r fig.asp = 1, fig.align = "default", out.width = "50%", fig.width = 4}
base <- ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class))
@ -335,7 +337,7 @@ ggplot(mpg, aes(displ, hwy)) +
### Replacing a scale
Instead of just tweaking the detail a little, you can also replace the scale altogether. We'll focus on colour scales because there are many options, and they're the scales you're mostly likely to want to change. The same principles apply to the other aesthetics. All colour scales have two variants: `scale_colour_x()` and `scale_fill_x()` for the `colour` and `fill` aesthetics respectively (the colour scales are available in both UK and US spellings).
Instead of just tweaking the details a little, you can instead replace the scale altogether. We'll focus on colour scales because there are many options, and they're the scales you're mostly likely to want to change. The same principles apply to the other aesthetics. All colour scales have two variants: `scale_colour_x()` and `scale_fill_x()` for the `colour` and `fill` aesthetics respectively (the colour scales are available in both UK and US spellings).
The default categorical scale picks colours that are evenly spaced around the colour wheel. Useful alternatives are the ColorBrewer scales which have been hand tuned to work better for people with common types of colour blindness. The two plots below look similar, but there is enough difference in the shades of red and green that the dots on the right can be distinguished even by people with red-green colour blindness.
@ -416,7 +418,7 @@ ggplot(df, aes(x, y)) +
1. Use `override.aes` to make the legend on the following plot easier to see.
```{r, dev = "png"}
```{r, dev = "png", out.width = "50%"}
ggplot(diamonds, aes(carat, price)) +
geom_point(aes(colour = cut), alpha = 1/20)
```
@ -431,7 +433,7 @@ There are three ways to control the plot limits:
To zoom in on a region of the plot, it's generally best to use `coord_cartesian()`. Compare the following two plots:
```{r out.width = "50%", fig.align = "default"}
```{r out.width = "50%", fig.align = "default", message = FALSE}
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
@ -478,13 +480,13 @@ ggplot(compact, aes(displ, hwy, colour = drv)) +
col_scale
```
In this particular case, you could have simply used faceting, but this technique is useful more generally, if for instance, you want to make your plots comparable even when spread across multiple pages of a report.
In this particular case, you could have simply used faceting, but this technique is useful more generally, if for instance, you want spread plots over multiple pages of a report.
## Themes
Finally, you can customize the non-data elements of your plot with a theme:
```{r}
```{r, message = FALSE}
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
@ -515,7 +517,7 @@ file.remove("my-plot.pdf")
If you don't specify the `width` and `height` they will be taken from the dimensions of the current plotting device. For reproducible code, you'll want to specify them.
Generally, however, I think you should be assembling your final reports using knitr and rmarkdown, so I want to focus on the important code chunk options that you should know about for graphics. You can learn more about `ggsave()` in the documentation.
Generally, however, I think you should be assembling your final reports using R Markdown, so I want to focus on the important code chunk options that you should know about for graphics. You can learn more about `ggsave()` in the documentation.
### Figure sizing
@ -554,13 +556,13 @@ plot
plot
```
If you want to make sure the font size is the same in all your figures, whenever you set `out.width`, you'll also need to adjust `fig.width` to maintain the same ratio with your default `out.width`. For example, if your default `fig.width` is 6 and `out.width` is 0.7, when you set `out.width = "50%"` you'll need to set `fig.width` to 4.2 (6 * 0.5 / 0.7).
If you want to make sure the font size is consistent across all your figures, whenever you set `out.width`, you'll also need to adjust `fig.width` to maintain the same ratio with your default `out.width`. For example, if your default `fig.width` is 6 and `out.width` is 0.7, when you set `out.width = "50%"` you'll need to set `fig.width` to 4.2 (6 * 0.5 / 0.7).
### Other important options
When mingling code and text, like I do in this book, I recommend setting `fig.show = "hold"` so that plots are shown after the code. This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.
To add a caption to the plot, use `fig.cap`. In RMarkdown this will change the figure from inline to "floating".
To add a caption to the plot, use `fig.cap`. In R Markdown this will change the figure from inline to "floating".
If you're producing PDF output, the default graphics type is PDF. This is a good default because PDFs are high quality vector graphics. However, they can produce very large and slow plots if you are displaying thousands of points. In that case, set `dev = "png"` to force the use of PNGs. They are slightly lower quality, but will be much more compact.
@ -570,4 +572,4 @@ It's a good idea to name code chunks that produce figures, even if you don't rou
The absolute best place to learn more is the ggplot2 book: [_ggplot2: Elegant graphics for data analysis_](https://amzn.com/331924275X). It goes into much more depth about the underlying theory, and has many more examples of how to combine the individual pieces to solve practical problems. Unfortunately the book is not available online for free, although you can find the source code at <https://github.com/hadley/ggplot2-book>.
Another great resource is the ggplot2 extensions guide <http://www.ggplot2-exts.org/>. This site lists many of the packages that extend ggplot2 with new geoms and scales. It's a great place to start if you're trying to do something that seems really hard with ggplot2.
Another great resource is the ggplot2 extensions guide <http://www.ggplot2-exts.org/>. This site lists many of the packages that extend ggplot2 with new geoms and scales. It's a great place to start if you're trying to do something that seems hard with ggplot2.