Communicate plots (#1114)

* Update before merge to visualize

* Move figure sizing + some chunk opts to Quarto chp
This commit is contained in:
Mine Cetinkaya-Rundel 2022-10-29 22:24:35 -04:00 committed by GitHub
parent b38829f31e
commit 9bba5cb695
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 147 additions and 116 deletions

View File

@ -9,7 +9,7 @@ status("drafting")
## Introduction
In \[exploratory data analysis\], you learned how to use plots as tools for *exploration*.
In @sec-exploratory-data-analysis, you learned how to use plots as tools for *exploration*.
When you make exploratory plots, you know---even before looking---which variables the plot will display.
You made each plot for a purpose, could quickly look at it, and then move on to the next plot.
In the course of most analyses, you'll produce tens or hundreds of plots, most of which are immediately thrown away.
@ -21,14 +21,14 @@ In this chapter, you'll learn some of the tools that ggplot2 provides to do so.
This chapter focuses on the tools you need to create good graphics.
We assume that you know what you want, and just need to know how to do it.
For that reason, we highly recommend pairing this chapter with a good general visualisation book.
For that reason, we highly recommend pairing this chapter with a good general visualization book.
We particularly like [*The Truthful Art*](https://www.amazon.com/gp/product/0321934075/), by Albert Cairo.
It doesn't teach the mechanics of creating visualisations, but instead focuses on what you need to think about in order to create effective graphics.
It doesn't teach the mechanics of creating visualizations, but instead focuses on what you need to think about in order to create effective graphics.
### Prerequisites
In this chapter, we'll focus once again on ggplot2.
We'll also use a little dplyr for data manipulation, and a few ggplot2 extension packages, including **ggrepel** and **viridis**.
We'll also use a little dplyr for data manipulation, and a few ggplot2 extension packages, including **ggrepel** and **patchwork**.
Rather than loading those extensions here, we'll refer to their functions explicitly, using the `::` notation.
This will help make it clear which functions are built into ggplot2, and which come from other packages.
Don't forget you'll need to install those packages with `install.packages()` if you don't already have them.
@ -54,7 +54,7 @@ ggplot(mpg, aes(displ, hwy)) +
labs(title = "Fuel efficiency generally decreases with engine size")
```
The purpose of a plot title is to summarise the main finding.
The purpose of a plot title is to summarize the main finding.
Avoid titles that just describe what the plot is, e.g. "A scatterplot of engine displacement vs. fuel economy".
If you need to add more text, there are two other useful labels that you can use in ggplot2 2.2.0 and above:
@ -114,11 +114,23 @@ ggplot(df, aes(x, y)) +
### Exercises
1. Create one plot on the fuel economy data with customised `title`, `subtitle`, `caption`, `x`, `y`, and `colour` labels.
1. Create one plot on the fuel economy data with customized `title`, `subtitle`, `caption`, `x`, `y`, and `colour` labels.
2. The `geom_smooth()` is somewhat misleading because the `hwy` for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines.
Use your modelling tools to fit and display a better model.
<!--# TO DO: Reconsider this exercise in light of removing modeling chapters. -->
2. Recreate the following plot using the fuel economy data.
Note that both the colors and shapes of points vary by type of drive train.
```{r}
#| echo: false
ggplot(mpg, aes(cty, hwy, color = drv, shape = drv)) +
geom_point() +
labs(
x = "City MPG",
y = "Highway MPG",
shape = "Type of\ndrive train",
color = "Type of\ndrive train"
)
```
3. Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.
@ -192,10 +204,10 @@ ggplot(mpg, aes(displ, hwy, colour = class)) +
```
Alternatively, you might just want to add a single label to the plot, but you'll still need to create a data frame.
Often, you want the label in the corner of the plot, so it's convenient to create a new data frame using `summarise()` to compute the maximum values of x and y.
Often, you want the label in the corner of the plot, so it's convenient to create a new data frame using `summarize()` to compute the maximum values of x and y.
```{r}
label <- mpg |>
label_info <- mpg |>
summarise(
displ = max(displ),
hwy = max(hwy),
@ -204,14 +216,14 @@ label <- mpg |>
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
geom_text(data = label_info, aes(label = label), vjust = "top", hjust = "right")
```
If you want to place the text exactly on the borders of the plot, you can use `+Inf` and `-Inf`.
Since we're no longer computing the positions from `mpg`, we can use `tibble()` to create the data frame:
```{r}
label <- tibble(
label_info <- tibble(
displ = Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
@ -219,7 +231,7 @@ label <- tibble(
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
geom_text(aes(label = label), data = label, vjust = "top", hjust = "right")
geom_text(data = label_info, aes(label = label), vjust = "top", hjust = "right")
```
In these examples, we manually broke the label up into lines using `"\n"`.
@ -227,7 +239,7 @@ Another approach is to use `stringr::str_wrap()` to automatically add line break
```{r}
"Increasing engine size is related to decreasing fuel economy." |>
stringr::str_wrap(width = 40) |>
str_wrap(width = 40) |>
writeLines()
```
@ -246,7 +258,7 @@ Note the use of `hjust` and `vjust` to control the alignment of the label.
vjust <- c(bottom = 0, center = 0.5, top = 1)
hjust <- c(left = 0, center = 0.5, right = 1)
df <- tidyr::crossing(hj = names(hjust), vj = names(vjust)) |>
df <- crossing(hj = names(hjust), vj = names(vjust)) |>
mutate(
y = vjust[vj],
x = hjust[hj],
@ -257,7 +269,7 @@ ggplot(df, aes(x, y)) +
geom_point(colour = "grey70", size = 5) +
geom_point(size = 0.5, colour = "red") +
geom_text(aes(label = label, hjust = hj, vjust = vj), size = 4) +
labs(x = NULL, y = NULL)
labs(x = NULL, y = NULL)
```
Remember, in addition to `geom_text()`, you have many other geoms in ggplot2 available to help annotate your plot.
@ -285,7 +297,7 @@ The only limit is your imagination (and your patience with positioning annotatio
3. How do labels with `geom_text()` interact with faceting?
How can you add a label to a single facet?
How can you put a different label in each facet?
(Hint: think about the underlying data.)
(Hint: Think about the underlying data.)
4. What arguments to `geom_label()` control the appearance of the background box?
@ -403,7 +415,7 @@ base + theme(legend.position = "right") # the default
You can also use `legend.position = "none"` to suppress the display of the legend altogether.
To control the display of individual legends, use `guides()` along with `guide_legend()` or `guide_colourbar()`.
To control the display of individual legends, use `guides()` along with `guide_legend()` or `guide_colorbar()`.
The following example shows two important settings: controlling the number of rows the legend uses with `nrow`, and overriding one of the aesthetics to make the points bigger.
This is particularly useful if you have used a low `alpha` to display many points on a plot.
@ -448,8 +460,8 @@ ggplot(diamonds, aes(carat, price)) +
scale_y_log10()
```
Another scale that is frequently customised is colour.
The default categorical scale picks colours that are evenly spaced around the colour wheel.
Another scale that is frequently customized is colour.
The default categorical scale picks colors that are evenly spaced around the colour wheel.
Useful alternatives are the ColorBrewer scales which have been hand tuned to work better for people with common types of colour blindness.
The two plots below look similar, but there is enough difference in the shades of red and green that the dots on the right can be distinguished even by people with red-green colour blindness.
@ -468,7 +480,7 @@ ggplot(mpg, aes(displ, hwy)) +
```
Don't forget simpler techniques.
If there are just a few colours, you can add a redundant shape mapping.
If there are just a few colors, you can add a redundant shape mapping.
This will also help ensure your plot is interpretable in black and white.
```{r}
@ -492,27 +504,26 @@ par(mar = c(0, 3, 0, 0))
RColorBrewer::display.brewer.all()
```
When you have a predefined mapping between values and colours, use `scale_colour_manual()`.
When you have a predefined mapping between values and colors, use `scale_colour_manual()`.
For example, if we map presidential party to colour, we want to use the standard mapping of red for Republicans and blue for Democrats:
```{r}
presidential |>
mutate(id = 33 + row_number()) |>
ggplot(aes(start, id, colour = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_colour_manual(values = c(Republican = "red", Democratic = "blue"))
```
For continuous colour, you can use the built-in `scale_colour_gradient()` or `scale_fill_gradient()`.
If you have a diverging scale, you can use `scale_colour_gradient2()`.
That allows you to give, for example, positive and negative values different colours.
That allows you to give, for example, positive and negative values different colors.
That's sometimes also useful if you want to distinguish points above or below the mean.
Another option is `scale_colour_viridis()` provided by the **viridis** package.
It's a continuous analog of the categorical ColorBrewer scales.
The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored a continuous colour scheme that has good perceptual properties.
Here's an example from the viridis vignette.
Another option is to use the viridis color scales.
The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored continuous colour schemes that are perceptible to people with various forms of colour blindness as well as perceptually uniform in both color and black and white.
These scales are available as continuous (`c`), discrete (`d`), and binned (`b`) palettes in ggplot2.
```{r}
#| fig-align: default
@ -526,12 +537,20 @@ df <- tibble(
)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed()
coord_fixed() +
labs(title = "Default, continuous")
ggplot(df, aes(x, y)) +
geom_hex() +
viridis::scale_fill_viridis() +
coord_fixed()
coord_fixed() +
scale_fill_viridis_c() +
labs(title = "Viridis, continuous")
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
scale_fill_viridis_b() +
labs(title = "Viridis, binned")
```
Note that all colour scales come in two variety: `scale_colour_x()` and `scale_fill_x()` for the `colour` and `fill` aesthetics respectively (the colour scales are available in both UK and US spellings).
@ -563,7 +582,7 @@ Note that all colour scales come in two variety: `scale_colour_x()` and `scale_f
4. Use `override.aes` to make the legend on the following plot easier to see.
```{r}
#| dev: "png"
#| fig-format: "png"
#| out-width: "50%"
ggplot(diamonds, aes(carat, price)) +
@ -659,19 +678,30 @@ ggplot(mpg, aes(displ, hwy)) +
```
ggplot2 includes eight themes by default, as shown in @fig-themes.
Many more are included in add-on packages like **ggthemes** (<https://github.com/jrnold/ggthemes>), by Jeffrey Arnold.
Many more are included in add-on packages like **ggthemes** (<https://jrnold.github.io/ggthemes>), by Jeffrey Arnold.
```{r}
#| label: fig-themes
#| echo: false
#| fig.cap: The eight themes built-in to ggplot2.
#| fig-cap: The eight themes built-in to ggplot2.
#| fig-alt: >
#| Eight barplots created with ggplot2, each
#| with one of the eight built-in themes:
#| theme_bw() - White background with grid lines,
#| theme_light() - Light axes and grid lines,
#| theme_classic() - Classic theme, axes but no grid
#| lines, theme_linedraw() - Only black lines,
#| theme_dark() - Dark background for contrast,
#| theme_minimal() - Minimal theme, no background,
#| theme_gray() - Gray background (default theme),
#| theme_void() - Empty theme, only geoms are visible.
knitr::include_graphics("images/visualization-themes.png")
```
Many people wonder why the default theme has a grey background.
Many people wonder why the default theme has a gray background.
This was a deliberate choice because it puts the data forward while still making the grid lines visible.
The white grid lines are visible (which is important because they significantly aid position judgements), but they have little visual impact and we can easily tune them out.
The white grid lines are visible (which is important because they significantly aid position judgments), but they have little visual impact and we can easily tune them out.
The grey background gives the plot a similar typographic colour to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background.
Finally, the grey background creates a continuous field of colour which ensures that the plot is perceived as a single visual entity.
@ -700,84 +730,9 @@ file.remove("my-plot.pdf")
If you don't specify the `width` and `height` they will be taken from the dimensions of the current plotting device.
For reproducible code, you'll want to specify them.
Generally, however, we recommend that you assemble your final reports using R Markdown, so we focus on the important code chunk options that you should know about for graphics.
Generally, however, we recommend that you assemble your final reports using Quarto, so we focus on the important code chunk options that you should know about for graphics.
You can learn more about `ggsave()` in the documentation.
### Figure sizing
<!--# TO DO: Add something about faceted plots here. -->
The biggest challenge of graphics in R Markdown is getting your figures the right size and shape.
There are five main options that control figure sizing: `fig.width`, `fig.height`, `fig.asp`, `out.width` and `out.height`.
Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e., height, width, and aspect ratio: pick two of three).
<!-- TODO: https://www.tidyverse.org/blog/2020/08/taking-control-of-plot-scaling/ -->
We recommend three of the five options:
- Plots tend to be more aesthetically pleasing if they have consistent width.
To enforce this, set `fig.width = 6` (6") and `fig.asp = 0.618` (the golden ratio) in the defaults.
Then in individual chunks, only adjust `fig.asp`.
- Control the output size with `out.width` and set it to a percentage of the line width.
We suggest to `out.width = "70%"` and `fig.align = "center"`.
That gives plots room to breathe, without taking up too much space.
- To put multiple plots in a single row, set the `out.width` to `50%` for two plots, `33%` for 3 plots, or `25%` to 4 plots, and set `fig.align = "default"`.
Depending on what you're trying to illustrate (e.g. show data or show plot variations), you might also tweak `fig.width`, as discussed below.
If you find that you're having to squint to read the text in your plot, you need to tweak `fig.width`.
If `fig.width` is larger than the size the figure is rendered in the final doc, the text will be too small; if `fig.width` is smaller, the text will be too big.
You'll often need to do a little experimentation to figure out the right ratio between the `fig.width` and the eventual width in your document.
To illustrate the principle, the following three plots have `fig.width` of 4, 6, and 8 respectively:
```{r}
#| include: false
plot <- ggplot(mpg, aes(displ, hwy)) + geom_point()
```
```{r}
#| echo: false
#| fig-width: 4
plot
```
```{r}
#| echo: false
#| fig-width: 6
plot
```
```{r}
#| echo: false
#| fig-width: 8
plot
```
If you want to make sure the font size is consistent across all your figures, whenever you set `out.width`, you'll also need to adjust `fig.width` to maintain the same ratio with your default `out.width`.
For example, if your default `fig.width` is 6 and `out.width` is 0.7, when you set `out.width = "50%"` you'll need to set `fig.width` to 4.3 (6 \* 0.5 / 0.7).
### Other important options
When mingling code and text, like in this book, you can set `fig.show = "hold"` so that plots are shown after the code.
This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.
To add a caption to the plot, use `fig.cap`.
In R Markdown this will change the figure from inline to "floating".
If you're producing PDF output, the default graphics type is PDF.
This is a good default because PDFs are high quality vector graphics.
However, they can produce very large and slow plots if you are displaying thousands of points.
In that case, set `dev = "png"` to force the use of PNGs.
They are slightly lower quality, but will be much more compact.
It's a good idea to name code chunks that produce figures, even if you don't routinely label other chunks.
The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet).
## Learning more
The absolute best place to learn more is the ggplot2 book: [*ggplot2: Elegant graphics for data analysis*](https://ggplot2-book.org/).
@ -786,4 +741,3 @@ It goes into much more depth about the underlying theory, and has many more exam
Another great resource is the ggplot2 extensions gallery <https://exts.ggplot2.tidyverse.org/gallery/>.
This site lists many of the packages that extend ggplot2 with new geoms and scales.
It's a great place to start if you're trying to do something that seems hard with ggplot2.

View File

@ -502,6 +502,83 @@ comma(.12358124331)
4. Set up a network of chunks where `d` depends on `c` and `b`, and both `b` and `c` depend on `a`.
Have each chunk print `lubridate::now()`, set `cache: true`, then verify your understanding of caching.
## Figures
### Figure sizing
<!--# TO DO: Add something about faceted plots here. -->
The biggest challenge of graphics in Quarto is getting your figures the right size and shape.
There are five main options that control figure sizing: `fig-width`, `fig-height`, `fig-asp`, `out-width` and `out-height`.
Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e., height, width, and aspect ratio: pick two of three).
<!-- TODO: https://www.tidyverse.org/blog/2020/08/taking-control-of-plot-scaling/ -->
We recommend three of the five options:
- Plots tend to be more aesthetically pleasing if they have consistent width.
To enforce this, set `fig-width: 6` (6") and `fig-asp: 0.618` (the golden ratio) in the defaults.
Then in individual chunks, only adjust `fig-asp`.
- Control the output size with `out-width` and set it to a percentage of the line width.
We suggest to `out-width: "70%"` and `fig-align: "center"`.
That gives plots room to breathe, without taking up too much space.
- To put multiple plots in a single row, set the `out-width` to `50%` for two plots, `33%` for 3 plots, or `25%` to 4 plots, and set `fig-align: "default"`.
Depending on what you're trying to illustrate (e.g. show data or show plot variations), you might also tweak `fig-width`, as discussed below.
If you find that you're having to squint to read the text in your plot, you need to tweak `fig-width`.
If `fig-width` is larger than the size the figure is rendered in the final doc, the text will be too small; if `fig-width` is smaller, the text will be too big.
You'll often need to do a little experimentation to figure out the right ratio between the `fig-width` and the eventual width in your document.
To illustrate the principle, the following three plots have `fig-width` of 4, 6, and 8 respectively:
```{r}
#| include: false
plot <- ggplot(mpg, aes(displ, hwy)) + geom_point()
```
```{r}
#| echo: false
#| fig-width: 4
plot
```
```{r}
#| echo: false
#| fig-width: 6
plot
```
```{r}
#| echo: false
#| fig-width: 8
plot
```
If you want to make sure the font size is consistent across all your figures, whenever you set `out-width`, you'll also need to adjust `fig-width` to maintain the same ratio with your default `out-width`.
For example, if your default `fig-width` is 6 and `out-width` is 0.7, when you set `out-width: "50%"` you'll need to set `fig-width` to 4.3 (6 \* 0.5 / 0.7).
### Other important options
When mingling code and text, like in this book, you can set `fig-show: "hold"` so that plots are shown after the code.
This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.
To add a caption to the plot, use `fig-cap`.
In Quarto this will change the figure from inline to "floating".
If you're producing PDF output, the default graphics type is PDF.
This is a good default because PDFs are high quality vector graphics.
However, they can produce very large and slow plots if you are displaying thousands of points.
In that case, set `fig-format: "png"` to force the use of PNGs.
They are slightly lower quality, but will be much more compact.
It's a good idea to name code chunks that produce figures, even if you don't routinely label other chunks.
The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet).
## Troubleshooting
Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks.