Fix chunk options

This commit is contained in:
Mine Çetinkaya-Rundel 2022-03-04 19:47:48 -05:00
parent a7360229ff
commit 58f52426c6
1 changed files with 91 additions and 84 deletions

View File

@ -16,7 +16,9 @@ If you'd like to learn more about the theoretical underpinnings of ggplot2, I'd
This chapter focuses on ggplot2, one of the core members of the tidyverse.
To access the datasets, help pages, and functions that we will use in this chapter, load the tidyverse by running this code:
```{r setup}
```{r}
#| label: setup
library(tidyverse)
```
@ -27,6 +29,7 @@ If you run this code and get the error message "there is no package called 'tidy
```{r}
#| eval: false
install.packages("tidyverse")
library(tidyverse)
```
@ -70,7 +73,7 @@ To learn more about `mpg`, open its help page by running `?mpg`.
To plot `mpg`, run this code to put `displ` on the x-axis and `hwy` on the y-axis:
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
```
@ -134,7 +137,8 @@ How can you explain these cars?
```{r}
#| echo: false
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. Cars with engine size greater than 5 litres and highway fuel efficiency greater than 20 miles per gallon stand out from the rest of the data and are highlighted in red."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. Cars with engine size greater than 5 litres and highway fuel efficiency greater than 20 miles per gallon stand out from the rest of the data and are highlighted in red."
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), colour = "red", size = 2.2)
@ -154,8 +158,9 @@ Here we change the levels of a point's size, shape, and color to make the point
```{r}
#| echo: false
#| fig.asp: 1/4
#| fig.alt: "Diagram that shows four plotting characters next to each other. The first is a large circle, the second is a small circle, the third is a triangle, and the fourth is a blue circle."
#| fig-asp: 1/4
#| fig-alt: "Diagram that shows four plotting characters next to each other. The first is a large circle, the second is a small circle, the third is a triangle, and the fourth is a blue circle."
ggplot() +
geom_point(aes(1, 1), size = 20) +
geom_point(aes(2, 1), size = 10) +
@ -170,7 +175,8 @@ You can convey information about your data by mapping the aesthetics in your plo
For example, you can map the colors of your points to the `class` variable to reveal the class of each car.
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The points representing each car are coloured according to the class of the car. The legend on the right of the plot shows the mapping between colours and levels of the class variable: 2seater, compact, midsize, minivan, pickup, or suv."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The points representing each car are coloured according to the class of the car. The legend on the right of the plot shows the mapping between colours and levels of the class variable: 2seater, compact, midsize, minivan, pickup, or suv."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
```
@ -191,7 +197,8 @@ In this case, the exact size of each point would reveal its class affiliation.
We get a *warning* here, because mapping an unordered variable (`class`) to an ordered aesthetic (`size`) is not a good idea.
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The points representing each car are sized according to the class of the car. The legend on the right of the plot shows the mapping between colours and levels of the class variable -- going from small to large: 2seater, compact, midsize, minivan, pickup, or suv."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The points representing each car are sized according to the class of the car. The legend on the right of the plot shows the mapping between colours and levels of the class variable -- going from small to large: 2seater, compact, midsize, minivan, pickup, or suv."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
```
@ -199,13 +206,13 @@ ggplot(data = mpg) +
Or we could have mapped `class` to the *alpha* aesthetic, which controls the transparency of the points, or to the *shape* aesthetic, which controls the shape of the points.
```{r}
#| fig.width: 4
#| out.width: "50%"
#| fig.align: "default"
#| fig-width: 4
#| out-width: "50%"
#| fig-align: "default"
#| warning: false
#| fig.asp: 1/2
#| fig.cap: ""
#| fig.alt: "Two scatterplots next to each other, both visualizing highway fuel efficiency versus engine size of cars in ggplot2::mpg and showing a negative association. In the plot on the left class is mapped to the alpha aesthetic, resulting in different transparency levels for each level of class. In the plot on the right class is mapped the shape aesthetic, resulting in different plotting character shapes for each level of class. Each plot comes with a legend that shows the mapping between alpha level or shape and levels of the class variable."
#| fig-asp: 1/2
#| fig-cap: ""
#| fig-alt: "Two scatterplots next to each other, both visualizing highway fuel efficiency versus engine size of cars in ggplot2::mpg and showing a negative association. In the plot on the left class is mapped to the alpha aesthetic, resulting in different transparency levels for each level of class. In the plot on the right class is mapped the shape aesthetic, resulting in different plotting character shapes for each level of class. Each plot comes with a legend that shows the mapping between alpha level or shape and levels of the class variable."
# Left
ggplot(data = mpg) +
@ -233,7 +240,7 @@ You can also *set* the aesthetic properties of your geom manually.
For example, we can make all of the points in our plot blue:
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. All points are blue."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. All points are blue."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
@ -253,9 +260,9 @@ You'll need to pick a level that makes sense for that aesthetic:
#| label: shapes
#| echo: false
#| warning: false
#| fig.asp: 1/2.75
#| fig.cap: "R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the `colour` and `fill` aesthetics. The hollow shapes (0--14) have a border determined by `colour`; the solid shapes (15--20) are filled with `colour`; the filled shapes (21--24) have a border of `colour` and are filled with `fill`."
#| fig.alt: "Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue."
#| fig-asp: 1/2.75
#| fig-cap: "R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the `colour` and `fill` aesthetics. The hollow shapes (0--14) have a border determined by `colour`; the solid shapes (15--20) are filled with `colour`; the filled shapes (21--24) have a border of `colour` and are filled with `fill`."
#| fig-alt: "Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue."
shapes <- tibble(
shape = c(0, 1, 2, 5, 3, 4, 6:19, 22, 21, 24, 23, 20),
@ -279,7 +286,7 @@ ggplot(shapes, aes(x, y)) +
Why are the points not blue?
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. All points are red and the legend shows a red point that is mapped to the word 'blue'."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. All points are red and the legend shows a red point that is mapped to the word 'blue'."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = "blue"))
```
@ -341,7 +348,7 @@ The first argument of `facet_wrap()` is a formula, which you create with `~` fol
The variable that you pass to `facet_wrap()` should be discrete.
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by class, with facets spanning two rows."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by class, with facets spanning two rows."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
@ -353,7 +360,7 @@ The first argument of `facet_grid()` is also a formula.
This time the formula should contain two variable names separated by a `~`.
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by number of cylinders across rows and by type of drive train across columns. This results in a 4x3 grid of 12 facets. Some of these facets have no observations: 5 cylinders and 4 wheel drive, 4 or 5 cylinders and front wheel drive."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by number of cylinders across rows and by type of drive train across columns. This results in a 4x3 grid of 12 facets. Some of these facets have no observations: 5 cylinders and 4 wheel drive, 4 or 5 cylinders and front wheel drive."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
@ -370,7 +377,7 @@ If you prefer to not facet in the rows or columns dimension, use a `.` instead o
How do they relate to this plot?
```{r}
#| fig.alt: "Scatterplot of number of cycles versus type of drive train of cars in ggplot2::mpg. Shows that there are no cars with 5 cylinders that are 4 wheel drive or with 4 or 5 cylinders that are front wheel drive."
#| fig-alt: "Scatterplot of number of cycles versus type of drive train of cars in ggplot2::mpg. Shows that there are no cars with 5 cylinders that are 4 wheel drive or with 4 or 5 cylinders that are front wheel drive."
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
@ -415,7 +422,7 @@ If you prefer to not facet in the rows or columns dimension, use a `.` instead o
What does this say about when to place a faceting variable across rows or columns?
```{r}
#| fig.alt: "Two faceted plots, both visualizing highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by drive train. In the top plot, facet are organized across rows and in the second, across columns."
#| fig-alt: "Two faceted plots, both visualizing highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by drive train. In the top plot, facet are organized across rows and in the second, across columns."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
@ -430,7 +437,7 @@ If you prefer to not facet in the rows or columns dimension, use a `.` instead o
How do the positions of the facet labels change?
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by type of drive train across rows."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, faceted by type of drive train across rows."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
@ -444,10 +451,10 @@ How are these two plots similar?
```{r}
#| echo: false
#| message: false
#| fig.width: 4
#| out.width: "50%"
#| fig.align: "default"
#| fig.alt: "Two plots: the plot on the left is a scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg and the plot on the right shows a smooth curve that follows the trajectory of the relationship between these variables. A confidence interval around the smooth curve is also displayed."
#| fig-width: 4
#| out-width: "50%"
#| fig-align: "default"
#| fig-alt: "Two plots: the plot on the left is a scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg and the plot on the right shows a smooth curve that follows the trajectory of the relationship between these variables. A confidence interval around the smooth curve is also displayed."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
@ -491,7 +498,7 @@ On the other hand, you *could* set the linetype of a line.
```{r}
#| message: false
#| fig.alt: "A plot of highway fuel efficiency versus engine size of cars in ggplot2::mpg. The data are represented with smooth curves, which use a different line type (solid, dashed, or long dashed) for each type of drive train. Confidence intervals around the smooth curves are also displayed."
#| fig-alt: "A plot of highway fuel efficiency versus engine size of cars in ggplot2::mpg. The data are represented with smooth curves, which use a different line type (solid, dashed, or long dashed) for each type of drive train. Confidence intervals around the smooth curves are also displayed."
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
@ -506,7 +513,7 @@ If this sounds strange, we can make it more clear by overlaying the lines on top
```{r}
#| echo: false
#| message: false
#| fig.alt: "A plot of highway fuel efficiency versus engine size of cars in ggplot2::mpg. The data are represented with points (coloured by drive train) as well as smooth curves (where line type is determined based on drive train as well). Confidence intervals around the smooth curves are also displayed."
#| fig-alt: "A plot of highway fuel efficiency versus engine size of cars in ggplot2::mpg. The data are represented with points (coloured by drive train) as well as smooth curves (where line type is determined based on drive train as well). Confidence intervals around the smooth curves are also displayed."
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
@ -528,11 +535,11 @@ In practice, ggplot2 will automatically group the data for these geoms whenever
It is convenient to rely on this feature because the group aesthetic by itself does not add a legend or distinguishing features to the geoms.
```{r}
#| fig.width: 3
#| fig.align: "default"
#| out.width: "33%"
#| fig-width: 3
#| fig-align: "default"
#| out-width: "33%"
#| message: false
#| fig.alt: "Three plots, each with highway fuel efficiency on the y-axis and engine size of cars in ggplot2::mpg, where data are represented by a smooth curve. The first plot only has these two variables, the center plot has three separate smooth curves for each level of drive train, and the right plot not only has the same three separate smooth curves for each level of drive train but these curves are plotted in different colours, without a legend explaining which color maps to which level. Confidence intervals around the smooth curves are also displayed."
#| fig-alt: "Three plots, each with highway fuel efficiency on the y-axis and engine size of cars in ggplot2::mpg, where data are represented by a smooth curve. The first plot only has these two variables, the center plot has three separate smooth curves for each level of drive train, and the right plot not only has the same three separate smooth curves for each level of drive train but these curves are plotted in different colours, without a legend explaining which color maps to which level. Confidence intervals around the smooth curves are also displayed."
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
@ -551,7 +558,7 @@ To display multiple geoms in the same plot, add multiple geom functions to `ggpl
```{r}
#| message: false
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg with a smooth curve overlaid. A confidence interval around the smooth curves is also displayed."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg with a smooth curve overlaid. A confidence interval around the smooth curves is also displayed."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
@ -579,7 +586,7 @@ This makes it possible to display different aesthetics in different layers.
```{r}
#| message: false
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, where points are coloured according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of cars is overlaid along with a confidence interval around it."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, where points are coloured according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of cars is overlaid along with a confidence interval around it."
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
@ -592,7 +599,7 @@ The local data argument in `geom_smooth()` overrides the global data argument in
```{r}
#| message: false
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, where points are coloured according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of subcompact cars is overlaid along with a confidence interval around it."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg, where points are coloured according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of subcompact cars is overlaid along with a confidence interval around it."
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
@ -646,10 +653,10 @@ ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
```{r}
#| echo: false
#| message: false
#| fig.width: 3
#| out.width: "50%"
#| fig.align: "default"
#| fig.alt: "There are six scatterplots in this figure, arranged in a 3x2 grid. In all plots highway fuel efficiency of cars in ggplot2::mpg are on the y-axis and engine size is on the x-axis. The first plot shows all points in black with a smooth curve overlaid on them. In the second plot points are also all black, with separate smooth curves overlaid for each level of drive train. On the third plot, points and the smooth curves are represented in different colours for each level of drive train. In the fourth plot the points are represented in different colours for each level of drive train but there is only a single smooth line fitted to the whole data. In the fifth plot, points are represented in different colours for each level of drive train, and a separate smooth curve with different line types are fitted to each level of drive train. And finally in the sixth plot points are represented in different colours for each level of drive train and they have a thick white border."
#| fig-width: 3
#| out-width: "50%"
#| fig-align: "default"
#| fig-alt: "There are six scatterplots in this figure, arranged in a 3x2 grid. In all plots highway fuel efficiency of cars in ggplot2::mpg are on the y-axis and engine size is on the x-axis. The first plot shows all points in black with a smooth curve overlaid on them. In the second plot points are also all black, with separate smooth curves overlaid for each level of drive train. On the third plot, points and the smooth curves are represented in different colours for each level of drive train. In the fourth plot the points are represented in different colours for each level of drive train but there is only a single smooth line fitted to the whole data. In the fifth plot, points are represented in different colours for each level of drive train, and a separate smooth curve with different line types are fitted to each level of drive train. And finally in the sixth plot points are represented in different colours for each level of drive train and they have a thick white border."
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
@ -681,7 +688,7 @@ The `diamonds` dataset is in the ggplot2 package and contains information on \~5
The chart shows that more diamonds are available with high quality cuts than with low quality cuts.
```{r}
#| fig.alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 12000 very good, 14000 premium, and 22000 ideal cut diamonds."
#| fig-alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 12000 very good, 14000 premium, and 22000 ideal cut diamonds."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
@ -704,8 +711,8 @@ The figure below describes how this process works with `geom_bar()`.
```{r}
#| echo: false
#| out.width: "100%"
#| fig.alt: 'A figure demonstrating three steps of creating a bar chart: 1. geom_bar() begins with the diamonds data set. 2. geom_bar() transforms the data with the "count" stat, which returns a data set of cut values and counts. 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis.'
#| out-width: "100%"
#| fig-alt: 'A figure demonstrating three steps of creating a bar chart: 1. geom_bar() begins with the diamonds data set. 2. geom_bar() transforms the data with the "count" stat, which returns a data set of cut values and counts. 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis.'
knitr::include_graphics("images/visualization-stat-bar.png")
```
@ -719,7 +726,7 @@ You can generally use geoms and stats interchangeably.
For example, you can recreate the previous plot using `stat_count()` instead of `geom_bar()`:
```{r}
#| fig.alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 12000 very good, 14000 premium, and 22000 ideal cut diamonds."
#| fig-alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 12000 very good, 14000 premium, and 22000 ideal cut diamonds."
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
@ -736,7 +743,7 @@ There are three reasons you might need to use a stat explicitly:
```{r}
#| warning: false
#| fig.alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 22000 ideal, 14000 premium, and 12000 very good, cut diamonds."
#| fig-alt: "Bar chart of number of each each cut of diamond in the ggplots::diamonds dataset. There are roughly 1500 fair diamonds, 5000 good, 22000 ideal, 14000 premium, and 12000 very good, cut diamonds."
demo <- tribble(
~cut, ~freq,
@ -758,7 +765,7 @@ There are three reasons you might need to use a stat explicitly:
For example, you might want to display a bar chart of proportions, rather than counts:
```{r}
#| fig.alt: "Bar chart of proportion of each each cut of diamond in the ggplots::diamonds dataset. Roughly, fair diamonds make up 0.03, good 0.09, very good 0.22, premium 26, and ideal 0.40."
#| fig-alt: "Bar chart of proportion of each each cut of diamond in the ggplots::diamonds dataset. Roughly, fair diamonds make up 0.03, good 0.09, very good 0.22, premium 26, and ideal 0.40."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = after_stat(prop), group = 1))
@ -770,7 +777,7 @@ There are three reasons you might need to use a stat explicitly:
For example, you might use `stat_summary()`, which summarizes the y values for each unique x value, to draw attention to the summary that you're computing:
```{r}
#| fig.alt: "A plot with depth on the y-axis and cut on the x-axis (with levels fair, good, very good, premium, and ideal) of diamonds in ggplot2::diamonds. For each level of cut, vertical lines extend from minimum to maximum depth for diamonds in that cut category, and the median depth is indicated on the line with a point."
#| fig-alt: "A plot with depth on the y-axis and cut on the x-axis (with levels fair, good, very good, premium, and ideal) of diamonds in ggplot2::diamonds. For each level of cut, vertical lines extend from minimum to maximum depth for diamonds in that cut category, and the median depth is indicated on the line with a point."
ggplot(data = diamonds) +
stat_summary(
@ -819,9 +826,9 @@ There's one more piece of magic associated with bar charts.
You can colour a bar chart using either the `colour` aesthetic, or, more usefully, `fill`:
```{r}
#| out.width: "50%"
#| fig.align: "default"
#| fig.alt: "Two bar charts of cut of diamonds in ggplot2::diamonds. In the first plot, the bars have coloured borders. In the second plot, they're filled with colours. Heights of the bars correspond to the number of diamonds in each cut category."
#| out-width: "50%"
#| fig-align: "default"
#| fig-alt: "Two bar charts of cut of diamonds in ggplot2::diamonds. In the first plot, the bars have coloured borders. In the second plot, they're filled with colours. Heights of the bars correspond to the number of diamonds in each cut category."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
@ -833,7 +840,7 @@ Note what happens if you map the fill aesthetic to another variable, like `clari
Each colored rectangle represents a combination of `cut` and `clarity`.
```{r}
#| fig.alt: "Segmented bar chart of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Heights of the bars correspond to the number of diamonds in each cut category, and heights of the coloured segments are proportional to the number of diamonds with a given clarity level within a given cut level."
#| fig-alt: "Segmented bar chart of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Heights of the bars correspond to the number of diamonds in each cut category, and heights of the coloured segments are proportional to the number of diamonds with a given clarity level within a given cut level."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
@ -847,9 +854,9 @@ If you don't want a stacked bar chart, you can use one of three other options: `
To see that overlapping we either need to make the bars slightly transparent by setting `alpha` to a small value, or completely transparent by setting `fill = NA`.
```{r}
#| out.width: "50%"
#| fig.align: "default"
#| fig.alt: "Two segmented bar charts of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Heights of the bars correspond to the number of diamonds in each cut category, and heights of the coloured segments are proportional to the number of diamonds with a given clarity level within a given cut level. However the segments overlap. In the first plot the segments are filled with transparent colours, in the second plot the segments are only outlined with colours."
#| out-width: "50%"
#| fig-align: "default"
#| fig-alt: "Two segmented bar charts of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Heights of the bars correspond to the number of diamonds in each cut category, and heights of the coloured segments are proportional to the number of diamonds with a given clarity level within a given cut level. However the segments overlap. In the first plot the segments are filled with transparent colours, in the second plot the segments are only outlined with colours."
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
@ -863,7 +870,7 @@ If you don't want a stacked bar chart, you can use one of three other options: `
This makes it easier to compare proportions across groups.
```{r}
#| fig.alt: "Segmented bar chart of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Height of each bar is 1 and heights of the coloured segments are proportional to the proportion of diamonds with a given clarity level within a given cut level."
#| fig-alt: "Segmented bar chart of cut of diamonds in ggplot2::diamonds, where each bar is filled with colours for the levels of clarity. Height of each bar is 1 and heights of the coloured segments are proportional to the proportion of diamonds with a given clarity level within a given cut level."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
@ -873,7 +880,7 @@ If you don't want a stacked bar chart, you can use one of three other options: `
This makes it easier to compare individual values.
```{r}
#| fig.alt: "Dodged bar chart of cut of diamonds in ggplot2::diamonds. Dodged bars are grouped by levels of cut (fair, good, very good, premium, and ideal). In each group there are eight bars, one for each level of clarity, and filled with a different color for each level. Heights of these bars represent the number of diamonds with a given level of cut and clarity."
#| fig-alt: "Dodged bar chart of cut of diamonds in ggplot2::diamonds. Dodged bars are grouped by levels of cut (fair, good, very good, premium, and ideal). In each group there are eight bars, one for each level of clarity, and filled with a different color for each level. Heights of these bars represent the number of diamonds with a given level of cut and clarity."
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
@ -885,7 +892,7 @@ Did you notice that the plot displays only 126 points, even though there are 234
```{r}
#| echo: FALSE
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
@ -901,7 +908,7 @@ You can avoid this gridding by setting the position adjustment to "jitter".
This spreads the points out because no two points are likely to receive the same amount of random noise.
```{r}
#| fig.alt: "Jittered scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
#| fig-alt: "Jittered scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association."
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
@ -918,7 +925,7 @@ To learn more about a position adjustment, look up the help page associated with
How could you improve it?
```{r}
#| fig.alt: "Scatterplot of highway fuel efficiency versus city fuel efficiency of cars in ggplot2::mpg that shows a positive association. The number of points visible in this plot is less than the number of points in the dataset."
#| fig-alt: "Scatterplot of highway fuel efficiency versus city fuel efficiency of cars in ggplot2::mpg that shows a positive association. The number of points visible in this plot is less than the number of points in the dataset."
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
@ -942,10 +949,10 @@ There are a number of other coordinate systems that are occasionally helpful.
It's also useful for long labels: it's hard to get them to fit without overlapping on the x-axis.
```{r}
#| fig.width: 3
#| out.width: "50%"
#| fig.align: "default"
#| fig.alt: "Two side-by-side box plots of highway fuel efficiency of cars in ggplot2::mpg. A separate box plot is created for cars in each level of class (2seater, compact, midsize, minivan, pickup, subcompact, and suv). In the first plot class is on the x-axis, in the second plot class is on the y-axis. The second plot makes it easier to read the names of the levels of class since they're listed down the y-axis, avoiding overlap."
#| fig-width: 3
#| out-width: "50%"
#| fig-align: "default"
#| fig-alt: "Two side-by-side box plots of highway fuel efficiency of cars in ggplot2::mpg. A separate box plot is created for cars in each level of class (2seater, compact, midsize, minivan, pickup, subcompact, and suv). In the first plot class is on the x-axis, in the second plot class is on the y-axis. The second plot makes it easier to read the names of the levels of class since they're listed down the y-axis, avoiding overlap."
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot()
@ -957,9 +964,9 @@ There are a number of other coordinate systems that are occasionally helpful.
However, note that you can achieve the same result by flipping the aesthetic mappings of the two variables.
```{r}
#| fig.width: 3
#| fig.align: "default"
#| fig.alt: "Side-by-side box plots of highway fuel efficiency of cars in ggplot2::mpg. A separate box plot is drawn along the y-axis for cars in each level of class (2seater, compact, midsize, minivan, pickup, subcompact, and suv)."
#| fig-width: 3
#| fig-align: "default"
#| fig-alt: "Side-by-side box plots of highway fuel efficiency of cars in ggplot2::mpg. A separate box plot is drawn along the y-axis for cars in each level of class (2seater, compact, midsize, minivan, pickup, subcompact, and suv)."
ggplot(data = mpg, mapping = aes(y = class, x = hwy)) +
geom_boxplot()
@ -969,11 +976,11 @@ There are a number of other coordinate systems that are occasionally helpful.
This is very important if you're plotting spatial data with ggplot2 (which unfortunately we don't have the space to cover in this book).
```{r}
#| fig.width: 3
#| out.width: "50%"
#| fig.align: "default"
#| fig-width: 3
#| out-width: "50%"
#| fig-align: "default"
#| message: FALSE
#| fig.alt: "Two maps of the boundaries of New Zealand. In the first plot the aspect ratio is incorrect, in the second plot it's correct."
#| fig-alt: "Two maps of the boundaries of New Zealand. In the first plot the aspect ratio is incorrect, in the second plot it's correct."
nz <- map_data("nz")
@ -989,11 +996,11 @@ There are a number of other coordinate systems that are occasionally helpful.
Polar coordinates reveal an interesting connection between a bar chart and a Coxcomb chart.
```{r}
#| fig.width: 3
#| out.width: "50%"
#| fig.align: "default"
#| fig.asp: 1
#| fig.alt: "Two plots: on the left is a bar chart of cut of diamonds in ggplot2::diamonds, on the right is a Coxcomb chart of the same data."
#| fig-width: 3
#| out-width: "50%"
#| fig-align: "default"
#| fig-asp: 1
#| fig-alt: "Two plots: on the left is a bar chart of cut of diamonds in ggplot2::diamonds, on the right is a Coxcomb chart of the same data."
bar <- ggplot(data = diamonds) +
geom_bar(
@ -1022,9 +1029,9 @@ There are a number of other coordinate systems that are occasionally helpful.
What does `geom_abline()` do?
```{r}
#| fig.asp: 1
#| out.width: "50%"
#| fig.alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The plot also has a straight line that follows the trend of the relationship between the variables but doesn't go through the cloud of points, it's beneath it."
#| fig-asp: 1
#| out-width: "50%"
#| fig-alt: "Scatterplot of highway fuel efficiency versus engine size of cars in ggplot2::mpg that shows a negative association. The plot also has a straight line that follows the trend of the relationship between the variables but doesn't go through the cloud of points, it's beneath it."
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
@ -1057,8 +1064,8 @@ To see how this works, consider how you could build a basic plot from scratch: y
```{r}
#| echo: FALSE
#| out.width: "100%"
#| fig.alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to table of counts where each row represents one level of cut and a count column shows how many diamonds are in that cut level. Steps 1 and 2 are annotated: 1. Begin with the diamonds dataset. 2. Compute counts for each cut value with stat_count()."
#| out-width: "100%"
#| fig-alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to table of counts where each row represents one level of cut and a count column shows how many diamonds are in that cut level. Steps 1 and 2 are annotated: 1. Begin with the diamonds dataset. 2. Compute counts for each cut value with stat_count()."
knitr::include_graphics("images/visualization-grammar-1.png")
```
@ -1069,8 +1076,8 @@ You would map the values of each variable to the levels of an aesthetic.
```{r}
#| echo: FALSE
#| out.width: "100%"
#| fig.alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to table of counts where each row represents one level of cut and a count column shows how many diamonds are in that cut level. Each level is also mapped to a color. Steps 3 and 4 are annotated: 3. Represent each observation with a bar. 4. Map the fill of each bar to the ..count.. variable."
#| out-width: "100%"
#| fig-alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to table of counts where each row represents one level of cut and a count column shows how many diamonds are in that cut level. Each level is also mapped to a color. Steps 3 and 4 are annotated: 3. Represent each observation with a bar. 4. Map the fill of each bar to the ..count.. variable."
knitr::include_graphics("images/visualization-grammar-2.png")
```
@ -1082,8 +1089,8 @@ You could also extend the plot by adding one or more additional layers, where ea
```{r}
#| echo: FALSE
#| out.width: "100%"
#| fig.alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to bar chart where each bar represents one level of cut and filled in with a different color. Steps 5 and 6 are annotated: 5. Place geoms in a Cartesian coordinate system. 6. Map the y values to ..count.. and the x values to cut."
#| out-width: "100%"
#| fig-alt: "A figure demonstrating the steps for going from raw data (ggplot2::diamonds) to bar chart where each bar represents one level of cut and filled in with a different color. Steps 5 and 6 are annotated: 5. Place geoms in a Cartesian coordinate system. 6. Map the y values to ..count.. and the x values to cut."
knitr::include_graphics("images/visualization-grammar-3.png")
```