More work on visualization. Organizes the chapter to go from doing -> vocab -> grammar -> customizing appearance.

This commit is contained in:
Garrett 2015-11-24 16:55:27 -05:00
parent 5983e8a40b
commit 619e212a81
1 changed files with 230 additions and 227 deletions

View File

@ -36,9 +36,9 @@ This chapter will teach you how to visualize your data with R and the `ggplot2`
*Section 1* will get you started making graphs right away. You'll learn how to make several common types of plots, and how to use the `ggplot2` syntax.
*Section 2* will teach you the _grammar of graphics_, a versatile system for building plots. You'll learn how to use a combination of _layers_, _geoms_, _stats_, _aesthetic mappings_, _position adjustments_, and _coordinate systems_ to assemble any plot you like.
*Section 2* will guide you through the geoms, stats, position adjustments, coordinate systems, and facetting schemes that you can use to make many different types of plots with `ggplot2`.
*Section 3* will show you how to use `ggplot2` and the grammar of graphics to make many specific types of plot. This section documents each of the options provided by `ggplot2`.
*Section 3* will teach you the _layered grammar of graphics_, a versatile system for building multi-layered plots that underlies `ggplot2`.
*Section 4* will show you how to customize your plots with labels, legends, color schemes, and more.
@ -79,7 +79,7 @@ You will need to reload the package each time you start a new R session.
### Scatterplots
The easiest way to understand the `mpg` data set is to visualize it, which means that its time to make our first graph. To do this, open an R session and run the code below. The code plots the `displ` variable of `mpg` against the `hwy` variable.
The easiest way to understand the `mpg` data set is to visualize it, which means that it is time to make our first graph. To do this, open an R session and run the code below. The code plots the `displ` variable of `mpg` against the `hwy` variable.
```{r}
ggplot(data = mpg) +
@ -107,7 +107,7 @@ With `ggplot2`, you begin a plot with the function `ggplot()`. `ggplot()` doesn'
The first argument of `ggplot()` is the data set to use in the graph. So `ggplot(data = mpg)` initializes a graph that will use the `mpg` data set.
You complete your graph by adding one or more layers to `ggplot()`. Here, the function `geom_point()` adds a layer of points to the plot, which creates a scatterplot. `ggplot2` comes with other `geom_` functions that you can use as well. Each function creates a different type of layer, and each function takes a mapping argument. We'll learn about all of the geom functions in Section 3.
You complete your graph by adding one or more layers to `ggplot()`. Here, the function `geom_point()` adds a layer of points to the plot, which creates a scatterplot. `ggplot2` comes with other `geom_` functions that you can use as well. Each function creates a different type of layer, and each function takes a mapping argument. We'll learn about all of the geom functions in Section 2.
The mapping argument of your geom function explains where your points should go. You must set mapping to a call to `aes()`. The `x` and `y` arguments of `aes()` explain which variables to map to the x and y axes of the graph. `ggplot()` will look for those variables in your data set, `mpg`.
@ -118,7 +118,7 @@ ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
```
The next few subsections will introduce several arguments (and functions) that you can add to the template. Each argument will come with a new set of options---and likely a new set of questions. Hold those questions for now. We will catalogue your options in Section 3. Use this section to become familiar with the `ggplot2` syntax. Once you do, the low level details of `ggplot2` will be easier to understand.
The next few subsections will introduce several arguments (and functions) that you can add to the template. Each argument will come with a new set of options---and likely a new set of questions. Hold those questions for now. We will catalogue your options in Section 2. Use this section to become familiar with the `ggplot2` syntax. Once you do, the low level details of `ggplot2` will be easier to understand.
#### Aesthetic Mappings
@ -170,13 +170,13 @@ ggplot(data = mpg) +
***
**Tip** - What happened to the suv's? `ggplot2` will only use six shapes at a time. See Section 3 for more details.
**Tip** - What happened to the suv's? `ggplot2` will only use six shapes at a time. See Section 2 for more details.
***
In each case, you set the name of the aesthetic to the variable to display, and you do this within the `aes()` function. The syntax highlights a useful insight because you also set `x` and `y` to variables within `aes()`. The insight is that the x and y locations of a point are themselves aesthetics, visual properties that you can map to variables to display information about the data.
Once you set an aesthetic, `ggplot2` takes care of the rest. It selects a pleasing set of levels to use for the aesthetic, and it constructs a legend that explains the mapping. For x and y aesthetics, `ggplot2` does not create a legend, but it creates an axis line with tick marks and a label. The axis line acts like a legend; it explains the mapping between locations and values.
Once you set an aesthetic, `ggplot2` takes care of the rest. It selects a pleasing set of levels to use for the aesthetic, and it constructs a legend that explains the mapping. For x and y aesthetics, `ggplot2` does not create a legend, but it creates an axis line with tick marks and a label. The axis line acts as a legend; it explains the mapping between locations and values.
#### Exercises
@ -192,7 +192,7 @@ See the help page for `geom_point()` (`?geom_point`) to learn which aesthetics a
#### Position adjustments
Did you notice that there is another riddle hidden in our scatterplot? The plot displays 126 points, but there are 234 observations in the `mpg` data set. Also, the points appear to fall on a grid. Why should this be?
Did you notice that there is another riddle hidden in our scatterplot? The plot displays 126 points, but the `mpg` data set contains 234 observations. Also, the points appear to fall on a grid. Why should this be?
```{r}
ggplot(data = mpg) +
@ -222,7 +222,7 @@ ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
```
The chart above displays the total number of diamonds in the `diamonds` data set, grouped by `cut`. The `diamonds` data set comes in `ggplot2` and contains information about 53940 diamonds, including the `price`, `carat`, `color`, `clarity`, and `cut` of each diamond. The chart shows that more diamonds are available with high quality cuts than with low quality cuts.
The chart above displays the total number of diamonds in the `diamonds` data set, grouped by `cut`. The `diamonds` data set comes in `ggplot2` and contains information about 53,940 diamonds, including the `price`, `carat`, `color`, `clarity`, and `cut` of each diamond. The chart shows that more diamonds are available with high quality cuts than with low quality cuts.
A bar has different visual properties than a point, which can create some surprises. For example, how would you create this simple chart? If you have an R session open, give it a try.
@ -261,7 +261,7 @@ ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
```
See Section 3 to learn about other position options.
See Section 2 to learn about other position options.
#### Stats
@ -272,7 +272,7 @@ ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
```
On the x axis it displays `cut`, a variable in the `diamonds` data set. On the y axis, it displays count. But count is not a variable in the diamonds data set:
On the x axis, the chart displays `cut`, a variable in the `diamonds` data set. On the y axis, it displays count. But count is not a variable in the diamonds data set:
```{r}
head(diamonds)
@ -407,223 +407,9 @@ The template takes seven parameters, the bracketed words that appear in the temp
The seven parameters in the template are connected by a powerful idea known as the _Grammar of Graphics_, a system for describing plots. The grammar shows that you can uniquely describe _any_ plot as a combination of---you guessed it: a data set, a geom, a set of mappings, a stat, a position adjustment, a coordinate system, and a faceting scheme.
In other words, you can use the template above to make any graph that you can imagine---at least in theory. Section 2 will examine how this works in practice. The section explains the details of the grammar of graphics works, and it shows how `ggplot2` implements the grammar to build real graphs.
Before we look at the grammar of graphics, let's take a look at the different choices that `ggplot2` offers for geoms, stats, position adjustments, coordinate systems, and facetting schemes.
## The Grammar of Graphics
The "gg" of `ggplot2` stands for the grammar of graphics, a system for describing plots. According to the grammar, a plot is a combination of seven elements:
$$\text{plot} = \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big) + \text{coordinate system} + \text{facet scheme}$$
You might not be used to thinking of plots in this way, so let's explore the formula above with a thought exercise. If you had to build a graph from scratch, how would you do it?
Here's one way. To build a plot, you could begin with a data set to visualize and a coordinate system to visualize the data in. For this thought exercise, we will visualize an abbreviated version of the `mpg` data set, and we will use the cartesian coordinate system.
`r bookdown::embed_png("images/visualization-3.png", dpi = 400)`
You could then choose whether to visualize the data in its raw form, or whether to summarize the data with a transformation and then visualize the summary. Let's visualize our data as in its raw form. This would be the same as applying an identity transformation to the data, since an identity transformation returns the data as it is.
`r bookdown::embed_png("images/visualization-4.png", dpi = 400)`
Next, you would need to choose some sort of visual object to represent the observations in your data set. This object will be what you actually draw in the coordinate system.
Here we will use a set of points. Each point will represent one row of data. Let's call the points "geoms", short for geometrical object.
`r bookdown::embed_png("images/visualization-5.png", dpi = 400)`
Next, you could map variables in your data to the visual properties of your geoms. These visual properties are what we call aesthetics. Once you do this, the visual information contained in the point will communicate recorded information contained in the data set.
Let's map the `cyl` variable to the shape of our points.
`r bookdown::embed_png("images/visualization-6.png", dpi = 400)`
One pair of mappings would be particularly important. To place your points into your coordinate system, you would need to map a variable to the x location of the points, which is an aesthetic. Here we map `displ` to the x location.
`r bookdown::embed_png("images/visualization-7.png", dpi = 400)`
And you would need to map a variable to the y location of the points, which is also an aesthetic. Here we map `hwy` to the y location.
`r bookdown::embed_png("images/visualization-8.png", dpi = 400)`
The process creates a complete graph:
`r bookdown::embed_png("images/visualization-9.png", dpi = 400)`
However, you could modify the graph further. You could choose to adjust the position of the points (or not) and to facet the graph (or not).
`r bookdown::embed_png("images/visualization-10.png", dpi = 400)`
This process works to make any graph. If you change any of the elements involved, you would end up with a new graph. For example, we could change our geom to a line to make a line graph, or to a bar to make a bar chart. Or we could change the position to "jitter" to make a jittered plot.
`r bookdown::embed_png("images/visualization-11.png", dpi = 400)`
You could also switch the data set, coordinate system, or any other component of the graph.
Let's extend the thought expercise to add a model line to the graph. To do this, we will add a new _layer_ to the graph.
### Layers
A layer is a collection of a data set, a stat, a geom, and a position adjustment. You can add a layer to a coordinate system and faceting scheme to make a complete graph, or you can add a layer to an existing graph to make a layered graph.
Let's build a layer that uses the same data set as our previous graph. In this layer, we will apply a "smooth" stat to the data. The stat fits a model to the data and then returns a transformed data set with three new columns:
* `y` - the value of the model line at each data point
* `ymin` - the y value of the bottom of the confidence interval associated with the model at each data point
* `ymax` - the y value of the top of the confidence interval associated with the model at each point
`r bookdown::embed_png("images/visualization-12.png", dpi = 400)`
In this layer, we will represent the observations with a line geom. We map the x values of the line to `displ` and we map the y values to our new `y` variable. We won't use a position adjustment.
`r bookdown::embed_png("images/visualization-13.png", dpi = 400)`
We now have a "layer" that we can add to a coordinate system and faceting scheme to make a complete graph.
`r bookdown::embed_png("images/visualization-14.png", dpi = 400)`
Or we can add the layer to our previous graph to make a plot that shows both summary information and raw data.
`r bookdown::embed_png("images/visualization-15.png", dpi = 400)`
For completion, let's add one more layer. This layer will begin with the same data set as the previous layer. It will also use the same stat. However, we will use the ribbon geom to visualize the data points. A ribbon is similar to a shaded region contained by two lines.
We map the top of the ribbon to `ymax` and the bottom of the ribbon to `ymin`. We map the x position of the ribbon to `displ`. We will not use a position adjustment.
We can now add the layer to our graph to show in one plot:
* raw data
* a visual summary of the data (the smooth line)
* the uncertainty associated with the summary
`r bookdown::embed_png("images/visualization-16.png", dpi = 400)`
If you like, you can continue to add layers to the graph (but the graph will soon become cluttered).
The thought exercise shows that the elements of the grammar of graphics work together to build a graph. You can describe any graph with these elements, and each unique combination of elements makes a single, unique graph. You can also extend a graph by adding layers of new data, stats, geoms, mappings, and positions.
In other words, you can extend the grammar of graphics formula indefinitely to make layered plots:
$$
\begin{aligned}
\text{plot} = & \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big) + \\
& \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big)^{*} + \\
& \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big)^{*} + \\
& \text{coordinate system} + \text{facet scheme}
\end{aligned}
$$
### Working with layers
`ggplot2` syntax matches this formulation almost exactly. The basic low level function of `ggplot2` is `layer()` which combines data, stats, geoms, mappings, and positions into a single layer to plot.
If you have time on your hands, you can use `layer()` to create a multi-level plot like the one above. Initialize your plot with `ggplot()`. Then add as many calls to `layer()` as you like. Give each layer its own `data`, `stat`, `geom`, `mapping`, and `position` arguments.
```{r message = FALSE}
ggplot() +
layer(
data = mpg,
stat = "identity",
geom = "point",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
layer(
data = mpg,
stat = "smooth",
geom = "ribbon",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
layer(
data = mpg,
stat = "smooth",
geom = "line",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
coord_cartesian()
```
Although you can build all of your graphs this way, few people do because `ggplot2` supplies some very efficient shortcuts.
For example, you will find in practice that you almost always pair the same geoms with the same stats and position adjustments. For instance, you will almost always use the point geom with the "identity" stat and the "identity" position. Similarly, you will almost always use the bar geom with the "bin" stat and the "stack" position.
The `geom_` functions in `ggplot2` take advantage of these common combinations. Like `layer()`, each geom function builds a layer, but the geom functions preset the geom, stat, and position values of the layer to useful defaults. The geom that appears in the function name becomes the geom of the layer. The stat and postion most commonly asscoiated with the geom become the default stat and position of the layer.
`ggplot2` even provides geom functions for less common, but still useful combinations of geoms, stats, and positions. For example, the function `geom_jitter()` builds a layer that has a point geom, an "identity" stat, and a "jitter" position. The function `geom_smooth()` builds a "layer" that is made of two sub-layers: a line layer that displays a model line and ribbon layer that displays a standard error band.
As a result, `geom_` functions provide a more direct syntax for making plots, one that you are already familiar with from Section 1.
```{r message = FALSE}
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```
#### Multiple geoms
As with `layer()`, you can add multiple geom functions to a single plot call.
This system lets you build sophisticated graphs geom by geom, but it also makes it possible to write repetitive code. For example, the code above repeats the arguments `data = mpg` and `mapping = aes(x = displ, y = hwy)`. Repetition makes your code harder to read and write, and it also increases the chance of typos and errors.
You can avoid repetition by passing the repeated mappings to `ggplot()`. `ggplot2` will treat mappings that appear in `ggplot()` as global mappings to be applied to each layer. For example, we can eliminate the duplication of `mapping = aes(x = displ, y = hwy)` in our previous code with a global mapping argument:
```{r, eval = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg) +
geom_smooth(data = mpg)
```
You can even combine global mappings with local mappings to differentiate geoms.
* Mappings that appear in `ggplot()` will be applied to each geom.
* Mappings that appear in a geom function will be applied to that geom only.
* If a local aesthetic mapping conflicts with a global aesthetic mapping, `ggplot2` will use the local mapping. This is arbitrated on an aesthetic by aesthetic basis.
```{r, message = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg, mapping = aes(color = class)) +
geom_smooth(data = mpg)
```
This system lets us overlay a single smooth line on a set of colored points. Notice that this would not occur if you add the color aesthetic to the global mappings. In that case, smooth would use the color mapping to draw a different colored line for each class of cars.
You can use the same system to specify a global data set for every layer. In other words,
```{r, eval = FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
```
is analagous to
```{r, eval = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg) +
geom_smooth(data = mpg)
```
As with mappings, you can define a local data argument to override the global data argument on a layer by layer basis.
```{r, message = FALSE, warning = FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(data = subset(mpg, cyl == 8))
```
### Recap
Your understanding of the `ggplot2` syntax is now complete. You understand the grammar written into the syntax, and you know how to extend the syntax by adding extra layers to your plot, as well as how to truncate the syntax by relying on `ggplot2`'s default settings.
Only one thing remains. You need to learn the vocabulary of function names and argument options that you can use with your code template.
Section 3 will guide you through these functions and arguments. It catalogues all of the options that `ggplot2` puts at your fingertips for geoms, mappings, stats, position adjustments, and coordinate systems.
## The Vocabulary of Graphics
## The Vocabulary of `ggplot2`
`ggplot2` comes with 37 geom functions, 22 stats, eight coordinate systems, six position adjustments, two facetting schemes, and an uncounted number of aesthetics to map. Each of these components introduces new decisions for you to make and new dilemma's for you to consider.
@ -1313,6 +1099,221 @@ ggplot(data = mpg) +
The results of `facet_wrap()` can be easier to study than the results of `facet_grid()`. However, `facet_wrap()` can only facet by one variable at a time.
In other words, you can use the template above to make any graph that you can imagine---at least in theory. Section 2 will examine how this works in practice. The section explains the details of the grammar of graphics works, and it shows how `ggplot2` implements the grammar to build real graphs.
## The Grammar of Graphics
The "gg" of `ggplot2` stands for the grammar of graphics, a system for describing plots. According to the grammar, a plot is a combination of seven elements:
$$\text{plot} = \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big) + \text{coordinate system} + \text{facet scheme}$$
You might not be used to thinking of plots in this way, so let's explore the formula above with a thought exercise. If you had to build a graph from scratch, how would you do it?
Here's one way. To build a plot, you could begin with a data set to visualize and a coordinate system to visualize the data in. For this thought exercise, we will visualize an abbreviated version of the `mpg` data set, and we will use the cartesian coordinate system.
`r bookdown::embed_png("images/visualization-3.png", dpi = 400)`
You could then choose whether to visualize the data in its raw form, or whether to summarize the data with a transformation and then visualize the summary. Let's visualize our data as in its raw form. This would be the same as applying an identity transformation to the data, since an identity transformation returns the data as it is.
`r bookdown::embed_png("images/visualization-4.png", dpi = 400)`
Next, you would need to choose some sort of visual object to represent the observations in your data set. This object will be what you actually draw in the coordinate system.
Here we will use a set of points. Each point will represent one row of data. Let's call the points "geoms", short for geometrical object.
`r bookdown::embed_png("images/visualization-5.png", dpi = 400)`
Next, you could map variables in your data to the visual properties of your geoms. These visual properties are what we call aesthetics. Once you do this, the visual information contained in the point will communicate recorded information contained in the data set.
Let's map the `cyl` variable to the shape of our points.
`r bookdown::embed_png("images/visualization-6.png", dpi = 400)`
One pair of mappings would be particularly important. To place your points into your coordinate system, you would need to map a variable to the x location of the points, which is an aesthetic. Here we map `displ` to the x location.
`r bookdown::embed_png("images/visualization-7.png", dpi = 400)`
And you would need to map a variable to the y location of the points, which is also an aesthetic. Here we map `hwy` to the y location.
`r bookdown::embed_png("images/visualization-8.png", dpi = 400)`
The process creates a complete graph:
`r bookdown::embed_png("images/visualization-9.png", dpi = 400)`
However, you could modify the graph further. You could choose to adjust the position of the points (or not) and to facet the graph (or not).
`r bookdown::embed_png("images/visualization-10.png", dpi = 400)`
This process works to make any graph. If you change any of the elements involved, you would end up with a new graph. For example, we could change our geom to a line to make a line graph, or to a bar to make a bar chart. Or we could change the position to "jitter" to make a jittered plot.
`r bookdown::embed_png("images/visualization-11.png", dpi = 400)`
You could also switch the data set, coordinate system, or any other component of the graph.
Let's extend the thought expercise to add a model line to the graph. To do this, we will add a new _layer_ to the graph.
### Layers
A layer is a collection of a data set, a stat, a geom, and a position adjustment. You can add a layer to a coordinate system and faceting scheme to make a complete graph, or you can add a layer to an existing graph to make a layered graph.
Let's build a layer that uses the same data set as our previous graph. In this layer, we will apply a "smooth" stat to the data. The stat fits a model to the data and then returns a transformed data set with three new columns:
* `y` - the value of the model line at each data point
* `ymin` - the y value of the bottom of the confidence interval associated with the model at each data point
* `ymax` - the y value of the top of the confidence interval associated with the model at each point
`r bookdown::embed_png("images/visualization-12.png", dpi = 400)`
In this layer, we will represent the observations with a line geom. We map the x values of the line to `displ` and we map the y values to our new `y` variable. We won't use a position adjustment.
`r bookdown::embed_png("images/visualization-13.png", dpi = 400)`
We now have a "layer" that we can add to a coordinate system and faceting scheme to make a complete graph.
`r bookdown::embed_png("images/visualization-14.png", dpi = 400)`
Or we can add the layer to our previous graph to make a plot that shows both summary information and raw data.
`r bookdown::embed_png("images/visualization-15.png", dpi = 400)`
For completion, let's add one more layer. This layer will begin with the same data set as the previous layer. It will also use the same stat. However, we will use the ribbon geom to visualize the data points. A ribbon is similar to a shaded region contained by two lines.
We map the top of the ribbon to `ymax` and the bottom of the ribbon to `ymin`. We map the x position of the ribbon to `displ`. We will not use a position adjustment.
We can now add the layer to our graph to show in one plot:
* raw data
* a visual summary of the data (the smooth line)
* the uncertainty associated with the summary
`r bookdown::embed_png("images/visualization-16.png", dpi = 400)`
If you like, you can continue to add layers to the graph (but the graph will soon become cluttered).
The thought exercise shows that the elements of the grammar of graphics work together to build a graph. You can describe any graph with these elements, and each unique combination of elements makes a single, unique graph. You can also extend a graph by adding layers of new data, stats, geoms, mappings, and positions.
In other words, you can extend the grammar of graphics formula indefinitely to make layered plots:
$$
\begin{aligned}
\text{plot} = & \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big) + \\
& \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big)^{*} + \\
& \Big( \text{data} + \text{stat} + \text{geom} + \text{mappings} + \text{position} \Big)^{*} + \\
& \text{coordinate system} + \text{facet scheme}
\end{aligned}
$$
### Working with layers
`ggplot2` syntax matches this formulation almost exactly. The basic low level function of `ggplot2` is `layer()` which combines data, stats, geoms, mappings, and positions into a single layer to plot.
If you have time on your hands, you can use `layer()` to create a multi-level plot like the one above. Initialize your plot with `ggplot()`. Then add as many calls to `layer()` as you like. Give each layer its own `data`, `stat`, `geom`, `mapping`, and `position` arguments.
```{r message = FALSE}
ggplot() +
layer(
data = mpg,
stat = "identity",
geom = "point",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
layer(
data = mpg,
stat = "smooth",
geom = "ribbon",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
layer(
data = mpg,
stat = "smooth",
geom = "line",
mapping = aes(x = displ, y = hwy),
position = "identity"
) +
coord_cartesian()
```
Although you can build all of your graphs this way, few people do because `ggplot2` supplies some very efficient shortcuts.
For example, you will find in practice that you almost always pair the same geoms with the same stats and position adjustments. For instance, you will almost always use the point geom with the "identity" stat and the "identity" position. Similarly, you will almost always use the bar geom with the "bin" stat and the "stack" position.
The `geom_` functions in `ggplot2` take advantage of these common combinations. Like `layer()`, each geom function builds a layer, but the geom functions preset the geom, stat, and position values of the layer to useful defaults. The geom that appears in the function name becomes the geom of the layer. The stat and postion most commonly asscoiated with the geom become the default stat and position of the layer.
`ggplot2` even provides geom functions for less common, but still useful combinations of geoms, stats, and positions. For example, the function `geom_jitter()` builds a layer that has a point geom, an "identity" stat, and a "jitter" position. The function `geom_smooth()` builds a "layer" that is made of two sub-layers: a line layer that displays a model line and ribbon layer that displays a standard error band.
As a result, `geom_` functions provide a more direct syntax for making plots, one that you are already familiar with from Section 1.
```{r message = FALSE}
ggplot() +
geom_point(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth(data = mpg, mapping = aes(x = displ, y = hwy))
```
#### Multiple geoms
As with `layer()`, you can add multiple geom functions to a single plot call.
This system lets you build sophisticated graphs geom by geom, but it also makes it possible to write repetitive code. For example, the code above repeats the arguments `data = mpg` and `mapping = aes(x = displ, y = hwy)`. Repetition makes your code harder to read and write, and it also increases the chance of typos and errors.
You can avoid repetition by passing the repeated mappings to `ggplot()`. `ggplot2` will treat mappings that appear in `ggplot()` as global mappings to be applied to each layer. For example, we can eliminate the duplication of `mapping = aes(x = displ, y = hwy)` in our previous code with a global mapping argument:
```{r, eval = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg) +
geom_smooth(data = mpg)
```
You can even combine global mappings with local mappings to differentiate geoms.
* Mappings that appear in `ggplot()` will be applied to each geom.
* Mappings that appear in a geom function will be applied to that geom only.
* If a local aesthetic mapping conflicts with a global aesthetic mapping, `ggplot2` will use the local mapping. This is arbitrated on an aesthetic by aesthetic basis.
```{r, message = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg, mapping = aes(color = class)) +
geom_smooth(data = mpg)
```
This system lets us overlay a single smooth line on a set of colored points. Notice that this would not occur if you add the color aesthetic to the global mappings. In that case, smooth would use the color mapping to draw a different colored line for each class of cars.
You can use the same system to specify a global data set for every layer. In other words,
```{r, eval = FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
```
is analagous to
```{r, eval = FALSE}
ggplot(mapping = aes(x = displ, y = hwy)) +
geom_point(data = mpg) +
geom_smooth(data = mpg)
```
As with mappings, you can define a local data argument to override the global data argument on a layer by layer basis.
```{r, message = FALSE, warning = FALSE}
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(data = subset(mpg, cyl == 8))
```
### Recap
Your understanding of the `ggplot2` syntax is now complete. You understand the grammar written into the syntax, and you know how to extend the syntax by adding extra layers to your plot, as well as how to truncate the syntax by relying on `ggplot2`'s default settings.
Only one thing remains. You need to learn the vocabulary of function names and argument options that you can use with your code template.
Section 3 will guide you through these functions and arguments. It catalogues all of the options that `ggplot2` puts at your fingertips for geoms, mappings, stats, position adjustments, and coordinate systems.
## Customizing plots
### Titles
@ -1326,6 +1327,8 @@ The results of `facet_wrap()` can be easier to study than the results of `facet_
### Saving plots
## Summary
> "A picture is not merely worth a thousand words, it is much more likely to be scrutinized than words are to be read."---John Tukey