Local bookdown working

This commit is contained in:
hadley 2015-12-11 13:28:10 -06:00
parent bad4c9d975
commit 8e40393cf5
21 changed files with 149 additions and 179 deletions

2
.gitignore vendored
View File

@ -8,3 +8,5 @@ temp.Rmd
*_files
figures
.Rapp.history
_main.Rmd
book_assets

View File

@ -31,7 +31,8 @@ Imports:
Remotes:
gaborcsardi/rcorpora,
garrettgman/DSR,
hadley/bookdown,
hadley/purrr,
hadley/stringr,
hadley/ggplot2
hadley/ggplot2,
rstudio/bookdown,
yihui/knitr

View File

@ -3,7 +3,14 @@
This is code and text behind the [R for data science](http://r4ds.had.co.nz)
book.
The site is built using jekyll, with a custom plugin to render `.rmd` files with
The site is built using [bookdown]
```{r}
devtools::install_github("yihui/knitr")
devtools::install_github("rstudio/bookdown")
```
jekyll, with a custom plugin to render `.rmd` files with
knitr and pandoc. To create the site, you need:
* jekyll gem: `gem install jekyll`

View File

@ -1,5 +1,20 @@
name: R for data science
markdown: redcarpet
highlighter: pygments
rmd_files: [
"index.Rmd",
"intro.Rmd",
"visualize.Rmd",
"transform.Rmd",
"tidy.Rmd",
"model.Rmd",
"import.Rmd",
"eda.Rmd",
"rmarkdown.Rmd",
"shiny.Rmd",
"data-structures.Rmd",
"functions.Rmd",
"strings.Rmd",
"datetimes.Rmd",
"lists.Rmd",
"model-vis.Rmd",
"model-assess.Rmd",
]
exclude: ["CONTRIBUTING.md", "README.md", "book", "vendor"]

View File

@ -1,9 +1,3 @@
---
layout: default
title: Data structures
output: bookdown::html_chapter
---
# Data structures
Might be quite brief.

View File

@ -1,7 +1 @@
---
layout: default
title: Dates and times
output: bookdown::html_chapter
---
# Dates and times

23
eda.Rmd
View File

@ -1,9 +1,3 @@
---
layout: default
title: Exploratory data analysis
output: bookdown::html_chapter
---
# Exploratory data analysis
```{r, include = FALSE}
@ -82,6 +76,7 @@ ggplot(data = diamonds) +
***
*Tip*: You can compute the counts of a discrete variable quickly with R's `table()` function. These are the numbers that `geom_bar()` visualizes.
```{r}
table(diamonds$cut)
```
@ -94,19 +89,27 @@ The strategy of counting the number of observations at each value breaks down fo
To get around this, data scientists divide the range of a continuous variable into equally spaced intervals, a process called _binning_.
`r bookdown::embed_png("images/visualization-17.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-17.png")
```
They then count how many observations fall into each bin.
`r bookdown::embed_png("images/visualization-18.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-18.png")
```
And display the count as a bar, or some other object.
`r bookdown::embed_png("images/visualization-19.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-19.png")
```
This method is temperamental because the appearance of the distribution can change dramatically if the bin size changes. As no bin size is "correct," you should explore several bin sizes when examining data.
`r bookdown::embed_png("images/visualization-20.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-20.png")
```
Several geoms exist to help you visualize continuous distributions. They almost all use the "bin" stat to implement the above strategy. For each of these geoms, you can set the following arguments for "bin" to use:

View File

@ -1,9 +1,3 @@
---
layout: default
title: Expressing yourself
output: bookdown::html_chapter
---
# Expressing yourself in code
```{r, include = FALSE}

View File

@ -1,9 +1,3 @@
---
layout: default
title: Data import
output: bookdown::html_chapter
---
# Data import
```{r, include = FALSE}

View File

@ -1,7 +1,11 @@
---
layout: default
title: Welcome
output: bookdown::html_chapter
knit: "bookdown::render_book"
output:
bookdown::html_chapters:
lib_dir: "book_assets"
---
# R for Data Science
@ -11,10 +15,3 @@ This is the book site for __"R for data science"__. This book will teach you how
To be published by O'Reilly in July 2016.
<img src="cover.png" width="250" height="328" alt="Cover image" />
## Table of contents {#toc}
<ul class="toc">
{% include package-nav.html %}
</ul>

View File

@ -1,12 +1,6 @@
---
layout: default
title: Welcome
output: bookdown::html_chapter
---
# Introduction
# Welcome
```{r setup, include = FALSE}
```{r setup-intro, include = FALSE}
source("common.R")
install.packages <- function(...) invisible()
```
@ -17,7 +11,9 @@ Data science is an exciting discipline that allows you to turn raw data into und
Data science is a huge field, and there's no way you can master it by reading a single book. The goal of this book is to give you a solid foundation with the most important tools. Our model of the tools needed in a typical data science project looks something like this:
`r bookdown::embed_png("diagrams/data-science.png")`
```{r}
knitr::include_graphics("diagrams/data-science.png")
```
First you must __import__ your data in R. This typically means that you take data stored in file, in a database, or in an web API, and load it into a data frame in R. If you can't get your data into R, you can't do data science on it!
@ -108,7 +104,9 @@ To run the code in this book, you will need to install both R and the RStudio ID
RStudio is an integated development environment, or IDE, for R programming. There are three key regions:
`r bookdown::embed_png("screenshots/rstudio-layout.png", dpi = 220)`
```{r}
knitr::include_graphics("screenshots/rstudio-layout.png")
```
You run R code in the __console__ pane. Textual output appears inline, and graphical output appears in the __output__ pane. You write more complex R scripts in the __editor__ pane.
@ -126,7 +124,9 @@ If you want to see a list of all keyboard shortcuts, use the meta keyboard short
We strongly recommend making two changes to the default RStudio options:
`r bookdown::embed_png("screenshots/rstudio-workspace.png", dpi = 220)`
```{r}
knitr::include_graphics("screenshots/rstudio-workspace.png")
```
This ensures that every time you restart RStudio you get a completely clean slate. This is good pratice because it encourages you to capture all important interactions in your code. There's nothing worse than discovering three months after the fact that you've only stored the results of important calculation in your workspace, not the calculation itself in your code. During a project, it's good practice to regularly restart R either using the menu Session | Restart R or the keyboard shortcut Cmd + Shift + F10.

View File

@ -1,15 +1,8 @@
---
layout: default
title: Working with lists
output: bookdown::html_chapter
---
# Lists
```{r setup, include=FALSE}
```{r setup-lists, include=FALSE}
library(purrr)
source("common.R")
source("images/embed_jpg.R")
```
In this chapter, you'll learn how to handle lists, the data structure R uses for complex, hierarchical objects. You've already familiar with vectors, R's data structure for 1d objects. Lists extend these ideas to model objects that are like trees. You can create a hierarchical structure with a list because unlike vectors, a list can contain other lists.
@ -82,7 +75,9 @@ x3 <- list(1, list(2, list(3)))
I draw them as follows:
`r bookdown::embed_png("diagrams/lists-structure.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-structure.png")
```
* Lists are rounded rectangles that contain their children.
@ -129,20 +124,22 @@ a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
Or visually:
`r bookdown::embed_png("diagrams/lists-subsetting.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-subsetting.png")
```
### Lists of condiments
It's easy to get confused between `[` and `[[`, but it's important to understand the difference. A few months ago I stayed at a hotel with a pretty interesting pepper shaker that I hope will help remember these differences:
```{r, echo = FALSE}
embed_jpg("images/pepper.jpg", 300)
knitr::include_graphics("images/pepper.jpg")
```
If this pepper shaker is your list `x`, then, `x[1]` is a pepper shaker containing a single pepper packet:
```{r, echo = FALSE}
embed_jpg("images/pepper-1.jpg", 300)
knitr::include_graphics("images/pepper-1.jpg")
```
`x[2]` would look the same, but would contain the second packet. `x[1:2]` would be a pepper shaker containing two pepper packets.
@ -150,13 +147,13 @@ embed_jpg("images/pepper-1.jpg", 300)
`x[[1]]` is:
```{r, echo = FALSE}
embed_jpg("images/pepper-2.jpg", 300)
knitr::include_graphics("images/pepper-2.jpg")
```
If you wanted to get the content of the pepper package, you'd need `x[[1]][[1]]`:
```{r, echo = FALSE}
embed_jpg("images/pepper-3.jpg", 300)
knitr::include_graphics("images/pepper-3.jpg")
```
### Exercises
@ -508,7 +505,9 @@ flatten_dbl(y)
Graphically, that sequence of operations looks like:
`r bookdown::embed_png("diagrams/lists-flatten.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-flatten.png")
````
Whenever I get confused about a sequence of flattening operations, I'll often draw a diagram like this to help me understand what's going on.
@ -529,7 +528,9 @@ x %>% transpose() %>% str()
Graphically, this looks like:
`r bookdown::embed_png("diagrams/lists-transpose.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-transpose.png")
```
You'll see an example of this in the next section, as `transpose()` is particularly useful in conjunction with adverbs like `safely()` and `quietly()`.
@ -638,7 +639,9 @@ map2(mu, sigma, rnorm, n = 10)
`map2()` generates this series of function calls:
`r bookdown::embed_png("diagrams/lists-map2.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-map2.png")
```
The arguments that vary for each call come before the function name, and arguments that are the same for every function call come afterwards.
@ -664,7 +667,9 @@ args1 %>% pmap(rnorm) %>% str()
That looks like:
`r bookdown::embed_png("diagrams/lists-pmap-unnamed.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-pmap-unnamed.png")
```
However, instead of relying on position matching, it's better to name the arguments. This is more verbose, but it makes the code clearer.
@ -675,7 +680,9 @@ args2 %>% pmap(rnorm) %>% str()
That generates longer, but safer, calls:
`r bookdown::embed_png("diagrams/lists-pmap-named.png", dpi = 220)`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-pmap-named.png")
```
Since the arguments are all the same length, it makes sense to store them in a data frame:
@ -706,7 +713,9 @@ To handle this case, you can use `invoke_map()`:
invoke_map(f, param, n = 5) %>% str()
```
`r bookdown::embed_png("diagrams/lists-invoke.png")`
```{r, echo = FALSE}
knitr::include_graphics("diagrams/lists-invoke.png")
```
The first argument is a list of functions or character vector of function names. The second argument is a list of lists giving the arguments that vary for each function. The subsequent arguments are passed on to every function.

View File

@ -1,12 +1,6 @@
---
layout: default
title: Model assessment
output: bookdown::html_chapter
---
# Model assessment
```{r setup, include=FALSE}
```{r setup-model, include=FALSE}
library(purrr)
set.seed(1014)
options(digits = 3)

View File

@ -1,7 +1,3 @@
---
layout: default
title: Models and visualisation
output: bookdown::html_chapter
---
# Model visualisation
Gap minder

View File

@ -1,9 +1,3 @@
---
layout: default
title: Model
output: bookdown::html_chapter
---
# Model
After reading this chapter, what can you do that you couldn't before?

View File

@ -1,11 +1,4 @@
---
layout: default
title: R Markdown
output: bookdown::html_chapter
---
# RMarkdown
# R Markdown
Recommendations for learning more about communication:

View File

@ -1,7 +1 @@
---
layout: default
title: Shiny
output: bookdown::html_chapter
---
# Shiny

View File

@ -1,13 +1,6 @@
---
layout: default
title: String manipulation
output: bookdown::html_chapter
---
# String manipulation
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```{r setup-strings, include = FALSE}
library(stringr)
common <- rcorpora::corpora("words/common")$commonWords
@ -71,8 +64,8 @@ str_length(NA)
The common `str_` prefix is particularly useful if you use RStudio, because typing `str_` will trigger autocomplete, allowing you to see all stringr functions:
```{r}
bookdown::embed_png("screenshots/stringr-autocomplete.png", dpi = 220)
```{r, echo = FALSE}
knitr::include_graphics("screenshots/stringr-autocomplete.png")
```
### Combining strings
@ -199,20 +192,20 @@ To learn regular expressions, we'll use `str_show()` and `str_show_all()`. These
The simplest patterns match exact strings:
```{r}
```{r, cache = FALSE}
x <- c("apple", "banana", "pear")
str_view(x, "an")
```
The next step up in complexity is `.`, which matches any character (except a new line):
```{r}
```{r, cache = FALSE}
str_view(x, ".a.")
```
But if "`.`" matches any character, how do you match an actual "`.`"? You need to use an "escape" to tell the regular expression you want to match it exactly, not use the special behaviour. The escape character used by regular expressions is `\`. Unfortunately, that's also the escape character used by strings, so to match a literal "`.`" you need to use `\\.`.
```{r}
```{r, cache = FALSE}
# To create the regular expression, we need \\
dot <- "\\."
@ -225,7 +218,7 @@ str_view(c("abc", "a.c", "bef"), "a\\.c")
If `\` is used an escape character, how do you match a literal `\`? Well you need to escape it, creating the regular expression `\\`. To create that regular expression, you need to use a string, which also needs to escape `\`. That means to match a literal `\` you need to write `"\\\\"` - you need four backslashes to match one!
```{r}
```{r, cache = FALSE}
x <- "a\\b"
writeLines(x)
@ -250,7 +243,7 @@ By default, regular expressions will match any part of a string. It's often usef
* `^` to match the start of the string.
* `$` to match the end of the string.
```{r}
```{r, cache = FALSE}
x <- c("apple", "banana", "pear")
str_view(x, "^a")
str_view(x, "a$")
@ -260,7 +253,7 @@ To remember which is which, try this mneomic which I learned from [Evan Misshula
To force a regular expression to only match a complete string, anchor it with both `^` and `$`.:
```{r}
```{r, cache = FALSE}
x <- c("apple pie", "apple", "apple cake")
str_view(x, "apple")
str_view(x, "^apple$")
@ -301,13 +294,13 @@ Remember, to create a regular expression containing `\d` or `\s`, you'll need to
You can use _alternation_ to pick between one or more alternative patterns. For example, `abc|d..f` will match either '"abc"', or `"deaf"`. Note that the precedence for `|` is low, so that `abc|xyz` matches either `abc` or `xyz` not `abcyz` or `abxyz`:
```{r}
```{r, cache = FALSE}
str_view(c("abc", "xyz"), "abc|xyz")
```
Like with mathematical expression, if precedence ever gets confusing, use parentheses to make it clear what you want:
```{r}
```{r, cache = FALSE}
str_view(c("grey", "gray"), "gr(e|a)y")
```
@ -373,7 +366,7 @@ Note that the precedence of these operators are high, so you can write: `colou?r
You learned about parentheses earlier as a way to disambiguate complex expression. They do one other special thing: they also define numeric groups that you can refer to with _backreferences_, `\1`, `\2` etc.For example, the following regular expression finds all fruits that have a pair letters that's repeated.
```{r}
```{r, cache = FALSE}
str_view(fruit, "(..)\\1", match = TRUE)
```
@ -461,7 +454,7 @@ mean(str_count(common, "[aeiou]"))
Note that matches never overlap. For example, in `"abababa"`, how many times will the pattern `"aba"` match? Regular expressions say two, not three:
```{r}
```{r, cache = FALSE}
str_count("abababa", "aba")
str_view_all("abababa", "aba")
```
@ -510,7 +503,7 @@ head(matches)
Note that `str_extract()` only extracts the first match. We can see that most easily by first selecting all the sentences that have more than 1 match:
```{r}
```{r, cache = FALSE}
more <- sentences[str_count(sentences, colour_match) > 1]
str_view_all(more, colour_match)
@ -646,7 +639,7 @@ fields %>% str_split(": ", n = 2, simplify = TRUE)
Instead of splitting up strings by patterns, you can also split up by character, line, sentence and word `boundary()`s:
```{r}
```{r, cache = FALSE}
x <- "This is a sentence. This is another sentence."
str_view_all(x, boundary("word"))
@ -683,7 +676,7 @@ You can use the other arguments of `regex()` to control details of the match:
* `ignore_case = TRUE` allows characters to match either their uppercase or
lowercase forms. This always uses the current locale.
```{r}
```{r, cache = FALSE}
bananas <- c("banana", "Banana", "BANANA")
str_view(bananas, "banana")
str_view(bananas, regex("banana", ignore_case = TRUE))
@ -692,7 +685,7 @@ You can use the other arguments of `regex()` to control details of the match:
* `multiline = TRUE` allows `^` and `$` to match the start and end of each
line rather than the start and end of the complete string.
```{r}
```{r, cache = FALSE}
x <- "Line 1\nLine 2\nLine 3"
str_view_all(x, "^Line")
str_view_all(x, regex("^Line", multiline = TRUE))
@ -773,7 +766,7 @@ There are three other functions you can use instead of `regex()`:
* As you saw with `str_split()` you can use `boundary()` to match boundaries.
You can also use it with the other functions, all though
```{r}
```{r, cache = FALSE}
x <- "This is a sentence."
str_view_all(x, boundary("word"))
str_extract_all(x, boundary("word"))

View File

@ -1,9 +1,3 @@
---
layout: default
title: Tidy Data
output: bookdown::html_chapter
---
# Tidy data
> "Tidy datasets are all alike but every messy dataset is messy in its
@ -68,7 +62,10 @@ R follows a set of conventions that makes one layout of tabular data much easier
Data that satisfies these rules is known as *tidy data*. Notice that `table1` is tidy data.
`r bookdown::embed_png("images/tidy-1.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-1.png")
```
*In `table1`, each variable is placed in its own column, each observation in its own row, and each value in its own cell.*
Tidy data builds on a premise of data science that data sets contain *both values and relationships*. Tidy data displays the relationships in a data set as consistently as it displays the values in a data set.
@ -79,7 +76,10 @@ Tidy data works well with R because it takes advantage of R's traits as a vector
Tidy data arranges values so that the relationships between variables in a data set will parallel the relationship between vectors in R's storage objects. R stores tabular data as a data frame, a list of atomic vectors arranged to look like a table. Each column in the table is an atomic vector in the list. In tidy data, each variable in the data set is assigned to its own column, i.e., its own vector in the data frame.
`r bookdown::embed_png("images/tidy-2.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-2.png")
```
*A data frame is a list of vectors that R displays as a table. When your data is tidy, the values of each variable fall in their own column vector.*
As a result, you can extract the all of the values of a variable in a tidy data set by extracting the column vector that contains the variable. You can do this easily with R's list syntax, i.e.
@ -111,7 +111,9 @@ table1$population / table1$cases
To create the output, R applies the function in element-wise fashion: R first applies the function (or operation) to the first elements of each vector involved. Then R applies the function (or operation) to the second elements of each vector involved, and so on until R reaches the end of the vectors. If one vector is shorter than the others, R will recycle its values as needed (according to a set of recycling rules).
`r bookdown::embed_png("images/tidy-3.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-3.png")
```
If your data is tidy, element-wise execution will ensure that observations are preserved across functions and operations. Each value will only be paired with other values that appear in the same row of the data frame. In a tidy data frame, these values will be values of the same observation.
@ -129,7 +131,9 @@ If you use basic R syntax, your calculations will look like the code below. If y
#### Data set one
`r bookdown::embed_png("images/tidy-4.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-4.png")
```
Since `table1` is organized in a tidy fashion, you can calculate the rate like this,
@ -140,7 +144,9 @@ table1$cases / table1$population * 10000
#### Data set two
`r bookdown::embed_png("images/tidy-5.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-5.png")
```
Data set two intermingles the values of *population* and *cases* in the same column, *value*. As a result, you will need to untangle the values whenever you want to work with each variable separately.
@ -155,7 +161,9 @@ table2$value[case_rows] / table2$value[pop_rows] * 10000
#### Data set three
`r bookdown::embed_png("images/tidy-6.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-6.png")
```
Data set three combines the values of cases and population into the same cells. It may seem that this would help you calculate the rate, but that is not so. You will need to separate the population values from the cases values if you wish to do math with them. This can be done, but not with "basic" R syntax.
@ -166,7 +174,9 @@ Data set three combines the values of cases and population into the same cells.
#### Data set four
`r bookdown::embed_png("images/tidy-7.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-7.png")
```
Data set four stores the values of each variable in a different format: as a column, a set of column names, or a field of cells. As a result, you will need to work with each variable differently. This makes code written for data set four hard to generalize. The code that extracts the values of *year*, `names(table4)[-1]`, cannot be generalized to extract the values of population, `c(table5$1999, table5$2000, table5$2001)`. Compare this to data set one. With `table1`, you can use the same code to extract the values of year, `table1$year`, that you use to extract the values of population. To do so, you only need to change the name of the variable that you will access: `table1$population`.
@ -248,7 +258,10 @@ spread(table2, key, value)
`spread()` returns a copy of your data set that has had the key and value columns removed. In their place, `spread()` adds a new column for each unique key in the key column. These unique keys will form the column names of the new columns. `spread()` distributes the cells of the former value column across the cells of the new columns and truncates any non-key, non-value columns in a way that prevents duplication.
`r bookdown::embed_png("images/tidy-8.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-8.png")
```
*`spread()` distributes a pair of key:value columns into a field of cells. The unique keys in the key column become the column names of the field of cells.*
You can see that `spread()` maintains each of the relationships expressed in the original data set. The output contains the four original variables, *country*, *year*, *population*, and *cases*, and the values of these variables are grouped according to the orginal observations. As a bonus, now the layout of these relationships is tidy.
@ -279,7 +292,9 @@ gather(table4, "year", "cases", 2:3)
We've placed "key" in quotation marks because you will usually use `gather()` to create tidy data. In this case, the "key" column will contain values, not keys. The values will only be keys in the sense that they were formally in the column names, a place where keys belong.
`r bookdown::embed_png("images/tidy-9.png", 220)`
```{r, echo = FALSE}
knitr::include_graphics("images/tidy-9.png")
```
Just like `spread()`, gather maintains each of the relationships in the original data set. This time `table4` only contained three variables, *country*, *year* and *cases*. Each of these appears in the output of `gather()` in a tidy fashion.

View File

@ -1,12 +1,6 @@
---
layout: default
title: Data transformation
output: bookdown::html_chapter
---
# Data transformation {#transform}
```{r setup, include=FALSE}
```{r setup-transform, include=FALSE}
library(dplyr)
library(nycflights13)
source("common.R")

View File

@ -1,12 +1,6 @@
---
layout: default
title: Data Visualization
output: bookdown::html_chapter
---
# Data visualisation
```{r setup, include = FALSE}
```{r setup-visualise, include = FALSE}
knitr::opts_chunk$set(
cache = TRUE,
fig.path = "figures/"
@ -96,7 +90,9 @@ The graph shows a negative relationship between engine size (`displ`) and fuel e
One group of points seems to fall outside of the linear trend. These cars have a higher mileage than you might expect. Can you tell why? Before we examine these cars, let's review the code that made our graph.
`r bookdown::embed_png("images/visualization-1.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-1.png")
```
#### Template
@ -134,7 +130,9 @@ You can add a third value, like `class`, to a two dimensional scatterplot by map
An aesthetic is a visual property of the points in your plot. Aesthetics include things like the size, the shape, or the color of your points. You can display a point (like the one below) in different ways by changing the values of its aesthetic properties. Since we already use the word "value" to describe data, let's use the word "level" to describe aesthetic properties. Here we change the levels of a point's size, shape, and color to make the point small, trianglular, or blue.
`r bookdown::embed_png("images/visualization-2.png", dpi = 300)`
```{r, echo = FALSE}
knitr::include_graphics("images/visualization-2.png")
```
You can convey information about your data by mapping the aesthetics in your plot to the variables in your data set. For example, we can map the colors of our points to the `class` variable. Then the color of each point will reveal its class affiliation.
@ -304,8 +302,6 @@ In practice, `ggplot2` will automatically detect when it needs to group the data
***
`r bookdown::embed_png("images/blank.png", dpi = 300)`
***
#### Layers
@ -532,12 +528,8 @@ Some graphs, like scatterplots, plot the raw values of your data set. Other grap
`ggplot2` calls the algorithm that a graph uses to transform raw data a _stat_, which is short for statistical transformation. Each geom in `ggplot2` is associated with a default stat that it uses to plot your data. `geom_bar()` uses the "count" stat, which computes a data set of counts for each x value from your raw data. `geom_bar()` then uses this computed data to make the plot.
`r bookdown::embed_png("images/blank.png", dpi = 300)`
A few geoms, like `geom_point()`, plot your raw data as it is. To keep things simple, let's imagine that these geoms also transform the data. They just use a very lame transformation, the identity transformation, which returns the data in its original state. Now we can say that _every_ geom uses a stat.
`r bookdown::embed_png("images/blank.png", dpi = 300)`
You can learn which stat a geom uses, as well as what variables it computes by visiting the geom's help page. For example, the help page of `geom_bar()` shows that it uses the count stat and that the count stat computes two new variables, `count` and `prop`. If you have an R session open---and you should!---you can verify this by running `?geom_bar` at the command line.
Stats are the most subtle part of plotting because you do not see them in action. `ggplot2` applies the transformation and stores the results behind the scenes. You only see the finished plot. Moreover, `ggplot2` applies stats automatically, with a very intuitive set of defaults. So why bother thinking about them? Because you can use stats to do three very useful things.
@ -589,7 +581,6 @@ Use consideration when you change a geom's stat. Many combinations of geoms and
***
`r bookdown::embed_png("images/blank.png", dpi = 300)`
***
@ -638,8 +629,6 @@ ggplot(data = diamonds) +
***
`r bookdown::embed_png("images/blank.png", dpi = 300)`
***
***
@ -724,8 +713,6 @@ To see how this works, consider how you could build a basic plot from scratch: y
***
`r bookdown::embed_png("images/blank.png", dpi = 300)`
***
Although this method may seem complicated, you could use it to build _any_ plot that you imagine. In other words, you can use the code template that you've learned in this chapter to build hundreds of thousnds of unique plots.