Minor edits during read thru of whole game viz (#1144)

* Add bit on visualize and Quarto

* Minor edits during readthru

* Fix typo
This commit is contained in:
Mine Cetinkaya-Rundel 2022-11-19 15:46:51 -05:00 committed by GitHub
parent 5e47710f81
commit 2983376224
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 16 additions and 11 deletions

View File

@ -29,7 +29,7 @@ library(tidyverse)
That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).
If you run this code and get the error message "there is no package called 'tidyverse'", you'll need to first install it, then run `library()` once again.
If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again.
```{r}
#| eval: false
@ -54,7 +54,7 @@ Nonlinear?
You can test your answer with the `mpg` **data frame** found in ggplot2 (a.k.a. `ggplot2::mpg`).
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).
`mpg` contains observations collected by the US Environmental Protection Agency on 38 car models.
`mpg` contains `r nrow(mpg)` observations collected by the US Environmental Protection Agency on `r mpg |> distinct(model) |> nrow()` car models.
```{r}
mpg
@ -62,16 +62,16 @@ mpg
Among the variables in `mpg` are:
1. `displ`, a car's engine size, in liters.
1. `displ`: a car's engine size, in liters.
2. `hwy`, a car's fuel efficiency on the highway, in miles per gallon (mpg).
2. `hwy`: a car's fuel efficiency on the highway, in miles per gallon (mpg).
A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance.
To learn more about `mpg`, open its help page by running `?mpg`.
### Creating a ggplot
To plot `mpg`, run this code to put `displ` on the x-axis and `hwy` on the y-axis:
To plot `mpg`, run this code to put `displ` on the x-axis, `hwy` on the y-axis, and represent each observation with a point:
```{r}
#| fig-alt: >
@ -88,6 +88,7 @@ Does this confirm or refute your hypothesis about fuel efficiency and engine siz
With ggplot2, you begin a plot with the function `ggplot()`.
`ggplot()` creates a coordinate system that you can add layers to.
You can think of it like an empty canvas you'll paint the rest of your plot on, layer by layer.
The first argument of `ggplot()` is the dataset to use in the graph.
So `ggplot(data = mpg)` creates an empty graph, but it's not very interesting so we won't show it here.
@ -151,7 +152,8 @@ How can you explain these cars?
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 2.2)
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 1.6) +
geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 3, shape = "circle open")
```
Let's hypothesize that the cars are hybrids.
@ -211,7 +213,7 @@ These cars don't seem like hybrids, and are, in fact, sports cars!
Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage.
In hindsight, these cars were unlikely to be hybrids since they have large engines.
In the above example, we mapped `class` to the color aesthetic, but we could have mapped `class` to the size aesthetic in the same way.
In the above example, we mapped `class` to the `color` aesthetic, but we could have mapped `class` to the `size` aesthetic in the same way.
In this case, the exact size of each point would reveal its class affiliation.
We get a *warning* here: mapping an unordered variable (`class`) to an ordered aesthetic (`size`) is generally not a good idea because it implies a ranking that does not in fact exist.
@ -227,7 +229,7 @@ ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, size = class))
```
Similarly, we could have mapped `class` to the *alpha* aesthetic, which controls the transparency of the points, or to the *shape* aesthetic, which controls the shape of the points.
Similarly, we could have mapped `class` to the `alpha` aesthetic, which controls the transparency of the points, or to the `shape` aesthetic, which controls the shape of the points.
```{r}
#| layout-ncol: 2
@ -329,8 +331,7 @@ ggplot(shapes, aes(x, y)) +
### Exercises
1. What's gone wrong with this code?
Why are the points not blue?
1. Why did the following code not result in a plot with blue points?
```{r}
#| fig-alt: >
@ -386,7 +387,7 @@ Don't worry if the help doesn't seem that helpful - instead skip down to the exa
If that doesn't help, carefully read the error message.
Sometimes the answer will be buried there!
But when you're new to R, the answer might be in the error message but you don't yet know how to understand it.
But when you're new to R, even if the answer is in the error message, you might not yet know how to understand it.
Another great tool is Google: try googling the error message, as it's likely someone else has had the same problem, and has gotten help online.
## Facets

View File

@ -7,6 +7,8 @@ Welcome to the second edition of "R for Data Science".
- The first part is renamed to "whole game" to reflect the entire data science cycle.
It gains a new chapter that briefly introduces the basics of reading data from csv files.
- We've added a new part called visualize.
- The wrangle part is now transform and gains new chapters on numbers, logical vectors, and missing values.
These were previously parts of the data transformation chapter, but needed much more room.
@ -19,6 +21,8 @@ Welcome to the second edition of "R for Data Science".
- We've switched from the magrittr pipe to the base pipe.
- The communicate part now features writing computational documents with Quarto.
## Acknowledgements {.unnumbered}
*TO DO: Add acknowledgements.*