From 29833762241c70e5a478241051a870ccdc9dae05 Mon Sep 17 00:00:00 2001 From: Mine Cetinkaya-Rundel Date: Sat, 19 Nov 2022 15:46:51 -0500 Subject: [PATCH] Minor edits during read thru of whole game viz (#1144) * Add bit on visualize and Quarto * Minor edits during readthru * Fix typo --- data-visualize.qmd | 23 ++++++++++++----------- preface-2e.qmd | 4 ++++ 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/data-visualize.qmd b/data-visualize.qmd index 4eb2922..a53b096 100644 --- a/data-visualize.qmd +++ b/data-visualize.qmd @@ -29,7 +29,7 @@ library(tidyverse) That one line of code loads the core tidyverse; packages which you will use in almost every data analysis. It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded). -If you run this code and get the error message "there is no package called 'tidyverse'", you'll need to first install it, then run `library()` once again. +If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again. ```{r} #| eval: false @@ -54,7 +54,7 @@ Nonlinear? You can test your answer with the `mpg` **data frame** found in ggplot2 (a.k.a. `ggplot2::mpg`). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). -`mpg` contains observations collected by the US Environmental Protection Agency on 38 car models. +`mpg` contains `r nrow(mpg)` observations collected by the US Environmental Protection Agency on `r mpg |> distinct(model) |> nrow()` car models. ```{r} mpg @@ -62,16 +62,16 @@ mpg Among the variables in `mpg` are: -1. `displ`, a car's engine size, in liters. +1. `displ`: a car's engine size, in liters. -2. `hwy`, a car's fuel efficiency on the highway, in miles per gallon (mpg). +2. `hwy`: a car's fuel efficiency on the highway, in miles per gallon (mpg). A car with a low fuel efficiency consumes more fuel than a car with a high fuel efficiency when they travel the same distance. To learn more about `mpg`, open its help page by running `?mpg`. ### Creating a ggplot -To plot `mpg`, run this code to put `displ` on the x-axis and `hwy` on the y-axis: +To plot `mpg`, run this code to put `displ` on the x-axis, `hwy` on the y-axis, and represent each observation with a point: ```{r} #| fig-alt: > @@ -88,6 +88,7 @@ Does this confirm or refute your hypothesis about fuel efficiency and engine siz With ggplot2, you begin a plot with the function `ggplot()`. `ggplot()` creates a coordinate system that you can add layers to. +You can think of it like an empty canvas you'll paint the rest of your plot on, layer by layer. The first argument of `ggplot()` is the dataset to use in the graph. So `ggplot(data = mpg)` creates an empty graph, but it's not very interesting so we won't show it here. @@ -151,7 +152,8 @@ How can you explain these cars? ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + geom_point() + - geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 2.2) + geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 1.6) + + geom_point(data = dplyr::filter(mpg, displ > 5, hwy > 20), color = "red", size = 3, shape = "circle open") ``` Let's hypothesize that the cars are hybrids. @@ -211,7 +213,7 @@ These cars don't seem like hybrids, and are, in fact, sports cars! Sports cars have large engines like SUVs and pickup trucks, but small bodies like midsize and compact cars, which improves their gas mileage. In hindsight, these cars were unlikely to be hybrids since they have large engines. -In the above example, we mapped `class` to the color aesthetic, but we could have mapped `class` to the size aesthetic in the same way. +In the above example, we mapped `class` to the `color` aesthetic, but we could have mapped `class` to the `size` aesthetic in the same way. In this case, the exact size of each point would reveal its class affiliation. We get a *warning* here: mapping an unordered variable (`class`) to an ordered aesthetic (`size`) is generally not a good idea because it implies a ranking that does not in fact exist. @@ -227,7 +229,7 @@ ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, size = class)) ``` -Similarly, we could have mapped `class` to the *alpha* aesthetic, which controls the transparency of the points, or to the *shape* aesthetic, which controls the shape of the points. +Similarly, we could have mapped `class` to the `alpha` aesthetic, which controls the transparency of the points, or to the `shape` aesthetic, which controls the shape of the points. ```{r} #| layout-ncol: 2 @@ -329,8 +331,7 @@ ggplot(shapes, aes(x, y)) + ### Exercises -1. What's gone wrong with this code? - Why are the points not blue? +1. Why did the following code not result in a plot with blue points? ```{r} #| fig-alt: > @@ -386,7 +387,7 @@ Don't worry if the help doesn't seem that helpful - instead skip down to the exa If that doesn't help, carefully read the error message. Sometimes the answer will be buried there! -But when you're new to R, the answer might be in the error message but you don't yet know how to understand it. +But when you're new to R, even if the answer is in the error message, you might not yet know how to understand it. Another great tool is Google: try googling the error message, as it's likely someone else has had the same problem, and has gotten help online. ## Facets diff --git a/preface-2e.qmd b/preface-2e.qmd index 468439d..2162b1d 100644 --- a/preface-2e.qmd +++ b/preface-2e.qmd @@ -7,6 +7,8 @@ Welcome to the second edition of "R for Data Science". - The first part is renamed to "whole game" to reflect the entire data science cycle. It gains a new chapter that briefly introduces the basics of reading data from csv files. +- We've added a new part called visualize. + - The wrangle part is now transform and gains new chapters on numbers, logical vectors, and missing values. These were previously parts of the data transformation chapter, but needed much more room. @@ -19,6 +21,8 @@ Welcome to the second edition of "R for Data Science". - We've switched from the magrittr pipe to the base pipe. +- The communicate part now features writing computational documents with Quarto. + ## Acknowledgements {.unnumbered} *TO DO: Add acknowledgements.*