Typo fixes, closes #1029

This commit is contained in:
Mine Çetinkaya-Rundel 2022-06-01 00:15:55 -04:00
parent dd64a615bd
commit 57206d364a
1 changed files with 13 additions and 13 deletions

View File

@ -9,8 +9,8 @@ status("polishing")
## Introduction
You've already learned the basics of missing values earlier in the the book.
You first saw them in @sec-summarize where they interfered with computing summary statistics, and you learned about their their infectious nature and how to check for their presence in @sec-na-comparison.
You've already learned the basics of missing values earlier in the book.
You first saw them in @sec-summarize where they interfered with computing summary statistics, and you learned about their infectious nature and how to check for their presence in @sec-na-comparison.
Now we'll come back to them in more depth, so you can learn more of the details.
We'll start by discussing some general tools for working with missing values recorded as `NA`s.
@ -19,7 +19,7 @@ We'll finish off with a related discussion of empty groups, caused by factor lev
### Prerequisites
The functions for working will missing data mostly come from dplyr and tidyr, which are core members of the tidyverse.
The functions for working with missing data mostly come from dplyr and tidyr, which are core members of the tidyverse.
```{r}
#| label: setup
@ -56,7 +56,7 @@ treatment |>
```
This treatment is sometimes called "last observation carried forward", or **locf** for short.
You can use the `direction` argument to fill in missing values that have been generated in more exotic ways.
You can use the `.direction` argument to fill in missing values that have been generated in more exotic ways.
### Fixed values
@ -79,7 +79,7 @@ df |>
### Sentinel values
Sometimes you'll hit the opposite problem where some conrete value actually represents as a missing value.
Sometimes you'll hit the opposite problem where some concrete value actually represents a missing value.
This typically arises in data generated by older software that doesn't have a proper way to represent missing values, so it must instead use some special value like 99 or -999.
If possible, handle this when reading in the data, for example, by using the `na` argument to `readr::read_csv()`.
@ -101,7 +101,7 @@ df |>
### NaN
Before we continue, there's one special type of missing value that you'll encounter from time-to-time: a `NaN` (pronounced "nan"), or **n**ot **a** **n**umber.
Before we continue, there's one special type of missing value that you'll encounter from time to time: a `NaN` (pronounced "nan"), or **n**ot **a** **n**umber.
It's not that important to know about because it generally behaves just like `NA`:
```{r}
@ -156,7 +156,7 @@ The following sections discuss some tools for moving between implicit and explic
You've already seen one tool that can make implicit missings explicit and vice versa: pivoting.
Making data wider can make implicit missing values explicit because every combination of the rows and new columns must have some value.
For example, if we pivot `stocks` to put the `quarter` in the columns, both missing become values explicit:
For example, if we pivot `stocks` to put the `quarter` in the columns, both missing values become explicit:
```{r}
stocks |>
@ -166,7 +166,7 @@ stocks |>
)
```
By default, making data longer preserves explicit missing values, but if they are structural missing values that only exist because the data is not tidy, you can drop them (make them implicit) by setting `drop_na = TRUE`.
By default, making data longer preserves explicit missing values, but if they are structurally missing values that only exist because the data is not tidy, you can drop them (make them implicit) by setting `values_drop_na = TRUE`.
See the examples in @sec-tidy-data for more details.
### Complete
@ -196,9 +196,9 @@ In that case, you can do manually what `complete()` does for you: create a data
### Joins
This brings us to another important way of revealing implicitly missing observations: joins.
Often you can only know that values are missing when from one dataset when you go to join it to another.
Often you can only know that values are missing from one dataset when you go to join it to another.
`dplyr::anti_join()` is particularly useful at revealing these values.
The following example shows how two `anti_join()`s reveals that we're missing information for four airports and 722 planes.
The following example shows how two `anti_join()`s reveal that we're missing information for four airports and 722 planes.
```{r}
library(nycflights13)
@ -246,7 +246,7 @@ health |> count(smoker, .drop = FALSE)
```
The same principle applies to ggplot2's discrete axes, which will also drop levels that don't have any values.
You can force them to display with by supplying `drop = FALSE` to the appropriate discrete axis:
You can force them to display by supplying `drop = FALSE` to the appropriate discrete axis:
```{r}
#| layout-ncol: 2
@ -283,7 +283,7 @@ health |>
```
We get some interesting results here because when summarizing an empty group, the summary functions are applied to zero-length vectors.
There's an important distinction between empty vectors, which have length 0, and missing values, which each have length 1.
There's an important distinction between empty vectors, which have length 0, and missing values, each of which has length 1.
```{r}
# A vector containing two missing values
@ -301,7 +301,7 @@ Here we see `mean(age)` returning `NaN` because `mean(age)` = `sum(age)/length(a
[^missing-values-1]: In other words, `min(c(x, y))` is always equal to `min(min(x), min(y)).`
A sometimes simpler approach is to perform the summary and then make the implicit missings explicit with `complete()`.
Sometimes a simpler approach is to perform the summary and then make the implicit missings explicit with `complete()`.
```{r}
health |>