Move more from missing values; fix build failure

This commit is contained in:
Hadley Wickham 2022-03-21 10:39:43 -05:00
parent 66675a2600
commit 005969424e
2 changed files with 7 additions and 26 deletions

View File

@ -12,7 +12,8 @@ You'll find logical vectors directly in data relatively rarely, but despite that
We'll begin with the most common way of creating logical vectors: numeric comparisons.
Then we'll talk about using Boolean algebra to combine different logical vectors, and some useful summaries for logical vectors.
We'll finish off with some other tool for making conditional changes
We'll finish off with some other tool for making conditional changes.
Along the way, you'll also learn a little more about working with missing values, `NA`.
### Prerequisites
@ -269,6 +270,10 @@ Similar reasoning applies with `NA & FALSE`.
### Exercises
1. Find all flights where `arr_delay` is missing but `dep_delay` is not. Find all flights where neither `arr_time` nor `sched_arr_time` are missing, but `arr_delay` is.
2. How many flights have a missing `dep_time`? What other variables are missing? What might these rows represent?
3. How could you use `arrange()` to sort all missing values to the start? (Hint: use `!is.na()`).
4. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`).
5. Look at the number of cancelled flights per day. Is there a pattern? Is the proportion of cancelled flights related to the average delay?
## Summaries
@ -416,3 +421,4 @@ df |> filter(cumall(!(balance < 0)))
###
##

View File

@ -18,31 +18,6 @@ Missing topics:
- `coalesce()` and `na_if()`
## Basics
### Missing values {#missing-values-filter}
If you want to determine if a value is missing, use `is.na()`:
```{r}
is.na(x)
```
### Exercises
1. How many flights have a missing `dep_time`?
What other variables are missing?
What might these rows represent?
2. How could you use `arrange()` to sort all missing values to the start?
(Hint: use `!is.na()`).
3. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`).
4. Look at the number of cancelled flights per day.
Is there a pattern?
Is the proportion of cancelled flights related to the average delay?
## Explicit vs implicit missing values {#missing-values-tidy}
Changing the representation of a dataset brings up an important subtlety of missing values.