diff --git a/logicals.Rmd b/logicals.Rmd index 7a2bd7e..cdf1af4 100644 --- a/logicals.Rmd +++ b/logicals.Rmd @@ -12,7 +12,8 @@ You'll find logical vectors directly in data relatively rarely, but despite that We'll begin with the most common way of creating logical vectors: numeric comparisons. Then we'll talk about using Boolean algebra to combine different logical vectors, and some useful summaries for logical vectors. -We'll finish off with some other tool for making conditional changes +We'll finish off with some other tool for making conditional changes. +Along the way, you'll also learn a little more about working with missing values, `NA`. ### Prerequisites @@ -269,6 +270,10 @@ Similar reasoning applies with `NA & FALSE`. ### Exercises 1. Find all flights where `arr_delay` is missing but `dep_delay` is not. Find all flights where neither `arr_time` nor `sched_arr_time` are missing, but `arr_delay` is. +2. How many flights have a missing `dep_time`? What other variables are missing? What might these rows represent? +3. How could you use `arrange()` to sort all missing values to the start? (Hint: use `!is.na()`). +4. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`). +5. Look at the number of cancelled flights per day. Is there a pattern? Is the proportion of cancelled flights related to the average delay? ## Summaries @@ -416,3 +421,4 @@ df |> filter(cumall(!(balance < 0))) ### ## + diff --git a/missing-values.Rmd b/missing-values.Rmd index a74fc95..a70bf74 100644 --- a/missing-values.Rmd +++ b/missing-values.Rmd @@ -18,31 +18,6 @@ Missing topics: - `coalesce()` and `na_if()` -## Basics - -### Missing values {#missing-values-filter} - -If you want to determine if a value is missing, use `is.na()`: - -```{r} -is.na(x) -``` - -### Exercises - -1. How many flights have a missing `dep_time`? - What other variables are missing? - What might these rows represent? - -2. How could you use `arrange()` to sort all missing values to the start? - (Hint: use `!is.na()`). - -3. Come up with another approach that will give you the same output as `not_cancelled |> count(dest)` and `not_cancelled |> count(tailnum, wt = distance)` (without using `count()`). - -4. Look at the number of cancelled flights per day. - Is there a pattern? - Is the proportion of cancelled flights related to the average delay? - ## Explicit vs implicit missing values {#missing-values-tidy} Changing the representation of a dataset brings up an important subtlety of missing values.