diff --git a/transform.Rmd b/transform.Rmd index 5028b2a..3f45a07 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -125,7 +125,7 @@ filter(flights, !(arr_delay > 120 | dep_delay > 120)) filter(flights, arr_delay <= 120, dep_delay <= 120) ``` -As well as `&` and `|`, R also has `&&` and `||`. Don't use them here! You'll when you should use them in [conditional execution]. +As well as `&` and `|`, R also has `&&` and `||`. Don't use them here! You'll learn when you should use them in [conditional execution]. Sometimes you want to find all rows after the first `TRUE`, or all rows until the first `FALSE`. The window functions `cumany()` and `cumall()` allow you to find these values: @@ -309,7 +309,7 @@ select(flights, time_hour, air_time, everything()) vars <- c("year", "month", "day", "dep_delay", "arr_delay") ``` -1. Does the result of running the following code suprise you? How do the +1. Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default? ```{r, eval = FALSE} @@ -784,7 +784,7 @@ daily <- group_by(flights, year, month, day) (per_year <- summarise(per_month, flights = sum(flights))) ``` -Be careful when progressively rolling up summaries: it's OK for sums and counts, but you need to think about weighting means and variances, and it's not possible to do it exactly for rank-based statistics like the median. In otherwords, the sum of groupwise sums is the overall sum, but the median of groupwise medians is not the overall median. +Be careful when progressively rolling up summaries: it's OK for sums and counts, but you need to think about weighting means and variances, and it's not possible to do it exactly for rank-based statistics like the median. In other words, the sum of groupwise sums is the overall sum, but the median of groupwise medians is not the overall median. ### Ungrouping @@ -814,7 +814,7 @@ daily %>% Which is more important: arrival delay or departure delay? 1. Our definition of cancelled flights (`!is.na(dep_delay) & !is.na(arr_delay)` - ) is slightly sup-optimal. Why? Which is the most important column? + ) is slightly suboptimal. Why? Which is the most important column? 1. Look at the number of cancelled flights per day. Is there a pattern? Is the proportion of cancelled flights related to the average delay? @@ -874,7 +874,7 @@ Functions that work most naturally in grouped mutates and filters are known as 1. Delays are typically temporally correlated: even once the problem that caused the initial delay has been resolved, later flights are delayed to allow earlier flights to leave. Using `lag()` explore how the delay - of a flight is related to the delay of the immediately preceeding flight. + of a flight is related to the delay of the immediately preceding flight. 1. Look at each destination. Can you find flights that are suspiciously fast? (i.e. flights that represent a potential data entry error). Compute