diff --git a/transform.Rmd b/transform.Rmd index efaa7c9..8c721f0 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -818,7 +818,7 @@ daily %>% `not_cancelled %>% count(tailnum, wt = distance)` (without using `count()`). -1. Our definition of cancelled flights (`!is.na(dep_delay) & !is.na(arr_delay)` +1. Our definition of cancelled flights (`is.na(dep_delay) | is.na(arr_delay)` ) is slightly suboptimal. Why? Which is the most important column? 1. Look at the number of cancelled flights per day. Is there a pattern?