Feedback from O'Reilly + style fixes

2022-11-23 11:55:08 -06:00 · 2022-11-23 11:55:08 -06:00 · 19c89ebf64
parent f0b19065c7
commit 19c89ebf64
1 changed files with 18 additions and 9 deletions
--- a/datetimes.qmd
+++ b/datetimes.qmd
@ -334,7 +334,7 @@ We can use `wday()` to see that more flights depart during the week than on the
 flights_dt |> 
  mutate(wday = wday(dep_time, label = TRUE)) |> 
  ggplot(aes(x = wday)) +
-    geom_bar()
+  geom_bar()
 ```
 There's an interesting pattern if we look at the average departure delay by minute within the hour.
@ -353,9 +353,10 @@ flights_dt |>
  group_by(minute) |> 
  summarize(
    avg_delay = mean(dep_delay, na.rm = TRUE),
-    n = n()) |> 
+    n = n()
  ) |> 
  ggplot(aes(minute, avg_delay)) +
-    geom_line()
+  geom_line()
 ```
 Interestingly, if we look at the *scheduled* departure time we don't see such a strong pattern:
@ -371,23 +372,30 @@ sched_dep <- flights_dt |>
  group_by(minute) |> 
  summarize(
    avg_delay = mean(arr_delay, na.rm = TRUE),
-    n = n())
+    n = n()
  )
 ggplot(sched_dep, aes(minute, avg_delay)) +
  geom_line()
 ```
 So why do we see that pattern with the actual departure times?
-Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times.
+Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times, as @fig-human-rounding shows.
 Always be alert for this sort of pattern whenever you work with data that involves human judgement!
 ```{r}
 #| label: fig-human-rounding
 #| fig-cap: >
 #|   A frequency polygon showing the number of flights scheduled to 
 #|   depart each hour. You can see a strong preference for round numbers
 #|   like 0 and 30 and generally for numbers that are a multiple of five.
 #| fig-alt: >
 #|   A line plot with departure minute (0-60) on the x-axis and number of
 #|   flights (0-60000) on the y-axis. Most flights are scheduled to depart
 #|   on either the hour (~60,000) or the half hour (~35,000). Otherwise,
 #|   all most all flights are scheduled to depart on multiples of five, 
 #|   with a few extra at 15, 45, and 55 minutes.
 #| echo: false
 ggplot(sched_dep, aes(minute, n)) +
  geom_line()
 ```
@ -421,7 +429,7 @@ You can use rounding to show the distribution of flights across the course of a
 flights_dt |> 
  mutate(dep_hour = dep_time - floor_date(dep_time, "day")) |> 
  ggplot(aes(dep_hour)) +
-    geom_freqpoly(binwidth = 60 * 30)
+  geom_freqpoly(binwidth = 60 * 30)
 ```
 Computing the difference between a pair of date-times yields a difftime (more on that in @sec-intervals).
@ -438,12 +446,13 @@ We can convert that to an `hms` object to get a more useful x-axis:
 flights_dt |> 
  mutate(dep_hour = hms::as_hms(dep_time - floor_date(dep_time, "day"))) |> 
  ggplot(aes(dep_hour)) +
-    geom_freqpoly(binwidth = 60 * 30)
+  geom_freqpoly(binwidth = 60 * 30)
 ```
 ### Modifying components
-You can also use each accessor function to modify the components of a date/time:
+You can also use each accessor function to modify the components of a date/time.
 This doesn't come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.
 ```{r}
 (datetime <- ymd_hms("2026-07-08 12:34:56"))
@ -490,7 +499,7 @@ update(ymd("2023-02-01"), hour = 400)
 6.  What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar?
-7.  Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
+7.  Confirm our hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
    Hint: create a binary variable that tells you whether or not a flight was delayed.
 ## Time spans