Feedback from O'Reilly + style fixes
This commit is contained in:
parent
f0b19065c7
commit
19c89ebf64
|
@ -353,7 +353,8 @@ flights_dt |>
|
||||||
group_by(minute) |>
|
group_by(minute) |>
|
||||||
summarize(
|
summarize(
|
||||||
avg_delay = mean(dep_delay, na.rm = TRUE),
|
avg_delay = mean(dep_delay, na.rm = TRUE),
|
||||||
n = n()) |>
|
n = n()
|
||||||
|
) |>
|
||||||
ggplot(aes(minute, avg_delay)) +
|
ggplot(aes(minute, avg_delay)) +
|
||||||
geom_line()
|
geom_line()
|
||||||
```
|
```
|
||||||
|
@ -371,23 +372,30 @@ sched_dep <- flights_dt |>
|
||||||
group_by(minute) |>
|
group_by(minute) |>
|
||||||
summarize(
|
summarize(
|
||||||
avg_delay = mean(arr_delay, na.rm = TRUE),
|
avg_delay = mean(arr_delay, na.rm = TRUE),
|
||||||
n = n())
|
n = n()
|
||||||
|
)
|
||||||
|
|
||||||
ggplot(sched_dep, aes(minute, avg_delay)) +
|
ggplot(sched_dep, aes(minute, avg_delay)) +
|
||||||
geom_line()
|
geom_line()
|
||||||
```
|
```
|
||||||
|
|
||||||
So why do we see that pattern with the actual departure times?
|
So why do we see that pattern with the actual departure times?
|
||||||
Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times.
|
Well, like much data collected by humans, there's a strong bias towards flights leaving at "nice" departure times, as @fig-human-rounding shows.
|
||||||
Always be alert for this sort of pattern whenever you work with data that involves human judgement!
|
Always be alert for this sort of pattern whenever you work with data that involves human judgement!
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| label: fig-human-rounding
|
||||||
|
#| fig-cap: >
|
||||||
|
#| A frequency polygon showing the number of flights scheduled to
|
||||||
|
#| depart each hour. You can see a strong preference for round numbers
|
||||||
|
#| like 0 and 30 and generally for numbers that are a multiple of five.
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| A line plot with departure minute (0-60) on the x-axis and number of
|
#| A line plot with departure minute (0-60) on the x-axis and number of
|
||||||
#| flights (0-60000) on the y-axis. Most flights are scheduled to depart
|
#| flights (0-60000) on the y-axis. Most flights are scheduled to depart
|
||||||
#| on either the hour (~60,000) or the half hour (~35,000). Otherwise,
|
#| on either the hour (~60,000) or the half hour (~35,000). Otherwise,
|
||||||
#| all most all flights are scheduled to depart on multiples of five,
|
#| all most all flights are scheduled to depart on multiples of five,
|
||||||
#| with a few extra at 15, 45, and 55 minutes.
|
#| with a few extra at 15, 45, and 55 minutes.
|
||||||
|
#| echo: false
|
||||||
ggplot(sched_dep, aes(minute, n)) +
|
ggplot(sched_dep, aes(minute, n)) +
|
||||||
geom_line()
|
geom_line()
|
||||||
```
|
```
|
||||||
|
@ -443,7 +451,8 @@ flights_dt |>
|
||||||
|
|
||||||
### Modifying components
|
### Modifying components
|
||||||
|
|
||||||
You can also use each accessor function to modify the components of a date/time:
|
You can also use each accessor function to modify the components of a date/time.
|
||||||
|
This doesn't come up much in data analysis, but can be useful when cleaning data that has clearly incorrect dates.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
(datetime <- ymd_hms("2026-07-08 12:34:56"))
|
(datetime <- ymd_hms("2026-07-08 12:34:56"))
|
||||||
|
@ -490,7 +499,7 @@ update(ymd("2023-02-01"), hour = 400)
|
||||||
|
|
||||||
6. What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar?
|
6. What makes the distribution of `diamonds$carat` and `flights$sched_dep_time` similar?
|
||||||
|
|
||||||
7. Confirm my hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
|
7. Confirm our hypothesis that the early departures of flights in minutes 20-30 and 50-60 are caused by scheduled flights that leave early.
|
||||||
Hint: create a binary variable that tells you whether or not a flight was delayed.
|
Hint: create a binary variable that tells you whether or not a flight was delayed.
|
||||||
|
|
||||||
## Time spans
|
## Time spans
|
||||||
|
|
Loading…
Reference in New Issue