From c31137b0c6aea0d4acac581762c7f09ffbb59c4c Mon Sep 17 00:00:00 2001 From: Peter Baumgartner Date: Mon, 17 Apr 2023 14:18:48 +0200 Subject: [PATCH] Fix 4 typos in joins (#1431) --- joins.qmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/joins.qmd b/joins.qmd index cf4ab4d..bf778e5 100644 --- a/joins.qmd +++ b/joins.qmd @@ -269,7 +269,7 @@ flights2 |> ``` We get a lot of missing matches because our join is trying to use `tailnum` and `year` as a compound key. -Both `flights` and `planes` have a `year` column but they mean different things: `flights$year` is year the flight occurred and `planes$year` is the year the plane was built. +Both `flights` and `planes` have a `year` column but they mean different things: `flights$year` is the year the flight occurred and `planes$year` is the year the plane was built. We only want to join on `tailnum` so we need to provide an explicit specification with `join_by()`: ```{r} @@ -627,7 +627,7 @@ df1 |> If you are doing this deliberately, you can set `relationship = "many-to-many"`, as the warning suggests. -### Filtering joins {#sec-non-equi-joins} +### Filtering joins The number of matches also determines the behavior of the filtering joins. The semi-join keeps rows in `x` that have one or more matches in `y`, as in @fig-join-semi. @@ -664,7 +664,7 @@ knitr::include_graphics("diagrams/join/semi.png", dpi = 270) knitr::include_graphics("diagrams/join/anti.png", dpi = 270) ``` -## Non-equi joins +## Non-equi joins {#sec-non-equi-joins} So far you've only seen equi-joins, joins where the rows match if the `x` key equals the `y` key. Now we're going to relax that restriction and discuss other ways of determining if a pair of rows match. @@ -841,7 +841,7 @@ Overlap joins provide three helpers that use inequality joins to make it easier Let's continue the birthday example to see how you might use them. There's one problem with the strategy we used above: there's no party preceding the birthdays Jan 1-9. -So it might be better to to be explicit about the date ranges that each party spans, and make a special case for those early birthdays: +So it might be better to be explicit about the date ranges that each party spans, and make a special case for those early birthdays: ```{r} parties <- tibble( @@ -854,7 +854,7 @@ parties ``` Hadley is hopelessly bad at data entry so he also wanted to check that the party periods don't overlap. -One way to do this is by using a self-join to check to if any start-end interval overlap with another: +One way to do this is by using a self-join to check if any start-end interval overlap with another: ```{r} parties |>