Fix 4 typos in joins (#1431)

This commit is contained in:
Peter Baumgartner 2023-04-17 14:18:48 +02:00 committed by GitHub
parent cbcf1e0d8b
commit c31137b0c6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 5 deletions

View File

@ -269,7 +269,7 @@ flights2 |>
```
We get a lot of missing matches because our join is trying to use `tailnum` and `year` as a compound key.
Both `flights` and `planes` have a `year` column but they mean different things: `flights$year` is year the flight occurred and `planes$year` is the year the plane was built.
Both `flights` and `planes` have a `year` column but they mean different things: `flights$year` is the year the flight occurred and `planes$year` is the year the plane was built.
We only want to join on `tailnum` so we need to provide an explicit specification with `join_by()`:
```{r}
@ -627,7 +627,7 @@ df1 |>
If you are doing this deliberately, you can set `relationship = "many-to-many"`, as the warning suggests.
### Filtering joins {#sec-non-equi-joins}
### Filtering joins
The number of matches also determines the behavior of the filtering joins.
The semi-join keeps rows in `x` that have one or more matches in `y`, as in @fig-join-semi.
@ -664,7 +664,7 @@ knitr::include_graphics("diagrams/join/semi.png", dpi = 270)
knitr::include_graphics("diagrams/join/anti.png", dpi = 270)
```
## Non-equi joins
## Non-equi joins {#sec-non-equi-joins}
So far you've only seen equi-joins, joins where the rows match if the `x` key equals the `y` key.
Now we're going to relax that restriction and discuss other ways of determining if a pair of rows match.
@ -841,7 +841,7 @@ Overlap joins provide three helpers that use inequality joins to make it easier
Let's continue the birthday example to see how you might use them.
There's one problem with the strategy we used above: there's no party preceding the birthdays Jan 1-9.
So it might be better to to be explicit about the date ranges that each party spans, and make a special case for those early birthdays:
So it might be better to be explicit about the date ranges that each party spans, and make a special case for those early birthdays:
```{r}
parties <- tibble(
@ -854,7 +854,7 @@ parties
```
Hadley is hopelessly bad at data entry so he also wanted to check that the party periods don't overlap.
One way to do this is by using a self-join to check to if any start-end interval overlap with another:
One way to do this is by using a self-join to check if any start-end interval overlap with another:
```{r}
parties |>