Join diagram tweaking.

Add diagrams for filtering joins.
This commit is contained in:
hadley 2016-01-14 09:12:14 -06:00
parent b41d39feba
commit 61b6f0d934
9 changed files with 19 additions and 1 deletions

BIN
diagrams/join-anti.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 29 KiB

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

After

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 32 KiB

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 116 KiB

After

Width:  |  Height:  |  Size: 118 KiB

BIN
diagrams/join-semi-many.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

BIN
diagrams/join-semi.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

View File

@ -388,7 +388,25 @@ Instead you can use a semi-join, which connects the two tables like a mutating j
flights %>% semi_join(top_dest)
```
The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match, and are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`:
Graphically, a semi-join looks like this:
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("diagrams/join-semi.png")
```
Only the existence of a match is important; it doesn't match what observation is matched. This means that filtering joins never duplicate rows like mutating joins do:
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("diagrams/join-semi-many.png")
```
The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match:
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("diagrams/join-anti.png")
```
Anti-joins are are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`:
```{r}
flights %>%