diff --git a/diagrams/join-anti.png b/diagrams/join-anti.png new file mode 100644 index 0000000..0d30477 Binary files /dev/null and b/diagrams/join-anti.png differ diff --git a/diagrams/join-inner.png b/diagrams/join-inner.png index 4a18e8f..18e996d 100644 Binary files a/diagrams/join-inner.png and b/diagrams/join-inner.png differ diff --git a/diagrams/join-many-to-many.png b/diagrams/join-many-to-many.png index b00deaf..c8bb150 100644 Binary files a/diagrams/join-many-to-many.png and b/diagrams/join-many-to-many.png differ diff --git a/diagrams/join-one-to-many.png b/diagrams/join-one-to-many.png index 8d9bd44..6ddab8b 100644 Binary files a/diagrams/join-one-to-many.png and b/diagrams/join-one-to-many.png differ diff --git a/diagrams/join-outer.png b/diagrams/join-outer.png index fc9c361..946a696 100644 Binary files a/diagrams/join-outer.png and b/diagrams/join-outer.png differ diff --git a/diagrams/join-semi-many.png b/diagrams/join-semi-many.png new file mode 100644 index 0000000..5ddd109 Binary files /dev/null and b/diagrams/join-semi-many.png differ diff --git a/diagrams/join-semi.png b/diagrams/join-semi.png new file mode 100644 index 0000000..ded6df1 Binary files /dev/null and b/diagrams/join-semi.png differ diff --git a/diagrams/join.graffle b/diagrams/join.graffle index e2924ce..4159281 100644 Binary files a/diagrams/join.graffle and b/diagrams/join.graffle differ diff --git a/relational-data.Rmd b/relational-data.Rmd index 2a7fc6d..31a5cee 100644 --- a/relational-data.Rmd +++ b/relational-data.Rmd @@ -388,7 +388,25 @@ Instead you can use a semi-join, which connects the two tables like a mutating j flights %>% semi_join(top_dest) ``` -The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match, and are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`: +Graphically, a semi-join looks like this: + +```{r, echo = FALSE, out.width = "50%"} +knitr::include_graphics("diagrams/join-semi.png") +``` + +Only the existence of a match is important; it doesn't match what observation is matched. This means that filtering joins never duplicate rows like mutating joins do: + +```{r, echo = FALSE, out.width = "50%"} +knitr::include_graphics("diagrams/join-semi-many.png") +``` + +The inverse of a semi-join is an anti-join. An anti-join keeps the rows that _don't_ have a match: + +```{r, echo = FALSE, out.width = "50%"} +knitr::include_graphics("diagrams/join-anti.png") +``` + +Anti-joins are are useful for diagnosing join mismatches. For example, when connecting `flights` and `planes`, you might be interested to know that there are many `flights` that don't have a match in `planes`: ```{r} flights %>%