diff --git a/diagrams/transform-join-types.png b/diagrams/transform-join-types.png new file mode 100644 index 0000000..ffa6a2c Binary files /dev/null and b/diagrams/transform-join-types.png differ diff --git a/diagrams/transform.graffle b/diagrams/transform.graffle index 0ba1937..b033cf9 100644 Binary files a/diagrams/transform.graffle and b/diagrams/transform.graffle differ diff --git a/transform.Rmd b/transform.Rmd index 8ee649f..ff531ef 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -1086,13 +1086,13 @@ The left, right and full joins are collectively known as __outer joins__. When a -------------------------------------------------------------------------------- -`base::merge()` can mimic all four types of mutating join. The advantages of the specific dplyr verbs is that they more clearly convey the intent of your code (the difference between the joins is really important but concealed in the arguments of `merge()`). dplyr's joins are also much faster than `merge()` and don't mess with the order of the rows. +`base::merge()` can mimic all four types of mutating join. The advantages of the specific dplyr verbs is that they more clearly convey the intent of your code (the difference between the joins is really important but concealed in the arguments of `merge()`), and are considerably faster. dplyr's joins also don't mess with the order of the rows. -------------------------------------------------------------------------------- #### New observations -Note that mutating joins are primarily used to add new variables, but they can also generate new "observations". If a match is not unique, a join will add all possible combinations (the Cartesian product) of the matching observations: +The mutating joins are primarily used to add new variables, but they can also generate new "observations". If a match is not unique, a join will add all possible combinations (the Cartesian product) of the matching observations: ```{r} df1 <- data_frame(x = c(1, 1, 2), y = 1:3) @@ -1146,7 +1146,7 @@ Filtering joins match obserations in the same way as mutating joins, but affect * `semi_join(x, y)` __keeps__ all observations in `x` that have a match in `y`. * `anti_join(x, y)` __drops__ all observations in `x` that have a match in `y`. -Semi joins are for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations: +Semi joins are useful for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations: ```{r} top_dest <- flights %>%