Thinking about how to better convey join types

This commit is contained in:
hadley 2016-01-05 10:14:47 -06:00
parent 94d64a3fc3
commit f37fd2033e
3 changed files with 3 additions and 3 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

Binary file not shown.

View File

@ -1086,13 +1086,13 @@ The left, right and full joins are collectively known as __outer joins__. When a
--------------------------------------------------------------------------------
`base::merge()` can mimic all four types of mutating join. The advantages of the specific dplyr verbs is that they more clearly convey the intent of your code (the difference between the joins is really important but concealed in the arguments of `merge()`). dplyr's joins are also much faster than `merge()` and don't mess with the order of the rows.
`base::merge()` can mimic all four types of mutating join. The advantages of the specific dplyr verbs is that they more clearly convey the intent of your code (the difference between the joins is really important but concealed in the arguments of `merge()`), and are considerably faster. dplyr's joins also don't mess with the order of the rows.
--------------------------------------------------------------------------------
#### New observations
Note that mutating joins are primarily used to add new variables, but they can also generate new "observations". If a match is not unique, a join will add all possible combinations (the Cartesian product) of the matching observations:
The mutating joins are primarily used to add new variables, but they can also generate new "observations". If a match is not unique, a join will add all possible combinations (the Cartesian product) of the matching observations:
```{r}
df1 <- data_frame(x = c(1, 1, 2), y = 1:3)
@ -1146,7 +1146,7 @@ Filtering joins match obserations in the same way as mutating joins, but affect
* `semi_join(x, y)` __keeps__ all observations in `x` that have a match in `y`.
* `anti_join(x, y)` __drops__ all observations in `x` that have a match in `y`.
Semi joins are for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations:
Semi joins are useful for matching filtered summary tables back to the original rows. For example, imagine you've found the top ten most popular destinations:
```{r}
top_dest <- flights %>%