Mention distinct

This commit is contained in:
Hadley Wickham 2022-11-21 08:42:55 -06:00
parent bdc3555b9a
commit 1477dd6fd3
1 changed files with 24 additions and 3 deletions

View File

@ -96,6 +96,7 @@ Let's dive in!
The most important verbs that operate on rows are `filter()`, which changes which rows are present without changing their order, and `arrange()`, which changes the order of the rows without changing which are present. The most important verbs that operate on rows are `filter()`, which changes which rows are present without changing their order, and `arrange()`, which changes the order of the rows without changing which are present.
Both functions only affect the rows, and the columns are left unchanged. Both functions only affect the rows, and the columns are left unchanged.
We'll also discuss `distinct()` which finds rows with unique values but unlike `arrange()` and `filter()` it can also optionally modify the columns.
### `filter()` ### `filter()`
@ -197,6 +198,23 @@ flights |>
arrange(desc(arr_delay)) arrange(desc(arr_delay))
``` ```
### `distinct()`
`distinct()` finds all the unique rows in a dataset, so in a technical sense, it primarily operates on the rows.
Most of the time, however, you'll want to the distinct combination of some variables, so you can also optionally supply column names:
```{r}
# This would remove any duplicate rows if there were any
flights |>
distinct()
# This finds all unique origin and destination pairs.
flights |>
distinct(origin, dest)
```
Note that if you want to find the number of duplicates, or rows that weren't duplicated, you're better off swapping `distinct()` for `count()` and then filtering as needed.
### Exercises ### Exercises
1. Find all flights that 1. Find all flights that
@ -213,10 +231,12 @@ flights |>
3. Sort `flights` to find the fastest flights (Hint: try sorting by a calculation). 3. Sort `flights` to find the fastest flights (Hint: try sorting by a calculation).
4. Which flights traveled the farthest? 4. Was there a flight on every day of 2017?
Which traveled the shortest?
5. Does it matter what order you used `filter()` and `arrange()` in if you're using both? 5. Which flights traveled the farthest distance?
Which traveled the least distance?
6. Does it matter what order you used `filter()` and `arrange()` in if you're using both?
Why/why not? Why/why not?
Think about the results and how much work the functions would have to do. Think about the results and how much work the functions would have to do.
@ -224,6 +244,7 @@ flights |>
There are four important verbs that affect the columns without changing the rows: `mutate()`, `select()`, `rename()`, and `relocate()`. There are four important verbs that affect the columns without changing the rows: `mutate()`, `select()`, `rename()`, and `relocate()`.
`mutate()` creates new columns that are functions of the existing columns; `select()`, `rename()`, and `relocate()` change which columns are present, their names, or their positions. `mutate()` creates new columns that are functions of the existing columns; `select()`, `rename()`, and `relocate()` change which columns are present, their names, or their positions.
We'll also discuss `pull()` since it allows you to get a column out of data frame.
### `mutate()` {#sec-mutate} ### `mutate()` {#sec-mutate}