Mention distinct

This commit is contained in:
Hadley Wickham 2022-11-21 08:42:55 -06:00
parent bdc3555b9a
commit 1477dd6fd3
1 changed files with 24 additions and 3 deletions

View File

@ -96,6 +96,7 @@ Let's dive in!
The most important verbs that operate on rows are `filter()`, which changes which rows are present without changing their order, and `arrange()`, which changes the order of the rows without changing which are present.
Both functions only affect the rows, and the columns are left unchanged.
We'll also discuss `distinct()` which finds rows with unique values but unlike `arrange()` and `filter()` it can also optionally modify the columns.
### `filter()`
@ -197,6 +198,23 @@ flights |>
arrange(desc(arr_delay))
```
### `distinct()`
`distinct()` finds all the unique rows in a dataset, so in a technical sense, it primarily operates on the rows.
Most of the time, however, you'll want to the distinct combination of some variables, so you can also optionally supply column names:
```{r}
# This would remove any duplicate rows if there were any
flights |>
distinct()
# This finds all unique origin and destination pairs.
flights |>
distinct(origin, dest)
```
Note that if you want to find the number of duplicates, or rows that weren't duplicated, you're better off swapping `distinct()` for `count()` and then filtering as needed.
### Exercises
1. Find all flights that
@ -213,10 +231,12 @@ flights |>
3. Sort `flights` to find the fastest flights (Hint: try sorting by a calculation).
4. Which flights traveled the farthest?
Which traveled the shortest?
4. Was there a flight on every day of 2017?
5. Does it matter what order you used `filter()` and `arrange()` in if you're using both?
5. Which flights traveled the farthest distance?
Which traveled the least distance?
6. Does it matter what order you used `filter()` and `arrange()` in if you're using both?
Why/why not?
Think about the results and how much work the functions would have to do.
@ -224,6 +244,7 @@ flights |>
There are four important verbs that affect the columns without changing the rows: `mutate()`, `select()`, `rename()`, and `relocate()`.
`mutate()` creates new columns that are functions of the existing columns; `select()`, `rename()`, and `relocate()` change which columns are present, their names, or their positions.
We'll also discuss `pull()` since it allows you to get a column out of data frame.
### `mutate()` {#sec-mutate}