Polish pipes chapter

This commit is contained in:
Hadley Wickham 2022-02-16 15:51:27 -06:00
parent 6376a68ebf
commit 1b2a1b4b35
1 changed files with 78 additions and 53 deletions

View File

@ -6,80 +6,105 @@ status("restructuring")
## Introduction
Pipes are a powerful tool for clearly expressing a sequence of multiple operations.
We briefly introduced them in the previous chapter but before going too much farther I wanted to explain a little more about how they work and give a splash of history.
### Prerequisites
The pipe `|>` is built into R itself so you don't need anything else 😄.
But we'll also discuss another historically important pipe, `%>%`, which is provided by the core tidyverse package magrittr.
```{r setup, message = FALSE}
library(tidyverse)
```
The pipe, `|>` is a powerful tool for clearly expressing a sequence of multiple operations.
We briefly introduced them in the previous chapter but before going too much farther I wanted to give a little more motivation, discuss another important pipe (`%>%`), and discuss one challenge of the pipe.
## Why use a pipe?
The point of the pipe is to help you write code in a way that is easier to read and understand.
Imagine you wanted to express the following sequence of actions as R code: find keys, unlock car, start car, drive to work, park.
You could write it as nested function calls:
Each individual dplyr function is quite simple, so to solve complex problems you'll typically need to combine multiple verbs together.
The end of the last chapter finished with a moderately complex pipe:
```{r, eval = FALSE}
park(drive(start_car(find("keys")), to = "work"))
flights |>
filter(!is.na(arr_delay), !is.na(tailnum)) |>
group_by(tailnum) |>
summarise(
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
```
But writing it out using with the pipe gives it a more natural and easier to read structure:
Even though this pipe has four steps, it quites easy to skim to get the main meaning: we start with flights, then filter, then group, then summarize.
What would happen if we didn't have the pipe?
We can still solve this same problem but we'd need to nest each function call inside the previous:
```{r, eval = FALSE}
find("keys") |>
start_car() |>
drive(to = "work") |>
park()
summarise(
group_by(
filter(
flights,
!is.na(arr_delay), !is.na(tailnum)
),
tailnum
),
delay = mean(arr_delay, na.rm = TRUE
),
n = n()
)
```
Behind the scenes, the pipe actually transforms your code to the first form.
In other words, `x |> f(y)` is equivalent to `f(x, y)`.
Or use a bunch of intermediate variables:
```{r, eval = FALSE}
flights1 <- filter(flights, !is.na(arr_delay), !is.na(tailnum))
flights2 <- group_by(flights1, tailnum)
flights3 <- summarise(flight2,
delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
```
While both of these forms have their uses, the pipe generally produces code that is easier to read and easier to write.
## magrittr and the `%>%` pipe
If you've been using the tidyverse for a while, you might be more familiar with `%>%` than `|>`.
`%>%` comes from the **magrittr** package by Stefan Milton Bache and has been available since 2014.
This pipe was so successful that in 2021 the base pipe, `|>`, added to R 4.1.0.
If you've been using the tidyverse for a while, you might have been be more familiar with the `%>%` pipe provided by the **magrittr** package by Stefan Milton Bache.
The magrittr package is included in the code the tidyverse, so you can use `%>%` whenever you use the tidyverse:
`|>` is inspired by `%>%`, and the tidyverse team was involved in its design.
`|>` offers fewer features than `%>%`, but we largely believe this to be a feature.
`%>%` was an experiment and included many speculative features that seemed like a good idea at the time, but in hindsight added too much complexity relative to their advantages.
The development of the base pipe gave an us opportunity to reset back to the most useful core.
```{r, message = FALSE}
library(tidyverse)
## Changing the argument
There is one feature that `%>%` has that `|>` currently lacks: a very easy way to change which argument you pass the object to --- you just put a `.` where you want the object on the left of the pipe to go.
Ironically this is particularly important for many base functions which were designed well before the pipe existed.
One particularly challenging example is extract a single column out of a data frame with `$`.
With `%>%` you can write the fairly straightforward:
```{r}
mtcars %>% .$cyl
mtcars %>%
group_by(cyl) %>%
summarise(n = n())
```
But the base pipe requires the rather cryptic:
For simple cases `|>` and `%>%` behave identically.
So why do we recommend the base pipe?
Firstly, because it's part of base R, it's always available for you to use, even when you're not using the tidyverse.
Secondly, the `|>` is quite a bit simpler than the magrittr pipe.
In the 7 years between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we honed in the core strength of the pipe, allowing the base implementation to jettison to estoeric and relatively unimportant features.
```{r}
mtcars |> (`$`)(cyl)
```
### Key differences
Fortunately, dplyr provides a way out of this common problem with `pull`:
If you haven't used `%>%` you can skip this section; if you have, read on to learn about the most important differences.
```{r}
mtcars |> pull(cyl)
```
- `%>%` allows you to use `.` as a placeholder to control how the object on the left is passed to the function on the right.
R 4.2.0 will bring a `_` as a placeholder with the additional restriction that it must be named.
magrittr offers a number of other variations on the pipe that you might want to learn about.
We don't teach them here because none of them has been sufficiently popular that you could reasonable expect a randomly chosen R user to recognize them.
- The base pipe `|>` doesn't support any of the more complex uses of `.` such as passing `.` to more than one argument, or the special behavior when used with `.`.
In R 4.2, the base pipe will gain its own placeholder, `_`.
Must be named.
Doesn't solve problem above, but helps out in lots of other places.
- The base pipe doesn't yet provide a convenient way to use `$` (and similar functions).
With magrittr, you can write:
Expect it to continue to evolve.
```{r}
mtcars %>% .$cyl
```
With the base pipe you instead need the rather cryptic:
```{r}
mtcars |> (`$`)(cyl)
```
Fortunately, you can instead use `dplyr::pull():`
```{r}
mtcars |> pull(cyl)
```
- When calling a function with no argument, you could drop the parenthesis, and write (e.g.) `x %>% ungroup`.
The parenthesis are always required with `|>`.
- Starting a pipe with `.`, like `. %>% group_by(x) %>% summarise(x)` would create a function rather than immediately performing the pipe.