r4ds/workflow-pipes.Rmd

99 lines
3.9 KiB
Plaintext

# Workflow: Pipes {#workflow-pipes}
```{r, results = "asis", echo = FALSE}
status("restructuring")
```
## Introduction
Pipes are a powerful tool for clearly expressing a sequence of multiple operations.
We briefly introduced them in the previous chapter but before going too much farther I wanted to explain a little more about how they work and give a splash of history.
### Prerequisites
The pipe `|>` is built into R itself so you don't need anything else 😄.
But we'll also discuss another historically important pipe, `%>%`, which is provided by the core tidyverse package magrittr.
```{r setup, message = FALSE}
library(tidyverse)
```
## Why use a pipe?
The point of the pipe is to help you write code in a way that is easier to read and understand.
Imagine you wanted to express the following sequence of actions as R code: find keys, unlock car, start car, drive to work, park.
You could write it as nested function calls:
```{r, eval = FALSE}
park(drive(start_car(find("keys")), to = "work"))
```
But writing it out using with the pipe gives it a more natural and easier to read structure:
```{r, eval = FALSE}
find("keys") |>
start_car() |>
drive(to = "work") |>
park()
```
Behind the scenes, the pipe actually transforms your code to the first form.
In other words, `x |> f(y)` is equivalent to `f(x, y)`.
## magrittr and the `%>%` pipe
If you've been using the tidyverse for a while, you might be more familiar with `%>%` than `|>`.
`%>%` comes from the **magrittr** package by Stefan Milton Bache and has been available since 2014.
This pipe was so successful that in 2021 the base pipe, `|>`, added to R 4.1.0.
`|>` is inspired by `%>%`, and the tidyverse team was involved in its design.
`|>` offers fewer features than `%>%`, but we largely believe this to be a feature.
`%>%` was an experiment and included many speculative features that seemed like a good idea at the time, but in hindsight added too much complexity relative to their advantages.
The development of the base pipe gave an us opportunity to reset back to the most useful core.
## Changing the argument
There is one feature that `%>%` has that `|>` currently lacks: a very easy way to change which argument you pass the object to --- you just put a `.` where you want the object on the left of the pipe to go.
Ironically this is particularly important for many base functions which were designed well before the pipe existed.
One particularly challenging example is extract a single column out of a data frame with `$`.
With `%>%` you can write the fairly straightforward:
```{r}
mtcars %>% .$cyl
```
But the base pipe requires the rather cryptic:
```{r}
mtcars |> (`$`)(cyl)
```
Fortunately, dplyr provides a way out of this common problem with `pull`:
```{r}
mtcars |> pull(cyl)
```
magrittr offers a number of other variations on the pipe that you might want to learn about.
We don't teach them here because none of them has been sufficiently popular that you could reasonable expect a randomly chosen R user to recognize them.
In R 4.2, the base pipe will gain its own placeholder, `_`.
Must be named.
Doesn't solve problem above, but helps out in lots of other places.
Expect it to continue to evolve.
## When not to use the pipe
The pipe is such fun to use, it's easy to go overboard and use pipes when better alternatives exists.
Pipes are most useful for rewriting a fairly short linear sequence of operations.
I think you should reach for another tool when:
- Your pipes are longer than (say) ten steps.
In that case, create intermediate objects with meaningful names.
That will make debugging easier, because you can more easily check the intermediate results, and it makes it easier to understand your code, because the variable names can help communicate intent.
- You have multiple inputs or outputs.
If there isn't one primary object being transformed, but two or more objects being combined together, don't use the pipe.