117 lines
5.2 KiB
Plaintext
117 lines
5.2 KiB
Plaintext
# Workflow: Pipes {#workflow-pipes}
|
|
|
|
```{r, results = "asis", echo = FALSE}
|
|
status("complete")
|
|
```
|
|
|
|
The pipe, `|>`, is a powerful tool for clearly expressing a sequence of operations that transform an object.
|
|
We briefly introduced pipes in the previous chapter, but before going too much farther, we want to give a few more details and discuss `%>%`, a predecessor to `|>`.
|
|
|
|
To add the pipe to your code, we recommend using the build-in keyboard shortcut Ctrl/Cmd + Shift + M.
|
|
You'll need to make one change to your RStudio options to use `|>` instead of `%>%` as shown in Figure \@ref(fig:pipe-options); more on `%>%` shortly.
|
|
|
|
```{r pipe-options, out.width = NULL, echo = FALSE}
|
|
#| fig.cap: >
|
|
#| To insert `|>`, make sure the "Use native pipe" option is checked.
|
|
#| fig.alt: >
|
|
#| Screenshot showing the "Use native pipe operator" option which can
|
|
#| be found on the "Editing" panel of the "Code" options.
|
|
knitr::include_graphics("screenshots/rstudio-pipe-options.png")
|
|
```
|
|
|
|
## Why use a pipe?
|
|
|
|
Each individual dplyr verb is quite simple, so solving complex problems typically requires combining multiple verbs.
|
|
For example, the last chapter finished with a moderately complex pipe:
|
|
|
|
```{r, eval = FALSE}
|
|
flights |>
|
|
filter(!is.na(arr_delay), !is.na(tailnum)) |>
|
|
group_by(tailnum) |>
|
|
summarise(
|
|
delay = mean(arr_delay, na.rm = TRUE),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
Even though this pipe has four steps, it's easy to skim because the verbs come at the start of each line: start with the flights data, then filter, then group, then summarize.
|
|
|
|
What would happen if we didn't have the pipe?
|
|
We could nest each function call inside the previous call:
|
|
|
|
```{r, eval = FALSE}
|
|
summarise(
|
|
group_by(
|
|
filter(
|
|
flights,
|
|
!is.na(arr_delay), !is.na(tailnum)
|
|
),
|
|
tailnum
|
|
),
|
|
delay = mean(arr_delay, na.rm = TRUE
|
|
),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
Or we could use a bunch of intermediate variables:
|
|
|
|
```{r, eval = FALSE}
|
|
flights1 <- filter(flights, !is.na(arr_delay), !is.na(tailnum))
|
|
flights2 <- group_by(flights1, tailnum)
|
|
flights3 <- summarise(flight2,
|
|
delay = mean(arr_delay, na.rm = TRUE),
|
|
n = n()
|
|
)
|
|
```
|
|
|
|
While both of these forms have their time and place, the pipe generally produces data analysis code that's both easier to write and easier to read.
|
|
|
|
## magrittr and the `%>%` pipe
|
|
|
|
If you've been using the tidyverse for a while, you might be familiar with the `%>%` pipe provided by the **magrittr** package.
|
|
The magrittr package is included in the core tidyverse, so you can use `%>%` whenever you load the tidyverse:
|
|
|
|
```{r, message = FALSE}
|
|
library(tidyverse)
|
|
|
|
mtcars %>%
|
|
group_by(cyl) %>%
|
|
summarise(n = n())
|
|
```
|
|
|
|
For simple cases `|>` and `%>%` behave identically.
|
|
So why do we recommend the base pipe?
|
|
Firstly, because it's part of base R, it's always available for you to use, even when you're not using the tidyverse.
|
|
Secondly, `|>` is quite a bit simpler than `%>%`: in the time between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we gained a better understanding of the pipe.
|
|
This allowed the base implementation to jettison infrequently used and less important features.
|
|
|
|
## `|>` vs `%>%`
|
|
|
|
While `|>` and `%>%` behave identically for simple cases, there are a few important differences.
|
|
These are most likely to affect you if you're a long-term user of `%>%` who has taken advantage of some of the more advanced features.
|
|
But they're still good to know about even if you've never used `%>%` because you're likely to encounter some of them when reading wild-caught code.
|
|
|
|
- By default, the pipe passes the object on its left hand side to the first argument of the function on the right-hand side.
|
|
`%>%` allows you change the placement with a `.` placeholder.
|
|
For example, `x %>% f(1)` is equivalent to `f(x, 1)` but `x %>% f(1, .)` is equivalent to `f(1, x)`.
|
|
R 4.2.0 added a `_` placeholder to the base pipe, with one additional restriction: the argument has to be named.
|
|
For example, `x |> f(1, y = _)` is equivalent to `f(1, y = x)`.
|
|
|
|
- The `|>` placeholder is deliberately simple and can't replicate many features of the `%>%` placeholder: you can't pass it to multiple arguments, and it doesn't have any special behavior when the placeholder is used inside another function.
|
|
For example, `df %>% split(.$var)` is equivalent to `split(df, df$var)` and `df %>% {split(.$x, .$y)}` is equivalent to `split(df$x, df$y)`.
|
|
|
|
With `%>%` you can use `.` on the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in Chapter \@ref(vectors)), so you can extract a single column from a data frame with (e.g.) `mtcars %>% .$cyl`.
|
|
A future version of R may add similar support for `|>` and `_`.
|
|
For the special case of extracting a column out of a data frame, you can also use `dplyr::pull():`
|
|
|
|
```{r}
|
|
mtcars |> pull(cyl)
|
|
```
|
|
|
|
- `%>%` allows you to drop the parentheses when calling a function with no other arguments; `|>` always requires the parentheses.
|
|
|
|
- `%>%` allows you to start a pipe with `.` to create a function rather than immediately executing the pipe; this is not supported by the base pipe.
|
|
|
|
Luckily there's no need to commit entirely to one pipe or the other --- you can use the base pipe for the majority of cases where it's sufficient, and use the magrittr pipe when you really need its special features.
|