Polishing pipes now that 4.2 is out

This commit is contained in:
Hadley Wickham 2022-04-27 08:01:00 -05:00
parent 064c056c29
commit 97b074509d
2 changed files with 19 additions and 23 deletions

View File

@ -26,6 +26,7 @@ status <- function(type) {
polishing = "should be readable but is currently undergoing final polishing",
restructuring = "is undergoing heavy restructuring and may be confusing or incomplete",
drafting = "is currently a dumping ground for ideas, and we don't recommend reading it",
complete = "is largely complete",
stop("Invalid `type`", call. = FALSE)
)
@ -33,7 +34,7 @@ status <- function(type) {
"::: {.rmdnote}\n",
"You are reading the work-in-progress second edition of R for Data Science. ",
"This chapter ", status, ". ",
"You can find the polished first edition at <https://r4ds.had.co.nz>.\n",
"You can find the complete first edition at <https://r4ds.had.co.nz>.\n",
":::\n"
))
}

View File

@ -1,14 +1,14 @@
# Workflow: Pipes {#workflow-pipes}
```{r, results = "asis", echo = FALSE}
status("restructuring")
status("complete")
```
The pipe, `|>`, is a powerful tool for clearly expressing a sequence of operations that transform an object.
We briefly introduced pipes in the previous chapter but before going too much farther I wanted to give a few more details and discuss, `%>%`, an predecessor to `|>`.
To add the pipe to your code, we recommend using the build-in keyboard shortcut Ctrl/Cmd + Shift + M.
You'll need to make one change to your RStudio options to use `|>` instead of `%>%` as shown in Figure \@ref(fig:pipe-options); more `%>%` that next.
You'll need to make one change to your RStudio options to use `|>` instead of `%>%` as shown in Figure \@ref(fig:pipe-options); more on `%>%` shortly.
```{r pipe-options, out.width = NULL, echo = FALSE}
#| fig.cap: >
@ -21,7 +21,7 @@ knitr::include_graphics("screenshots/rstudio-pipe-options.png")
## Why use a pipe?
Each individual dplyr function is quite simple so solving complex problems typically require multiple verbs together.
Each individual dplyr verb is quite simple so solving complex problems typically requires combining multiple verbs.
For example, the last chapter finished with a moderately complex pipe:
```{r, eval = FALSE}
@ -34,7 +34,7 @@ flights |>
)
```
Even though this pipe has four steps, it's quite easy to skim because the verbs come at the start of each line: we start with the flights data, then filter, then group, then summarize.
Even though this pipe has four steps, it's easy to skim because the verbs come at the start of each line: we start with the flights data, then filter, then group, then summarize.
What would happen if we didn't have the pipe?
We could nest each function call inside the previous call:
@ -65,11 +65,11 @@ flights3 <- summarise(flight2,
)
```
While both of these forms have their place and time, the pipe generally produces code that is easier to read and easier to write.
While both of these forms have their time and place, the pipe generally produces data analysis code that's both easier to write and easier to read.
## magrittr and the `%>%` pipe
If you've been using the tidyverse for a while, you might be more familiar with the `%>%` pipe provided by the **magrittr** package.
If you've been using the tidyverse for a while, you might be familiar with the `%>%` pipe provided by the **magrittr** package.
The magrittr package is included in the code the tidyverse, so you can use `%>%` whenever you load the tidyverse:
```{r, message = FALSE}
@ -86,34 +86,29 @@ Firstly, because it's part of base R, it's always available for you to use, even
Secondly, `|>` is quite a bit simpler than `%>%`: in the time between the invention of `%>%` in 2014 and the inclusion of `|>` in R 4.1.0 in 2021, we gained a better understanding of the pipe.
This allowed the base implementation to jettison infrequently used and less important features.
## Base pipe vs magrittr pipe
## `|>` vs `%>%`
While `|>` and `%>%` behave identically for simple cases there are a few important differences.
These are most likely to affect you if you're a long-term `%>%` user who has taken advantage of some of the more advanced features.
But they're good to know about even if you've never used `%>%`, because you're likely to encounter some of them when reading wild-caught code.
These are most likely to affect you if you're a long-term user of `%>%` who has taken advantage of some of the more advanced features.
But they're still good to know about even if you've never used `%>%` because you're likely to encounter some of them when reading wild-caught code.
- By default, the pipe passes the object on its left hand side to the first argument of the function on the right-hand side.
`%>%` allows you change the placement a `.` placeholder.
`%>%` allows you change the placement with a `.` placeholder.
For example, `x %>% f(1)` is equivalent to `f(x, 1)` but `x %>% f(1, .)` is equivalent to `f(1, x)`.
R 4.2.0 will bring a `_` as a placeholder to the base pipe, with one additional restriction: the argument has to be named.
R 4.2.0 added a `_` placeholder to the base pipe, with one additional restriction: the argument has to be named.
For example, `x |> f(1, y = _)` is equivalent to `f(1, y = x)`.
- The `|>` placeholder is deliberately simple and can't replicate many features of the `%>%` placeholder: you can't pass it to multiple arguments, and it doesn't have any special behavior when the placeholder is used inside another function (i.e. `df %>% split(.$var)` is equivalent to `split(df, df$var)`.
- You can also use both `.` and `_` on the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in Chapter \@ref(vectors)):
``` r
mtcars %>% .$cyl
mtcars |> _$cyl
```
- The `|>` placeholder is deliberately simple and can't replicate many features of the `%>%` placeholder: you can't pass it to multiple arguments, and it doesn't have any special behavior when the placeholder is used inside another function.
For example, `df %>% split(.$var)` is equivalent to `split(df, df$var)` and `df %>% {split(.$x, .$y)}` is equivalent to `split(df$x, df$y)`.
You can use `.` the left-hand side of operators like `$`, `[[`, `[` (which you'll learn about in Chapter \@ref(vectors)), so you can extract a single column from a data frame with (e.g.) `mtcars %>% .$cyl`.
A future version of R may add similar support for `|>` and `_`.
For the special case of extracting a column out of a data frame, you can also use `dplyr::pull():`
```{r}
mtcars |> pull(cyl)
```
- `%>%` allowed you to drop the parentheses when calling a function with no other arguments; `|>` always requires the parentheses.
- `%>%` allows you to drop the parentheses when calling a function with no other arguments; `|>` always requires the parentheses.
- `%>%` allowed you to starting a pipe with `.` to create a function rather than immediately executing the pipe; this is not supported by the base pipe.
- `%>%` allows you to start a pipe with `.` to create a function rather than immediately executing the pipe; this is not supported by the base pipe.