Pipe tweaks

This commit is contained in:
hadley 2016-08-09 15:16:01 -05:00
parent edc498f735
commit ae5764e3c7
1 changed files with 18 additions and 20 deletions

View File

@ -2,24 +2,21 @@
## Introduction
Pipes let you transform the way you call deeply nested functions. Using a pipe doesn't affect what the code does; behind the scenes it is run in (almost) the exact same way. What the pipe does is change how _you_ write, and read, code.
You've been using the pipe for a while now, so you already understand the basics. The point of this chapter is to explore the pipe in more detail. You'll learn the alternatives that the pipe replaces, and the pros and cons of the pipe. Importantly, you'll also learn situations in which you should avoid the pipe.
The pipe, `%>%`, comes from the __magrittr__ package by Stefan Milton Bache. This package provides a handful of other helpful tools if you explicitly load it. We'll explore some of those tools to close out the chapter.
You've been using the pipe for a while now, so you already understand the basics. Pipes transform the way you invoke deeply nested functions. Using a pipe doesn't affect what the code does; behind the scenes it is run in (almost) the exact same way. What the pipe does is change how _you_ write, and read, code. The point of this chapter is to explore the pipe in more detail. You'll learn the alternatives to pipe replaces, and their pro and cons. Importantly, you'll also learn the cases where you should avoid the pipe.
### Prerequisites
This chapter focusses on `%>%` which is normally loaded for you by packages in the tidyverse. Here we'll focus on it alone, so we'll make it available directly from magrittr. We'll also extract the `diamonds` dataset out of ggplot2 to use in some examples.
The pipe, `%>%`, comes from the __magrittr__ package by Stefan Milton Bache. Packages in the tidyverse load `%>%` for you automatically, so you don't usually explicitly load magrittr. Here, however, we're focussing on piping, and we aren't loading any other packages, so we'll need to load it explicitly. At the end of the chapter, we'll explore some other tool also provided by magrittr.
```{r setup}
library(magrittr)
diamonds <- ggplot2::diamonds
```
## Piping alternatives
The point of the pipe is to help you write code in a way that easier to read and understand. To see why the pipe is so useful, we're going to explore a number of ways of writing the same code. Let's use code to tell a story about a little bunny named Foo Foo:
The point of the pipe is to help you write code in a way that easier to read and understand. To see why the pipe is so useful, we're going to explore a number of ways of writing the same code.
Let's use code to tell a story about a little bunny named Foo Foo:
> Little bunny Foo Foo
> Went hopping through the forest
@ -39,7 +36,7 @@ And we'll use a function for each key verb: `hop()`, `scoop()`, and `bop()`. Usi
1. Compose functions.
1. Use the pipe.
We'll work through each approach, showing you the code and talking about the advantages and disadvantages. Note that these are made up functions; please don't expect this code to do something.
We'll work through each approach, showing you the code and talking about the advantages and disadvantages. These are made up functions; please don't expect this code to actually work!
### Intermediate steps
@ -51,11 +48,12 @@ foo_foo_2 <- scoop(foo_foo_1, up = field_mice)
foo_foo_3 <- bop(foo_foo_2, on = head)
```
The main downside of this form is that it forces you to name each intermediate element. If there are natural names, this form feels natural, and you should use it. But in this example, there aren't natural names, and we're adding numeric suffixes just to make the names unique. That leads to two problems: the code is cluttered with unimportant names, and you have to carefully increment the suffix on each line. Whenever I write code like this, I invariably use the wrong number on one line and then spend 10 minutes scratching my head and trying to figure out what went wrong with my code.
The main downside of this form is that it forces you to name each intermediate element. If there are natural names, this is a good idea, and you should use it. But many times, like this in this example, there aren't natural names, and you add numeric suffixes just to make the names unique. That leads to two problems: the code is cluttered with unimportant names, and you have to carefully increment the suffix on each line. Whenever I write code like this, I invariably use the wrong number on one line and then spend 10 minutes scratching my head and trying to figure out what went wrong with my code.
You may worry that this form creates many intermediate copies of your data and takes up a lot of memory, but that's not necessary. First, worrying about memory is not a useful way to spend your time: worry about it when it becomes a problem (i.e. you run out of memory), not before. Second, R isn't stupid: if you're working with data frames, R will share columns where possible. Let's take a look at an actual data manipulation pipeline where we add a new column to the `diamonds` dataset from ggplot2:
You may also worry that this form creates many copies of your data and takes up a lot of memory. Suprisingly, that's not the case. But first note that worrying about memory is not a useful way to spend your time: worry about it when it becomes a problem (i.e. you run out of memory), not before. Second, R isn't stupid: if you're working with data frames, R will share columns where possible. Let's take a look at an actual data manipulation pipeline where we add a new column to `ggplot2::diamonds`:
```{r}
diamonds <- ggplot2::diamonds
diamonds2 <- dplyr::mutate(diamonds, price_per_carat = price / carat)
pryr::object_size(diamonds)
@ -98,7 +96,7 @@ This is less typing (and less thinking), so you're less likely to make mistakes.
complete pipeline from the beginning.
1. The repetition of the object being transformed (we've written `foo_foo` six
times!) obscures what's changing on each line.
times!) obscures what's different on each line.
### Function composition
@ -128,9 +126,9 @@ foo_foo %>%
bop(on = head)
```
This is my favourite form, because it focusses on verbs, not nouns. You can read this series of function compositions like it's a set of imperative actions. Foo foo hops, then scoops, then bops. The downside, of course, is that you need to be familiar with the pipe. If you've never seen `%>%` before, you'll have no idea what this code does. Fortunately, however, most people pick up the idea very quickly, so when you share you code with others who aren't familiar with the pipe, you can easily teach them.
This is my favourite form, because it focusses on verbs, not nouns. You can read this series of function compositions like it's a set of imperative actions. Foo foo hops, then scoops, then bops. The downside, of course, is that you need to be familiar with the pipe. If you've never seen `%>%` before, you'll have no idea what this code does. Fortunately, most people pick up the idea very quickly, so when you share you code with others who aren't familiar with the pipe, you can easily teach them.
The pipe works by doing "lexical transformation". Behind the scenes, magrittr reassemble the code in the pipe to a form that works by overwriting an intermediate object. When you run a pipe like the one above, magrittr does something like this:
The pipe works by doing "lexical transformation". Behind the scenes, magrittr reassembles the code in the pipe to a form that works by overwriting an intermediate object. When you run a pipe like the one above, magrittr does something like this:
```{r, eval = FALSE}
my_pipe <- function(.) {
@ -164,7 +162,7 @@ This means that the pipe won't work for two classes of functions:
x
```
Other functions with this problem include `get()` and `load()`
Other functions with this problem include `get()` and `load()`.
1. Functions that use lazy evaluation. In R, function arguments
are only computed when the function uses them, not prior to calling the
@ -181,15 +179,15 @@ This means that the pipe won't work for two classes of functions:
tryCatch(error = function(e) "An error")
```
There are a relatively wide class of functions with this behaviour.
This includes `try()`, `supressMessages()`, and `suppressWarnings()`
There are a relatively wide class of functions with this behaviour,
including `try()`, `supressMessages()`, and `suppressWarnings()`
in base R.
## When not to use the pipe
The pipe is a powerful tool, but it's not the only tool at your disposal, and it doesn't solve every problem! Pipes are most useful for rewriting a fairly short linear sequence of operations. I think you should reach for another tool when:
* Your pipes get longer than five or six lines. In that case, create
* Your pipes are longer than five or six steps. In that case, create
intermediate objects with meaningful names. That will make debugging easier,
because you can more easily check the intermediate results, and it makes
it easier to understand your code, because the variable names can help
@ -205,7 +203,7 @@ The pipe is a powerful tool, but it's not the only tool at your disposal, and it
## Other tools from magrittr
The pipe is provided by the magrittr package, by Stefan Milton Bache. Most of the packages you work with in this book will automatically provide `%>%` for you. You might want to load magrittr yourself if you're using another package, or you want to access some of the other pipe variants that magrittr provides.
All packages in the tidyverse automatically make `%>%` avaiable for you, so you don't normally load magrittr explicitly. However, there are some other useful tools inside magrittr that you might want to try out:
* When working with more complex pipes, it's sometimes useful to call a
function for its side-effects. Maybe you want to print out the current
@ -257,5 +255,5 @@ The pipe is provided by the magrittr package, by Stefan Milton Bache. Most of th
I'm not a fan of this operator because I think assignment is such a
special operation that it should always be clear when it's occurring.
In my opinion, a little bit of duplication (i.e. repeating the
name of the object twice), is fine in return for making assignment
name of the object twice) is fine in return for making assignment
more explicit.