Consistently style for loop

Fixes #837
This commit is contained in:
Hadley Wickham 2022-08-30 09:00:14 -05:00
parent 72e0f519dc
commit 843df1d22d
1 changed files with 44 additions and 44 deletions

View File

@ -22,14 +22,14 @@ One tool for reducing duplication is functions, which reduce duplication by iden
Another tool for reducing duplication is **iteration**, which helps you when you need to do the same thing to multiple inputs: repeating the same operation on different columns, or on different datasets.
In this chapter you'll learn about two important iteration paradigms: **imperative** and **functional**.
On the imperative side you have tools like for loops and while loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
However, for loops are quite verbose because they require bookkeeping code that is duplicated for every for loop.
Functional programming (FP) offers tools to extract out this duplicated code, so each common for loop pattern gets its own function.
On the imperative side you have tools like `for` loops and `while` loops, which are a great place to start because they make iteration very explicit, so it's obvious what's happening.
However, `for` loops are quite verbose because they require bookkeeping code that is duplicated for every `for` loop.
Functional programming (FP) offers tools to extract out this duplicated code, so each common `for` loop pattern gets its own function.
Once you master the vocabulary of FP, you can solve many common iteration problems with less code, more ease, and fewer errors.
### Prerequisites
Once you've mastered the for loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
Once you've mastered the `for` loops provided by base R, you'll learn some of the powerful programming tools provided by purrr, one of the tidyverse core packages.
```{r}
#| label: setup
@ -62,7 +62,7 @@ median(df$d)
```
But that breaks our rule of thumb: never copy and paste more than twice.
Instead, we could use a for loop:
Instead, we could use a `for` loop:
```{r}
output <- vector("double", ncol(df)) # 1. output
@ -72,17 +72,17 @@ for (i in seq_along(df)) { # 2. sequence
output
```
Every for loop has three components:
Every `for` loop has three components:
1. The **output**: `output <- vector("double", length(x))`.
Before you start the loop, you must always allocate sufficient space for the output.
This is very important for efficiency: if you grow the for loop at each iteration using `c()` (for example), your for loop will be very slow.
This is very important for efficiency: if you grow the `for` loop at each iteration using `c()` (for example), your `for` loop will be very slow.
A general way of creating an empty vector of given length is the `vector()` function.
It has two arguments: the type of the vector ("logical", "integer", "double", "character", etc) and the length of the vector.
2. The **sequence**: `i in seq_along(df)`.
This determines what to loop over: each run of the for loop will assign `i` to a different value from `seq_along(df)`.
This determines what to loop over: each run of the `for` loop will assign `i` to a different value from `seq_along(df)`.
It's useful to think of `i` as a pronoun, like "it".
You might not have seen `seq_along()` before.
@ -102,13 +102,13 @@ Every for loop has three components:
It's run repeatedly, each time with a different value for `i`.
The first iteration will run `output[[1]] <- median(df[[1]])`, the second will run `output[[2]] <- median(df[[2]])`, and so on.
That's all there is to the for loop!
Now is a good time to practice creating some basic (and not so basic) for loops using the exercises below.
Then we'll move on to some variations of the for loop that help you solve other problems that will crop up in practice.
That's all there is to the `for` loop!
Now is a good time to practice creating some basic (and not so basic) `for` loops using the exercises below.
Then we'll move on to some variations of the `for` loop that help you solve other problems that will crop up in practice.
### Exercises
1. Write for loops to:
1. Write `for` loops to:
a. Compute the mean of every column in `mtcars`.
b. Determine the type of each column in `nycflights13::flights`.
@ -117,7 +117,7 @@ Then we'll move on to some variations of the for loop that help you solve other
Think about the output, sequence, and body **before** you start writing the loop.
2. Eliminate the for loop in each of the following examples by taking advantage of an existing function that works with vectors:
2. Eliminate the `for` loop in each of the following examples by taking advantage of an existing function that works with vectors:
```{r}
#| eval: false
@ -142,13 +142,13 @@ Then we'll move on to some variations of the for loop that help you solve other
}
```
3. Combine your function writing and for loop skills:
3. Combine your function writing and `for` loop skills:
a. Write a for loop that `prints()` the lyrics to the children's song "Alice the camel".
a. Write a `for` loop that `prints()` the lyrics to the children's song "Alice the camel".
b. Convert the nursery rhyme "ten in the bed" to a function. Generalise it to any number of people in any sleeping structure.
c. Convert the song "99 bottles of beer on the wall" to a function. Generalise to any number of any vessel containing any liquid on any surface.
4. It's common to see for loops that don't preallocate the output and instead increase the length of a vector at each step:
4. It's common to see `for` loops that don't preallocate the output and instead increase the length of a vector at each step:
```{r}
#| eval: false
@ -165,10 +165,10 @@ Then we'll move on to some variations of the for loop that help you solve other
## For loop variations
Once you have the basic for loop under your belt, there are some variations that you should be aware of.
Once you have the basic `for` loop under your belt, there are some variations that you should be aware of.
These variations are important regardless of how you do iteration, so don't forget about them once you've mastered the FP techniques you'll learn about in the next section.
There are four variations on the basic theme of the for loop:
There are four variations on the basic theme of the `for` loop:
1. Modifying an existing object, instead of creating a new object.
2. Looping over names or values, instead of indices.
@ -177,7 +177,7 @@ There are four variations on the basic theme of the for loop:
### Modifying an existing object
Sometimes you want to use a for loop to modify an existing object.
Sometimes you want to use a `for` loop to modify an existing object.
For example, remember our challenge from [Chapter -@sec-functions] on functions.
We wanted to rescale every column in a data frame:
@ -199,7 +199,7 @@ df$c <- rescale01(df$c)
df$d <- rescale01(df$d)
```
To solve this with a for loop we again think about the three components:
To solve this with a `for` loop we again think about the three components:
1. **Output**: we already have the output --- it's the same as the input!
@ -216,7 +216,7 @@ for (i in seq_along(df)) {
```
Typically you'll be modifying a list or data frame with this sort of loop, so remember to use `[[`, not `[`.
You might have spotted that we used `[[` in all my for loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
You might have spotted that we used `[[` in all my `for` loops: we think it's better to use `[[` even for atomic vectors because it makes it clear that you want to work with a single element.
### Looping patterns
@ -300,9 +300,9 @@ Whenever you see it, switch to a more complex result object, and then combine in
Sometimes you don't even know how long the input sequence should run for.
This is common when doing simulations.
For example, you might want to loop until you get three heads in a row.
You can't do that sort of iteration with the for loop.
Instead, you can use a while loop.
A while loop is simpler than a for loop because it only has two components, a condition and a body:
You can't do that sort of iteration with the `for` loop.
Instead, you can use a `while` loop.
A `while` loop is simpler than a `for` loop because it only has two components, a condition and a body:
```{r}
#| eval: false
@ -312,7 +312,7 @@ while (condition) {
}
```
A while loop is also more general than a for loop, because you can rewrite any for loop as a while loop, but you can't rewrite every while loop as a for loop:
A `while` loop is also more general than a `for` loop, because you can rewrite any `for` loop as a `while` loop, but you can't rewrite every `while` loop as a `for` loop:
```{r}
#| eval: false
@ -329,7 +329,7 @@ while (i <= length(x)) {
}
```
Here's how we could use a while loop to find how many tries it takes to get three heads in a row:
Here's how we could use a `while` loop to find how many tries it takes to get three heads in a row:
```{r}
flip <- function() sample(c("T", "H"), 1)
@ -348,7 +348,7 @@ while (nheads < 3) {
flips
```
I mention while loops only briefly, because we hardly ever use them.
I mention `while` loops only briefly, because we hardly ever use them.
They're most often used for simulation, which is outside the scope of this book.
However, it is good to know they exist so that you're prepared for problems where the number of iterations is not known in advance.
@ -356,7 +356,7 @@ However, it is good to know they exist so that you're prepared for problems wher
1. Imagine you have a directory full of CSV files that you want to read in.
You have their paths in a vector, `files <- dir("data/", pattern = "\\.csv$", full.names = TRUE)`, and now want to read each one with `read_csv()`.
Write the for loop that will load them into a single data frame.
Write the `for` loop that will load them into a single data frame.
2. What happens if you use `for (nm in names(x))` and `x` has no names?
What if only some of the elements are named?
@ -396,8 +396,8 @@ However, it is good to know they exist so that you're prepared for problems wher
## For loops vs. functionals
For loops are not as important in R as they are in other languages because R is a functional programming language.
This means that it's possible to wrap up for loops in a function, and call that function instead of using the for loop directly.
`For` loops are not as important in R as they are in other languages because R is a functional programming language.
This means that it's possible to wrap up `for` loops in a function, and call that function instead of using the `for` loop directly.
To see why this is important, consider (again) this simple data frame:
@ -411,7 +411,7 @@ df <- tibble(
```
Imagine you want to compute the mean of every column.
You could do that with a for loop:
You could do that with a `for` loop:
```{r}
output <- vector("double", length(df))
@ -454,7 +454,7 @@ col_sd <- function(df) {
Uh oh!
You've copied-and-pasted this code twice, so it's time to think about how to generalize it.
Notice that most of this code is for-loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
Notice that most of this code is `for` loop boilerplate and it's hard to see the one thing (`mean()`, `median()`, `sd()`) that is different between the functions.
What would you do if you saw a set of functions like this:
@ -488,10 +488,10 @@ col_summary(df, mean)
The idea of passing a function to another function is an extremely powerful idea, and it's one of the behaviors that makes R a functional programming language.
It might take you a while to wrap your head around the idea, but it's worth the investment.
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common for loops.
In the rest of the chapter, you'll learn about and use the **purrr** package, which provides functions that eliminate the need for many common `for` loops.
The apply family of functions in base R (`apply()`, `lapply()`, `tapply()`, etc) solve a similar problem, but purrr is more consistent and thus is easier to learn.
The goal of using purrr functions instead of for loops is to allow you to break common list manipulation challenges into independent pieces:
The goal of using purrr functions instead of `for` loops is to allow you to break common list manipulation challenges into independent pieces:
1. How can you solve the problem for a single element of the list?
Once you've solved that problem, purrr takes care of generalising your solution to every element in the list.
@ -505,7 +505,7 @@ It also makes it easier to understand your solutions to old problems when you re
### Exercises
1. Read the documentation for `apply()`.
In the 2d case, what two for loops does it generalise?
In the 2d case, what two `for` loops does it generalise?
2. Adapt `col_summary()` so that it only applies to numeric columns You might want to start with an `is_numeric()` function that returns a logical vector that has a `TRUE` corresponding to each numeric column.
@ -524,15 +524,15 @@ Each function takes a vector as input, applies a function to each piece, and the
The type of the vector is determined by the suffix to the map function.
Once you master these functions, you'll find it takes much less time to solve iteration problems.
But you should never feel bad about using a for loop instead of a map function.
But you should never feel bad about using a `for` loop instead of a map function.
The map functions are a step up a tower of abstraction, and it can take a long time to get your head around how they work.
The important thing is that you solve the problem that you're working on, not write the most concise and elegant code (although that's definitely something you want to strive towards!).
Some people will tell you to avoid for loops because they are slow.
Some people will tell you to avoid `for` loops because they are slow.
They're wrong!
(Well at least they're rather out of date, as for loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
(Well at least they're rather out of date, as `for` loops haven't been slow for many years.) The chief benefits of using functions like `map()` is not speed, but clarity: they make your code easier to write and to read.
We can use these functions to perform the same computations as the last for loop.
We can use these functions to perform the same computations as the last `for` loop.
Those summary functions returned doubles, so we need to use `map_dbl()`:
```{r}
@ -541,7 +541,7 @@ map_dbl(df, median)
map_dbl(df, sd)
```
Compared to using a for loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
Compared to using a `for` loop, focus is on the operation being performed (i.e. `mean()`, `median()`, `sd()`), not the bookkeeping required to loop over every element and store the output.
This is even more apparent if we use the pipe:
```{r}
@ -591,7 +591,7 @@ models <- mtcars |>
map(~lm(mpg ~ wt, data = .x))
```
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the for loop).
Here we've used `.x` as a pronoun: it refers to the current list element (in the same way that `i` referred to the current index in the `for` loop).
`.x` in a one-sided formula corresponds to an argument in an anonymous function.
When you're looking at many models, you might want to extract a summary statistic like the $R^2$.
@ -781,7 +781,7 @@ knitr::include_graphics("diagrams/lists-map2.png")
Note that the arguments that vary for each call come *before* the function; arguments that are the same for every call come *after*.
Like `map()`, `map2()` is just a wrapper around a for loop:
Like `map()`, `map2()` is just a wrapper around a `for` loop:
```{r}
map2 <- function(x, y, f, ...) {
@ -881,7 +881,7 @@ This makes them suitable for use in the middle of pipelines.
## Other patterns of for loops
Purrr provides a number of other functions that abstract over other types of for loops.
Purrr provides a number of other functions that abstract over other types of `for` loops.
You'll use them less frequently than the map functions, but they're useful to know about.
The goal here is to briefly illustrate each function, so hopefully it will come to mind if you see a similar problem in the future.
Then you can go look up the documentation for more details.
@ -978,7 +978,7 @@ x |> accumulate(`+`)
### Exercises
1. Implement your own version of `every()` using a for loop.
1. Implement your own version of `every()` using a `for` loop.
Compare it with `purrr::every()`.
What does purrr's version do that your version doesn't?