Starting to thinking about writing functions

This commit is contained in:
hadley 2016-01-18 09:19:35 -06:00
parent 844226da7c
commit cab42dd99a
1 changed files with 33 additions and 17 deletions

View File

@ -383,23 +383,6 @@ There are two main differences with `lapply()` and `col_summary()`:
As you learn more about R, you'll learn more functions that allow you to abstract over common patterns of for loops.
### Modifying columns
Going back to our original motivation we want to reduce the duplication in
```{r, eval = FALSE}
df$a <- rescale01(df$a)
df$b <- rescale01(df$b)
df$c <- rescale01(df$c)
df$d <- rescale01(df$d)
```
One way to do that is to combine `lapply()` with data frame subsetting:
```{r}
df[] <- lapply(df, rescale01)
```
### Exercises
1. Adapt `col_summary()` so that it only applies to numeric inputs.
@ -407,3 +390,36 @@ df[] <- lapply(df, rescale01)
a logical vector that has a TRUE corresponding to each numeric column.
1. How do `sapply()` and `vapply()` differ from `col_summary()`?
## Robust and readable functions
There is one principle that tends to lends itself both to easily readable code and code that works well even when generalised to handle new situations that you didn't previously think about.
You want to use functions who behaviour can be understood with as little context as possible. The smaller amount of code that you need to read to predict the likely outcome of a function, the easier it is to understand the code. Such code is also less likely to fail in unexpected ways because in new situations.
A few examples:
* What what will `df[, x]` return? You can assume that `df` is a data frame
and `x` is a vector because of their names. But don't know whether this
code will return a data frame or a vector because the behaviour of `[`
differs depending on the length of x. Compare with `df[, x, drop = FALSE]`
or `df[[x]]` which make it more clear what you expect.
* What will `filter(df, x == y)` do? It depends on whether `x` or `y` or
both are variable in `df` or variables in the current environment.
Compare with `df[df$x == y, , drop = FALSE]`.
`filter(df, local(x) == global(y))`
* What sort of column will `data.frame(x = "a")` create? You
can't be sure whether it will contain characters or factors depending on
the value of the global option `stringsAsFactors`.
`data.frame(x = "a", stringsAsFactors = FALSE)` or
`data_frame(x = "a")`.
Avoiding functions that behave unpredictably helps you to write code that you can understand (because you don't need to be intimately familiar with the specifics of the call). This book teaches you functions that have follow this principle as much as possible. When you have a choice between two functions with similar behaviour, pick the one that needs the least context to understand. This often means being more explicit, which means writing more code. That means your functions will take longer to write, but they will be easier to read and more robust to varying inputs.
The transition from interactive analysis to programming R can be very frustrating because it forces you to confront differences that you previously swept under the carpet. You need to learn about how functions can behave differently some of the time, and
If this behaviour is advantageous for programming, why do any functions behave differently? Because R is not just a programming language, it's also an environment for interactive data analysis. And somethings make sense for interactive use (where you quickly check the output and guessing what you want is ok) but don't make sense for programming (where you want errors to arise as quickly as possible).
It's a continuum, not two discrete endpoints. It's not possible to write code where every single line is understandable in isolation. Even if you could, it wouldn't be desirable. Relying on a little context is useful. You just don't want to go overboard.