typos and similar

This commit is contained in:
jennybc 2016-03-04 13:52:00 -08:00
parent cbc34125ae
commit 2b41ac6a04
1 changed files with 20 additions and 20 deletions

View File

@ -6,7 +6,7 @@ knit: bookdown::preview_chapter
One of the best ways to improve your reach as a data scientist is to write functions. Functions allow you to automate common tasks. Writing a function has three big advantages over using copy-and-paste:
1. You drastistically reduce the chances of making incidental mistakes when
1. You drastically reduce the chances of making incidental mistakes when
you copy and paste.
1. As requirements change, you only need to update code in one place, instead
@ -21,7 +21,7 @@ As well as practical advice for writing functions, this chapter also gives you s
## When should you write a function?
You should consider writing a funtion whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?
You should consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code). For example, take a look at this code. What does it do?
```{r}
df <- data.frame(
@ -50,7 +50,7 @@ To write a function you need to first analyse the code. How many inputs does it
(max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))
```
This code only has one input: `df$a`. (You might wonder if that `TRUE` is also an input: you can explore why it's not in the exercise below). To make the single input more clear, it's a good idea to rewrite the code using a temporary variables with a general name. Here this function only takes one vector of input, so I'll call it `x`:
This code only has one input: `df$a`. (You might wonder if that `TRUE` is also an input: you can explore why it's not in the exercise below). To make the single input more clear, it's a good idea to rewrite the code using temporary variables with a general name. Here this function only takes one vector of input, so I'll call it `x`:
```{r}
x <- 1:10
@ -64,7 +64,7 @@ rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
```
Pulling out intermediate calculations into named variables is good practice because it makes it more clear what the code is doing. Now that I've simplified the code, and checked that it still works, I can turn it into a function:
Pulling out intermediate calculations into named variables is a good practice because it makes it more clear what the code is doing. Now that I've simplified the code, and checked that it still works, I can turn it into a function:
```{r}
rescale01 <- function(x) {
@ -86,7 +86,7 @@ There are three key steps to creating a new function:
1. You place the __body__ of the function inside a `{` block immediately
following `function`.
Note the overall process: I only made the function after I'd figured out how to make it work with a simple input. It's easier to start with working code and turn it into a function; it's harder to creating a function and then try to make it work.
Note the overall process: I only made the function after I'd figured out how to make it work with a simple input. It's easier to start with working code and turn it into a function; it's harder to create a function and then try to make it work.
Now that we have `rescale01()` we can use that to simplify the original example:
@ -97,7 +97,7 @@ df$c <- rescale01(df$c)
df$d <- rescale01(df$d)
```
Compared the original, this code is easier to understand. We've also eliminated one class of copy-and-paste errors. There is, however, still quite a bit of duplication since we're doing the same thing to multiple columns. You'll learn how to eliminate that duplication in the next chapter.
Compared to the original, this code is easier to understand. We've also eliminated one class of copy-and-paste errors. There is, however, still quite a bit of duplication since we're doing the same thing to multiple columns. You'll learn how to eliminate that duplication in the next chapter.
### Practice
@ -155,7 +155,7 @@ The name of a function is important. Ideally the name of your function will be s
Generally, function names should be verbs, and arguments should be nouns. There are some exceptions: nouns are ok if the function computes a very well known noun (i.e. `mean()` is better than `compute_mean()`), or accessing some property of an object (i.e. `coef()` is better than `get_coefficients()`). A good sign that a noun might be a better choice is if you're using a very broad verb like get, or compute, or calculate, or determine. Use your best judgement and don't be afraid to rename a function if you later figure out a better name.
If your function name is composed of multiple words, I recommend using "snake\_case", where each word is lower case and separated by an underscore. camelCase is a popular alterative alternative, but be consistent: pick one or the other and stick with it. R itself is not very consistent, but there's nothing you can do about that. Make sure you don't fall into the same trap by making your code as consistent as possible.
If your function name is composed of multiple words, I recommend using "snake\_case", where each word is lower case and separated by an underscore. camelCase is a popular alternative alternative, but be consistent: pick one or the other and stick with it. R itself is not very consistent, but there's nothing you can do about that. Make sure you don't fall into the same trap by making your code as consistent as possible.
If you have a family of functions that do similar things, make sure they have consistent names and arguments. Use a common prefix to indicate that they are connected. That's better than a common suffix because autocomplete allows you to type the prefix and see all the members of the family.
@ -211,11 +211,11 @@ Another important use of comments is to break up your file into easily readable
1. Take a function that you've written recently and spend 5 minutes
brainstorming a better name for it and its arguments.
1. Compare and constrast `rnorm()` and `mvrnorm()`. How could you make
1. Compare and contrast `rnorm()` and `MASS::mvrnorm()`. How could you make
them more consistent?
1. Make a case for why `normr()`, `normd()` etc would be better than
`rnorm()`, dnorm()`. Make a case for the opposite.
`rnorm()`, `dnorm()`. Make a case for the opposite.
## Conditional execution
@ -229,7 +229,7 @@ if (condition) {
}
```
To get help on if you need to surround it in backticks: `` ?`if` ``.
To get help on `if` you need to surround it in backticks: `` ?`if` ``.
Here's a simple function that uses an if statement. The goal of this function is to return a logical vector describing whether or not each element of a vector is named.
@ -260,7 +260,7 @@ if (NA) {
}
```
You can use `||` (or) and `&&` (and) to combine multiple logical expressions. These operators a "short-circuiting": as soon as `||` sees the first `TRUE` it returns `TRUE` without computing anything else. As soon as `&&` sees the first `FALSE` it returns `FALSE`.
You can use `||` (or) and `&&` (and) to combine multiple logical expressions. These operators are "short-circuiting": as soon as `||` sees the first `TRUE` it returns `TRUE` without computing anything else. As soon as `&&` sees the first `FALSE` it returns `FALSE`.
You should never use `|` or `&` in an `if` statement: these are vectorised operations that apply to multiple values. If you do have a logical vector, you can use `any()` or `all()` to collapse it to a single value.
@ -339,15 +339,15 @@ function(x, y, op) {
Another useful function that can often eliminate long chains of `if` statements is `cut()`. It's used to discretise continuous variables.
Note that neither `if` nor `switch()` are vectorised: they work with a single value at a time.
Note that neither `if` nor `switch()` is vectorised: they work with a single value at a time.
### Exercises
1. What's the different between `if` and `ifelse()`? Carefully read the help
1. What's the difference between `if` and `ifelse()`? Carefully read the help
and construct three examples that illustrate the key differences.
1. Write a greeting function that says "good morning", "good afternoon",
or "good evening", depending on the time of day. (Hint: use have a time
or "good evening", depending on the time of day. (Hint: use a time
argument that defaults to `lubridate::now()`. That will make it
easier to test your function.)
@ -395,7 +395,7 @@ The arguments to a function typically fall into two broad sets: one set supplies
`alternative`, `mu`, `paired`, `var.equal`, and `conf.level`.
* In `paste()` you can supply any number of strings to `...`, and the details
of the concatenation is controlled by `sep` and `collapse`.
of the concatenation are controlled by `sep` and `collapse`.
Generally, data arguments should come first. Detail arguments should go on the end, and usually should have default values. You specify a default value in the same way you call a function with a named argument:
@ -411,7 +411,7 @@ mean_se(x)
mean_se(x, 0.99)
```
The default value should almost always be the most common value. There are a few exceptions to do with safety. For example, it makes sense for `na.rm` to default to `FALSE` because missing values are important. Even though `na.rm = TRUE` is what you usually put in your code, it's a bad idea to silently ignoring missing values by default.
The default value should almost always be the most common value. There are a few exceptions to do with safety. For example, it makes sense for `na.rm` to default to `FALSE` because missing values are important. Even though `na.rm = TRUE` is what you usually put in your code, it's a bad idea to silently ignore missing values by default.
When you call a function, typically you can omit the names for the data arguments (because they are used so commonly). If you override the default value of a detail argument, you should use the full name:
@ -438,7 +438,7 @@ average<-mean(feet/12+inches,na.rm=TRUE)
### Dot dot dot
There's one special argument you need to know about: `...`, pronounced dot-dot-dot. This captures any other arguments not otherwise matched. It's useful because you can then send those `...` on to another argument. This is a useful catch-all if your function primarily wraps another function.
There's one special argument you need to know about: `...`, pronounced dot-dot-dot. This captures any other arguments not otherwise matched. It's useful because you can then send those `...` on to another function. This is a useful catch-all if your function primarily wraps another function.
For example, I commonly create these helper functions that wrap around `paste()`:
@ -471,7 +471,7 @@ Arguments in R are lazily evaluated: they're not computed until they're needed.
1. What does `commas(letters, collapse = "-")` do? Why?
1. It'd be nice if you supply multiple characters to the `pad` arugment, e.g.
1. It'd be nice if you supply multiple characters to the `pad` argument, e.g.
`rule("Title", pad = "-+")`. Why doesn't this currently work? How could you
fix it?
@ -545,7 +545,7 @@ You can also force printing by surrounding the call in parentheses:
(f())
```
Invisible values are mostly used when your function is called primarily for its side-effects (e.g. printing, plotting, or saving a file). It's nice to be able pipe such functions together, so it's good practive to invisibly return the first argument. This allows you to do things like:
Invisible values are mostly used when your function is called primarily for its side-effects (e.g. printing, plotting, or saving a file). It's nice to be able pipe such functions together, so it's good practice to invisibly return the first argument. This allows you to do things like:
```{r, eval = FALSE}
library(readr)
@ -607,7 +607,7 @@ try_require <- function(package, fun) {
}
```
Another place where it's useful to throw errors is if the inputs to the function are the wrong type. It's a good idea to throw an error early.
Another place where it's useful to throw errors is when the inputs to the function are the wrong type. It's a good idea to throw an error early.
`stopifnot()`.