Fixes from Roberto

This commit is contained in:
hadley 2016-06-27 13:07:23 -05:00
parent 98c14b843c
commit 539e6d9a8a
7 changed files with 15 additions and 15 deletions

View File

@ -5,7 +5,7 @@ library(purrr)
library(dplyr)
```
So far this book has focussed on data frames and packages that work with them. But as you start to write your own functions, and dig deeper into R, you need to learn about vectors, the objects that underpin data frames. If you've learned R in a more traditional way, you're probably familiar with vectors already, as most R resource start with vectors and work their way up to data frames. I think it's better to start with data frames because they're immediately useful, and then work your way down to the underlying components.
So far this book has focussed on data frames and packages that work with them. But as you start to write your own functions, and dig deeper into R, you need to learn about vectors, the objects that underpin data frames. If you've learned R in a more traditional way, you're probably already familiar with vectors, as most R resources start with vectors and work their way up to data frames. I think it's better to start with data frames because they're immediately useful, and then work your way down to the underlying components.
Vectors are particularly important as its to learn to write functions that work with vectors, rather than data frames. The technology that lets ggplot2, tidyr, dplyr etc work with data frames is considerably more complex and not currently standardised. While I'm currently working on a new standard that will make life much easier, it's unlikely to be ready in time for this book.
@ -211,7 +211,7 @@ if (length(x)) {
}
```
In this case, 0 is converted to `FALSE` and everything else is converted to `TRUE`. I think this makes it harder to understand your code, and I recommend it.
In this case, 0 is converted to `FALSE` and everything else is converted to `TRUE`. I think this makes it harder to understand your code, and I don't recommend it.
It's also important to understand what happens when you try and create a vector containing multiple types with `c()`: the most complex type always wins.

View File

@ -305,7 +305,7 @@ if (NA) {}
You can use `||` (or) and `&&` (and) to combine multiple logical expressions. These operators are "short-circuiting": as soon as `||` sees the first `TRUE` it returns `TRUE` without computing anything else. As soon as `&&` sees the first `FALSE` it returns `FALSE`. You should never use `|` or `&` in an `if` statement: these are vectorised operations that apply to multiple values (that's why you use them in `filter()`). If you do have a logical vector, you can use `any()` or `all()` to collapse it to a single value.
Be careful when testing for equality. `==` is vectorised, which means that it's easy to get more than one output. Either check the the length is already 1, collapsed with `all()` or `any()`, or use the non-vectorised `identical()`. `identical()` is very strict: it always returns either a single `TRUE` or a single `FALSE`, and doesn't coerce types. This means that you need to be careful when comparing integers and doubles:
Be careful when testing for equality. `==` is vectorised, which means that it's easy to get more than one output. Either check the length is already 1, collapsed with `all()` or `any()`, or use the non-vectorised `identical()`. `identical()` is very strict: it always returns either a single `TRUE` or a single `FALSE`, and doesn't coerce types. This means that you need to be careful when comparing integers and doubles:
```{r}
identical(0L, 0)
@ -641,7 +641,7 @@ complicated_function <- function(x, y, z) {
```
Another reason is becuase you have a `if` statement with one complex block and one simple block. For example, you might write an if statement like this:
Another reason is because you have a `if` statement with one complex block and one simple block. For example, you might write an if statement like this:
```{r, eval = FALSE}
f <- function() {

View File

@ -2,6 +2,7 @@
```{r setup, include=FALSE}
library(purrr)
library(stringr)
```
In [functions], we talked about how important it is to reduce duplication in your code. Reducing code duplication has three main benefits:
@ -104,8 +105,6 @@ Every for loop has three components:
the work. It's run repeatedly, each time with a different value for `i`.
The first iteration will run `output[[1]] <- median(df[[1]])`,
the second will run `output[[2]] <- median(df[[2]])`, and so on.
If you haven't seen `x[[i]]` before, it extracts the `i`th element from
`x`. You'll learn more about it in [subsetting].
That's all there is to the for loop! Now is a good time to practice creating some basic (and not so basic) for loops using the exercises below. Then we'll move on some variations of the for loop that help you solve other problems that will crop up in practice.
@ -127,7 +126,7 @@ That's all there is to the for loop! Now is a good time to practice creating som
```{r}
out <- ""
for (x in letters) {
out <- paste0(out, x)
out <- str_c(out, x)
}
x <- sample(100)
@ -843,7 +842,7 @@ library(ggplot2)
plots <- mtcars %>%
split(.$cyl) %>%
map(~ggplot(., aes(mpg, wt)) + geom_point())
paths <- paste0(names(plots), ".pdf")
paths <- str_c(names(plots), ".pdf")
pwalk(list(paths, plots), ggsave, path = tempdir())
```

View File

@ -433,7 +433,7 @@ The advantage of this structure is that it generalises in a straightforward way
Now if you want to iterate over names and values in parallel, you can use `map2()`:
```{r}
df %>% mutate(smry = map2_chr(name, value, ~ paste0(.x, ": ", .y[1])))
df %>% mutate(smry = map2_chr(name, value, ~ stringr::str_c(.x, ": ", .y[1])))
```

View File

@ -4,6 +4,7 @@
library(dplyr)
library(nycflights13)
library(ggplot2)
library(stringr)
```
It's rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you're interested in. Collectively, multiple tables of data are called __relational data__ because it is the relations, not just the individual datasets, that are particularly important.
@ -261,8 +262,8 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
and a foreign key in `x`.
```{r}
x <- data_frame(key = c(1, 2, 2, 1), val_x = paste0("x", 1:4))
y <- data_frame(key = 1:2, val_y = paste0("y", 1:2))
x <- data_frame(key = c(1, 2, 2, 1), val_x = str_c("x", 1:4))
y <- data_frame(key = 1:2, val_y = str_c("y", 1:2))
left_join(x, y, by = "key")
```
@ -275,8 +276,8 @@ So far all the diagrams have assumed that the keys are unique. But that's not al
```
```{r}
x <- data_frame(key = c(1, 2, 2, 3), val_x = paste0("x", 1:4))
y <- data_frame(key = c(1, 2, 2, 3), val_y = paste0("y", 1:4))
x <- data_frame(key = c(1, 2, 2, 3), val_x = str_c("x", 1:4))
y <- data_frame(key = c(1, 2, 2, 3), val_y = str_c("y", 1:4))
left_join(x, y, by = "key")
```

View File

@ -288,7 +288,7 @@ There are number of other special patterns that match more than one character:
* `\d`: any digit.
* `\s`: any whitespace (space, tab, newline).
* `[abc]`: match a, b, or c.
* `[!abc]`: match anything except a, b, or c.
* `[^abc]`: match anything except a, b, or c.
Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`.

View File

@ -13,7 +13,7 @@ Note that this chapter explains how to change the format, or layout, of tabular
In *Section 4.1*, you will learn how the features of R determine the best way to layout your data. This section introduces "tidy data," a way to organize your data that works particularly well with R.
*Section 4.2* teaches the basic method for making untidy data tidy. In this section, you will learn how to reorganize the values in your data set with the the `spread()` and `gather()` functions of the `tidyr` package.
*Section 4.2* teaches the basic method for making untidy data tidy. In this section, you will learn how to reorganize the values in your data set with the `spread()` and `gather()` functions of the `tidyr` package.
*Section 4.3* explains how to split apart and combine values in your data set to make them easier to access with R.