Spell check suggestions (#259)

This commit is contained in:
harrismcgehee 2016-08-15 08:33:05 -04:00 committed by Hadley Wickham
parent f9901e3e54
commit 2c0c6a8be5
1 changed files with 7 additions and 7 deletions

View File

@ -119,7 +119,7 @@ The second step is to resolve one of two common problems:
1. One variable might be spread across multiple columns.
1. One observation might be scattered across mutliple rows.
1. One observation might be scattered across multiple rows.
Typically a dataset will only suffer from one of these problems; it'll only suffer from both if you're really unlucky! To fix these problems, you'll need the two most important functions in tidyr: `gather()` and `spread()`.
@ -185,10 +185,10 @@ To tidy this up, we first analyse the representation in similar way to `gather()
* The column that contains variable names, the `key` column. Here, it's
`type`.
* The column that contains values froms multiple variables, the `value`
* The column that contains values forms multiple variables, the `value`
column. Here it's `count`.
Once we've figured that out, we can use `spread()`, as shown progammatically below, and visually in Figure \@ref(fig:tidy-spread).
Once we've figured that out, we can use `spread()`, as shown programmatically below, and visually in Figure \@ref(fig:tidy-spread).
```{r}
spread(table2, key = type, value = count)
@ -317,7 +317,7 @@ table5 %>%
unite(new, century, year)
```
In this case we also need to use the `sep` arguent. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`:
In this case we also need to use the `sep` argument. The default will place an underscore (`_`) between the values from different columns. Here we don't want any separator so we use `""`:
```{r}
table5 %>%
@ -345,7 +345,7 @@ table5 %>%
## Missing values
Changing the representation of a dataset brings up an important subtlety of missing values. Suprisingly, a value can be missing in one of two possible ways:
Changing the representation of a dataset brings up an important subtlety of missing values. Surprisingly, a value can be missing in one of two possible ways:
* __Explicitly__, i.e. flagged with `NA`.
* __Implicitly__, i.e. simply not present in the data.
@ -442,7 +442,7 @@ The best place to start is almost always to gathering together the columns that
in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`)
these are likely to be values, not variables.
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells repesent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
```{r}
who1 <- who %>%
@ -550,7 +550,7 @@ who %>%
## Non-tidy data
Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the perjorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures:
Before we continue on to other topics, it's worth talking briefly about non-tidy data. Earlier in the chapter, I used the pejorative term "messy" to refer to non-tidy data. That's an oversimplification: there are lots of useful and well founded data structures that are not tidy data. There are two mains reasons to use other data structures:
* Alternative representations may have substantial performance or space
advantages.