Better factor motivation

Thanks to @csgillespie
This commit is contained in:
hadley 2016-10-04 09:00:33 -05:00
parent 55caa63edd
commit 5fee66efce
1 changed files with 46 additions and 8 deletions

View File

@ -19,32 +19,70 @@ library(forcats)
## Creating factors
Typically you'll convert a factor from a character vector, using `factor()`. Apart from the character input, the most important argument is the list of valid __levels__:
Imagine that you have a variable that records month:
```{r}
x <- c("pear", "apple", "banana", "apple", "pear", "apple")
factor(x, levels = c("apple", "banana", "pear"))
x1 <- c("Dec", "Apr", "Jan", "Mar")
```
Any values not in the list of levels will be silently converted to `NA`:
Using a string to record this variable has two problems:
1. There are only twelve possible months, and there's nothing saving you
from typos:
```{r}
x2 <- c("Dec", "Apr", "Jam", "Mar")
```
1. It doesn't sort in a useful way:
```{r}
sort(x1)
```
You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid __levels__:
```{r}
factor(x, levels = c("apple", "banana"))
month_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)
```
Now you can create a factor:
```{r}
y1 <- factor(x1, levels = month_levels)
y1
sort(y1)
```
And any values not in the set will be silently converted to NA:
```{r}
y2 <- factor(x2, levels = month_levels)
y2
```
If you want a want, you can use `readr::parse_factor()`:
```{r}
y2 <- parse_factor(x2, levels = month_levels)
```
If you omit the levels, they'll be taken from the data in alphabetical order:
```{r}
factor(x)
factor(x1)
```
Sometimes you'd prefer that the order of the levels match the order of the first appearance in the data. You can do that when creating the factor by setting levels to `unique(x)`, or after the fact, with `fct_inorder()`:
```{r}
f1 <- factor(x, levels = unique(x))
f1 <- factor(x1, levels = unique(x1))
f1
f2 <- x %>% factor() %>% fct_inorder()
f2 <- x1 %>% factor() %>% fct_inorder()
f2
```