diff --git a/factors.Rmd b/factors.Rmd index 28e346b..33a9c5c 100644 --- a/factors.Rmd +++ b/factors.Rmd @@ -19,32 +19,70 @@ library(forcats) ## Creating factors -Typically you'll convert a factor from a character vector, using `factor()`. Apart from the character input, the most important argument is the list of valid __levels__: +Imagine that you have a variable that records month: ```{r} -x <- c("pear", "apple", "banana", "apple", "pear", "apple") -factor(x, levels = c("apple", "banana", "pear")) +x1 <- c("Dec", "Apr", "Jan", "Mar") ``` -Any values not in the list of levels will be silently converted to `NA`: +Using a string to record this variable has two problems: + +1. There are only twelve possible months, and there's nothing saving you + from typos: + + ```{r} + x2 <- c("Dec", "Apr", "Jam", "Mar") + ``` + +1. It doesn't sort in a useful way: + + ```{r} + sort(x1) + ``` + +You can fix both of these problems with a factor. To create a factor you must start by creating a list of the valid __levels__: ```{r} -factor(x, levels = c("apple", "banana")) +month_levels <- c( + "Jan", "Feb", "Mar", "Apr", "May", "Jun", + "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" +) +``` + +Now you can create a factor: + +```{r} +y1 <- factor(x1, levels = month_levels) +y1 +sort(y1) +``` + +And any values not in the set will be silently converted to NA: + +```{r} +y2 <- factor(x2, levels = month_levels) +y2 +``` + +If you want a want, you can use `readr::parse_factor()`: + +```{r} +y2 <- parse_factor(x2, levels = month_levels) ``` If you omit the levels, they'll be taken from the data in alphabetical order: ```{r} -factor(x) +factor(x1) ``` Sometimes you'd prefer that the order of the levels match the order of the first appearance in the data. You can do that when creating the factor by setting levels to `unique(x)`, or after the fact, with `fct_inorder()`: ```{r} -f1 <- factor(x, levels = unique(x)) +f1 <- factor(x1, levels = unique(x1)) f1 -f2 <- x %>% factor() %>% fct_inorder() +f2 <- x1 %>% factor() %>% fct_inorder() f2 ```