factors.Rmd clarification (#577)

Fixes #576
This commit is contained in:
Brent Brewington 2017-05-04 08:07:34 -04:00 committed by Hadley Wickham
parent 2afd79c1f6
commit 0f956d64db
1 changed files with 7 additions and 7 deletions

View File

@ -144,7 +144,7 @@ When working with factors, the two most common operations are changing the order
It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions: It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions:
```{r} ```{r}
relig <- gss_cat %>% relig_summary <- gss_cat %>%
group_by(relig) %>% group_by(relig) %>%
summarise( summarise(
age = mean(age, na.rm = TRUE), age = mean(age, na.rm = TRUE),
@ -152,7 +152,7 @@ relig <- gss_cat %>%
n = n() n = n()
) )
ggplot(relig, aes(tvhours, relig)) + geom_point() ggplot(relig_summary, aes(tvhours, relig)) + geom_point()
``` ```
It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments: It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments:
@ -163,7 +163,7 @@ It is difficult to interpret this plot because there's no overall pattern. We ca
`x` for each value of `f`. The default value is `median`. `x` for each value of `f`. The default value is `median`.
```{r} ```{r}
ggplot(relig, aes(tvhours, fct_reorder(relig, tvhours))) + ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) +
geom_point() geom_point()
``` ```
@ -172,7 +172,7 @@ Reordering religion makes it much easier to see that people in the "Don't know"
As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as: As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as:
```{r, eval = FALSE} ```{r, eval = FALSE}
relig %>% relig_summary %>%
mutate(relig = fct_reorder(relig, tvhours)) %>% mutate(relig = fct_reorder(relig, tvhours)) %>%
ggplot(aes(tvhours, relig)) + ggplot(aes(tvhours, relig)) +
geom_point() geom_point()
@ -180,7 +180,7 @@ relig %>%
What if we create a similar plot looking at how average age varies across reported income level? What if we create a similar plot looking at how average age varies across reported income level?
```{r} ```{r}
rincome <- gss_cat %>% rincome_summary <- gss_cat %>%
group_by(rincome) %>% group_by(rincome) %>%
summarise( summarise(
age = mean(age, na.rm = TRUE), age = mean(age, na.rm = TRUE),
@ -188,7 +188,7 @@ rincome <- gss_cat %>%
n = n() n = n()
) )
ggplot(rincome, aes(age, fct_reorder(rincome, age))) + geom_point() ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) + geom_point()
``` ```
Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered. Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered.
@ -196,7 +196,7 @@ Here, arbitrarily reordering the levels isn't a good idea! That's because `rinco
However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line. However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line.
```{r} ```{r}
ggplot(rincome, aes(age, fct_relevel(rincome, "Not applicable"))) + ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) +
geom_point() geom_point()
``` ```