From 0f956d64db7f2ac2493aab979f9ed76dc58c3524 Mon Sep 17 00:00:00 2001 From: Brent Brewington Date: Thu, 4 May 2017 08:07:34 -0400 Subject: [PATCH] factors.Rmd clarification (#577) Fixes #576 --- factors.Rmd | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/factors.Rmd b/factors.Rmd index ae2e8e1..8db4c0f 100644 --- a/factors.Rmd +++ b/factors.Rmd @@ -144,7 +144,7 @@ When working with factors, the two most common operations are changing the order It's often useful to change the order of the factor levels in a visualisation. For example, imagine you want to explore the average number of hours spent watching TV per day across religions: ```{r} -relig <- gss_cat %>% +relig_summary <- gss_cat %>% group_by(relig) %>% summarise( age = mean(age, na.rm = TRUE), @@ -152,7 +152,7 @@ relig <- gss_cat %>% n = n() ) -ggplot(relig, aes(tvhours, relig)) + geom_point() +ggplot(relig_summary, aes(tvhours, relig)) + geom_point() ``` It is difficult to interpret this plot because there's no overall pattern. We can improve it by reordering the levels of `relig` using `fct_reorder()`. `fct_reorder()` takes three arguments: @@ -163,7 +163,7 @@ It is difficult to interpret this plot because there's no overall pattern. We ca `x` for each value of `f`. The default value is `median`. ```{r} -ggplot(relig, aes(tvhours, fct_reorder(relig, tvhours))) + +ggplot(relig_summary, aes(tvhours, fct_reorder(relig, tvhours))) + geom_point() ``` @@ -172,7 +172,7 @@ Reordering religion makes it much easier to see that people in the "Don't know" As you start making more complicated transformations, I'd recommend moving them out of `aes()` and into a separate `mutate()` step. For example, you could rewrite the plot above as: ```{r, eval = FALSE} -relig %>% +relig_summary %>% mutate(relig = fct_reorder(relig, tvhours)) %>% ggplot(aes(tvhours, relig)) + geom_point() @@ -180,7 +180,7 @@ relig %>% What if we create a similar plot looking at how average age varies across reported income level? ```{r} -rincome <- gss_cat %>% +rincome_summary <- gss_cat %>% group_by(rincome) %>% summarise( age = mean(age, na.rm = TRUE), @@ -188,7 +188,7 @@ rincome <- gss_cat %>% n = n() ) -ggplot(rincome, aes(age, fct_reorder(rincome, age))) + geom_point() +ggplot(rincome_summary, aes(age, fct_reorder(rincome, age))) + geom_point() ``` Here, arbitrarily reordering the levels isn't a good idea! That's because `rincome` already has a principled order that we shouldn't mess with. Reserve `fct_reorder()` for factors whose levels are arbitrarily ordered. @@ -196,7 +196,7 @@ Here, arbitrarily reordering the levels isn't a good idea! That's because `rinco However, it does make sense to pull "Not applicable" to the front with the other special levels. You can use `fct_relevel()`. It takes a factor, `f`, and then any number of levels that you want to move to the front of the line. ```{r} -ggplot(rincome, aes(age, fct_relevel(rincome, "Not applicable"))) + +ggplot(rincome_summary, aes(age, fct_relevel(rincome, "Not applicable"))) + geom_point() ```