Update fct_lump() usage

Fixes #855
This commit is contained in:
Hadley Wickham 2021-04-18 09:21:59 -05:00
parent 4c171731e9
commit 42bbae55d3
1 changed files with 10 additions and 10 deletions

View File

@ -336,22 +336,21 @@ gss_cat %>%
``` ```
Sometimes you just want to lump together all the small groups to make a plot or table simpler. Sometimes you just want to lump together all the small groups to make a plot or table simpler.
That's the job of `fct_lump()`: That's the job of the `fct_lump_*()` family of functions.
`fct_lump_lowfreq()` is a simple starting point that progressively lumps the smallest groups categories into "Other", always keeping "Other" as the smallest category.
```{r} ```{r}
gss_cat %>% gss_cat %>%
mutate(relig = fct_lump(relig)) %>% mutate(relig = fct_lump_lowfreq(relig)) %>%
count(relig) count(relig)
``` ```
The default behaviour is to progressively lump together the smallest groups, ensuring that the aggregate is still the smallest group. In this case it's not very helpful: it is true that the majority of Americans in this survey are Protestant, but we'd probably like to see some more details!
In this case it's not very helpful: it is true that the majority of Americans in this survey are Protestant, but we've probably over collapsed. Instead, we can use the `fct_lump_n()` to specify that we want exactly 10 groups:
Instead, we can use the `n` parameter to specify how many groups (excluding other) we want to keep:
```{r} ```{r}
gss_cat %>% gss_cat %>%
mutate(relig = fct_lump(relig, n = 10)) %>% mutate(relig = fct_lump_n(relig, n = 10)) %>%
count(relig, sort = TRUE) %>% count(relig, sort = TRUE) %>%
print(n = Inf) print(n = Inf)
``` ```
@ -360,7 +359,8 @@ gss_cat %>%
1. How have the proportions of people identifying as Democrat, Republican, and Independent changed over time? 1. How have the proportions of people identifying as Democrat, Republican, and Independent changed over time?
1. How could you collapse `rincome` into a small set of categories? 2. How could you collapse `rincome` into a small set of categories?
1. Notice there are 9 groups (excluding other) in the `fct_lump` example above. Why not 10? (Hint: type `?fct_lump`, and find the default for the argument `other_level` is "Other".)
3. Notice there are 9 groups (excluding other) in the `fct_lump` example above.
Why not 10?
(Hint: type `?fct_lump`, and find the default for the argument `other_level` is "Other".)