Brain dump of ggplot2 functions from twitter

This commit is contained in:
Hadley Wickham 2022-09-26 08:37:59 -05:00
parent 27683b9040
commit 2f637609c4
1 changed files with 154 additions and 5 deletions

View File

@ -23,10 +23,11 @@ Writing a function has three big advantages over using copy-and-paste:
Writing good functions is a lifetime journey.
Even after using R for many years we still learn new techniques and better ways of approaching old problems.
The goal of this chapter is to get you started on your journey with functions with two useful types of functions:
The goal of this chapter is to get you started on your journey with functions with three useful types of functions:
- Vector functions take one or more vectors as input and return a vector as output.
- Data frame functions take a data frame as input and return a data frame as output.
- Plot functions that take a data frame as input and return a plot as output.
The chapter concludes with some advice on function style.
@ -343,7 +344,7 @@ These functions work in the same way as dplyr verbs: they takes a data frame as
### Indirection and tidy evaluation
When you start writing functions that use dplyr verbs you rapidly hit the problem of inderation.
When you start writing functions that use dplyr verbs you rapidly hit the problem of indirecation.
Let's illustrate the problem with a very simple function: `pull_unique()`.
The goal of this function is to `pull()` the unique (distinct) values of a variable:
@ -413,8 +414,6 @@ There are are some cases that are harder to guess because you usually use them w
- The `names_from` arguments to `pivot_wider()` is a selecting function because you can take the names from multiple variables with `names_from = c(x, y, z)`.
- It's not a data frame function, but ggplot2's `aes()` uses data-masking because `aes(x * 2, y / 10)` etc.
In the next two sections we'll explore the sorts of handy functions you might write for data-masking and tidy-select arguments
### Data-masking arguments
@ -562,6 +561,147 @@ mtcars |> count_wide(vs, cyl)
mtcars |> count_wide(c(vs, am), cyl)
```
### Learning more
Once you have the basics under your belt, you can learn more about the full range of tidy evaluation possibilities by reading `vignette("programming", package = "dplyr")`.
## Plot functions
You can also use the techniques described above with ggplot2, because `aes()` is a data-masking function.
For example, imagine that you're making a lot of histograms:
```{r}
#| fig-show: hide
diamonds |>
ggplot(aes(carat)) +
geom_histogram(binwidth = 0.1)
diamonds |>
ggplot(aes(carat)) +
geom_histogram(binwidth = 0.05)
```
Wouldn't it be nice if you could wrap this up into a histogram function?
This is easy as once you know that `aes()` is a data-masking function so that you need to embrace:
```{r}
histogram <- function(df, var, binwidth = NULL) {
df |>
ggplot(aes({{ var }})) +
geom_histogram(binwidth = binwidth)
}
diamonds |> histogram(carat, 0.1)
```
Note that `histogram()` returns a ggplot2 plot, so that you can still add on additional components if you want.
Just remember to switch from `|>` to `+`:
```{r}
diamonds |>
histogram(carat, 0.1) +
labs(x = "Size (in carats)", y = "Number of diamonds")
```
### Other examples
```{r}
# https://twitter.com/tyler_js_smith/status/1574377116988104704
lin_check <- function(df, x, y) {
df |>
ggplot(aes({{ x }}, {{ y }})) +
geom_point() +
geom_smooth(method = "loess", color = "red", se = FALSE) +
geom_smooth(method = "lm", color = "black", se = FALSE)
}
```
```{r}
# https://twitter.com/sharoz/status/1574376332821204999
# Facetting is fiddly - have to use special vars syntax.
foo <- function(x) {
ggplot(mtcars) +
aes(x = mpg, y = disp) +
geom_point() +
facet_wrap(vars({{ x }}))
}
```
```{r}
sorted_bars <- function(df, var) {
df |>
mutate({{ var }} := fct_rev(fct_infreq({{ var }}))) |>
ggplot(aes(y = {{ var }})) +
geom_bar()
}
diamonds |> sorted_bars(cut)
```
Of course you might combine both dplyr and ggplot2:
```{r}
bars <- function(df, condition, var) {
df |>
filter({{ condition }}) |>
ggplot(aes({{ var }})) +
geom_bar() +
scale_x_discrete(guide = guide_axis(angle = 45))
}
diamonds |> bars(cut == "Good", clarity)
```
I've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
```{r}
density <- function(fill, ...) {
palmerpenguins::penguins |>
ggplot(aes(bill_length_mm, fill = {{ fill }})) +
geom_density(alpha = 0.5) +
facet_wrap(vars(...))
}
density()
density(species)
density(island, sex)
```
### Labelling
It'd be nice to label this plot automatically.
To do so, we're going to have to go under the covers of tidy evaluation and use a function from a package we have talked about before: rlang.
rlang is the package that implements tidy evaluation, and is used by all the other packages in the tidyverse.
rlang provides a helpful function called `englue()` to solve just this problem.
It uses a syntax inspired by glue but combined with embracing:
```{r}
histogram <- function(df, var, binwidth = NULL) {
label <- rlang::englue("A histogram of {{var}} with binwidth {binwidth}")
df |>
ggplot(aes({{ var }})) +
geom_histogram(binwidth = binwidth) +
labs(title = label)
}
diamonds |> histogram(carat, 0.1)
```
(Note that if you omit the `binwidth` the function fails with a weird error. That appears to be a bug in `englue()`: https://github.com/r-lib/rlang/issues/1492.
Hopefully it'll be fixed soon!)
You can use the same approach any other place that you might supply a string in a ggplot2 plot.
### Advice
It's hard to create general purpose plotting functions because you need to consider many different situations, and we haven't given you the programming skills to handle them all.
Fortunately, in most cases it's relatively simple to extract repeated plotting code into a function.
So, for now, strive to keep your functions simple, focussing on concrete repetition, not solve imaginary future problems.
You can also learn other techniques in <https://ggplot2-book.org/programming.html>.
## Style
It's important to remember that functions are not just for the computer, but are also for humans.
@ -640,4 +780,13 @@ Learn more at <https://style.tidyverse.org/functions.html>
## Summary
Once you have the basics under your belt, you can learn more about the full range of tidy evaluation possibilities by reading `vignette("programming", package = "dplyr")`.
In this chapter you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot.
Writing functions to create data frames and plots using the tidyverse required you to learn a little about tidy evaluation.
Tidy evaluation is really important, because its what allows you to write `diamonds |> filter(x == y)` and `filter()` knows to use `x` and `y` from the diamonds dataset.
The downside of tidy evaluation is that you need to learn a new technique for programming: embracing.
Embracing, e.g. `{{ x }}`, tells the tidy-evaluation using function to look inside the argument `x`, rather than using the literal variable `x`.
You can figure out when you need to use embracing by looking in the documentation for the terms for the two major styles of tidyselect: "data masking" and "tidy select".
In the next chapter, we'll dive into some of the details of R's vector data structures that we've omitted so far.
These are immediately useful by themselves, but are a necessary foundation for the following chapter on iteration that provides some amazingly powerful tools.