Update functions.qmd (#1178)
This commit is contained in:
parent
0743cbd3aa
commit
e5e66de3cf
116
functions.qmd
116
functions.qmd
|
@ -75,7 +75,7 @@ Preventing this type of mistake of is one very good reason to learn how to write
|
||||||
### Writing a function
|
### Writing a function
|
||||||
|
|
||||||
To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary.
|
To write a function you need to first analyse your repeated code to figure what parts are constant and what parts vary.
|
||||||
If we take the code above and pull it outside of `mutate()` it's a little easier to see the pattern because each repetition is now one line:
|
If we take the code above and pull it outside of `mutate()`, it's a little easier to see the pattern because each repetition is now one line:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
@ -99,11 +99,11 @@ To turn this into a function you need three things:
|
||||||
Here we'll use `rescale01` because this function rescales a vector to lie between 0 and 1.
|
Here we'll use `rescale01` because this function rescales a vector to lie between 0 and 1.
|
||||||
|
|
||||||
2. The **arguments**.
|
2. The **arguments**.
|
||||||
The arguments are things that vary across calls and our analysis above tells us that have just one.
|
The arguments are things that vary across calls and our analysis above tells us that we have just one.
|
||||||
We'll call it `x` because this is the conventional name for a numeric vector.
|
We'll call it `x` because this is the conventional name for a numeric vector.
|
||||||
|
|
||||||
3. The **body**.
|
3. The **body**.
|
||||||
The body is the code that repeated across all the calls.
|
The body is the code that's repeated across all the calls.
|
||||||
|
|
||||||
Then you create a function by following the template:
|
Then you create a function by following the template:
|
||||||
|
|
||||||
|
@ -143,7 +143,7 @@ df |> mutate(
|
||||||
|
|
||||||
### Improving our function
|
### Improving our function
|
||||||
|
|
||||||
You might notice `rescale01()` function does some unnecessary work --- instead of computing `min()` twice and `max()` once we could instead compute both the minimum and maximum in one step with `range()`:
|
You might notice that the `rescale01()` function does some unnecessary work --- instead of computing `min()` twice and `max()` once we could instead compute both the minimum and maximum in one step with `range()`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
rescale01 <- function(x) {
|
rescale01 <- function(x) {
|
||||||
|
@ -166,6 +166,7 @@ rescale01 <- function(x) {
|
||||||
rng <- range(x, na.rm = TRUE, finite = TRUE)
|
rng <- range(x, na.rm = TRUE, finite = TRUE)
|
||||||
(x - rng[1]) / (rng[2] - rng[1])
|
(x - rng[1]) / (rng[2] - rng[1])
|
||||||
}
|
}
|
||||||
|
|
||||||
rescale01(x)
|
rescale01(x)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -173,11 +174,11 @@ These changes illustrate an important benefit of functions: because we've moved
|
||||||
|
|
||||||
### Mutate functions
|
### Mutate functions
|
||||||
|
|
||||||
Now you've got the basic idea of functions, lets take a look a whole bunch of examples.
|
Now you've got the basic idea of functions, let's take a look at a whole bunch of examples.
|
||||||
We'll start by looking at "mutate" functions, functions that work well like `mutate()` and `filter()` because they return an output the same length as the input.
|
We'll start by looking at "mutate" functions, i.e. functions that work well inside of `mutate()` and `filter()` because they return an output of the same length as the input.
|
||||||
|
|
||||||
Lets start with a simple variation of `rescale01()`.
|
Let's start with a simple variation of `rescale01()`.
|
||||||
Maybe you want compute the Z-score, rescaling a vector to have to a mean of zero and a standard deviation of one:
|
Maybe you want to compute the Z-score, rescaling a vector to have a mean of zero and a standard deviation of one:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
z_score <- function(x) {
|
z_score <- function(x) {
|
||||||
|
@ -185,7 +186,7 @@ z_score <- function(x) {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
Or maybe you want to wrap up a straightforward `case_when()` in order to give it a useful name.
|
Or maybe you want to wrap up a straightforward `case_when()` and give it a useful name.
|
||||||
For example, this `clamp()` function ensures all values of a vector lie in between a minimum or a maximum:
|
For example, this `clamp()` function ensures all values of a vector lie in between a minimum or a maximum:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -196,6 +197,7 @@ clamp <- function(x, min, max) {
|
||||||
.default = x
|
.default = x
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
clamp(1:10, min = 3, max = 7)
|
clamp(1:10, min = 3, max = 7)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -209,11 +211,12 @@ na_outside <- function(x, min, max) {
|
||||||
.default = x
|
.default = x
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
na_outside(1:10, min = 3, max = 7)
|
na_outside(1:10, min = 3, max = 7)
|
||||||
```
|
```
|
||||||
|
|
||||||
Of course functions don't just need to work with numeric variables.
|
Of course functions don't just need to work with numeric variables.
|
||||||
You might want to extract out some repeated string manipulation.
|
You might want to do some repeated string manipulation.
|
||||||
Maybe you need to make the first character upper case:
|
Maybe you need to make the first character upper case:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -221,6 +224,7 @@ first_upper <- function(x) {
|
||||||
str_sub(x, 1, 1) <- str_to_upper(str_sub(x, 1, 1))
|
str_sub(x, 1, 1) <- str_to_upper(str_sub(x, 1, 1))
|
||||||
x
|
x
|
||||||
}
|
}
|
||||||
|
|
||||||
first_upper("hello")
|
first_upper("hello")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -237,11 +241,12 @@ clean_number <- function(x) {
|
||||||
as.numeric(x)
|
as.numeric(x)
|
||||||
if_else(is_pct, num / 100, num)
|
if_else(is_pct, num / 100, num)
|
||||||
}
|
}
|
||||||
|
|
||||||
clean_number("$12,300")
|
clean_number("$12,300")
|
||||||
clean_number("45%")
|
clean_number("45%")
|
||||||
```
|
```
|
||||||
|
|
||||||
Sometimes your functions will be highly specialized for one data analysis.
|
Sometimes your functions will be highly specialized for one data analysis step.
|
||||||
For example, if you have a bunch of variables that record missing values as 997, 998, or 999, you might want to write a function to replace them with `NA`:
|
For example, if you have a bunch of variables that record missing values as 997, 998, or 999, you might want to write a function to replace them with `NA`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -282,15 +287,17 @@ Sometimes this can just be a matter of setting a default argument or two:
|
||||||
commas <- function(x) {
|
commas <- function(x) {
|
||||||
str_flatten(x, collapse = ", ", last = " and ")
|
str_flatten(x, collapse = ", ", last = " and ")
|
||||||
}
|
}
|
||||||
|
|
||||||
commas(c("cat", "dog", "pigeon"))
|
commas(c("cat", "dog", "pigeon"))
|
||||||
```
|
```
|
||||||
|
|
||||||
Or you might wrap up a simple computation, like for the coefficient of variation, which divides standard deviation by the mean:
|
Or you might wrap up a simple computation, like for the coefficient of variation, which divides the standard deviation by the mean:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
cv <- function(x, na.rm = FALSE) {
|
cv <- function(x, na.rm = FALSE) {
|
||||||
sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)
|
sd(x, na.rm = na.rm) / mean(x, na.rm = na.rm)
|
||||||
}
|
}
|
||||||
|
|
||||||
cv(runif(100, min = 0, max = 50))
|
cv(runif(100, min = 0, max = 50))
|
||||||
cv(runif(100, min = 0, max = 500))
|
cv(runif(100, min = 0, max = 500))
|
||||||
```
|
```
|
||||||
|
@ -402,7 +409,7 @@ If we try and use it, we get an error:
|
||||||
diamonds |> grouped_mean(cut, carat)
|
diamonds |> grouped_mean(cut, carat)
|
||||||
```
|
```
|
||||||
|
|
||||||
To make the problem a bit more clear we can use a made up data frame:
|
To make the problem a bit more clear, we can use a made up data frame:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
df <- tibble(
|
df <- tibble(
|
||||||
|
@ -412,6 +419,7 @@ df <- tibble(
|
||||||
x = 10,
|
x = 10,
|
||||||
y = 100
|
y = 100
|
||||||
)
|
)
|
||||||
|
|
||||||
df |> grouped_mean(group, x)
|
df |> grouped_mean(group, x)
|
||||||
df |> grouped_mean(group, y)
|
df |> grouped_mean(group, y)
|
||||||
```
|
```
|
||||||
|
@ -428,7 +436,7 @@ Embracing a variable means to wrap it in braces so (e.g.) `var` becomes `{{ var
|
||||||
Embracing a variable tells dplyr to use the value stored inside the argument, not the argument as the literal variable name.
|
Embracing a variable tells dplyr to use the value stored inside the argument, not the argument as the literal variable name.
|
||||||
One way to remember what's happening is to think of `{{ }}` as looking down a tunnel --- `{{ var }}` will make a dplyr function look inside of `var` rather than looking for a variable called `var`.
|
One way to remember what's happening is to think of `{{ }}` as looking down a tunnel --- `{{ var }}` will make a dplyr function look inside of `var` rather than looking for a variable called `var`.
|
||||||
|
|
||||||
So to make grouped_mean`()` work we need to replace surround `group_var` and `mean_var()` with `{{ }}`:
|
So to make grouped_mean`()` work, we need to surround `group_var` and `mean_var()` with `{{ }}`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
grouped_mean <- function(df, group_var, mean_var) {
|
grouped_mean <- function(df, group_var, mean_var) {
|
||||||
|
@ -445,16 +453,16 @@ Success!
|
||||||
### When to embrace? {#sec-embracing}
|
### When to embrace? {#sec-embracing}
|
||||||
|
|
||||||
So the key challenge in writing data frame functions is figuring out which arguments need to be embraced.
|
So the key challenge in writing data frame functions is figuring out which arguments need to be embraced.
|
||||||
Fortunately this is easy because you can look it up from the documentation 😄.
|
Fortunately, this is easy because you can look it up from the documentation 😄.
|
||||||
There are two terms to look for in the docs which corresponding to the two most common sub-types of tidy evaluation:
|
There are two terms to look for in the docs which correspond to the two most common sub-types of tidy evaluation:
|
||||||
|
|
||||||
- **Data-masking**: this is used in functions like `arrange()`, `filter()`, and `summarize()` that compute with variables.
|
- **Data-masking**: this is used in functions like `arrange()`, `filter()`, and `summarize()` that compute with variables.
|
||||||
|
|
||||||
- **Tidy-selection**: this is used for for functions like `select()`, `relocate()`, and `rename()` that select variables.
|
- **Tidy-selection**: this is used for functions like `select()`, `relocate()`, and `rename()` that select variables.
|
||||||
|
|
||||||
Your intuition about which arguments use tidy evaluation should be good for many common functions --- just think about whether you can compute (e.g. `x + 1`) or select (e.g. `a:x`).
|
Your intuition about which arguments use tidy evaluation should be good for many common functions --- just think about whether you can compute (e.g. `x + 1`) or select (e.g. `a:x`).
|
||||||
|
|
||||||
In the following sections we'll explore the sorts of handy functions you might write once you understand embracing.
|
In the following sections, we'll explore the sorts of handy functions you might write once you understand embracing.
|
||||||
|
|
||||||
### Common use cases
|
### Common use cases
|
||||||
|
|
||||||
|
@ -472,12 +480,13 @@ summary6 <- function(data, var) {
|
||||||
.groups = "drop"
|
.groups = "drop"
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
diamonds |> summary6(carat)
|
diamonds |> summary6(carat)
|
||||||
```
|
```
|
||||||
|
|
||||||
(Whenever you wrap `summarize()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
(Whenever you wrap `summarize()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
||||||
|
|
||||||
The nice thing about this function is because it wraps `summarize()` you can used it on grouped data:
|
The nice thing about this function is, because it wraps `summarize()`, you can use it on grouped data:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
diamonds |>
|
diamonds |>
|
||||||
|
@ -485,7 +494,7 @@ diamonds |>
|
||||||
summary6(carat)
|
summary6(carat)
|
||||||
```
|
```
|
||||||
|
|
||||||
Because the arguments to summarize are data-masking that also means that the `var` argument to `summary6()` is data-masking.
|
Furthermore, since the arguments to summarize are data-masking also means that the `var` argument to `summary6()` is data-masking.
|
||||||
That means you can also summarize computed variables:
|
That means you can also summarize computed variables:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -494,7 +503,7 @@ diamonds |>
|
||||||
summary6(log10(carat))
|
summary6(log10(carat))
|
||||||
```
|
```
|
||||||
|
|
||||||
To summarize multiple variables you'll need to wait until @sec-across, where you'll learn how to use `across()`.
|
To summarize multiple variables, you'll need to wait until @sec-across, where you'll learn how to use `across()`.
|
||||||
|
|
||||||
Another popular `summarize()` helper function is a version of `count()` that also computes proportions:
|
Another popular `summarize()` helper function is a version of `count()` that also computes proportions:
|
||||||
|
|
||||||
|
@ -505,6 +514,7 @@ count_prop <- function(df, var, sort = FALSE) {
|
||||||
count({{ var }}, sort = sort) |>
|
count({{ var }}, sort = sort) |>
|
||||||
mutate(prop = n / sum(n))
|
mutate(prop = n / sum(n))
|
||||||
}
|
}
|
||||||
|
|
||||||
diamonds |> count_prop(clarity)
|
diamonds |> count_prop(clarity)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -527,9 +537,9 @@ flights |> unique_where(month == 12, dest)
|
||||||
flights |> unique_where(tailnum == "N14228", month)
|
flights |> unique_where(tailnum == "N14228", month)
|
||||||
```
|
```
|
||||||
|
|
||||||
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()` and `arrange()`.
|
Here we embrace `condition` because it's passed to `filter()` and `var` because it's passed to `distinct()` and `arrange()`.
|
||||||
|
|
||||||
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data, it can make sense to hardcode it.
|
We've made all these examples to take a data frame as the first argument, but if you're working repeatedly with the same data, it can make sense to hardcode it.
|
||||||
For example, the following function always works with the flights dataset and always selects `time_hour`, `carrier`, and `flight` since they form the compound primary key that allows you to identify a row.
|
For example, the following function always works with the flights dataset and always selects `time_hour`, `carrier`, and `flight` since they form the compound primary key that allows you to identify a row.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -555,12 +565,13 @@ count_missing <- function(df, group_vars, x_var) {
|
||||||
group_by({{ group_vars }}) |>
|
group_by({{ group_vars }}) |>
|
||||||
summarize(n_miss = sum(is.na({{ x_var }})))
|
summarize(n_miss = sum(is.na({{ x_var }})))
|
||||||
}
|
}
|
||||||
|
|
||||||
flights |>
|
flights |>
|
||||||
count_missing(c(year, month, day), dep_time)
|
count_missing(c(year, month, day), dep_time)
|
||||||
```
|
```
|
||||||
|
|
||||||
This doesn't work because `group_by()` uses data-masking, not tidy-selection.
|
This doesn't work because `group_by()` uses data-masking, not tidy-selection.
|
||||||
We can work around that problem by using the handy `pick()` which allows you to use use tidy-selection inside data-masking functions:
|
We can work around that problem by using the handy `pick()` function, which allows you to use tidy-selection inside data-masking functions:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
count_missing <- function(df, group_vars, x_var) {
|
count_missing <- function(df, group_vars, x_var) {
|
||||||
|
@ -568,6 +579,7 @@ count_missing <- function(df, group_vars, x_var) {
|
||||||
group_by(pick({{ group_vars }})) |>
|
group_by(pick({{ group_vars }})) |>
|
||||||
summarize(n_miss = sum(is.na({{ x_var }})))
|
summarize(n_miss = sum(is.na({{ x_var }})))
|
||||||
}
|
}
|
||||||
|
|
||||||
flights |>
|
flights |>
|
||||||
count_missing(c(year, month, day), dep_time)
|
count_missing(c(year, month, day), dep_time)
|
||||||
```
|
```
|
||||||
|
@ -587,6 +599,7 @@ count_wide <- function(data, rows, cols) {
|
||||||
values_fill = 0
|
values_fill = 0
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
diamonds |> count_wide(clarity, cut)
|
diamonds |> count_wide(clarity, cut)
|
||||||
diamonds |> count_wide(c(clarity, color), cut)
|
diamonds |> count_wide(c(clarity, color), cut)
|
||||||
```
|
```
|
||||||
|
@ -595,9 +608,9 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. Using the datasets from nycflights13, write functions that:
|
1. Using the datasets from nycflights13, write a function that:
|
||||||
|
|
||||||
1. Find all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
|
1. Finds all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
|
@ -632,7 +645,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
|
||||||
weather |> standardise_time(sched_dep_time)
|
weather |> standardise_time(sched_dep_time)
|
||||||
```
|
```
|
||||||
|
|
||||||
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
|
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-selection: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
|
||||||
|
|
||||||
3. Generalize the following function so that you can supply any number of variables to count.
|
3. Generalize the following function so that you can supply any number of variables to count.
|
||||||
|
|
||||||
|
@ -647,7 +660,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
|
||||||
## Plot functions
|
## Plot functions
|
||||||
|
|
||||||
Instead of returning a data frame, you might want to return a plot.
|
Instead of returning a data frame, you might want to return a plot.
|
||||||
Fortunately you can use the same techniques with ggplot2, because `aes()` is a data-masking function.
|
Fortunately, you can use the same techniques with ggplot2, because `aes()` is a data-masking function.
|
||||||
For example, imagine that you're making a lot of histograms:
|
For example, imagine that you're making a lot of histograms:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -662,7 +675,7 @@ diamonds |>
|
||||||
```
|
```
|
||||||
|
|
||||||
Wouldn't it be nice if you could wrap this up into a histogram function?
|
Wouldn't it be nice if you could wrap this up into a histogram function?
|
||||||
This is easy as once you know that `aes()` is a data-masking function so that you need to embrace:
|
This is easy as pie once you know that `aes()` is a data-masking function and you need to embrace:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
histogram <- function(df, var, binwidth = NULL) {
|
histogram <- function(df, var, binwidth = NULL) {
|
||||||
|
@ -674,7 +687,7 @@ histogram <- function(df, var, binwidth = NULL) {
|
||||||
diamonds |> histogram(carat, 0.1)
|
diamonds |> histogram(carat, 0.1)
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that `histogram()` returns a ggplot2 plot, so that you can still add on additional components if you want.
|
Note that `histogram()` returns a ggplot2 plot, meaning you can still add on additional components if you want.
|
||||||
Just remember to switch from `|>` to `+`:
|
Just remember to switch from `|>` to `+`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -690,7 +703,6 @@ For example, maybe you want an easy way to eyeball whether or not a data set is
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# https://twitter.com/tyler_js_smith/status/1574377116988104704
|
# https://twitter.com/tyler_js_smith/status/1574377116988104704
|
||||||
|
|
||||||
linearity_check <- function(df, x, y) {
|
linearity_check <- function(df, x, y) {
|
||||||
df |>
|
df |>
|
||||||
ggplot(aes({{ x }}, {{ y }})) +
|
ggplot(aes({{ x }}, {{ y }})) +
|
||||||
|
@ -717,6 +729,7 @@ hex_plot <- function(df, x, y, z, bins = 20, fun = "mean") {
|
||||||
fun = fun,
|
fun = fun,
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
diamonds |> hex_plot(carat, price, depth)
|
diamonds |> hex_plot(carat, price, depth)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -724,7 +737,7 @@ diamonds |> hex_plot(carat, price, depth)
|
||||||
|
|
||||||
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
||||||
For example, if you might want to do a vertical bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
For example, if you might want to do a vertical bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
||||||
Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top:
|
Since the bar chart is vertical, we also need to reverse the usual order to get the highest values at the top (also note the `:=` operator, which allows you to inject names with glue syntax on the left-hand side of `:=`; type: ?\`:=\` for more details):
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
sorted_bars <- function(df, var) {
|
sorted_bars <- function(df, var) {
|
||||||
|
@ -733,10 +746,11 @@ sorted_bars <- function(df, var) {
|
||||||
ggplot(aes(y = {{ var }})) +
|
ggplot(aes(y = {{ var }})) +
|
||||||
geom_bar()
|
geom_bar()
|
||||||
}
|
}
|
||||||
|
|
||||||
diamonds |> sorted_bars(cut)
|
diamonds |> sorted_bars(cut)
|
||||||
```
|
```
|
||||||
|
|
||||||
Or you could maybe you want to make it easy to draw a bar plot just for a subset of the data:
|
Or maybe you want to make it easy to draw a bar plot just for a subset of the data:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
conditional_bars <- function(df, condition, var) {
|
conditional_bars <- function(df, condition, var) {
|
||||||
|
@ -749,20 +763,19 @@ conditional_bars <- function(df, condition, var) {
|
||||||
diamonds |> conditional_bars(cut == "Good", clarity)
|
diamonds |> conditional_bars(cut == "Good", clarity)
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also get creative and display data summaries in other way.
|
You can also get creative and display data summaries in other ways.
|
||||||
For example, this code uses the axis labels to display the highest value.
|
For example, this code uses the axis labels to display the highest value.
|
||||||
As you learn more about ggplot2, the power of your functions will continue to increase.
|
As you learn more about ggplot2, the power of your functions will continue to increase.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# https://gist.github.com/GShotwell/b19ef520b6d56f61a830fabb3454965b
|
# https://gist.github.com/GShotwell/b19ef520b6d56f61a830fabb3454965b
|
||||||
|
|
||||||
fancy_ts <- function(df, val, group) {
|
fancy_ts <- function(df, val, group) {
|
||||||
labs <- df |>
|
labs <- df |>
|
||||||
group_by({{group}}) |>
|
group_by({{ group }}) |>
|
||||||
summarize(breaks = max({{val}}))
|
summarize(breaks = max({{ val }}))
|
||||||
|
|
||||||
df |>
|
df |>
|
||||||
ggplot(aes(date, {{val}}, group = {{group}}, color = {{group}})) +
|
ggplot(aes(date, {{ val }}, group = {{ group }}, color = {{ group }})) +
|
||||||
geom_path() +
|
geom_path() +
|
||||||
scale_y_continuous(
|
scale_y_continuous(
|
||||||
breaks = labs$breaks,
|
breaks = labs$breaks,
|
||||||
|
@ -778,6 +791,7 @@ df <- tibble(
|
||||||
dist4 = sort(rnorm(50, 15, 1)),
|
dist4 = sort(rnorm(50, 15, 1)),
|
||||||
date = seq.Date(as.Date("2022-01-01"), as.Date("2022-04-10"), by = "2 days")
|
date = seq.Date(as.Date("2022-01-01"), as.Date("2022-04-10"), by = "2 days")
|
||||||
)
|
)
|
||||||
|
|
||||||
df <- pivot_longer(df, cols = -date, names_to = "dist_name", values_to = "value")
|
df <- pivot_longer(df, cols = -date, names_to = "dist_name", values_to = "value")
|
||||||
|
|
||||||
fancy_ts(df, value, dist_name)
|
fancy_ts(df, value, dist_name)
|
||||||
|
@ -787,19 +801,19 @@ Next we'll discuss two more complicated cases: faceting and automatic labeling.
|
||||||
|
|
||||||
### Faceting
|
### Faceting
|
||||||
|
|
||||||
Unfortunately programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work.
|
Unfortunately, programming with faceting is a special challenge, because faceting was implemented before we understood what tidy evaluation was and how it should work.
|
||||||
so you have to learn a new syntax.
|
So you have to learn a new syntax.
|
||||||
When programming with facets, instead of writing `~ x`, you need to write `vars(x)` and instead of `~ x + y` you need to write `vars(x, y)`.
|
When programming with facets, instead of writing `~ x`, you need to write `vars(x)` and instead of `~ x + y` you need to write `vars(x, y)`.
|
||||||
The only advantage of this syntax is that `vars()` uses tidy evaluation so you can embrace within it:
|
The only advantage of this syntax is that `vars()` uses tidy evaluation so you can embrace within it:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# https://twitter.com/sharoz/status/1574376332821204999
|
# https://twitter.com/sharoz/status/1574376332821204999
|
||||||
|
|
||||||
foo <- function(x) {
|
foo <- function(x) {
|
||||||
ggplot(mtcars, aes(mpg, disp)) +
|
ggplot(mtcars, aes(mpg, disp)) +
|
||||||
geom_point() +
|
geom_point() +
|
||||||
facet_wrap(vars({{ x }}))
|
facet_wrap(vars({{ x }}))
|
||||||
}
|
}
|
||||||
|
|
||||||
foo(cyl)
|
foo(cyl)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -833,12 +847,12 @@ histogram <- function(df, var, binwidth = NULL) {
|
||||||
```
|
```
|
||||||
|
|
||||||
Wouldn't it be nice if we could label the output with the variable and the bin width that was used?
|
Wouldn't it be nice if we could label the output with the variable and the bin width that was used?
|
||||||
To do so, we're going to have to go under the covers of tidy evaluation and use a function from package we haven't talked about before: rlang.
|
To do so, we're going to have to go under the covers of tidy evaluation and use a function from the package we haven't talked about yet: rlang.
|
||||||
rlang is a low-level package that's used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).
|
rlang is a low-level package that's used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools).
|
||||||
|
|
||||||
To solve the labeling problem we can use `rlang::englue()`.
|
To solve the labeling problem we can use `rlang::englue()`.
|
||||||
This works similarly to `str_glue()`, so any value wrapped in `{ }` will be inserted into the string.
|
This works similarly to `str_glue()`, so any value wrapped in `{ }` will be inserted into the string.
|
||||||
But it also understands `{{ }}`, which automatically insert the appropriate variable name:
|
But it also understands `{{ }}`, which automatically inserts the appropriate variable name:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
histogram <- function(df, var, binwidth) {
|
histogram <- function(df, var, binwidth) {
|
||||||
|
@ -853,16 +867,17 @@ histogram <- function(df, var, binwidth) {
|
||||||
diamonds |> histogram(carat, 0.1)
|
diamonds |> histogram(carat, 0.1)
|
||||||
```
|
```
|
||||||
|
|
||||||
You can use the same approach any other place that you might supply a string in a ggplot2 plot.
|
You can use the same approach in any other place where you want to supply a string in a ggplot2 plot.
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
1. Build up a rich plotting function by incrementally implementing each of the steps below.
|
Build up a rich plotting function by incrementally implementing each of the steps below:
|
||||||
1. Draw a scatterplot given dataset and `x` and `y` variables.
|
|
||||||
|
|
||||||
2. Add a line of best fit (i.e. a linear model with no standard errors).
|
1. Draw a scatterplot given dataset and `x` and `y` variables.
|
||||||
|
|
||||||
3. Add a title.
|
2. Add a line of best fit (i.e. a linear model with no standard errors).
|
||||||
|
|
||||||
|
3. Add a title.
|
||||||
|
|
||||||
## Style
|
## Style
|
||||||
|
|
||||||
|
@ -923,6 +938,7 @@ This makes it very obvious that something unusual is happening.
|
||||||
f1 <- function(string, prefix) {
|
f1 <- function(string, prefix) {
|
||||||
substr(string, 1, nchar(prefix)) == prefix
|
substr(string, 1, nchar(prefix)) == prefix
|
||||||
}
|
}
|
||||||
|
|
||||||
f3 <- function(x, y) {
|
f3 <- function(x, y) {
|
||||||
rep(y, length.out = length(x))
|
rep(y, length.out = length(x))
|
||||||
}
|
}
|
||||||
|
@ -935,8 +951,8 @@ This makes it very obvious that something unusual is happening.
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
In this chapter you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot.
|
In this chapter, you learned how to write functions for three useful scenarios: creating a vector, creating a data frames, or creating a plot.
|
||||||
Along the way your saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.
|
Along the way you saw many examples, which hopefully started to get your creative juices flowing, and gave you some ideas for where functions might help your analysis code.
|
||||||
|
|
||||||
We have only shown you the bare minimum to get started with functions and there's much more to learn.
|
We have only shown you the bare minimum to get started with functions and there's much more to learn.
|
||||||
A few places to learn more are:
|
A few places to learn more are:
|
||||||
|
|
Loading…
Reference in New Issue