More polishing + some exercises

This commit is contained in:
Hadley Wickham 2022-10-20 10:52:34 -05:00
parent 8078a9c0f7
commit f004297d8c
1 changed files with 64 additions and 11 deletions

View File

@ -20,17 +20,15 @@ Writing a function has three big advantages over using copy-and-paste:
3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
A good rule of thumb is to consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).
The goal of this chapter is to get you started on your journey with three useful types of functions:
In this chapter, you'll learn about three useful types of functions:
- Vector functions take one or more vectors as input and return a vector as output.
- Data frame functions take a data frame as input and return a data frame as output.
- Plot functions that take a data frame as input and return a plot as output.
The chapter concludes with some advice on function style.
This chapter includes many examples to help you generalize the patterns that you see.
Many of the examples were inspired by real data analysis code supplied by folks on twitter; follow the links in the comment to see original inspiration.
And if you want to see even more examples, check out the motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680).
Each of these sections include many examples to help you generalize the patterns that you see.
These examples wouldn't be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations.
You might also want to read the original motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680) to see even more functions.
### Prerequisites
@ -549,7 +547,7 @@ flights_sub(dest == "IAH", contains("time"))
### Data-masking vs tidy-selection
Sometimes you want to select variables inside a function that uses data-masking.
For example, imagine you want to write `count_missing()` that counts the number of missing observations in rows.
For example, imagine you want to write a `count_missing()` that counts the number of missing observations in rows.
You might try writing something like:
```{r}
@ -577,7 +575,7 @@ flights |>
```
Another convenient use of `pick()` is to make a 2d table of counts.
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange into a grid:
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange the counts into a grid:
```{r}
# https://twitter.com/pollicipes/status/1571606508944719876
@ -595,10 +593,58 @@ diamonds |> count_wide(clarity, cut)
diamonds |> count_wide(c(clarity, color), cut)
```
While our examples have mostly focused on dplyr, the tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
While our examples have mostly focused on dplyr, tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
### Exercises
1. Using the datasets from nyclights13, write functions that:
1. Find all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
```{r}
#| eval: false
flights |> filter_severe()
```
2. Counts the number of cancelled flights and the number of flights delayed by more than an hour.
```{r}
#| eval: false
flights |> group_by(dest) |> summarise_severe()
```
3. Finds all flights that were cancelled or delayed by more than a user supplied number of hours:
```{r}
#| eval: false
flights |> filter_severe(hours = 2)
```
4. Summarizes the weather to compute the minum, mean, and maximum, of a user supplied variable:
```{r}
#| eval: false
weather |> summarise_weather(temp)
```
5. Converts the user supplied variable that uses clock time (e.g. `dep_time`, `arr_time`, etc) into a decimal time (i.e. hours + minutes / 60).
```{r}
weather |> standardise_time(sched_dep_time)
```
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
3. Generalize the following function so that you can supply any number of variables to count.
```{r}
count_prop <- function(df, var, sort = FALSE) {
df |>
count({{ var }}, sort = sort) |>
mutate(prop = n / sum(n))
}
```
## Plot functions
Instead of returning a data frame, you might want to return a plot.
@ -812,6 +858,13 @@ You can use the same approach any other place that you might supply a string in
### Exercises
1. Build up a rich plotting function by incrementally implementing each of the steps below.
1. Draw a scatterplot given dataset and `x` and `y` variables.
2. Add a line of best fit (i.e. a linear model with no standard errors).
3. Add a title.
## Style
R doesn't care what your function or arguments are called but the names make a big difference for humans.
@ -890,9 +943,9 @@ Along the way your saw many examples, which hopefully started to get your creati
We have only shown you the bare minimum to get started with functions and there's much more to learn.
A few places to learn more are:
- To learn more about programming with tidy evaluation, see useful recipes in `vignette("programming", package = "dplyr")` and `vignette("programming", package = "tidyr")` and learn more about the theory in <https://rlang.r-lib.org/reference/topic-data-mask.html>.
- To learn more about programming with tidy evaluation, see useful recipes in [programming with dplyr](https://dplyr.tidyverse.org/articles/programming.html) and [programming with tidyr](https://tidyr.tidyverse.org/articles/programming.html) and learn more about the theory in [What is data-masking and why do I need {{?](https://rlang.r-lib.org/reference/topic-data-mask.html).
- To learn more about reducing duplication in your ggplot2 code, read the [Programming with ggplot2](https://ggplot2-book.org/programming.html){.uri} chapter of the ggplot2 book.
- To learn more about good function style, read <https://style.tidyverse.org/functions.html>.
- For more advice on function style, see the [tidyverse style guide](https://style.tidyverse.org/functions.html){.uri}.
In the next chapter, we'll dive into some of the details of R's vector data structures that we've omitted so far.
These are not immediately useful by themselves, but are a necessary foundation for the following chapter on iteration which gives you further tools for reducing code duplication.