More polishing + some exercises
This commit is contained in:
parent
8078a9c0f7
commit
f004297d8c
|
@ -20,17 +20,15 @@ Writing a function has three big advantages over using copy-and-paste:
|
||||||
3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
|
3. You eliminate the chance of making incidental mistakes when you copy and paste (i.e. updating a variable name in one place, but not in another).
|
||||||
|
|
||||||
A good rule of thumb is to consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).
|
A good rule of thumb is to consider writing a function whenever you've copied and pasted a block of code more than twice (i.e. you now have three copies of the same code).
|
||||||
The goal of this chapter is to get you started on your journey with three useful types of functions:
|
In this chapter, you'll learn about three useful types of functions:
|
||||||
|
|
||||||
- Vector functions take one or more vectors as input and return a vector as output.
|
- Vector functions take one or more vectors as input and return a vector as output.
|
||||||
- Data frame functions take a data frame as input and return a data frame as output.
|
- Data frame functions take a data frame as input and return a data frame as output.
|
||||||
- Plot functions that take a data frame as input and return a plot as output.
|
- Plot functions that take a data frame as input and return a plot as output.
|
||||||
|
|
||||||
The chapter concludes with some advice on function style.
|
Each of these sections include many examples to help you generalize the patterns that you see.
|
||||||
|
These examples wouldn't be possible without the help of folks of twitter, and we encourage follow the links in the comment to see original inspirations.
|
||||||
This chapter includes many examples to help you generalize the patterns that you see.
|
You might also want to read the original motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680) to see even more functions.
|
||||||
Many of the examples were inspired by real data analysis code supplied by folks on twitter; follow the links in the comment to see original inspiration.
|
|
||||||
And if you want to see even more examples, check out the motivating tweets for [general functions](https://twitter.com/hadleywickham/status/1571603361350164486) and [plotting functions](https://twitter.com/hadleywickham/status/1574373127349575680).
|
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
|
@ -549,7 +547,7 @@ flights_sub(dest == "IAH", contains("time"))
|
||||||
### Data-masking vs tidy-selection
|
### Data-masking vs tidy-selection
|
||||||
|
|
||||||
Sometimes you want to select variables inside a function that uses data-masking.
|
Sometimes you want to select variables inside a function that uses data-masking.
|
||||||
For example, imagine you want to write `count_missing()` that counts the number of missing observations in rows.
|
For example, imagine you want to write a `count_missing()` that counts the number of missing observations in rows.
|
||||||
You might try writing something like:
|
You might try writing something like:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -577,7 +575,7 @@ flights |>
|
||||||
```
|
```
|
||||||
|
|
||||||
Another convenient use of `pick()` is to make a 2d table of counts.
|
Another convenient use of `pick()` is to make a 2d table of counts.
|
||||||
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange into a grid:
|
Here we count using all the variables in the `rows` and `columns`, then use `pivot_wider()` to rearrange the counts into a grid:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# https://twitter.com/pollicipes/status/1571606508944719876
|
# https://twitter.com/pollicipes/status/1571606508944719876
|
||||||
|
@ -595,10 +593,58 @@ diamonds |> count_wide(clarity, cut)
|
||||||
diamonds |> count_wide(c(clarity, color), cut)
|
diamonds |> count_wide(c(clarity, color), cut)
|
||||||
```
|
```
|
||||||
|
|
||||||
While our examples have mostly focused on dplyr, the tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
|
While our examples have mostly focused on dplyr, tidy evaluation also underpins tidyr, and if you look at the `pivot_wider()` docs you can see that `names_from` uses tidy-selection.
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
|
1. Using the datasets from nyclights13, write functions that:
|
||||||
|
|
||||||
|
1. Find all flights that were cancelled (i.e. `is.na(arr_time)`) or delayed by more than an hour.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
flights |> filter_severe()
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Counts the number of cancelled flights and the number of flights delayed by more than an hour.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
flights |> group_by(dest) |> summarise_severe()
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Finds all flights that were cancelled or delayed by more than a user supplied number of hours:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
flights |> filter_severe(hours = 2)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Summarizes the weather to compute the minum, mean, and maximum, of a user supplied variable:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
weather |> summarise_weather(temp)
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Converts the user supplied variable that uses clock time (e.g. `dep_time`, `arr_time`, etc) into a decimal time (i.e. hours + minutes / 60).
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
weather |> standardise_time(sched_dep_time)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-select: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.
|
||||||
|
|
||||||
|
3. Generalize the following function so that you can supply any number of variables to count.
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
count_prop <- function(df, var, sort = FALSE) {
|
||||||
|
df |>
|
||||||
|
count({{ var }}, sort = sort) |>
|
||||||
|
mutate(prop = n / sum(n))
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Plot functions
|
## Plot functions
|
||||||
|
|
||||||
Instead of returning a data frame, you might want to return a plot.
|
Instead of returning a data frame, you might want to return a plot.
|
||||||
|
@ -812,6 +858,13 @@ You can use the same approach any other place that you might supply a string in
|
||||||
|
|
||||||
### Exercises
|
### Exercises
|
||||||
|
|
||||||
|
1. Build up a rich plotting function by incrementally implementing each of the steps below.
|
||||||
|
1. Draw a scatterplot given dataset and `x` and `y` variables.
|
||||||
|
|
||||||
|
2. Add a line of best fit (i.e. a linear model with no standard errors).
|
||||||
|
|
||||||
|
3. Add a title.
|
||||||
|
|
||||||
## Style
|
## Style
|
||||||
|
|
||||||
R doesn't care what your function or arguments are called but the names make a big difference for humans.
|
R doesn't care what your function or arguments are called but the names make a big difference for humans.
|
||||||
|
@ -890,9 +943,9 @@ Along the way your saw many examples, which hopefully started to get your creati
|
||||||
We have only shown you the bare minimum to get started with functions and there's much more to learn.
|
We have only shown you the bare minimum to get started with functions and there's much more to learn.
|
||||||
A few places to learn more are:
|
A few places to learn more are:
|
||||||
|
|
||||||
- To learn more about programming with tidy evaluation, see useful recipes in `vignette("programming", package = "dplyr")` and `vignette("programming", package = "tidyr")` and learn more about the theory in <https://rlang.r-lib.org/reference/topic-data-mask.html>.
|
- To learn more about programming with tidy evaluation, see useful recipes in [programming with dplyr](https://dplyr.tidyverse.org/articles/programming.html) and [programming with tidyr](https://tidyr.tidyverse.org/articles/programming.html) and learn more about the theory in [What is data-masking and why do I need {{?](https://rlang.r-lib.org/reference/topic-data-mask.html).
|
||||||
- To learn more about reducing duplication in your ggplot2 code, read the [Programming with ggplot2](https://ggplot2-book.org/programming.html){.uri} chapter of the ggplot2 book.
|
- To learn more about reducing duplication in your ggplot2 code, read the [Programming with ggplot2](https://ggplot2-book.org/programming.html){.uri} chapter of the ggplot2 book.
|
||||||
- To learn more about good function style, read <https://style.tidyverse.org/functions.html>.
|
- For more advice on function style, see the [tidyverse style guide](https://style.tidyverse.org/functions.html){.uri}.
|
||||||
|
|
||||||
In the next chapter, we'll dive into some of the details of R's vector data structures that we've omitted so far.
|
In the next chapter, we'll dive into some of the details of R's vector data structures that we've omitted so far.
|
||||||
These are not immediately useful by themselves, but are a necessary foundation for the following chapter on iteration which gives you further tools for reducing code duplication.
|
These are not immediately useful by themselves, but are a necessary foundation for the following chapter on iteration which gives you further tools for reducing code duplication.
|
||||||
|
|
Loading…
Reference in New Issue