I -> we
This commit is contained in:
parent
faeeb564a4
commit
be5905a09c
|
@ -32,9 +32,9 @@ The goal of this chapter is to get you started on your journey with functions wi
|
||||||
The chapter concludes with some advice on function style.
|
The chapter concludes with some advice on function style.
|
||||||
|
|
||||||
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
|
Many of the examples in this chapter were inspired by real data analysis code supplied by folks on twitter.
|
||||||
We've often simplified the code from the original so you might want to look at the original tweets which I list in the comments.
|
We've often simplified the code from the original so you might want to look at the original tweets which we list in the comments.
|
||||||
If you want just to see a huge variety of funcitons, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
If you want just to see a huge variety of functions, check out the motivating tweets: https://twitter.com/hadleywickham/status/1574373127349575680, https://twitter.com/hadleywickham/status/1571603361350164486 A big thanks to everyone who contributed!
|
||||||
WI won't fully explain all of the functions that I use here, so you might need to do some reading of the documentation.
|
WI won't fully explain all of the functions that we use here, so you might need to do some reading of the documentation.
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
|
@ -101,14 +101,14 @@ If we take the code above and pull it outside of `mutate()` it's a little easier
|
||||||
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
|
(d - min(d, na.rm = TRUE)) / (max(d, na.rm = TRUE) - min(d, na.rm = TRUE))
|
||||||
```
|
```
|
||||||
|
|
||||||
To make this a bit clearer I can replace the bit that varies with `█`:
|
To make this a bit clearer we can replace the bit that varies with `█`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
(█ - min(█, na.rm = TRUE)) / (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
|
(█ - min(█, na.rm = TRUE)) / (max(█, na.rm = TRUE) - min(█, na.rm = TRUE))
|
||||||
```
|
```
|
||||||
|
|
||||||
There's only one thing that varies which implies I'm going to need a function with one argument.
|
There's only one thing that varies which implies we're going to need a function with one argument.
|
||||||
|
|
||||||
To turn this into an actual function you need three things:
|
To turn this into an actual function you need three things:
|
||||||
|
|
||||||
|
@ -473,7 +473,7 @@ summary6 <- function(data, var) {
|
||||||
diamonds |> summary6(carat)
|
diamonds |> summary6(carat)
|
||||||
```
|
```
|
||||||
|
|
||||||
(Whenever you wrap `summarise()` in a helper, I think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
(Whenever you wrap `summarise()` in a helper, we think it's good practice to set `.groups = "drop"` to both avoid the message and leave the data in an ungrouped state.)
|
||||||
|
|
||||||
The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
|
The nice thing about this function is because it wraps `summarise()` you can used it on grouped data:
|
||||||
|
|
||||||
|
@ -563,7 +563,7 @@ We didn't discuss `pivot_wider()` above, but you can read the docs to discover t
|
||||||
### Selecting rows and columns
|
### Selecting rows and columns
|
||||||
|
|
||||||
Or maybe you want to find the sorted unique values of a variable for a subset of the data.
|
Or maybe you want to find the sorted unique values of a variable for a subset of the data.
|
||||||
Rather than supplying a variable and a value to do the filtering, I'll allow the user to supply an condition:
|
Rather than supplying a variable and a value to do the filtering, we'll allow the user to supply an condition:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
unique_where <- function(df, condition, var) {
|
unique_where <- function(df, condition, var) {
|
||||||
|
@ -582,7 +582,7 @@ flights |> unique_where(tailnum == "N14228", month)
|
||||||
|
|
||||||
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()`, `arrange()`, and `pull()`.
|
Here we embrace `condition` because it's passed to `filter()` and `var` because its passed to `distinct()`, `arrange()`, and `pull()`.
|
||||||
|
|
||||||
I've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
We've made all these examples take a data frame as the first argument, but if you're working repeatedly with the same data frame, it can make sense to hard code it.
|
||||||
For example, this function always works with the flights dataset, make it easy to grab the subset that you want to work with.
|
For example, this function always works with the flights dataset, make it easy to grab the subset that you want to work with.
|
||||||
It always includes `time_hour`, `carrier`, and `flight` since these are the primary key that allows you to identify a row.
|
It always includes `time_hour`, `carrier`, and `flight` since these are the primary key that allows you to identify a row.
|
||||||
|
|
||||||
|
@ -682,7 +682,7 @@ diamonds |> hex_plot(carat, price, depth)
|
||||||
|
|
||||||
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
Some of the most useful helpers combine a dash of dplyr with ggplot2.
|
||||||
For example, if you might want to do a bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
For example, if you might want to do a bar chart where you automatically sort the bars in frequency order using `fct_infreq()`.
|
||||||
And I'm drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
And we're drawing the vertical bars, so you need to reverse the usual order to get the highest values at the top:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
sorted_bars <- function(df, var) {
|
sorted_bars <- function(df, var) {
|
||||||
|
@ -748,7 +748,7 @@ foo <- function(x) {
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
I've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
We've written these functions so that you can supply any data frame, but there are also advantages to hardcoding a data frame, if you're using it repeatedly:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# https://twitter.com/yutannihilat_en/status/1574387230025875457
|
# https://twitter.com/yutannihilat_en/status/1574387230025875457
|
||||||
|
@ -764,7 +764,7 @@ density(species)
|
||||||
density(island, sex)
|
density(island, sex)
|
||||||
```
|
```
|
||||||
|
|
||||||
Also note that I hardcoded the `x` variable but allowed the fill to vary.
|
Also note that we hardcoded the `x` variable but allowed the fill to vary.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
bars <- function(df, condition, var) {
|
bars <- function(df, condition, var) {
|
||||||
|
|
|
@ -595,7 +595,7 @@ write_csv(gapminder, "gapminder.csv")
|
||||||
unlink("gapminder.csv")
|
unlink("gapminder.csv")
|
||||||
```
|
```
|
||||||
|
|
||||||
If you're working in a project, I'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
|
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R.` The `0` in the file name suggests that this should be run before anything else.
|
||||||
|
|
||||||
If your input data files change of over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run when ever one of the input files is modified.
|
If your input data files change of over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run when ever one of the input files is modified.
|
||||||
|
|
||||||
|
|
|
@ -921,7 +921,7 @@ parties <- tibble(
|
||||||
```
|
```
|
||||||
|
|
||||||
Now we can match each employee to their party.
|
Now we can match each employee to their party.
|
||||||
This is a good place to use `unmatched = "error"` because I want to quickly find out if any employees didn't get assigned a party.
|
This is a good place to use `unmatched = "error"` because we want to quickly find out if any employees didn't get assigned a party.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
employees |>
|
employees |>
|
||||||
|
@ -939,7 +939,7 @@ employees |>
|
||||||
x |> full_join(y, by = "key", keep = TRUE)
|
x |> full_join(y, by = "key", keep = TRUE)
|
||||||
```
|
```
|
||||||
|
|
||||||
2. When finding if any party period overlapped with another party period I used `q < q` in the `join_by()`?
|
2. When finding if any party period overlapped with another party period we used `q < q` in the `join_by()`?
|
||||||
Why?
|
Why?
|
||||||
What happens if you remove this inequality?
|
What happens if you remove this inequality?
|
||||||
|
|
||||||
|
|
|
@ -58,12 +58,12 @@ not_cancelled |>
|
||||||
Instead of running your code expression-by-expression, you can also execute the complete script in one step with Cmd/Ctrl + Shift + S.
|
Instead of running your code expression-by-expression, you can also execute the complete script in one step with Cmd/Ctrl + Shift + S.
|
||||||
Doing this regularly is a great way to ensure that you've captured all the important parts of your code in the script.
|
Doing this regularly is a great way to ensure that you've captured all the important parts of your code in the script.
|
||||||
|
|
||||||
I recommend that you always start your script with the packages that you need.
|
We recommend that you always start your script with the packages that you need.
|
||||||
That way, if you share your code with others, they can easily see which packages they need to install.
|
That way, if you share your code with others, they can easily see which packages they need to install.
|
||||||
Note, however, that you should never include `install.packages()` in a script that you share.
|
Note, however, that you should never include `install.packages()` in a script that you share.
|
||||||
It's very antisocial to change settings on someone else's computer!
|
It's very antisocial to change settings on someone else's computer!
|
||||||
|
|
||||||
When working through future chapters, I highly recommend starting in the script editor and practicing your keyboard shortcuts.
|
When working through future chapters, we highly recommend starting in the script editor and practicing your keyboard shortcuts.
|
||||||
Over time, sending code to the console in this way will become so natural that you won't even think about it.
|
Over time, sending code to the console in this way will become so natural that you won't even think about it.
|
||||||
|
|
||||||
### RStudio diagnostics
|
### RStudio diagnostics
|
||||||
|
@ -333,7 +333,7 @@ You should **never** use absolute paths in your scripts, because they hinder sha
|
||||||
There's another important difference between operating systems: how you separate the components of the path.
|
There's another important difference between operating systems: how you separate the components of the path.
|
||||||
Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows uses backslashes (e.g. `plots\diamonds.pdf`).
|
Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows uses backslashes (e.g. `plots\diamonds.pdf`).
|
||||||
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
|
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
|
||||||
That makes life frustrating, so I recommend always using the Linux/Mac style with forward slashes.
|
That makes life frustrating, so we recommend always using the Linux/Mac style with forward slashes.
|
||||||
|
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue