Use dev tidyverse (#1240)

* lubridate is now core
* briefly mention conflicted

Fixes #1105
This commit is contained in:
Hadley Wickham 2023-01-23 07:48:36 -06:00 committed by GitHub
parent a3b40d8dd8
commit d3a9919967
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 17 additions and 15 deletions

View File

@ -49,6 +49,7 @@ Remotes:
tidyverse/dplyr,
tidyverse/dbplyr,
tidyverse/tidyr,
tidyverse/purrr
tidyverse/purrr,
tidyverse/tidyverse
Encoding: UTF-8
License: CC NC ND 3.0

View File

@ -24,7 +24,7 @@ We'll finish off with saving your plots and troubleshooting tips.
### Prerequisites
This chapter focuses on ggplot2, one of the core packages in the tidyverse.
To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running this code:
To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running:
```{r}
#| label: setup
@ -32,8 +32,11 @@ To access the datasets, help pages, and functions used in this chapter, load the
library(tidyverse)
```
That one line of code loads the core tidyverse; packages which you will use in almost every data analysis.
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded).
That one line of code loads the core tidyverse; the packages that you will use in almost every data analysis.
It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded)[^data-visualize-1].
[^data-visualize-1]: You can eliminate that message and force conflict resolution to happen on demand by using the conflicted package, which becomes more important as you load more packages.
You can learn more about conflicted at <https://conflicted.r-lib.org>.
If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again.
@ -44,7 +47,7 @@ install.packages("tidyverse")
library(tidyverse)
```
You only need to install a package once, but you need to reload it every time you start a new session.
You only need to install a package once, but you need to load it every time you start a new session.
In addition to tidyverse, we will also use the **palmerpenguins** package, which includes the `penguins` dataset containing body measurements for penguins on three islands in the Palmer Archipelago.
@ -68,9 +71,9 @@ And how about by the island where the penguin lives.
You can test your answer with the `penguins` **data frame** found in palmerpenguins (a.k.a. `palmerpenguins::penguins`).
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).
`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-1].
`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-2].
[^data-visualize-1]: Horst AM, Hill AP, Gorman KB (2020).
[^data-visualize-2]: Horst AM, Hill AP, Gorman KB (2020).
palmerpenguins: Palmer Archipelago (Antarctica) penguin data.
R package version 0.1.0.
<https://allisonhorst.github.io/palmerpenguins/>.
@ -741,10 +744,10 @@ However adding too many aesthetic mappings to a plot makes it cluttered and diff
Another way, which is particularly useful for categorical variables, is to split your plot into **facets**, subplots that each display one subset of the data.
To facet your plot by a single variable, use `facet_wrap()`.
The first argument of `facet_wrap()` is a formula[^data-visualize-2], which you create with `~` followed by a variable name.
The first argument of `facet_wrap()` is a formula[^data-visualize-3], which you create with `~` followed by a variable name.
The variable that you pass to `facet_wrap()` should be categorical.
[^data-visualize-2]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation".
[^data-visualize-3]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation".
```{r}
#| warning: false

View File

@ -35,14 +35,13 @@ We'll conclude with a brief discussion of the additional challenges posed by tim
### Prerequisites
This chapter will focus on the **lubridate** package, which makes it easier to work with dates and times in R.
lubridate is not part of core tidyverse because you only need it when you're working with dates/times.
As of the latest tidyverse release, lubridate is part of core tidyverse so.
We will also need nycflights13 for practice data.
```{r}
#| message: false
library(tidyverse)
library(lubridate)
library(nycflights13)
```

View File

@ -200,7 +200,7 @@ Once you have installed a package, you can load it using the `library()` functio
library(tidyverse)
```
This tells you that tidyverse loads eight packages: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, and forcats.
This tells you that tidyverse loads nine packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, tidyr.
These are considered the **core** of the tidyverse because you'll use them in almost every analysis.
Packages in the tidyverse change fairly frequently.

View File

@ -308,8 +308,6 @@ df_miss |> filter(if_all(a:d, is.na))
For example, [Jacob Scott](https://twitter.com/_wurli/status/1571836746899283969) uses this little helper which wraps a bunch of lubridate function to expand all date columns into year, month, and day columns:
```{r}
library(lubridate)
expand_dates <- function(df) {
df |>
mutate(
@ -687,7 +685,8 @@ Now when you come back to this problem in the future, you can read in a single c
unlink("gapminder.csv")
```
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`. The `0` in the file name suggests that this should be run before anything else.
If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`.
The `0` in the file name suggests that this should be run before anything else.
If your input data files change over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run whenever one of the input files is modified.