diff --git a/DESCRIPTION b/DESCRIPTION index 2ca14b7..01cde74 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -49,6 +49,7 @@ Remotes: tidyverse/dplyr, tidyverse/dbplyr, tidyverse/tidyr, - tidyverse/purrr + tidyverse/purrr, + tidyverse/tidyverse Encoding: UTF-8 License: CC NC ND 3.0 diff --git a/data-visualize.qmd b/data-visualize.qmd index d4c6bbb..4963750 100644 --- a/data-visualize.qmd +++ b/data-visualize.qmd @@ -24,7 +24,7 @@ We'll finish off with saving your plots and troubleshooting tips. ### Prerequisites This chapter focuses on ggplot2, one of the core packages in the tidyverse. -To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running this code: +To access the datasets, help pages, and functions used in this chapter, load the tidyverse by running: ```{r} #| label: setup @@ -32,8 +32,11 @@ To access the datasets, help pages, and functions used in this chapter, load the library(tidyverse) ``` -That one line of code loads the core tidyverse; packages which you will use in almost every data analysis. -It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded). +That one line of code loads the core tidyverse; the packages that you will use in almost every data analysis. +It also tells you which functions from the tidyverse conflict with functions in base R (or from other packages you might have loaded)[^data-visualize-1]. + +[^data-visualize-1]: You can eliminate that message and force conflict resolution to happen on demand by using the conflicted package, which becomes more important as you load more packages. + You can learn more about conflicted at . If you run this code and get the error message `there is no package called 'tidyverse'`, you'll need to first install it, then run `library()` once again. @@ -44,7 +47,7 @@ install.packages("tidyverse") library(tidyverse) ``` -You only need to install a package once, but you need to reload it every time you start a new session. +You only need to install a package once, but you need to load it every time you start a new session. In addition to tidyverse, we will also use the **palmerpenguins** package, which includes the `penguins` dataset containing body measurements for penguins on three islands in the Palmer Archipelago. @@ -68,9 +71,9 @@ And how about by the island where the penguin lives. You can test your answer with the `penguins` **data frame** found in palmerpenguins (a.k.a. `palmerpenguins::penguins`). A data frame is a rectangular collection of variables (in the columns) and observations (in the rows). -`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-1]. +`penguins` contains `r nrow(penguins)` observations collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER[^data-visualize-2]. -[^data-visualize-1]: Horst AM, Hill AP, Gorman KB (2020). +[^data-visualize-2]: Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. . @@ -741,10 +744,10 @@ However adding too many aesthetic mappings to a plot makes it cluttered and diff Another way, which is particularly useful for categorical variables, is to split your plot into **facets**, subplots that each display one subset of the data. To facet your plot by a single variable, use `facet_wrap()`. -The first argument of `facet_wrap()` is a formula[^data-visualize-2], which you create with `~` followed by a variable name. +The first argument of `facet_wrap()` is a formula[^data-visualize-3], which you create with `~` followed by a variable name. The variable that you pass to `facet_wrap()` should be categorical. -[^data-visualize-2]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation". +[^data-visualize-3]: Here "formula" is the name of the type of thing created by `~`, not a synonym for "equation". ```{r} #| warning: false diff --git a/datetimes.qmd b/datetimes.qmd index df4c463..32af40a 100644 --- a/datetimes.qmd +++ b/datetimes.qmd @@ -35,14 +35,13 @@ We'll conclude with a brief discussion of the additional challenges posed by tim ### Prerequisites This chapter will focus on the **lubridate** package, which makes it easier to work with dates and times in R. -lubridate is not part of core tidyverse because you only need it when you're working with dates/times. +As of the latest tidyverse release, lubridate is part of core tidyverse so. We will also need nycflights13 for practice data. ```{r} #| message: false library(tidyverse) -library(lubridate) library(nycflights13) ``` diff --git a/intro.qmd b/intro.qmd index a007f44..70adb96 100644 --- a/intro.qmd +++ b/intro.qmd @@ -200,7 +200,7 @@ Once you have installed a package, you can load it using the `library()` functio library(tidyverse) ``` -This tells you that tidyverse loads eight packages: ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, and forcats. +This tells you that tidyverse loads nine packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, tidyr. These are considered the **core** of the tidyverse because you'll use them in almost every analysis. Packages in the tidyverse change fairly frequently. diff --git a/iteration.qmd b/iteration.qmd index fe948ce..49c314b 100644 --- a/iteration.qmd +++ b/iteration.qmd @@ -308,8 +308,6 @@ df_miss |> filter(if_all(a:d, is.na)) For example, [Jacob Scott](https://twitter.com/_wurli/status/1571836746899283969) uses this little helper which wraps a bunch of lubridate function to expand all date columns into year, month, and day columns: ```{r} -library(lubridate) - expand_dates <- function(df) { df |> mutate( @@ -687,7 +685,8 @@ Now when you come back to this problem in the future, you can read in a single c unlink("gapminder.csv") ``` -If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`. The `0` in the file name suggests that this should be run before anything else. +If you're working in a project, we'd suggest calling the file that does this sort of data prep work something like `0-cleanup.R`. +The `0` in the file name suggests that this should be run before anything else. If your input data files change over time, you might consider learning a tool like [targets](https://docs.ropensci.org/targets/) to set up your data cleaning code to automatically re-run whenever one of the input files is modified.