diff --git a/lists.Rmd b/lists.Rmd index c40f6fd..e253169 100644 --- a/lists.Rmd +++ b/lists.Rmd @@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function * `map_int()`: integer vector * `map_dbl()`: double vector * `map_chr()`: character vector +* `map_df()`: a data frame Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping: @@ -185,7 +186,6 @@ Other outputs: * `flatten()` * `map_int()` vs. `map()` + `flatten_int()` * `flatmap()` -* `dplyr::bind_rows()` Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and @@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login")) issues %>% map_int(c("user", "id")) ``` - ### Predicate functions Imagine we want to summarise each numeric column of a data frame. We could write this: @@ -340,14 +339,13 @@ x[error] y[!error] %>% map("result") ``` -Challenge: read_csv all the files in this directory. Which ones failed -and why? Potentially helpful digression into names() and bind_rows(id -= "xyz"): +Challenge: read all the csv files in this directory. Which ones failed +and why? ```{r, eval = FALSE} files <- dir("data", pattern = "\\.csv$") files %>% - set_names(basename(.)) %>% + set_names(., basename(.)) %>% map_df(readr::read_csv, .id = "filename") %>% ``` @@ -443,12 +441,12 @@ Then fit the models to each training dataset: mod <- trn %>% map(~lm(mpg ~ wt, data = .)) ``` -If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2: +If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2: ```{r} coef <- mod %>% map(broom::tidy) %>% - dplyr::bind_rows(.id = "i") + map_df(.id = "i") coef library(ggplot2) @@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a data frame. Need example that has some covariates so you can (e.g.) select all models for females, or under 30s, ... - ## "Tidying" lists I don't know know how to put this stuff in words yet, but I know it