Update post map_df

This commit is contained in:
hadley 2015-11-11 11:31:15 -06:00
parent 1aea6351a9
commit eaf25f50c6
1 changed files with 6 additions and 9 deletions

View File

@ -165,6 +165,7 @@ This is such a common use of for loops, that the purrr package has five function
* `map_int()`: integer vector * `map_int()`: integer vector
* `map_dbl()`: double vector * `map_dbl()`: double vector
* `map_chr()`: character vector * `map_chr()`: character vector
* `map_df()`: a data frame
Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping: Each of these functions take a list as input, apply a function to each piece and then return a new vector that's the same length as the input. Because the first element is the list to transform, it also makes them particularly suitable for piping:
@ -185,7 +186,6 @@ Other outputs:
* `flatten()` * `flatten()`
* `map_int()` vs. `map()` + `flatten_int()` * `map_int()` vs. `map()` + `flatten_int()`
* `flatmap()` * `flatmap()`
* `dplyr::bind_rows()`
Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and Need sidebar/callout about predicate functions somewhere. Better to use purrr's underscore variants because they tend to do what you expect, and
@ -268,7 +268,6 @@ issues %>% map_chr(c("user", "login"))
issues %>% map_int(c("user", "id")) issues %>% map_int(c("user", "id"))
``` ```
### Predicate functions ### Predicate functions
Imagine we want to summarise each numeric column of a data frame. We could write this: Imagine we want to summarise each numeric column of a data frame. We could write this:
@ -340,14 +339,13 @@ x[error]
y[!error] %>% map("result") y[!error] %>% map("result")
``` ```
Challenge: read_csv all the files in this directory. Which ones failed Challenge: read all the csv files in this directory. Which ones failed
and why? Potentially helpful digression into names() and bind_rows(id and why?
= "xyz"):
```{r, eval = FALSE} ```{r, eval = FALSE}
files <- dir("data", pattern = "\\.csv$") files <- dir("data", pattern = "\\.csv$")
files %>% files %>%
set_names(basename(.)) %>% set_names(., basename(.)) %>%
map_df(readr::read_csv, .id = "filename") %>% map_df(readr::read_csv, .id = "filename") %>%
``` ```
@ -443,12 +441,12 @@ Then fit the models to each training dataset:
mod <- trn %>% map(~lm(mpg ~ wt, data = .)) mod <- trn %>% map(~lm(mpg ~ wt, data = .))
``` ```
If we wanted, we could extract the coefficients using broom, and make a single data frame with `bind_rows()` and then visualise the distributions with ggplot2: If we wanted, we could extract the coefficients using broom, and make a single data frame with `map_df()` and then visualise the distributions with ggplot2:
```{r} ```{r}
coef <- mod %>% coef <- mod %>%
map(broom::tidy) %>% map(broom::tidy) %>%
dplyr::bind_rows(.id = "i") map_df(.id = "i")
coef coef
library(ggplot2) library(ggplot2)
@ -483,7 +481,6 @@ Why you should store related vectors (even if they're lists!) in a
data frame. Need example that has some covariates so you can (e.g.) data frame. Need example that has some covariates so you can (e.g.)
select all models for females, or under 30s, ... select all models for females, or under 30s, ...
## "Tidying" lists ## "Tidying" lists
I don't know know how to put this stuff in words yet, but I know it I don't know know how to put this stuff in words yet, but I know it