More about lists

This commit is contained in:
Hadley Wickham 2022-11-08 16:19:14 -06:00
parent 0973a0dea8
commit 3b9d54db7a
1 changed files with 59 additions and 18 deletions

View File

@ -406,10 +406,10 @@ You could do it with copy and paste:
```{r}
#| eval: false
data2019 <- readr::read_excel("data/y2019.xls")
data2020 <- readr::read_excel("data/y2020.xls")
data2021 <- readr::read_excel("data/y2021.xls")
data2022 <- readr::read_excel("data/y2022.xls")
data2019 <- readr::read_excel("data/y2019.xlsx")
data2020 <- readr::read_excel("data/y2020.xlsx")
data2021 <- readr::read_excel("data/y2021.xlsx")
data2022 <- readr::read_excel("data/y2022.xlsx")
```
And then use `dplyr::bind_rows()` to combine them all together:
@ -448,21 +448,45 @@ paths
### Lists
Now that we have these 12 paths, we could call `read_excel()` 12 times to get 12 data frames.
In general, we won't know how files there are to read, so instead of saving each data frame to its own variable, we'll put them all into a list, something like this:
Now that we have these 12 paths, we could call `read_excel()` 12 times to get 12 data frames:
```{r}
#| eval: false
list(
readxl::read_excel("data/gapminder/1952.xls"),
readxl::read_excel("data/gapminder/1957.xls"),
readxl::read_excel("data/gapminder/1962.xls"),
gapminder_1952 <- readxl::read_excel("data/gapminder/1952.xlsx")
gapminder_1957 <- readxl::read_excel("data/gapminder/1957.xlsx")
gapminder_1962 <- readxl::read_excel("data/gapminder/1962.xlsx")
...
gapminder_2007 <- readxl::read_excel("data/gapminder/2007.xlsx")
```
But putting each sheet into its own variable is going to make it hard to work them a few steps down the road.
Instead, they'll be easier to work with if we put them into a single object.
A list is the perfect tool for this job:
```{r}
#| eval: false
files <- list(
readxl::read_excel("data/gapminder/1952.xlsx"),
readxl::read_excel("data/gapminder/1957.xlsx"),
readxl::read_excel("data/gapminder/1962.xlsx"),
...,
readxl::read_excel("data/gapminder/2007.xls")
readxl::read_excel("data/gapminder/2007.xlsx")
)
```
Something about `[[`
```{r}
#| include: false
files <- map(paths, readxl::read_excel)
```
Now that you have these data frames in a list, how do you get one out?
You can use `files[[i]]` to extract the ith element:
```{r}
files[[3]]
```
We'll come back to `[[` in more detail in @sec-subset-one.
### `purrr::map()` and `list_rbind()`
@ -530,17 +554,34 @@ The easiest way to do this is with the `set_names()` function, which can take a
Here we use `basename()` to extract just the file name from the full path:
```{r}
paths <- paths |> set_names(basename)
paths
paths |> set_names(basename)
```
Those paths are automatically carried along by all the map functions, so the list of data frames will have those same names:
```{r}
files <- paths |>
set_names(basename) |>
map(readxl::read_excel)
```
That makes this call to `map()` shorthand for:
```{r}
#| eval: false
paths |>
map(readxl::read_excel) |>
names()
files <- list(
"1952.xlsx" = readxl::read_excel("data/gapminder/1952.xlsx"),
"1957.xlsx" = readxl::read_excel("data/gapminder/1957.xlsx"),
"1962.xlsx" = readxl::read_excel("data/gapminder/1962.xlsx"),
...,
"2007.xlsx" = readxl::read_excel("data/gapminder/2007.xlsx")
)
```
You can also use `[[` to extract elements by name:
```{r}
files[["1962.xlsx"]]
```
Then we use the `names_to` argument to `list_rbind()` to tell it to save the names into a new column called `year` then use `readr::parse_number()` to extract the number from the string.
@ -921,7 +962,7 @@ unlink(by_clarity$paths)
In this chapter you learn iteration tools to solve three problems that come up frequently when doing data science: manipulating multiple columns, reading multiple files, and saving multiple outputs.
But in general, iteration is a super power: if you know the right iteration technique, you can easily go from fixing one problems to fixing any number of problems.
Once you've mastered the techniques in this chapter, we highly recommend learning more by reading [Functionals chapter](https://adv-r.hadley.nz/functionals.html) of *Advanced R* and consulting the [purrr website](https://purrr.tidyverse.org and the).
Once you've mastered the techniques in this chapter, we highly recommend learning more by reading [Functionals chapter](https://adv-r.hadley.nz/functionals.html) of *Advanced R* and consulting the [purrr website](https://purrr.tidyverse.org%20and%20the).
If you know much about iteration in other languages you might be surprised that we didn't discuss the `for` loop.
That comes up in the next chapter where we'll discuss some important base R functions.