More about lists
This commit is contained in:
parent
0973a0dea8
commit
3b9d54db7a
|
@ -406,10 +406,10 @@ You could do it with copy and paste:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
data2019 <- readr::read_excel("data/y2019.xls")
|
data2019 <- readr::read_excel("data/y2019.xlsx")
|
||||||
data2020 <- readr::read_excel("data/y2020.xls")
|
data2020 <- readr::read_excel("data/y2020.xlsx")
|
||||||
data2021 <- readr::read_excel("data/y2021.xls")
|
data2021 <- readr::read_excel("data/y2021.xlsx")
|
||||||
data2022 <- readr::read_excel("data/y2022.xls")
|
data2022 <- readr::read_excel("data/y2022.xlsx")
|
||||||
```
|
```
|
||||||
|
|
||||||
And then use `dplyr::bind_rows()` to combine them all together:
|
And then use `dplyr::bind_rows()` to combine them all together:
|
||||||
|
@ -448,21 +448,45 @@ paths
|
||||||
|
|
||||||
### Lists
|
### Lists
|
||||||
|
|
||||||
Now that we have these 12 paths, we could call `read_excel()` 12 times to get 12 data frames.
|
Now that we have these 12 paths, we could call `read_excel()` 12 times to get 12 data frames:
|
||||||
In general, we won't know how files there are to read, so instead of saving each data frame to its own variable, we'll put them all into a list, something like this:
|
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
list(
|
gapminder_1952 <- readxl::read_excel("data/gapminder/1952.xlsx")
|
||||||
readxl::read_excel("data/gapminder/1952.xls"),
|
gapminder_1957 <- readxl::read_excel("data/gapminder/1957.xlsx")
|
||||||
readxl::read_excel("data/gapminder/1957.xls"),
|
gapminder_1962 <- readxl::read_excel("data/gapminder/1962.xlsx")
|
||||||
readxl::read_excel("data/gapminder/1962.xls"),
|
...
|
||||||
|
gapminder_2007 <- readxl::read_excel("data/gapminder/2007.xlsx")
|
||||||
|
```
|
||||||
|
|
||||||
|
But putting each sheet into its own variable is going to make it hard to work them a few steps down the road.
|
||||||
|
Instead, they'll be easier to work with if we put them into a single object.
|
||||||
|
A list is the perfect tool for this job:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
files <- list(
|
||||||
|
readxl::read_excel("data/gapminder/1952.xlsx"),
|
||||||
|
readxl::read_excel("data/gapminder/1957.xlsx"),
|
||||||
|
readxl::read_excel("data/gapminder/1962.xlsx"),
|
||||||
...,
|
...,
|
||||||
readxl::read_excel("data/gapminder/2007.xls")
|
readxl::read_excel("data/gapminder/2007.xlsx")
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
Something about `[[`
|
```{r}
|
||||||
|
#| include: false
|
||||||
|
files <- map(paths, readxl::read_excel)
|
||||||
|
```
|
||||||
|
|
||||||
|
Now that you have these data frames in a list, how do you get one out?
|
||||||
|
You can use `files[[i]]` to extract the ith element:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
files[[3]]
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll come back to `[[` in more detail in @sec-subset-one.
|
||||||
|
|
||||||
### `purrr::map()` and `list_rbind()`
|
### `purrr::map()` and `list_rbind()`
|
||||||
|
|
||||||
|
@ -530,17 +554,34 @@ The easiest way to do this is with the `set_names()` function, which can take a
|
||||||
Here we use `basename()` to extract just the file name from the full path:
|
Here we use `basename()` to extract just the file name from the full path:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
paths <- paths |> set_names(basename)
|
paths |> set_names(basename)
|
||||||
paths
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Those paths are automatically carried along by all the map functions, so the list of data frames will have those same names:
|
Those paths are automatically carried along by all the map functions, so the list of data frames will have those same names:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
files <- paths |>
|
||||||
|
set_names(basename) |>
|
||||||
|
map(readxl::read_excel)
|
||||||
|
```
|
||||||
|
|
||||||
|
That makes this call to `map()` shorthand for:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: false
|
#| eval: false
|
||||||
paths |>
|
files <- list(
|
||||||
map(readxl::read_excel) |>
|
"1952.xlsx" = readxl::read_excel("data/gapminder/1952.xlsx"),
|
||||||
names()
|
"1957.xlsx" = readxl::read_excel("data/gapminder/1957.xlsx"),
|
||||||
|
"1962.xlsx" = readxl::read_excel("data/gapminder/1962.xlsx"),
|
||||||
|
...,
|
||||||
|
"2007.xlsx" = readxl::read_excel("data/gapminder/2007.xlsx")
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use `[[` to extract elements by name:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
files[["1962.xlsx"]]
|
||||||
```
|
```
|
||||||
|
|
||||||
Then we use the `names_to` argument to `list_rbind()` to tell it to save the names into a new column called `year` then use `readr::parse_number()` to extract the number from the string.
|
Then we use the `names_to` argument to `list_rbind()` to tell it to save the names into a new column called `year` then use `readr::parse_number()` to extract the number from the string.
|
||||||
|
@ -921,7 +962,7 @@ unlink(by_clarity$paths)
|
||||||
|
|
||||||
In this chapter you learn iteration tools to solve three problems that come up frequently when doing data science: manipulating multiple columns, reading multiple files, and saving multiple outputs.
|
In this chapter you learn iteration tools to solve three problems that come up frequently when doing data science: manipulating multiple columns, reading multiple files, and saving multiple outputs.
|
||||||
But in general, iteration is a super power: if you know the right iteration technique, you can easily go from fixing one problems to fixing any number of problems.
|
But in general, iteration is a super power: if you know the right iteration technique, you can easily go from fixing one problems to fixing any number of problems.
|
||||||
Once you've mastered the techniques in this chapter, we highly recommend learning more by reading [Functionals chapter](https://adv-r.hadley.nz/functionals.html) of *Advanced R* and consulting the [purrr website](https://purrr.tidyverse.org and the).
|
Once you've mastered the techniques in this chapter, we highly recommend learning more by reading [Functionals chapter](https://adv-r.hadley.nz/functionals.html) of *Advanced R* and consulting the [purrr website](https://purrr.tidyverse.org%20and%20the).
|
||||||
|
|
||||||
If you know much about iteration in other languages you might be surprised that we didn't discuss the `for` loop.
|
If you know much about iteration in other languages you might be surprised that we didn't discuss the `for` loop.
|
||||||
That comes up in the next chapter where we'll discuss some important base R functions.
|
That comes up in the next chapter where we'll discuss some important base R functions.
|
||||||
|
|
Loading…
Reference in New Issue