This chapter belongs in [wrangle](#wrangle-intro): it will give you a set of tools for working with hierarchical data, such as the deeply nested lists you often get when working with JSON.
However, you can only learn it now because working with hierarchical structures requires some programming skills, particularly an understanding of data structures, functions, and iteration.
Now you have those tools under your belt, you can learn how to work with hierarchical data.
The
As well as tools to simplify iteration, purrr provides tools for handling deeply nested lists.
There are three common sources of such data:
- JSON and XML
-
The map functions apply a function to every element in a list.
They are the most commonly used part of purrr, but not the only part.
Since lists are often used to represent complex hierarchies, purrr also provides tools to work with hierarchy:
- You can extract deeply nested elements in a single call by supplying a character vector to the map functions.
- You can remove a level of the hierarchy with the flatten functions.
- You can flip levels of the hierarchy with the transpose function.
### Prerequisites
This chapter focusses mostly on purrr.
As well as the tools for iteration that you've already learned about, purrr also provides a number of tools specifically designed to manipulate hierarchical data.
```{r setup}
library(purrr)
```
## Initial exploration
Sometimes you get data structures that are very deeply nested.
A common source of such data is JSON from a web API.
I've previously downloaded a list of GitHub issues related to this book and saved it as `issues.json`.
Now I'm going to load it into a list with jsonlite.
By default `fromJSON()` tries to be helpful and simplifies the structure a little for you.
Here I'm going to show you how to do it with purrr, so I set `simplifyVector = FALSE`:
```{r}
# From https://api.github.com/repos/hadley/r4ds/issues
Whenever you see an error from purrr complaining about the "type" of the result, it's because it's trying to shove it into a simple vector (here a character).
You can diagnose the problem more easily if you use `map()`:
(You might wonder why that isn't the default value since it's so useful. Well, if it was the default, you'd never get an error message if you had a typo in the names. You'd just get a vector of missing values. That would be annoying to debug because it's a silent failure.)
It's possible to mix position and named indexing by using a list
You'll see an example of this in the next section, as `transpose()` is particularly useful in conjunction with adverbs like `safely()` and `quietly()`.
It's called transpose by analogy to matrices.
When you subset a transposed matrix, you switch indices: `x[i, j]` is the same as `t(x)[j, i]`.
It's the same idea when transposing a list, but the subsetting looks a little different: `x[[i]][[j]]` is equivalent to `transpose(x)[[j]][[i]]`.
Similarly, a transpose is its own inverse so `transpose(transpose(x))` is equal to `x`.
Transpose is also useful when working with JSON APIs.
Many JSON APIs represent data frames in a row-based format, rather than R's column-based format.
`transpose()` makes it easy to switch between the two:
```{r}
df <- tibble::tibble(x = 1:3, y = c("a", "b", "c"))