Fix/rectangling probably typos (#1486)
* probably a typo * probably a typo * - "a" is not a factor but a character. - probably a typo * a typo * probably a typo * probably a typo * "ab" is a string, but is not a character, though can be an element of a character vector * a typo * probably a typo * Update rectangling.qmd --------- Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
parent
b2c4c1d0d0
commit
0bd216b75a
|
@ -9,7 +9,7 @@ status("complete")
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
In this chapter, you'll learn the art of data **rectangling**, taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
|
In this chapter, you'll learn the art of data **rectangling**: taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
|
||||||
This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.
|
This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.
|
||||||
|
|
||||||
To learn about rectangling, you'll need to first learn about lists, the data structure that makes hierarchical data possible.
|
To learn about rectangling, you'll need to first learn about lists, the data structure that makes hierarchical data possible.
|
||||||
|
@ -263,12 +263,12 @@ df6 |> unnest_longer(y)
|
||||||
```
|
```
|
||||||
|
|
||||||
We get zero rows in the output, so the row effectively disappears.
|
We get zero rows in the output, so the row effectively disappears.
|
||||||
If you want to preserve that row, adding add `NA` in `y` by setting `keep_empty = TRUE`.
|
If you want to preserve that row, adding `NA` in `y`, set `keep_empty = TRUE`.
|
||||||
|
|
||||||
### Inconsistent types
|
### Inconsistent types
|
||||||
|
|
||||||
What happens if you unnest a list-column that contains different types of vector?
|
What happens if you unnest a list-column that contains different types of vector?
|
||||||
For example, take the following dataset where the list-column `y` contains two numbers, a factor, and a logical, which can't normally be mixed in a single column.
|
For example, take the following dataset where the list-column `y` contains two numbers, a character, and a logical, which can't normally be mixed in a single column.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
df4 <- tribble(
|
df4 <- tribble(
|
||||||
|
@ -292,7 +292,7 @@ Because `unnest_longer()` can't find a common type of vector, it keeps the origi
|
||||||
You might wonder if this breaks the commandment that every element of a column must be the same type.
|
You might wonder if this breaks the commandment that every element of a column must be the same type.
|
||||||
It doesn't: every element is a list, even though the contents are of different types.
|
It doesn't: every element is a list, even though the contents are of different types.
|
||||||
|
|
||||||
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll mostly likely need tools from @sec-iteration.
|
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll most likely need tools from @sec-iteration.
|
||||||
|
|
||||||
### Other functions
|
### Other functions
|
||||||
|
|
||||||
|
@ -444,7 +444,7 @@ chars |>
|
||||||
select(id, where(is.list))
|
select(id, where(is.list))
|
||||||
```
|
```
|
||||||
|
|
||||||
Lets explore the `titles` column.
|
Let's explore the `titles` column.
|
||||||
It's an unnamed list-column, so we'll unnest it into rows:
|
It's an unnamed list-column, so we'll unnest it into rows:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -509,7 +509,7 @@ locations
|
||||||
|
|
||||||
Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.
|
Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.
|
||||||
|
|
||||||
There are few different places we could go from here.
|
There are a few different places we could go from here.
|
||||||
We might want to determine the exact location of the match, which is stored in the `geometry` list-column:
|
We might want to determine the exact location of the match, which is stored in the `geometry` list-column:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -576,7 +576,7 @@ If these case studies have whetted your appetite for more real-life rectangling,
|
||||||
Why can you only roughly estimate the date?
|
Why can you only roughly estimate the date?
|
||||||
|
|
||||||
2. The `owner` column of `gh_repo` contains a lot of duplicated information because each owner can have many repos.
|
2. The `owner` column of `gh_repo` contains a lot of duplicated information because each owner can have many repos.
|
||||||
Can you construct a `owners` data frame that contains one row for each owner?
|
Can you construct an `owners` data frame that contains one row for each owner?
|
||||||
(Hint: does `distinct()` work with `list-cols`?)
|
(Hint: does `distinct()` work with `list-cols`?)
|
||||||
|
|
||||||
3. Follow the steps used for `titles` to create similar tables for the aliases, allegiances, books, and TV series for the Game of Thrones characters.
|
3. Follow the steps used for `titles` to create similar tables for the aliases, allegiances, books, and TV series for the Game of Thrones characters.
|
||||||
|
@ -634,7 +634,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
||||||
|
|
||||||
Note that JSON doesn't have any native way to represent dates or date-times, so they're often stored as strings, and you'll need to use `readr::parse_date()` or `readr::parse_datetime()` to turn them into the correct data structure.
|
Note that JSON doesn't have any native way to represent dates or date-times, so they're often stored as strings, and you'll need to use `readr::parse_date()` or `readr::parse_datetime()` to turn them into the correct data structure.
|
||||||
Similarly, JSON's rules for representing floating point numbers in JSON are a little imprecise, so you'll also sometimes find numbers stored in strings.
|
Similarly, JSON's rules for representing floating point numbers in JSON are a little imprecise, so you'll also sometimes find numbers stored in strings.
|
||||||
Apply `readr::parse_double()` as needed to the get correct variable type.
|
Apply `readr::parse_double()` as needed to get the correct variable type.
|
||||||
|
|
||||||
### jsonlite
|
### jsonlite
|
||||||
|
|
||||||
|
@ -741,7 +741,7 @@ df |>
|
||||||
|
|
||||||
In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames.
|
In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames.
|
||||||
Surprisingly we only need two new functions: `unnest_longer()` to put list elements into rows and `unnest_wider()` to put list elements into columns.
|
Surprisingly we only need two new functions: `unnest_longer()` to put list elements into rows and `unnest_wider()` to put list elements into columns.
|
||||||
It doesn't matter how deeply nested the list-column is, all you need to do is repeatedly call these two functions.
|
It doesn't matter how deeply nested the list-column is; all you need to do is repeatedly call these two functions.
|
||||||
|
|
||||||
JSON is the most common data format returned by web APIs.
|
JSON is the most common data format returned by web APIs.
|
||||||
What happens if the website doesn't have an API, but you can see data you want on the website?
|
What happens if the website doesn't have an API, but you can see data you want on the website?
|
||||||
|
|
Loading…
Reference in New Issue