Fix/rectangling probably typos (#1486)

* probably a typo

* probably a typo

* - "a" is not a factor but a character.
- probably a typo

* a typo

* probably a typo

* probably a typo

* "ab" is a string, but is not a character, though can be an element of a character vector

* a typo

* probably a typo

* Update rectangling.qmd

---------

Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
Mitsuo Shiota 2023-05-22 00:26:53 +09:00 committed by GitHub
parent b2c4c1d0d0
commit 0bd216b75a
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 9 additions and 9 deletions

View File

@ -9,7 +9,7 @@ status("complete")
## Introduction
In this chapter, you'll learn the art of data **rectangling**, taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
In this chapter, you'll learn the art of data **rectangling**: taking data that is fundamentally hierarchical, or tree-like, and converting it into a rectangular data frame made up of rows and columns.
This is important because hierarchical data is surprisingly common, especially when working with data that comes from the web.
To learn about rectangling, you'll need to first learn about lists, the data structure that makes hierarchical data possible.
@ -263,12 +263,12 @@ df6 |> unnest_longer(y)
```
We get zero rows in the output, so the row effectively disappears.
If you want to preserve that row, adding add `NA` in `y` by setting `keep_empty = TRUE`.
If you want to preserve that row, adding `NA` in `y`, set `keep_empty = TRUE`.
### Inconsistent types
What happens if you unnest a list-column that contains different types of vector?
For example, take the following dataset where the list-column `y` contains two numbers, a factor, and a logical, which can't normally be mixed in a single column.
For example, take the following dataset where the list-column `y` contains two numbers, a character, and a logical, which can't normally be mixed in a single column.
```{r}
df4 <- tribble(
@ -292,7 +292,7 @@ Because `unnest_longer()` can't find a common type of vector, it keeps the origi
You might wonder if this breaks the commandment that every element of a column must be the same type.
It doesn't: every element is a list, even though the contents are of different types.
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll mostly likely need tools from @sec-iteration.
Dealing with inconsistent types is challenging and the details depend on the precise nature of the problem and your goals, but you'll most likely need tools from @sec-iteration.
### Other functions
@ -444,7 +444,7 @@ chars |>
select(id, where(is.list))
```
Lets explore the `titles` column.
Let's explore the `titles` column.
It's an unnamed list-column, so we'll unnest it into rows:
```{r}
@ -509,7 +509,7 @@ locations
Now we can see why two cities got two results: Washington matched both Washington state and Washington, DC, and Arlington matched Arlington, Virginia and Arlington, Texas.
There are few different places we could go from here.
There are a few different places we could go from here.
We might want to determine the exact location of the match, which is stored in the `geometry` list-column:
```{r}
@ -576,7 +576,7 @@ If these case studies have whetted your appetite for more real-life rectangling,
Why can you only roughly estimate the date?
2. The `owner` column of `gh_repo` contains a lot of duplicated information because each owner can have many repos.
Can you construct a `owners` data frame that contains one row for each owner?
Can you construct an `owners` data frame that contains one row for each owner?
(Hint: does `distinct()` work with `list-cols`?)
3. Follow the steps used for `titles` to create similar tables for the aliases, allegiances, books, and TV series for the Game of Thrones characters.
@ -634,7 +634,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
Note that JSON doesn't have any native way to represent dates or date-times, so they're often stored as strings, and you'll need to use `readr::parse_date()` or `readr::parse_datetime()` to turn them into the correct data structure.
Similarly, JSON's rules for representing floating point numbers in JSON are a little imprecise, so you'll also sometimes find numbers stored in strings.
Apply `readr::parse_double()` as needed to the get correct variable type.
Apply `readr::parse_double()` as needed to get the correct variable type.
### jsonlite
@ -741,7 +741,7 @@ df |>
In this chapter, you learned what lists are, how you can generate them from JSON files, and how turn them into rectangular data frames.
Surprisingly we only need two new functions: `unnest_longer()` to put list elements into rows and `unnest_wider()` to put list elements into columns.
It doesn't matter how deeply nested the list-column is, all you need to do is repeatedly call these two functions.
It doesn't matter how deeply nested the list-column is; all you need to do is repeatedly call these two functions.
JSON is the most common data format returned by web APIs.
What happens if the website doesn't have an API, but you can see data you want on the website?