Update rectangling.qmd (#1069)
It is a very nicely written chapter. Here I provided a few corrections on typos and errors.
This commit is contained in:
parent
476f2c8282
commit
d080f3279c
|
@ -126,8 +126,8 @@ knitr::include_graphics("screenshots/View-3.png", dpi = 220)
|
||||||
### List-columns
|
### List-columns
|
||||||
|
|
||||||
Lists can also live inside a tibble, where we call them list-columns.
|
Lists can also live inside a tibble, where we call them list-columns.
|
||||||
List-columns are useful because they allow you to shoehorn in objects that wouldn't wouldn't usually belong in a tibble.
|
List-columns are useful because they allow you to shoehorn in objects that wouldn't usually belong in a tibble.
|
||||||
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allows you to store things like models or resamples in a data frame.
|
In particular, list-columns are are used a lot in the [tidymodels](https://www.tidymodels.org) ecosystem, because they allow you to store things like models or resamples in a data frame.
|
||||||
|
|
||||||
Here's a simple example of a list-column:
|
Here's a simple example of a list-column:
|
||||||
|
|
||||||
|
@ -187,7 +187,7 @@ It's easier to use list-columns with tibbles because `tibble()` treats lists lik
|
||||||
|
|
||||||
## Unnesting
|
## Unnesting
|
||||||
|
|
||||||
Now that you've learned the basics of lists and list-columns, lets explore how you can turn them back into regular rows and columns.
|
Now that you've learned the basics of lists and list-columns, let's explore how you can turn them back into regular rows and columns.
|
||||||
We'll start with very simple sample data so you can get the basic idea, and then switch to more realistic examples in the next section.
|
We'll start with very simple sample data so you can get the basic idea, and then switch to more realistic examples in the next section.
|
||||||
|
|
||||||
List-columns tend to come in two basic forms: named and unnamed.
|
List-columns tend to come in two basic forms: named and unnamed.
|
||||||
|
@ -195,7 +195,7 @@ When the children are **named**, they tend to have the same names in every row.
|
||||||
When the children are **unnamed**, the number of elements tends to vary from row-to-row.
|
When the children are **unnamed**, the number of elements tends to vary from row-to-row.
|
||||||
The following code creates an example of each.
|
The following code creates an example of each.
|
||||||
In `df1`, every element of list-column `y` has two elements named `a` and `b`.
|
In `df1`, every element of list-column `y` has two elements named `a` and `b`.
|
||||||
If `df2`, the elements of list-column `y` are unnamed and vary in length.
|
In `df2`, the elements of list-column `y` are unnamed and vary in length.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
df1 <- tribble(
|
df1 <- tribble(
|
||||||
|
@ -316,7 +316,7 @@ You might wonder if this breaks the commandment that every element of a column m
|
||||||
What happens if you find this problem in a dataset you're trying to rectangle?
|
What happens if you find this problem in a dataset you're trying to rectangle?
|
||||||
There are two basic options.
|
There are two basic options.
|
||||||
You could use the `transform` argument to coerce all inputs to a common type.
|
You could use the `transform` argument to coerce all inputs to a common type.
|
||||||
It's not particularly useful here because there's only really one class that these five class can be converted to: character.
|
It's not particularly useful here because there's only really one class that these five class can be converted to character.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
df4 |>
|
df4 |>
|
||||||
|
@ -371,7 +371,7 @@ These are good to know about when you're other people's code and for tackling ra
|
||||||
## Case studies
|
## Case studies
|
||||||
|
|
||||||
So far you've learned about the simplest case of list-columns, where rectangling only requires a single call to `unnest_longer()` or `unnest_wider()`.
|
So far you've learned about the simplest case of list-columns, where rectangling only requires a single call to `unnest_longer()` or `unnest_wider()`.
|
||||||
The main difference between real data and these simple examples is that real data typically containsmultiple levels of nesting that requires multiple calls to `unnest_longer()` and `unnest_wider()`.
|
The main difference between real data and these simple examples is that real data typically contains multiple levels of nesting that require multiple calls to `unnest_longer()` and `unnest_wider()`.
|
||||||
This section will work through four real rectangling challenges using datasets from the repurrrsive package that are inspired by datasets that we've encountered in the wild.
|
This section will work through four real rectangling challenges using datasets from the repurrrsive package that are inspired by datasets that we've encountered in the wild.
|
||||||
|
|
||||||
### Very wide data
|
### Very wide data
|
||||||
|
@ -426,7 +426,7 @@ repos |>
|
||||||
|
|
||||||
You can use this to work back to understand how `gh_repos` was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.
|
You can use this to work back to understand how `gh_repos` was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created.
|
||||||
|
|
||||||
`owner` is another list-column, and since it a contains a named list, we can use `unnest_wider()` to get at the values:
|
`owner` is another list-column, and since it contains a named list, we can use `unnest_wider()` to get at the values:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| error: true
|
#| error: true
|
||||||
|
@ -624,7 +624,7 @@ locations |>
|
||||||
unnest_wider(location)
|
unnest_wider(location)
|
||||||
```
|
```
|
||||||
|
|
||||||
Extracting the bounds requires a few more steps
|
Extracting the bounds requires a few more steps:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
locations |>
|
locations |>
|
||||||
|
@ -649,7 +649,7 @@ locations |>
|
||||||
|
|
||||||
Note how we unnest two columns simultaneously by supplying a vector of variable names to `unnest_wider()`.
|
Note how we unnest two columns simultaneously by supplying a vector of variable names to `unnest_wider()`.
|
||||||
|
|
||||||
This somewhere that `hoist()`, mentioned briefly above, can be useful.
|
This is somewhere that `hoist()`, mentioned briefly above, can be useful.
|
||||||
Once you've discovered the path to get to the components you're interested in, you can extract them directly using `hoist()`:
|
Once you've discovered the path to get to the components you're interested in, you can extract them directly using `hoist()`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -711,17 +711,17 @@ Four of them are scalars:
|
||||||
|
|
||||||
- The simplest type is a null, which is written `null`, which plays the same role as both `NULL` and `NA` in R. It represents the absence of data.
|
- The simplest type is a null, which is written `null`, which plays the same role as both `NULL` and `NA` in R. It represents the absence of data.
|
||||||
- A **string** is much like a string in R, but must use double quotes, not single quotes.
|
- A **string** is much like a string in R, but must use double quotes, not single quotes.
|
||||||
- A **number** is similar to R's numbers: they can be use integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
|
- A **number** is similar to R's numbers: they can be integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support Inf, -Inf, or NaN.
|
||||||
- A **boolean** is similar to R's `TRUE` and `FALSE`, but use lower case `true` and `false`.
|
- A **boolean** is similar to R's `TRUE` and `FALSE`, but use lower case `true` and `false`.
|
||||||
|
|
||||||
JSON's strings, numbers, and booleans are pretty similar to R's character, numeric, and logical vectors.
|
JSON's strings, numbers, and booleans are pretty similar to R's character, numeric, and logical vectors.
|
||||||
The main difference is that JSON's scalars can only represent a single value.
|
The main difference is that JSON's scalars can only represent a single value.
|
||||||
To represent multiple values you need to use one of the two remaining two types, arrays and objects.
|
To represent multiple values you need to use one of the two remaining types, arrays and objects.
|
||||||
|
|
||||||
Both arrays and objects are similar to lists in R; the difference is whether or not they're named.
|
Both arrays and objects are similar to lists in R; the difference is whether or not they're named.
|
||||||
An **array** is like an unnamed list, and is written with `[]`.
|
An **array** is like an unnamed list, and is written with `[]`.
|
||||||
For example `[1, 2, 3]` is an array containing 3 numbers, and `[null, 1, "string", false]` is an array that contains a null, a number, a string, and a boolean.
|
For example `[1, 2, 3]` is an array containing 3 numbers, and `[null, 1, "string", false]` is an array that contains a null, a number, a string, and a boolean.
|
||||||
An **object** is like a named list, and they're written with `{}`.
|
An **object** is like a named list, and it's written with `{}`.
|
||||||
For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
||||||
|
|
||||||
### jsonlite
|
### jsonlite
|
||||||
|
@ -729,7 +729,7 @@ For example, `{"x": 1, "y": 2}` is an object that maps `x` to 1 and `y` to 2.
|
||||||
To convert JSON into R data structures, we recommend that you use the jsonlite package, by Jeroen Oooms.
|
To convert JSON into R data structures, we recommend that you use the jsonlite package, by Jeroen Oooms.
|
||||||
We'll use only two jsonlite functions: `read_json()` and `parse_json()`.
|
We'll use only two jsonlite functions: `read_json()` and `parse_json()`.
|
||||||
In real life, you'll use `read_json()` to read a JSON file from disk.
|
In real life, you'll use `read_json()` to read a JSON file from disk.
|
||||||
For example, we the repurrsive package also provides the source for `gh_user` as a JSON file:
|
For example, the repurrsive package also provides the source for `gh_user` as a JSON file:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
# A path to a json file inside the package:
|
# A path to a json file inside the package:
|
||||||
|
|
Loading…
Reference in New Issue