Fixes for dev tidyr
This commit is contained in:
parent
5ef6a6af54
commit
40a56c55ed
49
strings.qmd
49
strings.qmd
|
@ -261,10 +261,10 @@ Working from <https://github.com/tidyverse/tidyr/pull/1304>.
|
||||||
It's very common for multiple variables to be crammed together into a single string.
|
It's very common for multiple variables to be crammed together into a single string.
|
||||||
In this section you'll learn how to use four tidyr to extract them:
|
In this section you'll learn how to use four tidyr to extract them:
|
||||||
|
|
||||||
- `df |> separate_by_longer(col, sep)`
|
- `df |> separate_longer_delim(col, delim)`
|
||||||
- `df |> separate_at_longer(col, width)`
|
- `df |> separate_longer_position(col, width)`
|
||||||
- `df |> separate_by_wider(col, sep, names)`
|
- `df |> separate_wider_delim(col, delim, names)`
|
||||||
- `df |> separate_at_wider(col, widths)`
|
- `df |> separate_wider_(col, widths)`
|
||||||
|
|
||||||
If you look closely you can see there's a common pattern here: `separate` followed by `by` or `at`, followed by longer or `wider`.
|
If you look closely you can see there's a common pattern here: `separate` followed by `by` or `at`, followed by longer or `wider`.
|
||||||
`by` splits up a string with a separator like `", "` or `" "`.
|
`by` splits up a string with a separator like `", "` or `" "`.
|
||||||
|
@ -274,80 +274,63 @@ If you look closely you can see there's a common pattern here: `separate` follow
|
||||||
There's one more member of this family, `separate_regex_wider()`, that we'll come back in @sec-regular-expressions.
|
There's one more member of this family, `separate_regex_wider()`, that we'll come back in @sec-regular-expressions.
|
||||||
It's the most flexible of the `at` forms but you need to know a bit about regular expression in order to use it.
|
It's the most flexible of the `at` forms but you need to know a bit about regular expression in order to use it.
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| include: false
|
|
||||||
has_dev_tidyr <- packageVersion("tidyr") >= "1.2.1.9001"
|
|
||||||
```
|
|
||||||
|
|
||||||
The next two sections will give you the basic idea behind these separate functions, and then we'll work through a few case studies that require mutliple uses.
|
The next two sections will give you the basic idea behind these separate functions, and then we'll work through a few case studies that require mutliple uses.
|
||||||
|
|
||||||
### Splitting into rows
|
### Splitting into rows
|
||||||
|
|
||||||
`separate_by_longer()` and `separate_at_longer()` are most useful when the number of components varies from row to row.
|
`separate_longer_delim()` and `separate_longer_position()` are most useful when the number of components varies from row to row.
|
||||||
`separate_by_longer()` arises most commonly:
|
`separate_longer_delim()` arises most commonly:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df1 <- tibble(x = c("a,b,c", "d,e", "f"))
|
df1 <- tibble(x = c("a,b,c", "d,e", "f"))
|
||||||
df1 |>
|
df1 |>
|
||||||
separate_by_longer(x, sep = ",")
|
separate_longer_delim(x, delim = ",")
|
||||||
```
|
```
|
||||||
|
|
||||||
(If the separators have some variation you can use a regular expression instead, if you know about it.)
|
(If the separators have some variation you can use a regular expression instead, if you know about it.)
|
||||||
|
|
||||||
It's rarer to see `separate_at_longer()` in the wild, but some older datasets can adopt a very compact format where each character is used to record a value:
|
It's rarer to see `separate_longer_position()` in the wild, but some older datasets can adopt a very compact format where each character is used to record a value:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df2 <- tibble(x = c("1211", "131", "21"))
|
df2 <- tibble(x = c("1211", "131", "21"))
|
||||||
df2 |>
|
df2 |>
|
||||||
separate_at_longer(x, width = 1)
|
separate_longer_position(x, width = 1)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Splitting into columns
|
### Splitting into columns
|
||||||
|
|
||||||
`separate_by_wider()` and `separate_at_wider()` are most useful when there are a fixed number of components in each string, and you want to spread them into columns.
|
`separate_wider_delim()` and `separate_wider_position()` are most useful when there are a fixed number of components in each string, and you want to spread them into columns.
|
||||||
They are more complicated that their `by` equivalents because you need to name the columns.
|
They are more complicated that their `by` equivalents because you need to name the columns.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
||||||
df3 |>
|
df3 |>
|
||||||
separate_by_wider(x, sep = ",", names = c("letter", "number", "year"))
|
separate_wider_delim(x, delim = ",", names = c("letter", "number", "year"))
|
||||||
```
|
```
|
||||||
|
|
||||||
If a specific value is not useful you can use `NA` to omit it from the results:
|
If a specific value is not useful you can use `NA` to omit it from the results:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
df3 <- tibble(x = c("a,1,2022", "b,2,2011", "e,5,2015"))
|
||||||
df3 |>
|
df3 |>
|
||||||
separate_by_wider(x, sep = ",", names = c("letter", NA, "year"))
|
separate_wider_delim(x, delim = ",", names = c("letter", NA, "year"))
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, you can provide `names_sep` and `separate_by_wider()` will use that separator to name automatically:
|
Alternatively, you can provide `names_sep` and `separate_wider_delim()` will use that separator to name automatically:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df3 |>
|
df3 |>
|
||||||
separate_by_wider(x, sep = ",", names_sep = "_")
|
separate_wider_delim(x, delim = ",", names_sep = "_")
|
||||||
```
|
```
|
||||||
|
|
||||||
`separate_at_wider()` works a little differently, because you typically want to specify the width of each column.
|
`separate_wider_position()` works a little differently, because you typically want to specify the width of each column.
|
||||||
So you give it a named integer vector, where the name gives the name of the new column and the value is the number of characters it occupies.
|
So you give it a named integer vector, where the name gives the name of the new column and the value is the number of characters it occupies.
|
||||||
You can omit values from the output by not naming them:
|
You can omit values from the output by not naming them:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| eval: !expr has_dev_tidyr
|
|
||||||
|
|
||||||
df4 <- tibble(x = c("202215TX", "202122LA", "202325CA"))
|
df4 <- tibble(x = c("202215TX", "202122LA", "202325CA"))
|
||||||
df4 |>
|
df4 |>
|
||||||
separate_at_wider(x, c(year = 4, age = 2, state = 2))
|
separate_wider_position(x, c(year = 4, age = 2, state = 2))
|
||||||
```
|
```
|
||||||
|
|
||||||
### Case studies
|
### Case studies
|
||||||
|
|
Loading…
Reference in New Issue