Clarify str_view() color, closes #1479

This commit is contained in:
Mine Çetinkaya-Rundel 2023-11-11 07:12:20 -05:00
parent 336b70bbc4
commit 10558e4d57
1 changed files with 16 additions and 13 deletions

View File

@ -129,9 +129,12 @@ x
str_view(x)
```
Note that `str_view()` uses a blue background for tabs to make them easier to spot.
Note that `str_view()` uses curly braces for tabs to make them easier to spot[^strings-3].
One of the challenges of working with text is that there's a variety of ways that white space can end up in the text, so this background helps you recognize that something strange is going on.
[^strings-3]: `str_view()` also uses color to bring tabs, spaces, matches, etc. to your attention.
The colors don't currently show up in the book, but you'll notice them when running code interactively.
### Exercises
1. Create strings that contain the following values:
@ -189,9 +192,9 @@ df |>
### `str_glue()` {#sec-glue}
If you are mixing many fixed and variable strings with `str_c()`, you'll notice that you type a lot of `"`s, making it hard to see the overall goal of the code. An alternative approach is provided by the [glue package](https://glue.tidyverse.org) via `str_glue()`[^strings-3]. You give it a single string that has a special feature: anything inside `{}` will be evaluated like it's outside of the quotes:
If you are mixing many fixed and variable strings with `str_c()`, you'll notice that you type a lot of `"`s, making it hard to see the overall goal of the code. An alternative approach is provided by the [glue package](https://glue.tidyverse.org) via `str_glue()`[^strings-4]. You give it a single string that has a special feature: anything inside `{}` will be evaluated like it's outside of the quotes:
[^strings-3]: If you're not using stringr, you can also access it directly with `glue::glue()`.
[^strings-4]: If you're not using stringr, you can also access it directly with `glue::glue()`.
```{r}
df |> mutate(greeting = str_glue("Hi {name}!"))
@ -211,9 +214,9 @@ df |> mutate(greeting = str_glue("{{Hi {name}!}}"))
`str_c()` and `str_glue()` work well with `mutate()` because their output is the same length as their inputs.
What if you want a function that works well with `summarize()`, i.e. something that always returns a single string?
That's the job of `str_flatten()`[^strings-4]: it takes a character vector and combines each element of the vector into a single string:
That's the job of `str_flatten()`[^strings-5]: it takes a character vector and combines each element of the vector into a single string:
[^strings-4]: The base R equivalent is `paste()` used with the `collapse` argument.
[^strings-5]: The base R equivalent is `paste()` used with the `collapse` argument.
```{r}
str_flatten(c("x", "y", "z"))
@ -344,11 +347,11 @@ df4 |>
### Diagnosing widening problems
`separate_wider_delim()`[^strings-5] requires a fixed and known set of columns.
`separate_wider_delim()`[^strings-6] requires a fixed and known set of columns.
What happens if some of the rows don't have the expected number of pieces?
There are two possible problems, too few or too many pieces, so `separate_wider_delim()` provides two arguments to help: `too_few` and `too_many`. Let's first look at the `too_few` case with the following sample dataset:
[^strings-5]: The same principles apply to `separate_wider_position()` and `separate_wider_regex()`.
[^strings-6]: The same principles apply to `separate_wider_position()` and `separate_wider_regex()`.
```{r}
#| error: true
@ -463,9 +466,9 @@ You'll learn how to find the length of a string, extract substrings, and handle
str_length(c("a", "R for data science", NA))
```
You could use this with `count()` to find the distribution of lengths of US babynames and then with `filter()` to look at the longest names, which happen to have 15 letters[^strings-6]:
You could use this with `count()` to find the distribution of lengths of US babynames and then with `filter()` to look at the longest names, which happen to have 15 letters[^strings-7]:
[^strings-6]: Looking at these entries, we'd guess that the babynames data drops spaces or hyphens and truncates after 15 letters.
[^strings-7]: Looking at these entries, we'd guess that the babynames data drops spaces or hyphens and truncates after 15 letters.
```{r}
babynames |>
@ -547,9 +550,9 @@ readr uses UTF-8 everywhere.
This is a good default but will fail for data produced by older systems that don't use UTF-8.
If this happens, your strings will look weird when you print them.
Sometimes just one or two characters might be messed up; other times, you'll get complete gibberish.
For example here are two inline CSVs with unusual encodings[^strings-7]:
For example here are two inline CSVs with unusual encodings[^strings-8]:
[^strings-7]: Here I'm using the special `\x` to encode binary data directly into a string.
[^strings-8]: Here I'm using the special `\x` to encode binary data directly into a string.
```{r}
#| eval: false
@ -630,10 +633,10 @@ str_to_upper(c("i", "ı"))
str_to_upper(c("i", "ı"), locale = "tr")
```
Sorting strings depends on the order of the alphabet, and the order of the alphabet is not the same in every language[^strings-8]!
Sorting strings depends on the order of the alphabet, and the order of the alphabet is not the same in every language[^strings-9]!
Here's an example: in Czech, "ch" is a compound letter that appears after `h` in the alphabet.
[^strings-8]: Sorting in languages that don't have an alphabet, like Chinese, is more complicated still.
[^strings-9]: Sorting in languages that don't have an alphabet, like Chinese, is more complicated still.
```{r}
str_sort(c("a", "c", "ch", "h", "z"))