Use metacharacter + literal character terms

This commit is contained in:
Hadley Wickham 2022-11-07 09:50:57 -06:00
parent 45353e0a58
commit f4f739bccb
1 changed files with 9 additions and 9 deletions

View File

@ -57,8 +57,8 @@ str_view(fruit, "berry")
str_view(fruit, "BERRY")
```
While letter and number match exactly, punctuation characters like `.`, `+`, `*`, `[`, `]`, `?` have special meanings[^regexps-2].
For example, `.`
Letters and numbers match exactly and so are called **literal characters**.
Punctuation characters like `.`, `+`, `*`, `[`, `]`, `?` have special meanings[^regexps-2] and are called **meta-characters**. For example, `.`
will match any character[^regexps-3], so `"a."` will match any string that contains an "a" followed by another character
:
@ -300,7 +300,7 @@ If the match fails, you can use `too_short = "debug"` to figure out what went wr
## Pattern details
Now that you understand the basics of the pattern language and how it use it with some stringr and tidyr functions, its time to dig into more of the details.
First, we'll start with **escaping**, which allows you to match characters that the pattern language otherwise treats specially.
First, we'll start with **escaping**, which allows you to match metacharacters that would otherwise be treated specially.
Next you'll learn about **anchors**, which allow you to match the start or end of the string.
Then you'll more learn about **character classes** and their shortcuts, which allow you to match any character from a set.
Next you'll learn the final details of **quantifiers**, which control how many times a pattern can match.
@ -312,11 +312,11 @@ They're not always the most evocative of their purpose, but it's very helpful to
### Escaping {#sec-regexp-escaping}
In order to match a literal `.`, you need an **escape**, which tells the regular expression to ignore the special behavior and match exactly.
In order to match a literal `.`, you need an **escape**, which tells the regular expression to match metacharacters literally.
Like strings, regexps use the backslash for escaping, so to match a `.`, you need the regexp `\.`.
Unfortunately this creates a problem.
We use strings to represent regular expressions, and `\` is also used as an escape symbol in strings.
So, as the following example shows, to create the regular expression `\.` we need the string `"\\."`.
So to create the regular expression `\.` we need the string `"\\."`, as the following example shows.
```{r}
# To create the regular expression \., we need to use \\.
@ -350,7 +350,7 @@ That lets you to avoid one layer of escaping:
str_view(x, r"{\\}")
```
The full set of characters with special meanings that need to be escaped is `.^$\|*+?{}[]()`.
The full set of metacharacters is `.^$\|*+?{}[]()`.
In general, look at punctuation characters with suspicion; if your regular expression isn't matching what you think it should, check if you've used any of these characters.
### Anchors
@ -574,7 +574,7 @@ str_match(x, "gr(?:e|a)y")
## Pattern control
It's possible to exercise extra control over the details of the match by using a special pattern object instead of just a string.
It's possible to exercise extra control over the details of the match by using a pattern object instead of just a string.
This allows you control the so called regex flags and match various types of fixed strings, as described below.
### Regex flags {#sec-flags}
@ -809,8 +809,8 @@ pattern <- str_c("\\b(", str_flatten(cols, "|"), ")\\b")
str_view(sentences, pattern)
```
In this example `cols` only contains numbers and letters so you don't need to worry about special characters.
But in general, whenever you create create patterns from existing strings it's wise to run them through `str_escape()` to escape any special behavior.
In this example `cols` only contains numbers and letters so you don't need to worry about metacharacters.
But in general, whenever you create create patterns from existing strings it's wise to run them through `str_escape()` to ensure they match literally.
### Exercises