From 03eb8d06a97a89918083d848b1b7d422d465c9c9 Mon Sep 17 00:00:00 2001 From: "Jennifer (Jenny) Bryan" Date: Wed, 20 Jun 2018 20:08:05 -0700 Subject: [PATCH] Mention the use of a character class for metacharacters (#687) Closes #673 --- strings.Rmd | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/strings.Rmd b/strings.Rmd index 1d500a2..ae5207f 100644 --- a/strings.Rmd +++ b/strings.Rmd @@ -299,6 +299,17 @@ There are a number of special patterns that match more than one character. You'v Remember, to create a regular expression containing `\d` or `\s`, you'll need to escape the `\` for the string, so you'll type `"\\d"` or `"\\s"`. +A character class containing a single character is a nice alternative to backslash escapes when you want to include a single metacharacter in a regex. Many people find this more readable. + +```{r} +# Look for a literal character that normally has special meaning in a regex +str_view(c("abc", "a.c", "a*c", "a c"), "a[.]c") +str_view(c("abc", "a.c", "a*c", "a c"), ".[*]c") +str_view(c("abc", "a.c", "a*c", "a c"), "a[ ]") +``` + +This works for most (but not all) regex metacharacters: `$` `.` `|` `?` `*` `+` `(` `)` `[` `{`. Unfortunately, a few characters have special meaning even inside a character class and must be handled with backslash escapes: `]` `\` `^` and `-`. + You can use _alternation_ to pick between one or more alternative patterns. For example, `abc|d..f` will match either '"abc"', or `"deaf"`. Note that the precedence for `|` is low, so that `abc|xyz` matches `abc` or `xyz` not `abcyz` or `abxyz`. Like with mathematical expressions, if precedence ever gets confusing, use parentheses to make it clear what you want: ```{r}