diff --git a/strings.Rmd b/strings.Rmd index 4cdd869..c272d79 100644 --- a/strings.Rmd +++ b/strings.Rmd @@ -418,18 +418,18 @@ Remember that when you use a logical vector in a numeric context, `FALSE` become ```{r} # How many common words start with t? -sum(str_detect(common, "^t")) +sum(str_detect(words, "^t")) # What proportion of common words end with a vowel? -mean(str_detect(common, "[aeiou]$")) +mean(str_detect(words, "[aeiou]$")) ``` When you have complex logical conditions (e.g. match a or b but not c unless d) it's often easier to combine multiple `str_detect()` calls with logical operators, rather than trying to create a single regular expression. For example, here are two ways to find all words that don't contain any vowels: ```{r} # Find all words containing at least one vowel, and negate -no_vowels_1 <- !str_detect(common, "[aeiou]") +no_vowels_1 <- !str_detect(words, "[aeiou]") # Find all words consisting only of consonants (non-vowels) -no_vowels_2 <- str_detect(common, "^[^aeiou]+$") +no_vowels_2 <- str_detect(words, "^[^aeiou]+$") all.equal(no_vowels_1, no_vowels_2) ``` @@ -438,8 +438,8 @@ The results are identical, but I think the first approach is significantly easie A common use of `str_detect()` is to select the elements that match a pattern. You can do this with logical subsetting, or the convenient `str_subset()` wrapper: ```{r} -common[str_detect(common, "x$")] -str_subset(common, "x$") +words[str_detect(words, "x$")] +str_subset(words, "x$") ``` A variation on `str_detect()` is `str_count()`: rather than a simple yes or no, it tells you how many matches there are in a string: @@ -449,7 +449,7 @@ x <- c("apple", "banana", "pear") str_count(x, "a") # On average, how many vowels per word? -mean(str_count(common, "[aeiou]")) +mean(str_count(words, "[aeiou]")) ``` Note that matches never overlap. For example, in `"abababa"`, how many times will the pattern `"aba"` match? Regular expressions say two, not three: