common is now words

This commit is contained in:
hadley 2016-07-20 11:51:53 -05:00
parent fa8940ecab
commit e56032efe3
1 changed files with 7 additions and 7 deletions

View File

@ -418,18 +418,18 @@ Remember that when you use a logical vector in a numeric context, `FALSE` become
```{r}
# How many common words start with t?
sum(str_detect(common, "^t"))
sum(str_detect(words, "^t"))
# What proportion of common words end with a vowel?
mean(str_detect(common, "[aeiou]$"))
mean(str_detect(words, "[aeiou]$"))
```
When you have complex logical conditions (e.g. match a or b but not c unless d) it's often easier to combine multiple `str_detect()` calls with logical operators, rather than trying to create a single regular expression. For example, here are two ways to find all words that don't contain any vowels:
```{r}
# Find all words containing at least one vowel, and negate
no_vowels_1 <- !str_detect(common, "[aeiou]")
no_vowels_1 <- !str_detect(words, "[aeiou]")
# Find all words consisting only of consonants (non-vowels)
no_vowels_2 <- str_detect(common, "^[^aeiou]+$")
no_vowels_2 <- str_detect(words, "^[^aeiou]+$")
all.equal(no_vowels_1, no_vowels_2)
```
@ -438,8 +438,8 @@ The results are identical, but I think the first approach is significantly easie
A common use of `str_detect()` is to select the elements that match a pattern. You can do this with logical subsetting, or the convenient `str_subset()` wrapper:
```{r}
common[str_detect(common, "x$")]
str_subset(common, "x$")
words[str_detect(words, "x$")]
str_subset(words, "x$")
```
A variation on `str_detect()` is `str_count()`: rather than a simple yes or no, it tells you how many matches there are in a string:
@ -449,7 +449,7 @@ x <- c("apple", "banana", "pear")
str_count(x, "a")
# On average, how many vowels per word?
mean(str_count(common, "[aeiou]"))
mean(str_count(words, "[aeiou]"))
```
Note that matches never overlap. For example, in `"abababa"`, how many times will the pattern `"aba"` match? Regular expressions say two, not three: