diff --git a/datetimes.qmd b/datetimes.qmd index 68add8c..ba2e61e 100644 --- a/datetimes.qmd +++ b/datetimes.qmd @@ -545,7 +545,7 @@ dyears(1) Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds: 60 seconds in a minute, 60 minutes in an hour, 24 hours in a day, and 7 days in a week. Larger time units are more problematic. -A year is uses the "average" number of days in a year, i.e. 365.25. +A year uses the "average" number of days in a year, i.e. 365.25. There's no way to convert a month to a duration, because there's just too much variation. You can add and multiply durations: @@ -565,15 +565,15 @@ last_year <- today() - dyears(1) However, because durations represent an exact number of seconds, sometimes you might get an unexpected result: ```{r} -one_pm <- ymd_hms("2026-03-12 13:00:00", tz = "America/New_York") +one_am <- ymd_hms("2026-03-08 01:00:00", tz = "America/New_York") -one_pm -one_pm + ddays(1) +one_am +one_am + ddays(1) ``` -Why is one day after 1pm March 12, 2pm March 13? +Why is one day after 1am March 8, 2am March 9? If you look carefully at the date you might also notice that the time zones have changed. -March 12 only has 23 hours because it's when DST starts, so if we add a full days worth of seconds we end up with a different time. +March 8 only has 23 hours because it's when DST starts, so if we add a full days worth of seconds we end up with a different time. ### Periods @@ -582,8 +582,8 @@ Periods are time spans but don't have a fixed length in seconds, instead they wo That allows them to work in a more intuitive way: ```{r} -one_pm -one_pm + days(1) +one_am +one_am + days(1) ``` Like durations, periods can be created with a number of friendly constructor functions. @@ -610,8 +610,8 @@ ymd("2024-01-01") + dyears(1) ymd("2024-01-01") + years(1) # Daylight Savings Time -one_pm + ddays(1) -one_pm + days(1) +one_am + ddays(1) +one_am + days(1) ``` Let's use periods to fix an oddity related to our flight dates. diff --git a/missing-values.qmd b/missing-values.qmd index 196cbce..b41cd5a 100644 --- a/missing-values.qmd +++ b/missing-values.qmd @@ -179,7 +179,7 @@ This brings us to another important way of revealing implicitly missing observat You'll learn more about joins in @sec-joins, but we wanted to quickly mention them to you here since you can often only know that values are missing from one dataset when you compare it another. `dplyr::anti_join(x, y)` is a particularly useful tool here because it selects only the rows in `x` that don't have a match in `y`. -For example, we can use two `anti_join()`s reveal to reveal that we're missing information for four airports and 722 planes mentioned in `flights`: +For example, we can use two `anti_join()`s to reveal that we're missing information for four airports and 722 planes mentioned in `flights`: ```{r} library(nycflights13) diff --git a/regexps.qmd b/regexps.qmd index 16946a8..b55fae3 100644 --- a/regexps.qmd +++ b/regexps.qmd @@ -252,8 +252,8 @@ These functions are naturally paired with `mutate()` when doing data cleaning, a ### Extract variables {#sec-extract-variables} The last function we'll discuss uses regular expressions to extract data out of one column into one or more new columns: `separate_wider_regex()`. -It's a peer of the `separate_wider_location()` and `separate_wider_delim()` functions that you learned about in @sec-string-columns. -These functions live in tidyr because the operates on (columns of) data frames, rather than individual vectors. +It's a peer of the `separate_wider_position()` and `separate_wider_delim()` functions that you learned about in @sec-string-columns. +These functions live in tidyr because they operate on (columns of) data frames, rather than individual vectors. Let's create a simple dataset to show how it works. Here we have some data derived from `babynames` where we have the name, gender, and age of a bunch of people in a rather weird format[^regexps-5]: @@ -377,9 +377,9 @@ str_view(fruit, "^a") str_view(fruit, "a$") ``` -It's tempting to think that `$` should matches the start of a string, because that's how we write dollar amounts, but it's not what regular expressions want. +It's tempting to think that `$` should match the start of a string, because that's how we write dollar amounts, but it's not what regular expressions want. -To force a regular expression to only the full string, anchor it with both `^` and `$`: +To force a regular expression to match only the full string, anchor it with both `^` and `$`: ```{r} str_view(fruit, "apple") @@ -387,7 +387,7 @@ str_view(fruit, "^apple$") ``` You can also match the boundary between words (i.e. the start or end of a word) with `\b`. -This can be particularly when using RStudio's find and replace tool. +This can be particularly useful when using RStudio's find and replace tool. For example, if to find all uses of `sum()`, you can search for `\bsum\b` to avoid matching `summarize`, `summary`, `rowsum` and so on: ```{r} @@ -496,7 +496,7 @@ But unlike algebra you're unlikely to remember the precedence rules for regexes, ### Grouping and capturing -As well overriding operator precedence, parentheses have another important effect: they create **capturing groups** that allow you to use sub-components of the match. +As well as overriding operator precedence, parentheses have another important effect: they create **capturing groups** that allow you to use sub-components of the match. The first way to use a capturing group is to refer back to it within a match with **back reference**: `\1` refers to the match contained in the first parenthesis, `\2` in the second parenthesis, and so on. For example, the following pattern finds all fruits that have a repeated pair of letters: @@ -594,7 +594,7 @@ This allows you control the so called regex flags and match various types of fix ### Regex flags {#sec-flags} -There are a number of settings that can use to control the details of the regexp. +There are a number of settings that can be used to control the details of the regexp. These settings are often called **flags** in other programming languages. In stringr, you can use these by wrapping the pattern in a call to `regex()`. The most useful flag is probably `ignore_case = TRUE` because it allows characters to match either their uppercase or lowercase forms: @@ -680,7 +680,7 @@ str_view("i İ ı I", coll("İ", ignore_case = TRUE, locale = "tr")) To put these ideas into practice we'll solve a few semi-authentic problems next. We'll discuss three general techniques: -1. checking you work by creating simple positive and negative controls +1. checking your work by creating simple positive and negative controls 2. combining regular expressions with Boolean algebra 3. creating complex patterns using string manipulation @@ -830,7 +830,7 @@ str_view(sentences, pattern) ``` In this example, `cols` only contains numbers and letters so you don't need to worry about metacharacters. -But in general, whenever you create create patterns from existing strings it's wise to run them through `str_escape()` to ensure they match literally. +But in general, whenever you create patterns from existing strings it's wise to run them through `str_escape()` to ensure they match literally. ### Exercises @@ -862,10 +862,10 @@ There are three other particularly useful places where you might want to use a r - `matches(pattern)` will select all variables whose name matches the supplied pattern. It's a "tidyselect" function that you can use anywhere in any tidyverse function that selects variables (e.g. `select()`, `rename_with()` and `across()`). -- `pivot_longer()'s` `names_pattern` argument takes a vector of regular expressions, just like `separate_with_regex()`. +- `pivot_longer()'s` `names_pattern` argument takes a vector of regular expressions, just like `separate_wider_regex()`. It's useful when extracting data out of variable names with a complex structure -- The `delim` argument in `separate_delim_longer()` and `separate_delim_wider()` usually matches a fixed string, but you can use `regex()` to make it match a pattern. +- The `delim` argument in `separate_longer_delim()` and `separate_wider_delim()` usually matches a fixed string, but you can use `regex()` to make it match a pattern. This is useful, for example, if you want to match a comma that is optionally followed by a space, i.e. `regex(", ?")`. ### Base R