diff --git a/logicals.qmd b/logicals.qmd index b95760d..9d1359d 100644 --- a/logicals.qmd +++ b/logicals.qmd @@ -480,7 +480,7 @@ Instead, you can switch to `dplyr::case_when()`. ### `case_when()` -dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different computations. +dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different conditions. It has a special syntax that unfortunately looks like nothing else you'll use in the tidyverse. It takes pairs that look like `condition ~ output`. `condition` must be a logical vector; when it's `TRUE`, `output` will be used. diff --git a/numbers.qmd b/numbers.qmd index de9e44b..70bb66b 100644 --- a/numbers.qmd +++ b/numbers.qmd @@ -18,6 +18,11 @@ We'll finish off by covering the summary functions that pair well with `summariz ### Prerequisites +::: callout-important +This chapter relies on features only found in dplyr 1.1.0, which is still in development. +If you want to live on the edge, you can get the dev versions with `devtools::install_github("tidyverse/dplyr")`. +::: + This chapter mostly uses functions from base R, which are available without loading any packages. But we still need the tidyverse because we'll use these base R functions inside of tidyverse functions like `mutate()` and `filter()`. Like in the last chapter, we'll use real examples from nycflights13, as well as toy examples made with `c()` and `tribble()`. @@ -395,7 +400,7 @@ cut(y, breaks = c(0, 5, 10, 15, 20)) See the documentation for other useful arguments like `right` and `include.lowest`, which control if the intervals are `[a, b)` or `(a, b]` and if the lowest interval should be `[a, b]`. -### Cumulative and rolling aggregates +### Cumulative and rolling aggregates {#sec-cumulative-and-rolling-aggregates} Base R provides `cumsum()`, `cumprod()`, `cummin()`, `cummax()` for running, or cumulative, sums, products, mins and maxes. dplyr provides `cummean()` for cumulative means. @@ -544,16 +549,15 @@ events ``` But how do we go from that logical vector to something that we can `group_by()`? -`consecutive_id()` comes to the rescue: +`cumsum()` from @sec-cumulative-and-rolling-aggregates comes to the rescue as each occurring gap, i.e., `gap` is `TRUE`, increments `group` by one (see @sec-numeric-summaries-of-logicals on the numerical interpretation of logicals): ```{r} events |> mutate( - group = consecutive_id(gap) + group = cumsum(gap) ) ``` -`consecutive_id()` starts a new group every time one of its arguments changes. -That makes it useful both here, with logical vectors, and in many other place. +Another approach for creating grouping variables is `consecutive_id()`, which starts a new group every time one of its arguments changes. For example, inspired by [this stackoverflow question](https://stackoverflow.com/questions/27482712), imagine you have a data frame with a bunch of repeated values: ```{r}