parent
1d0902c9bf
commit
5162de55ea
37
logicals.qmd
37
logicals.qmd
|
@ -15,7 +15,7 @@ It's relatively rare to find logical vectors in your raw data, but you'll create
|
||||||
|
|
||||||
We'll begin by discussing the most common way of creating logical vectors: with numeric comparisons.
|
We'll begin by discussing the most common way of creating logical vectors: with numeric comparisons.
|
||||||
Then you'll learn about how you can use Boolean algebra to combine different logical vectors, as well as some useful summaries.
|
Then you'll learn about how you can use Boolean algebra to combine different logical vectors, as well as some useful summaries.
|
||||||
We'll finish off with some tools for making conditional changes, and a cool hack for turning logical vectors into groups.
|
We'll finish off with some tools for making conditional changes, and a useful function for turning logical vectors into groups.
|
||||||
|
|
||||||
### Prerequisites
|
### Prerequisites
|
||||||
|
|
||||||
|
@ -546,13 +546,12 @@ flights |>
|
||||||
|
|
||||||
## Making groups {#sec-groups-from-logical}
|
## Making groups {#sec-groups-from-logical}
|
||||||
|
|
||||||
Before we move on to the next chapter, we want to show you one last trick.
|
Before we move on to the next chapter, we want to show you one last trick that's useful for grouping data.
|
||||||
We don't know exactly how to describe it, and it feels a little magical, but it's super handy so we wanted to make sure you knew about it.
|
Sometimes you want to start a new group every time some event occurs.
|
||||||
Sometimes you want to divide your dataset up into groups based on the occurrence of some event.
|
|
||||||
For example, when you're looking at website data, it's common to want to break up events into sessions, where a session is defined as a gap of more than x minutes since the last activity.
|
For example, when you're looking at website data, it's common to want to break up events into sessions, where a session is defined as a gap of more than x minutes since the last activity.
|
||||||
|
|
||||||
Here's some made up data that illustrates the problem.
|
Here's some made up data that illustrates the problem.
|
||||||
We've computed the time lag between the events, and figured out if there's a gap that's big enough to qualify.
|
So far computed the time lag between the events, and figured out if there's a gap that's big enough to qualify:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
events <- tibble(
|
events <- tibble(
|
||||||
|
@ -566,12 +565,32 @@ events <- events |>
|
||||||
events
|
events
|
||||||
```
|
```
|
||||||
|
|
||||||
How do we go from that logical vector to something that we can `group_by()`?
|
But how do we go from that logical vector to something that we can `group_by()`?
|
||||||
You can use the cumulative sum, `cumsum(),` to turn this logical vector into a unique group identifier.
|
`consecutive_id()` comes to the rescue:
|
||||||
Remember that whenever you use a logical vector in a numeric context `TRUE` becomes 1 and `FALSE` becomes 0, taking the cumulative sum of a logical vector creates a numeric index that increments every time it sees a `TRUE`.
|
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
events |> mutate(
|
events |> mutate(
|
||||||
group = cumsum(gap) + 1
|
group = consecutive_id(gap)
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
`consecutive_id()` starts a new group every time one of its arguments changes.
|
||||||
|
That makes it useful both here, with logical vectors, and in many other place.
|
||||||
|
For example, inspired by [this stackoverflow question](https://stackoverflow.com/questions/27482712), imagine you have a data frame with a bunch of repeated values:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
df <- tibble(
|
||||||
|
x = c("a", "a", "a", "b", "c", "c", "d", "e", "a", "a", "b", "b"),
|
||||||
|
y = c(1, 2, 3, 2, 4, 1, 3, 9, 4, 8, 10, 199)
|
||||||
|
)
|
||||||
|
df
|
||||||
|
```
|
||||||
|
|
||||||
|
You want to keep the first row from each repeated `x`.
|
||||||
|
That's easier to express with a combination of `consecutive_id()` and `slice_head()`:
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
df |>
|
||||||
|
group_by(id = consecutive_id(grp)) |>
|
||||||
|
slice_head(n = 1)
|
||||||
|
```
|
||||||
|
|
Loading…
Reference in New Issue