Final polishing
This commit is contained in:
parent
14c267391c
commit
f497d3d996
54
logicals.Rmd
54
logicals.Rmd
|
@ -1,7 +1,7 @@
|
||||||
# Logical vectors {#logicals}
|
# Logical vectors {#logicals}
|
||||||
|
|
||||||
```{r, results = "asis", echo = FALSE}
|
```{r, results = "asis", echo = FALSE}
|
||||||
status("drafting")
|
status("polishing")
|
||||||
```
|
```
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
@ -412,39 +412,40 @@ Also note the difference in the group size: in the first chunk `n()` gives the n
|
||||||
|
|
||||||
## Conditional transformations
|
## Conditional transformations
|
||||||
|
|
||||||
One of the most powerful features of logical vectors are their use for conditional transformations, i.e. returning one value for true values, and a different value for false values.
|
One of the most powerful features of logical vectors are their use for conditional transformations, i.e. doing one thing for condition x, and something different for condition y.
|
||||||
There are two important tools for this: `if_else()` and `case_when()`.
|
There are two important tools for this: `if_else()` and `case_when()`.
|
||||||
|
|
||||||
### `if_else()`
|
### `if_else()`
|
||||||
|
|
||||||
If you want to use one value when a condition is true and another value when it's `FALSE`, you can use `dplyr::if_else()`[^logicals-4].
|
If you want to use one value when a condition is true and another value when it's `FALSE`, you can use `dplyr::if_else()`[^logicals-4].
|
||||||
Let's begin with a few simple examples.
|
|
||||||
You'll always use the first three argument of `if_else(`).
|
You'll always use the first three argument of `if_else(`).
|
||||||
The first argument is a logical condition, the second argument decides determines the output if the condition is true, and the third argument determines the output if the condition is false.
|
The first argument, `condition`, is a logical vector, the second, `true`, gives the output when the condition is true, and the third, `false`, gives the output if the condition is false.
|
||||||
|
|
||||||
[^logicals-4]: dplyr's `if_else()` is very similar to base R's `ifelse()`.
|
[^logicals-4]: dplyr's `if_else()` is very similar to base R's `ifelse()`.
|
||||||
There are two main advantages of `if_else()`over `ifelse()`: you can choose what should happen to missing values, and `if_else()` is much more likely to give you a meaningful error if you variables have incompatible types.
|
There are two main advantages of `if_else()`over `ifelse()`: you can choose what should happen to missing values, and `if_else()` is much more likely to give you a meaningful error if you variables have incompatible types.
|
||||||
|
|
||||||
|
Let's begin with a simple example of labeling a numeric vector as either "+ve" or "-ve":
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
x <- c(-3:3, NA)
|
x <- c(-3:3, NA)
|
||||||
if_else(x < 0, "-ve", "+ve")
|
if_else(x > 0, "+ve", "-ve")
|
||||||
```
|
```
|
||||||
|
|
||||||
There's an optional fourth argument which will be used if the input is missing:
|
There's an optional fourth argument, `missing` which will be used if the input is `NA`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
if_else(x < 0, "-ve", "+ve", "???")
|
if_else(x > 0, "+ve", "-ve", "???")
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also include vectors for the the `true` and `false` arguments.
|
You can also use vectors for the the `true` and `false` arguments.
|
||||||
For example, this allows you to create your own implementation of `abs()`:
|
For example, this allows us to create a minimal implementation of `abs()`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
if_else(x < 0, -x, x)
|
if_else(x < 0, -x, x)
|
||||||
```
|
```
|
||||||
|
|
||||||
So far all the arguments have used the same vectors, but you can of course mix and match.
|
So far all the arguments have used the same vectors, but you can of course mix and match.
|
||||||
For example, you could implement a simple version of `coalesce()` this way:
|
For example, you could implement a simple version of `coalesce()` like this:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
x1 <- c(NA, 1, 2, NA)
|
x1 <- c(NA, 1, 2, NA)
|
||||||
|
@ -452,21 +453,23 @@ y1 <- c(3, NA, 4, 6)
|
||||||
if_else(is.na(x1), y1, x1)
|
if_else(is.na(x1), y1, x1)
|
||||||
```
|
```
|
||||||
|
|
||||||
If you need to create more complex conditions, you can string together multiple `if_elses()`s, but this quickly gets hard to read.
|
You might have noticed a small infelicity in our labeling: zero is neither positive nor negative.
|
||||||
|
We could resolves this by adding an additional `if_else():`
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
if_else(x == 0, "0", if_else(x < 0, "-ve", "+ve"), "???")
|
if_else(x == 0, "0", if_else(x < 0, "-ve", "+ve"), "???")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
This is already a little hard to read, and you can imagine it would only get harder if you have more conditions.
|
||||||
Instead, you can switch to `dplyr::case_when()`.
|
Instead, you can switch to `dplyr::case_when()`.
|
||||||
|
|
||||||
### `case_when()`
|
### `case_when()`
|
||||||
|
|
||||||
Inspired by SQL.
|
dplyr's `case_when()` is inspired by SQL's `CASE` statement and provides a flexible way of performing different computations for different computations.
|
||||||
|
It has a special syntax that unfortunately looks like nothing else you'll use in the tidyverse.
|
||||||
`case_when()` has a special syntax that unfortunately looks like nothing else you'll use in the tidyverse.
|
|
||||||
it takes pairs that look like `condition ~ output`.
|
it takes pairs that look like `condition ~ output`.
|
||||||
`condition` must be a logical vector; when it's `TRUE`, `output` will be used.
|
`condition` must be a logical vector; when it's `TRUE`, `output` will be used.
|
||||||
|
|
||||||
This means we could recreate our previous nested `if_else()` as follows:
|
This means we could recreate our previous nested `if_else()` as follows:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -478,8 +481,6 @@ case_when(
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
(Note that I've added spaces before the `~` to make the outputs line up so it's easier to scan)
|
|
||||||
|
|
||||||
This is more code, but it's also more explicit.
|
This is more code, but it's also more explicit.
|
||||||
|
|
||||||
To explain how `case_when()` works, lets explore some simpler cases.
|
To explain how `case_when()` works, lets explore some simpler cases.
|
||||||
|
@ -492,7 +493,7 @@ case_when(
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
If you want to create a "default"/catch all value, put `TRUE` on the left hand side:
|
If you want to create a "default"/catch all value, use `TRUE` on the left hand side:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
case_when(
|
case_when(
|
||||||
|
@ -502,7 +503,7 @@ case_when(
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that if multiple conditions match, only the first will be used:
|
And note that if multiple conditions match, only the first will be used:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
case_when(
|
case_when(
|
||||||
|
@ -512,7 +513,7 @@ case_when(
|
||||||
```
|
```
|
||||||
|
|
||||||
Just like with `if_else()` you can use variables on both sides of the `~` and you can mix and match variables as needed for your problem.
|
Just like with `if_else()` you can use variables on both sides of the `~` and you can mix and match variables as needed for your problem.
|
||||||
Finally, you'll typically use with `mutate()`.
|
For example, we could use `case_when()` to provide some human readable labels for the arrival delay:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
flights |>
|
flights |>
|
||||||
|
@ -531,12 +532,14 @@ flights |>
|
||||||
|
|
||||||
## Making groups
|
## Making groups
|
||||||
|
|
||||||
Before we move on to the next chapter, I want to show you one last handy trick.
|
Before we move on to the next chapter, I want to show you one last trick.
|
||||||
I don't know exactly how to describe it, and it feels a little magical, but it's super handy so I wanted to make sure you knew about it.
|
I don't know exactly how to describe it, and it feels a little magical, but it's super handy so I wanted to make sure you knew about it.
|
||||||
|
Sometimes you want to divide your dataset up into groups based on the occurrence of some event.
|
||||||
Sometimes you want to divide your dataset up into groups whenever some event occurs.
|
|
||||||
For example, when you're looking at website data it's common to want to break up events into sessions, where a session is defined an a gap of more than x minutes since the last activity.
|
For example, when you're looking at website data it's common to want to break up events into sessions, where a session is defined an a gap of more than x minutes since the last activity.
|
||||||
|
|
||||||
|
Here's some made up data that illustrates the problem.
|
||||||
|
I've computed the time lag between the events, and figured out if there's a gap that's big enough to qualify.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
events <- tibble(
|
events <- tibble(
|
||||||
time = c(0, 1, 2, 3, 5, 10, 12, 15, 17, 19, 20, 27, 28, 30)
|
time = c(0, 1, 2, 3, 5, 10, 12, 15, 17, 19, 20, 27, 28, 30)
|
||||||
|
@ -549,7 +552,8 @@ events <- events |>
|
||||||
events
|
events
|
||||||
```
|
```
|
||||||
|
|
||||||
We can use the cumulative sum, `cumsum(),` to turn this logical vector into a unique group identifier.
|
How do I go from that logical vector to something that I can `group_by()`?
|
||||||
|
You can use the cumulative sum, `cumsum(),` to turn this logical vector into a unique group identifier.
|
||||||
Remember that whenever you use a logical vector in a numeric context `TRUE` becomes 1 and `FALSE` becomes 0, taking the cumulative sum of a logical vector creates a numeric index that increments every time it sees a `TRUE`.
|
Remember that whenever you use a logical vector in a numeric context `TRUE` becomes 1 and `FALSE` becomes 0, taking the cumulative sum of a logical vector creates a numeric index that increments every time it sees a `TRUE`.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
@ -557,7 +561,3 @@ events |> mutate(
|
||||||
group = cumsum(gap) + 1
|
group = cumsum(gap) + 1
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Exercises
|
|
||||||
|
|
||||||
1. For each plane, count the number of flights before the first delay of greater than 1 hour.
|
|
||||||
|
|
Loading…
Reference in New Issue