In this chapter, you'll learn useful tools for working with logical and numeric vectors.
You'll learn them together because they have an important connection: when you use a logical vector in a numeric context, `TRUE` becomes 1 and `FALSE` becomes 0, and when you use a numeric vector in a logical context, 0 becomes `FALSE` and everything else becomes `TRUE`.
Sometimes you can simplify complicated subsetting by remembering De Morgan's law: `!(x & y)` is the same as `!x | !y`, and `!(x | y)` is the same as `!x & !y`.
For example, if you wanted to find flights that weren't delayed (on arrival or departure) by more than two hours, you could use either of the following two filters:
These are called short-circuiting operators and you'll learn when you should use them in Section \@ref(conditional-execution) on conditional execution.
Whenever you start using complicated, multi-part expressions in `filter()`, consider making them explicit variables instead.
That makes it much easier to check your work.When checking your work, a particularly useful `mutate()` argument is `.keep = "used"`: this will just show you the variables you've used, along with the variables that you created.
This makes it easy to see the variables involved side-by-side.
If you want to use one value when a condition is true and another value when it's `FALSE`, you can use `if_else()`[^logicals-numbers-1].
[^logicals-numbers-1]: This is equivalent to the base R function `ifelse`.
There are two main advantages of `if_else()`over `ifelse()`: you can choose what should happen to missing values, and `if_else()` is much more likely to give you a meaningful error message if you use the wrong type of variable.
(Note that I usually add spaces to make the outputs line up so it's easier to scan)
If none of the cases match, the output will be missing:
```{r}
x <- 1:10
case_when(
x %% 2 == 0 ~ "even",
)
```
You can create a catch all value by using `TRUE` as the condition:
```{r}
case_when(
x %% 2 == 0 ~ "even",
TRUE ~ "odd"
)
```
If multiple conditions are `TRUE`, the first is used:
```{r}
case_when(
x < 5 ~ "< 5",
x < 3 ~ "< 3",
)
```
### Summaries
There are four particularly useful summary functions for logical vectors: they all take a vector of logical values and return a single value, making them a good fit for use in `summarise()`.
`any()` and `all()` --- `any()` will return if there's at least one `TRUE`, `all()` will return `TRUE` if all values are `TRUE`.
Like all summary functions, they'll return `NA` if there are any missing values present, and like usual you can make the missing values go away with `na.rm = TRUE`.
`sum()` and `mean()` are particularly useful with logical vectors because `TRUE` is converted to 1 and `FALSE` to 0.
This means that `sum(x)` gives the number of `TRUE`s in `x` and `mean(x)` gives the proportion of `TRUE`s:
2. What does `prod()` return when applied to a logical vector? What logical summary function is it equivalent to? What does `min()` return applied to a logical vector? What logical summary function is it equivalent to?
There are many functions for creating new variables that you can use with `mutate()`.
The key property is that the function must be vectorised: it must take a vector of values as input, return a vector with the same number of values as output.
There's no way to list every possible function that you might use, but here's a selection of functions that are frequently useful:
- Arithmetic operators: `+`, `-`, `*`, `/`, `^`.
These are all vectorised, using the so called "recycling rules".
If one parameter is shorter than the other, it will be automatically extended to be the same length.
This is most useful when one of the arguments is a single number: `air_time / 60`, `hours * 60 + minute`, etc.
- Trigonometry: R provides all the trigonometry functions that you might expect.
I'm not going to enumerate them here since it's rare that you need them for data science, but you can sleep soundly at night knowing that they're available if you need them.
Logarithms are an incredibly useful transformation for dealing with data that ranges across multiple orders of magnitude.
They also convert multiplicative relationships to additive.
All else being equal, I recommend using `log2()` because it's easy to interpret: a difference of 1 on the log scale corresponds to doubling on the original scale and a difference of -1 corresponds to halving.
- Arithmetic operators are also useful in conjunction with the aggregate functions you'll learn about later. For example, `x / sum(x)` calculates the proportion of a total, and `y - mean(y)` computes the difference from the mean.
If you're doing a complex sequence of logical operations it's often a good idea to store the interim values in new variables so you can check that each step is working as expected.
Computers use finite precision arithmetic (they obviously can't store an infinite number of digits!) so remember that every number you see is an approximation.