r4ds/missing-values.Rmd

71 lines
1.4 KiB
Plaintext
Raw Normal View History

# Missing values {#missing-values}
2021-03-04 01:13:14 +08:00
## Introduction
2021-04-19 20:56:29 +08:00
## Basics
### Missing values {#missing-values-filter}
One important feature of R that can make comparison tricky is missing values, or `NA`s ("not availables").
`NA` represents an unknown value so missing values are "contagious": almost any operation involving an unknown value will also be unknown.
```{r}
NA > 5
10 == NA
NA + 10
NA / 2
```
The most confusing result is this one:
```{r}
NA == NA
```
It's easiest to understand why this is true with a bit more context:
```{r}
# Let x be Mary's age. We don't know how old she is.
x <- NA
# Let y be John's age. We don't know how old he is.
y <- NA
# Are John and Mary the same age?
x == y
# We don't know!
```
If you want to determine if a value is missing, use `is.na()`:
```{r}
is.na(x)
```
## dplyr verbs
`filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values.
If you want to preserve missing values, ask for them explicitly:
```{r}
df <- tibble(x = c(1, NA, 3))
filter(df, x > 1)
filter(df, is.na(x) | x > 1)
```
Missing values are always sorted at the end:
```{r}
df <- tibble(x = c(5, 2, NA))
arrange(df, x)
arrange(df, desc(x))
```
## Exercises
1. Why is `NA ^ 0` not missing?
Why is `NA | TRUE` not missing?
Why is `FALSE & NA` not missing?
Can you figure out the general rule?
(`NA * 0` is a tricky counterexample!)