# Missing values {#missing-values} ## Introduction ## Basics ### Missing values {#missing-values-filter} One important feature of R that can make comparison tricky is missing values, or `NA`s ("not availables"). `NA` represents an unknown value so missing values are "contagious": almost any operation involving an unknown value will also be unknown. ```{r} NA > 5 10 == NA NA + 10 NA / 2 ``` The most confusing result is this one: ```{r} NA == NA ``` It's easiest to understand why this is true with a bit more context: ```{r} # Let x be Mary's age. We don't know how old she is. x <- NA # Let y be John's age. We don't know how old he is. y <- NA # Are John and Mary the same age? x == y # We don't know! ``` If you want to determine if a value is missing, use `is.na()`: ```{r} is.na(x) ``` ## dplyr verbs `filter()` only includes rows where the condition is `TRUE`; it excludes both `FALSE` and `NA` values. If you want to preserve missing values, ask for them explicitly: ```{r} df <- tibble(x = c(1, NA, 3)) filter(df, x > 1) filter(df, is.na(x) | x > 1) ``` Missing values are always sorted at the end: ```{r} df <- tibble(x = c(5, 2, NA)) arrange(df, x) arrange(df, desc(x)) ``` ## Exercises 1. Why is `NA ^ 0` not missing? Why is `NA | TRUE` not missing? Why is `FALSE & NA` not missing? Can you figure out the general rule? (`NA * 0` is a tricky counterexample!)