This commit is contained in:
Hadley Wickham 2022-08-08 13:44:06 -05:00
parent 8dc95f9c6d
commit 01b86c5025
1 changed files with 38 additions and 2 deletions

View File

@ -335,6 +335,42 @@ round(x / 4) * 4
round(x / 0.25) * 0.25
```
### Cutting numbers into ranges
Use `cut()`[^numbers-1] to break up a numeric vector into discrete buckets:
[^numbers-1]: ggplot2 provides some helpers for common cases in `cut_interval()`, `cut_number()`, and `cut_width()`.
ggplot2 is an admittedly weird place for these functions to live, but they are useful as part of histogram computation and were written before any other parts of the tidyverse existed.
```{r}
x <- c(1, 2, 5, 10, 15, 20)
cut(x, breaks = c(0, 5, 10, 15, 20))
```
The breaks don't need to be evenly spaced:
```{r}
cut(x, breaks = c(0, 5, 10, 100))
```
You can optionally supply your own `labels`.
Note that there should be one less `labels` than `breaks`.
`right` and `include.lowest` control the details of the intervals.
```{r}
cut(x,
breaks = c(0, 5, 10, 15, 20),
labels = c("sm", "md", "lg", "xl")
)
```
Any values outside of the range of the breaks will become `NA`:
```{r}
y <- c(NA, -10, 5, 10, 30)
cut(y, breaks = c(0, 5, 10, 15, 20))
```
### Cumulative and rolling aggregates
Base R provides `cumsum()`, `cumprod()`, `cummin()`, `cummax()` for running, or cumulative, sums, products, mins and maxes.
@ -560,9 +596,9 @@ flights |>
You might also wonder about the **mode**, or the most common value.
This is a summary that only works well for very simple cases (which is why you might have learned about it in high school), but it doesn't work well for many real datasets.
If the data is discrete, there may be multiple most common values, and if the data is continuous, there might be no most common value because every value is ever so slightly different.
For these reasons, the mode tends not to be used by statisticians and there's no mode function included in base R[^numbers-1].
For these reasons, the mode tends not to be used by statisticians and there's no mode function included in base R[^numbers-2].
[^numbers-1]: The `mode()` function does something quite different!
[^numbers-2]: The `mode()` function does something quite different!
### Minimum, maximum, and quantiles {#sec-min-max-summary}