Minimise iteration

This commit is contained in:
Hadley Wickham 2023-02-07 15:48:02 -06:00
parent 4322a35d7c
commit cd6c68b5a9
2 changed files with 4 additions and 24 deletions

View File

@ -132,7 +132,6 @@ dbWriteTable(con, "diamonds", ggplot2::diamonds)
If you're using duckdb in a real project, we highly recommend learning about `duckdb_read_csv()` and `duckdb_register_arrow()`. If you're using duckdb in a real project, we highly recommend learning about `duckdb_read_csv()` and `duckdb_register_arrow()`.
These give you powerful and performant ways to quickly load data directly into duckdb, without having to first load it into R. These give you powerful and performant ways to quickly load data directly into duckdb, without having to first load it into R.
We'll also show off a useful technique for loading multiple files into a database in @sec-save-database. We'll also show off a useful technique for loading multiple files into a database in @sec-save-database.
## DBI basics ## DBI basics

View File

@ -108,21 +108,6 @@ Note grouping columns (`grp` here) are not included in `across()`, because they'
- `where(is.POSIXct)` selects all date-time columns. - `where(is.POSIXct)` selects all date-time columns.
- `where(is.logical)` selects all logical columns. - `where(is.logical)` selects all logical columns.
```{r}
df_types <- tibble(
x1 = 1:3,
x2 = runif(3),
y1 = sample(letters, 3),
y2 = c("banana", "apple", "egg")
)
df_types |>
summarize(across(where(is.numeric), mean))
df_types |>
summarize(across(where(is.character), str_flatten))
```
Just like other selectors, you can combine these with Boolean algebra. Just like other selectors, you can combine these with Boolean algebra.
For example, `!where(is.numeric)` selects all non-numeric columns, and `starts_with("a") & where(is.logical)` selects all logical columns whose name starts with "a". For example, `!where(is.numeric)` selects all non-numeric columns, and `starts_with("a") & where(is.logical)` selects all logical columns whose name starts with "a".
@ -288,12 +273,10 @@ It's clear that `across()` can help to create multiple logical columns, but then
So dplyr provides two variants of `across()` called `if_any()` and `if_all()`: So dplyr provides two variants of `across()` called `if_any()` and `if_all()`:
```{r} ```{r}
df_miss |> filter(is.na(a) | is.na(b) | is.na(c) | is.na(d)) # same as df_miss |> filter(is.na(a) | is.na(b) | is.na(c) | is.na(d))
# same as:
df_miss |> filter(if_any(a:d, is.na)) df_miss |> filter(if_any(a:d, is.na))
df_miss |> filter(is.na(a) & is.na(b) & is.na(c) & is.na(d)) # same as df_miss |> filter(is.na(a) & is.na(b) & is.na(c) & is.na(d))
# same as:
df_miss |> filter(if_all(a:d, is.na)) df_miss |> filter(if_all(a:d, is.na))
``` ```
@ -332,11 +315,11 @@ summarize_means <- function(df, summary_vars = where(is.numeric)) {
) )
} }
diamonds |> diamonds |>
group_by(clarity) |> group_by(cut) |>
summarize_means() summarize_means()
diamonds |> diamonds |>
group_by(clarity) |> group_by(cut) |>
summarize_means(c(carat, x:z)) summarize_means(c(carat, x:z))
``` ```
@ -650,7 +633,6 @@ In more complicated cases, there might be other variables stored in the director
In that case, use `set_names()` (without any arguments) to record the full path, and then use `tidyr::separate_wider_delim()` and friends to turn them into useful columns. In that case, use `set_names()` (without any arguments) to record the full path, and then use `tidyr::separate_wider_delim()` and friends to turn them into useful columns.
```{r} ```{r}
# NOTE: this chapter also depends on dev tidyr (in addition to dev purrr and dev dplyr)
paths |> paths |>
set_names() |> set_names() |>
map(readxl::read_excel) |> map(readxl::read_excel) |>
@ -763,7 +745,6 @@ df_types <- function(df) {
} }
df_types(starwars) df_types(starwars)
df_types(nycflights13::flights)
``` ```
You can then apply this function to all of the files, and maybe do some pivoting to make it easier to see where the differences are. You can then apply this function to all of the files, and maybe do some pivoting to make it easier to see where the differences are.