From 1bf0b5d105485d2afd18298b62d914e55ad5f182 Mon Sep 17 00:00:00 2001 From: mine-cetinkaya-rundel Date: Mon, 12 Dec 2022 13:37:08 -0500 Subject: [PATCH] Catch a few more UK spellings, closes #1160 --- arrow.qmd | 8 ++++---- functions.qmd | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arrow.qmd b/arrow.qmd index e203384..ceb4c80 100644 --- a/arrow.qmd +++ b/arrow.qmd @@ -208,7 +208,7 @@ For example, we could count the total number of books checked out in each month query <- seattle_pq |> filter(CheckoutYear >= 2018, MaterialType == "BOOK") |> group_by(CheckoutYear, CheckoutMonth) |> - summarise(TotalCheckouts = sum(Checkouts)) |> + summarize(TotalCheckouts = sum(Checkouts)) |> arrange(CheckoutYear, CheckoutMonth) ``` @@ -239,7 +239,7 @@ First, let's time how long it takes to calculate the number of books checked out seattle_csv |> filter(CheckoutYear == 2021, MaterialType == "BOOK") |> group_by(CheckoutMonth) |> - summarise(TotalCheckouts = sum(Checkouts)) |> + summarize(TotalCheckouts = sum(Checkouts)) |> arrange(desc(CheckoutMonth)) |> collect() |> system.time() @@ -253,7 +253,7 @@ Now let's use our new version of the data set in which the Seattle library check seattle_pq |> filter(CheckoutYear == 2021, MaterialType == "BOOK") |> group_by(CheckoutMonth) |> - summarise(TotalCheckouts = sum(Checkouts)) |> + summarize(TotalCheckouts = sum(Checkouts)) |> arrange(desc(CheckoutMonth)) |> collect() |> system.time() @@ -275,7 +275,7 @@ seattle_pq |> to_duckdb() |> filter(CheckoutYear >= 2018, MaterialType == "BOOK") |> group_by(CheckoutYear) |> - summarise(TotalCheckouts = sum(Checkouts)) |> + summarize(TotalCheckouts = sum(Checkouts)) |> arrange(desc(CheckoutYear)) |> collect() ``` diff --git a/functions.qmd b/functions.qmd index e72089b..447e94b 100644 --- a/functions.qmd +++ b/functions.qmd @@ -424,7 +424,7 @@ df |> grouped_mean(group, x) df |> grouped_mean(group, y) ``` -Regardless of how we call `grouped_mean()` it always does `df |> group_by(group_var) |> summarise(mean(mean_var))`, instead of `df |> group_by(group) |> summarise(mean(x))` or `df |> group_by(group) |> summarise(mean(y))`. +Regardless of how we call `grouped_mean()` it always does `df |> group_by(group_var) |> summarize(mean(mean_var))`, instead of `df |> group_by(group) |> summarize(mean(x))` or `df |> group_by(group) |> summarize(mean(y))`. This is a problem of indirection, and it arises because dplyr uses **tidy evaluation** to allow you to refer to the names of variables inside your data frame without any special treatment. Tidy evaluation is great 95% of the time because it makes your data analyses very concise as you never have to say which data frame a variable comes from; it's obvious from the context.