From 26a20c586a5eaba5dd05f4abacbfe00b74cbefd0 Mon Sep 17 00:00:00 2001 From: Mine Cetinkaya-Rundel Date: Mon, 2 Jan 2023 20:44:46 -0500 Subject: [PATCH] Add exercise on group_by (#1203) * Add exercise on group_by * Don't eval the code chunks * Edits + indentation --- data-transform.qmd | 82 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/data-transform.qmd b/data-transform.qmd index 9cc4d7d..916e27e 100644 --- a/data-transform.qmd +++ b/data-transform.qmd @@ -582,6 +582,88 @@ As you can see, when you summarize an ungrouped data frame, you get a single row 5. Explain what `count()` does in terms of the dplyr verbs you just learn. What does the `sort` argument to `count()` do? +6. Suppose we have the following tiny data frame: + + ```{r} + df <- tibble( + x = 1:5, + y = c("a", "b", "a", "a", "b"), + z = c("K", "K", "L", "L", "K") + ) + ``` + + a. What does the following code do? + Run it, analyze the result, and describe what `group_by()` does. + + ```{r} + #| eval: false + + df |> + group_by(y) + ``` + + b. What does the following code do? + Run it, analyze the result, and describe what `arrange()` does. + Also comment on how it's different from the `group_by()` in part (a)? + + ```{r} + #| eval: false + + df |> + arrange(y) + ``` + + c. What does the following code do? + Run it, analyze the result, and describe what the pipeline does. + + ```{r} + #| eval: false + + df |> + group_by(y) |> + summarize(mean_x = mean(x)) + ``` + + d. What does the following code do? + Run it, analyze the result, and describe what the pipeline does. + Then, comment on what the message says. + + ```{r} + #| eval: false + + df |> + group_by(y, z) |> + summarize(mean_x = mean(x)) + ``` + + e. What does the following code do? + Run it, analyze the result, and describe what the pipeline does. + How is the output different from the one in part (d). + + ```{r} + #| eval: false + + df |> + group_by(y, z) |> + summarize(mean_x = mean(x), .groups = "drop") + ``` + + f. What do the following pipelines do? + Run both, analyze the results, and describe what each pipeline does. + How are the outputs of the two pipelines different? + + ```{r} + #| eval: false + + df |> + group_by(y, z) |> + summarize(mean_x = mean(x)) + + df |> + group_by(y, z) |> + mutate(mean_x = mean(x)) + ``` + ## Case study: aggregates and sample size {#sec-sample-size} Whenever you do any aggregation, it's always a good idea to include a count (`n()`).