From 26a20c586a5eaba5dd05f4abacbfe00b74cbefd0 Mon Sep 17 00:00:00 2001
From: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
Date: Mon, 2 Jan 2023 20:44:46 -0500
Subject: [PATCH] Add exercise on group_by (#1203)

* Add exercise on group_by

* Don't eval the code chunks

* Edits + indentation
---
 data-transform.qmd | 82 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/data-transform.qmd b/data-transform.qmd
index 9cc4d7d..916e27e 100644
--- a/data-transform.qmd
+++ b/data-transform.qmd
@@ -582,6 +582,88 @@ As you can see, when you summarize an ungrouped data frame, you get a single row
 5.  Explain what `count()` does in terms of the dplyr verbs you just learn.
     What does the `sort` argument to `count()` do?
 
+6.  Suppose we have the following tiny data frame:
+
+    ```{r}
+    df <- tibble(
+      x = 1:5,
+      y = c("a", "b", "a", "a", "b"),
+      z = c("K", "K", "L", "L", "K")
+    )
+    ```
+
+    a.  What does the following code do?
+        Run it, analyze the result, and describe what `group_by()` does.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y)
+        ```
+
+    b.  What does the following code do?
+        Run it, analyze the result, and describe what `arrange()` does.
+        Also comment on how it's different from the `group_by()` in part (a)?
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          arrange(y)
+        ```
+
+    c.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y) |>
+          summarize(mean_x = mean(x))
+        ```
+
+    d.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+        Then, comment on what the message says.
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x))
+        ```
+
+    e.  What does the following code do?
+        Run it, analyze the result, and describe what the pipeline does.
+        How is the output different from the one in part (d).
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x), .groups = "drop")
+        ```
+
+    f.  What do the following pipelines do?
+        Run both, analyze the results, and describe what each pipeline does.
+        How are the outputs of the two pipelines different?
+
+        ```{r}
+        #| eval: false
+            
+        df |>
+          group_by(y, z) |>
+          summarize(mean_x = mean(x))
+            
+        df |>
+          group_by(y, z) |>
+          mutate(mean_x = mean(x))
+        ```
+
 ## Case study: aggregates and sample size {#sec-sample-size}
 
 Whenever you do any aggregation, it's always a good idea to include a count (`n()`).