Add exercise on group_by (#1203)

* Add exercise on group_by * Don't eval the code chunks * Edits + indentation
2023-01-02 20:44:46 -05:00 · 2023-01-02 20:44:46 -05:00 · 26a20c586a
parent 29c8822d3b
commit 26a20c586a
1 changed files with 82 additions and 0 deletions
--- a/data-transform.qmd
+++ b/data-transform.qmd
@ -582,6 +582,88 @@ As you can see, when you summarize an ungrouped data frame, you get a single row
 5.  Explain what `count()` does in terms of the dplyr verbs you just learn.
    What does the `sort` argument to `count()` do?
 6.  Suppose we have the following tiny data frame:
    ```{r}
    df <- tibble(
      x = 1:5,
      y = c("a", "b", "a", "a", "b"),
      z = c("K", "K", "L", "L", "K")
    )
    ```
    a.  What does the following code do?
        Run it, analyze the result, and describe what `group_by()` does.
        ```{r}
        #| eval: false
        df |>
          group_by(y)
        ```
    b.  What does the following code do?
        Run it, analyze the result, and describe what `arrange()` does.
        Also comment on how it's different from the `group_by()` in part (a)?
        ```{r}
        #| eval: false
        df |>
          arrange(y)
        ```
    c.  What does the following code do?
        Run it, analyze the result, and describe what the pipeline does.
        ```{r}
        #| eval: false
        df |>
          group_by(y) |>
          summarize(mean_x = mean(x))
        ```
    d.  What does the following code do?
        Run it, analyze the result, and describe what the pipeline does.
        Then, comment on what the message says.
        ```{r}
        #| eval: false
        df |>
          group_by(y, z) |>
          summarize(mean_x = mean(x))
        ```
    e.  What does the following code do?
        Run it, analyze the result, and describe what the pipeline does.
        How is the output different from the one in part (d).
        ```{r}
        #| eval: false
        df |>
          group_by(y, z) |>
          summarize(mean_x = mean(x), .groups = "drop")
        ```
    f.  What do the following pipelines do?
        Run both, analyze the results, and describe what each pipeline does.
        How are the outputs of the two pipelines different?
        ```{r}
        #| eval: false
        df |>
          group_by(y, z) |>
          summarize(mean_x = mean(x))
        df |>
          group_by(y, z) |>
          mutate(mean_x = mean(x))
        ```
 ## Case study: aggregates and sample size {#sec-sample-size}
 Whenever you do any aggregation, it's always a good idea to include a count (`n()`).