Make table3 match tidyr version, adjust ex accordingly

This commit is contained in:
mine-cetinkaya-rundel 2023-04-12 20:54:06 -04:00
parent d2b27da862
commit b6277d08fc
1 changed files with 2 additions and 12 deletions

View File

@ -44,16 +44,6 @@ You can represent the same underlying data in multiple ways.
The example below shows the same data organized in three different ways.
Each dataset shows the same values of four variables: *country*, *year*, *population*, and number of documented *cases* of TB (tuberculosis), but each dataset organizes the values in a different way.
```{r}
#| echo: false
table2 <- table1 |>
pivot_longer(cases:population, names_to = "type", values_to = "count")
table3 <- table2 |>
pivot_wider(names_from = year, values_from = count)
```
```{r}
table1
@ -136,7 +126,7 @@ ggplot(table1, aes(x = year, y = cases)) +
1. For each of the sample tables, describe what each observation and each column represents.
2. Sketch out the process you'd use to calculate the `rate` for `table2` and `table3`.
2. Sketch out the process you'd use to calculate the `rate` from `table2`.
You will need to perform four operations:
a. Extract the number of TB cases per country per year.
@ -360,7 +350,7 @@ There are two columns that are already variables and are easy to interpret: `cou
They are followed by 56 columns like `sp_m_014`, `ep_m_4554`, and `rel_m_3544`.
If you stare at these columns for long enough, you'll notice there's a pattern.
Each column name is made up of three pieces separated by `_`.
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2534`/`3544`/`4554`/`5564/``65` is the `age` range (`014` represents 0-14, for example).
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2534`/`3544`/`4554`/``` 5564/``65 ``` is the `age` range (`014` represents 0-14, for example).
So in this case we have six pieces of information recorded in `who2`: the country and the year (already columns); the method of diagnosis, the gender category, and the age range category (contained in the other column names); and the count of patients in that category (cell values).
To organize these six pieces of information in six separate columns, we use `pivot_longer()` with a vector of column names for `names_to` and instructors for splitting the original variable names into pieces for `names_sep` as well as a column name for `values_to`: