diff --git a/tidy.Rmd b/tidy.Rmd index be4c3e5..b4ae17e 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -440,7 +440,7 @@ The best place to start is almost always to gathering together the columns that * We don't know what all the other columns are yet, but given the structure in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`) - these are likely to be values, not variable. + these are likely to be values, not variables. So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells repesent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.