This commit is contained in:
mine-cetinkaya-rundel 2023-04-09 22:23:13 -04:00
commit fc128fb591
3 changed files with 10 additions and 10 deletions

View File

@ -291,7 +291,7 @@ df |>
How does the reshaping work?
It's easier to see if we think about it column by column.
As shown in @fig-pivot-variables, the values in column that was already a variable in the original dataset (`var`) need to be repeated, once for each column that is pivoted.
As shown in @fig-pivot-variables, the values in column that was already a variable in the original dataset (`id`) need to be repeated, once for each column that is pivoted.
```{r}
#| label: fig-pivot-variables
@ -360,7 +360,7 @@ There are two columns that are already variables and are easy to interpret: `cou
They are followed by 56 columns like `sp_m_014`, `ep_m_4554`, and `rel_m_3544`.
If you stare at these columns for long enough, you'll notice there's a pattern.
Each column name is made up of three pieces separated by `_`.
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2535`/`3544`/`4554`/`65` is the `age` range (`014` represents 0-14, for example).
The first piece, `sp`/`rel`/`ep`, describes the method used for the diagnosis, the second piece, `m`/`f` is the `gender` (coded as a binary variable in this dataset), and the third piece, `014`/`1524`/`2534`/`3544`/`4554`/`5564/``65` is the `age` range (`014` represents 0-14, for example).
So in this case we have six pieces of information recorded in `who2`: the country and the year (already columns); the method of diagnosis, the gender category, and the age range category (contained in the other column names); and the count of patients in that category (cell values).
To organize these six pieces of information in six separate columns, we use `pivot_longer()` with a vector of column names for `names_to` and instructors for splitting the original variable names into pieces for `names_sep` as well as a column name for `values_to`:

View File

@ -63,7 +63,7 @@ Doing this regularly is a great way to ensure that you've captured all the impor
We recommend you always start your script with the packages you need.
That way, if you share your code with others, they can easily see which packages they need to install.
Note, however, that you should never include `install.packages()` in a script you share.
It's inconsiderate to hand off a script that will something on their computer if they're not being careful!
It's inconsiderate to hand off a script that will change something on their computer if they're not being careful!
When working through future chapters, we highly recommend starting in the script editor and practicing your keyboard shortcuts.
Over time, sending code to the console in this way will become so natural that you won't even think about it.
@ -138,7 +138,7 @@ There are a variety of problems here: it's hard to find which file to run first,
[^workflow-scripts-1]: Not to mention that you're tempting fate by using "final" in the name 😆 The comic Piled Higher and Deeper has a [fun strip on this](https://phdcomics.com/comics/archive.php?comicid=1531).
Here's better way of naming and organizing the same set of files:
Here's a better way of naming and organizing the same set of files:
01-load-data.R
02-exploratory-analysis.R
@ -175,7 +175,7 @@ With your R scripts (and your data files), you can recreate the environment.
With only your environment, it's much harder to recreate your R scripts: you'll either have to retype a lot of code from memory (inevitably making mistakes along the way) or you'll have to carefully mine your R history.
To help keep your R scripts as the source of truth for your analysis, we highly recommend that you instruct RStudio not to preserve your workspace between sessions.
You can do this either by running `usethis::use_blank_slate()`[^workflow-scripts-2] or by mimicking the options shown in @fig-blank-slate. This will cause you some short-term pain, because now when you restart RStudio, it will no longer remember the code that you ran last time nor will the objects you created or datasets you read be available to use.
You can do this either by running `usethis::use_blank_slate()`[^workflow-scripts-2] or by mimicking the options shown in @fig-blank-slate. This will cause you some short-term pain, because now when you restart RStudio, it will no longer remember the code that you ran last time nor will the objects you created or the datasets you read be available to use.
But this short-term pain saves you long-term agony because it forces you to capture all important procedures in your code.
There's nothing worse than discovering three months after the fact that you've only stored the results of an important calculation in your environment, not the calculation itself in your code.
@ -198,7 +198,7 @@ knitr::include_graphics("diagrams/rstudio/clean-slate.png", dpi = 270)
There is a great pair of keyboard shortcuts that will work together to make sure you've captured the important parts of your code in the editor:
1. Press Cmd/Ctrl + Shift + 0 to restart R.
1. Press Cmd/Ctrl + Shift + 0/F10 to restart R.
2. Press Cmd/Ctrl + Shift + S to re-run the current script.
We collectively use this pattern hundreds of times a week.

View File

@ -105,7 +105,7 @@ This makes it easier to skim the code.
flights |>
mutate(
speed = air_time / distance,
speed = distance / air_time,
dep_hour = dep_time %/% 100,
dep_minute = dep_time %% 100
)
@ -151,7 +151,7 @@ flights |>
```
After the first step of the pipeline, indent each line by two spaces.
RStudio will automatically put the spaces in for you after a line break following a `\>` .
RStudio will automatically put the spaces in for you after a line break following a `|>` .
If you're putting each argument on its own line, indent by an extra two spaces.
Make sure `)` is on its own line, and un-indented to match the horizontal position of the function name.
@ -233,7 +233,7 @@ flights |>
group_by(dest) |>
summarize(
distance = mean(distance),
speed = mean(air_time / distance, na.rm = TRUE)
speed = mean(distance / air_time, na.rm = TRUE)
) |>
ggplot(aes(x = distance, y = speed)) +
geom_smooth(
@ -292,7 +292,7 @@ knitr::include_graphics("screenshots/rstudio-nav.png")
## Summary
In this chapter, you've learn the most important principles of code style.
In this chapter, you've learned the most important principles of code style.
These may feel like a set of arbitrary rules to start with (because they are!) but over time, as you write more code, and share code with more people, you'll see how important a consistent style is.
And don't forget about the styler package: it's a great way to quickly improve the quality of poorly styled code.