r4ds/workflow-style.Rmd

84 lines
4.3 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Workflow: code style {#workflow-style}
```{r, results = "asis", echo = FALSE}
status("drafting")
```
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.
Even as a very new programmer it's a good idea to work on your code style.
Use a consistent style makes it easier for others (including future-you!) to read your work, and is particularly important if you need to get help from someone else.
Styling your code will feel a bit tedious at the start, but if you practice it, it will soon become second nature.
Additionally, there are some great tools available like the [styler](http://styler.r-lib.org) package which can get you 90% of the way there with a touch of a button.
An easy way to use style is via RStudio's "command palette", which you can access with Cmd/Ctrl + Shift + P.
If you type "styler" you'll see all the shortcuts provided by styler:
![](screenshots/rstudio-palette.png)
It's highly recommended to regularly spend some time just working on the clarity of your code.
The results might be exactly the same but it's not wasted effort: when you come back to the code in the future, you'll find it easier to remember what you did and easy to adapt to new demands.
Here I'll introduce you to the high points parts of the [tidyverse style guide](https://style.tidyverse.org).
I highly recommend you consult the full style guide if you have more questions as it goes into much more detail.
## Names
Variable names should use only lowercase letters, numbers, and `_`.
Use underscores (`_`) (so called snake case) to separate words within a name.
As a general rule of thumb, it's better to err on the side of overly long description names than concise names that are fast to type.
Short names save relatively little time when writing code (especially since autocomplete will often help you finish a long variable name), but will suck up time when you re-read code in the future and have to wrack your memory for what that now cryptic abbreviation means.
### Spaces
Put spaces on either side of mathematical operators (e.g `+`, `-`, `==`, `<` ; but not `^`) and the assignment operator (`<-`).
Don't put spaces inside or outside parentheses for regular function calls.
Always put a space after a comma, just like in regular English.
It's ok to add extra spaces if it improves alignment of [`=`](https://rdrr.io/r/base/assignOps.html).
### Pipes
`|>` should always have a space after and should usually be followed by a new line.
After the first step, each line should be indented by two spaces.
This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.
If the function as named arguments (like `mutate()` or `summarise()`) then put each argument on a new line, indented by another two spaces.
Make sure the closing parentheses start a new line and are lined up with the start of the function name.
```{r, eval = FALSE}
df |> mutate(y = x + 1)
# vs
df |>
mutate(
y = x + 1
)
```
The same basic rules apply to ggplot2, just treat `+` the same way as `|>`.
```{r, eval = FALSE}
df |>
ggplot(aes())
```
It's ok to skip these rules if your snippet is fits easily on one line (e.g.) `mutate(df, y = x + 1)` or `df %>% mutate(df, y = x + 1)`.
But it's pretty common for short snippets to grow longer, so you'll save time in the long run by starting out as you wish to continue.
Be wary of writing very long pipes, say longer than 10-15 lines.
Try to break them up into logical subtasks, giving each part an informative name.
The names will help cue the reader into what's happening and gives convenient places to check that intermediate results are as expected.
Whenever you can give something an informative name, you should give it an informative name.
Don't expect to get it right the first time!
This means breaking up long pipelines if there are intermediate states that can get good names.
Strive to limit your code to 80 characters per line.
This fits comfortably on a printed page with a reasonably sized font.
If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
## Organisation
- Use empty lines to organize your code into "paragraphs" of related thoughts.
- In data analysis code, use comments to record important findings and analysis decisions.