Polishing style workflow

2022-02-21 15:39:02 -06:00 · 2022-02-21 15:39:02 -06:00 · 155aaf0593
parent 555edc81d5
commit 155aaf0593
3 changed files with 111 additions and 54 deletions
--- a/functions.Rmd
+++ b/functions.Rmd
@ -271,21 +271,6 @@ However, your code can never capture the reasoning behind your decisions: why di
 What else did you try that didn't work?
 It's a great idea to capture that sort of thinking in a comment.

-Another important use of comments is to break up your file into easily readable chunks.
-Use long lines of `-` and `=` to make it easy to spot the breaks.
-
-```{r, eval = FALSE}
-# Load data --------------------------------------
-
-# Plot data --------------------------------------
-```
-
-RStudio provides a keyboard shortcut to create these headers (Cmd/Ctrl + Shift + R), and will display them in the code navigation drop-down at the bottom-left of the editor:
-
-```{r, echo = FALSE, out.width = NULL}
-knitr::include_graphics("screenshots/rstudio-nav.png")
-```
-
 ### Exercises

 1.  Read the source code for each of the following three functions, puzzle out what they do, and then brainstorm better names.
@ -381,7 +366,7 @@ x == 2
 x - 2
 ```

-Instead use `dplyr::near()` for comparisons, as described in [comparisons].
+Instead use `dplyr::near()` for comparisons, as described in \[comparisons\].

 And remember, `x == NA` doesn't do anything useful!

--- a/workflow-scripts.Rmd
+++ b/workflow-scripts.Rmd
@ -17,8 +17,6 @@ Nevertheless, it's a good idea to save your scripts regularly and to back them u

 TODO: Add file naming advice

-TODO: Add advice about creating sections
-
 ## Running code

 The script editor is also a great place to build up complex ggplot2 plots or long sequences of dplyr manipulations.
--- a/workflow-style.Rmd
+++ b/workflow-style.Rmd
@ -7,48 +7,104 @@ status("drafting")
 Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.
 Even as a very new programmer it's a good idea to work on your code style.
 Use a consistent style makes it easier for others (including future-you!) to read your work, and is particularly important if you need to get help from someone else.
+This chapter will introduce to the most important points of the [tidyverse style guide](https://style.tidyverse.org), which is used throughout this book.

-Styling your code will feel a bit tedious at the start, but if you practice it, it will soon become second nature.
-Additionally, there are some great tools available like the [styler](http://styler.r-lib.org) package which can get you 90% of the way there with a touch of a button.
-An easy way to use style is via RStudio's "command palette", which you can access with Cmd/Ctrl + Shift + P.
-If you type "styler" you'll see all the shortcuts provided by styler:
+Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature.
+Additionally, there are some great tools to quickly restyle existing code, like the [styler](http://styler.r-lib.org) package by Lorenz Walthert.
+Once you've installed it with `install.packages("styler")`, an easy way to use it is via RStudio's **command palette**.
+The command palette lets you use any build-in RStudio command, as well as many addins provided by packages.
+Open the palette by pressing Cmd/Ctrl + Shift + P, then type "styler" to see all the shortcuts provided by styler.
+Figure \@ref(fig:styler) shows the results.

-![](screenshots/rstudio-palette.png)
-
-It's highly recommended to regularly spend some time just working on the clarity of your code.
-The results might be exactly the same but it's not wasted effort: when you come back to the code in the future, you'll find it easier to remember what you did and easy to adapt to new demands.
-
-Here I'll introduce you to the high points parts of the [tidyverse style guide](https://style.tidyverse.org).
-I highly recommend you consult the full style guide if you have more questions as it goes into much more detail.
+```{r styler}
+#| echo: false
+#| out.width: NULL
+#| fig.cap: > 
+#|   RStudio's command palette makes it easy to access every RStudio command
+#|   using only the keyboard.
+#| fig.alt: >
+#|   A screenshot showing the command palette after typing "styler", showing
+#|   the four styling tool provided by the package.
+knitr::include_graphics("screenshots/rstudio-palette.png")
+```

 ## Names

-Variable names should use only lowercase letters, numbers, and `_`.
-Use underscores (`_`) (so called snake case) to separate words within a name.
+Variable names (those created by `<-` and those created by `mutate()`) should use only lowercase letters, numbers, and `_`.
+Use underscores (`_`) to separate words within a name.

-As a general rule of thumb, it's better to err on the side of overly long description names than concise names that are fast to type.
-Short names save relatively little time when writing code (especially since autocomplete will often help you finish a long variable name), but will suck up time when you re-read code in the future and have to wrack your memory for what that now cryptic abbreviation means.
+```{r, eval = FALSE}
+# Strive for:
+short_flights <- flights |> filter(airtime < 60)

-### Spaces
+# Avoid:
+
+```
+
+As a general rule of thumb, it's better to prefer long, descriptive names that are easy to understand, rather than concise names that are fast to type.
+Short names save relatively little time when writing code (especially since autocomplete will help you finish typing them), but can be expensive when you come back to old need and need to puzzle out what a cryptic abbreviation means.
+
+## Spaces

 Put spaces on either side of mathematical operators (e.g `+`, `-`, `==`, `<` ; but not `^`) and the assignment operator (`<-`).
 Don't put spaces inside or outside parentheses for regular function calls.
 Always put a space after a comma, just like in regular English.

-It's ok to add extra spaces if it improves alignment of [`=`](https://rdrr.io/r/base/assignOps.html).
+```{r, eval = FALSE}
+# Strive for
+(a + b)^2 / d
+mean(x, na.rm = TRUE)

-### Pipes
+# Avoid
+( a + b ) ^ 2/d
+mean (x ,na.rm=TRUE)
+```

-`|>` should always have a space after and should usually be followed by a new line.
+It's OK to add extra spaces if it improves alignment of `=:`
+
+```{r, eval = FALSE}
+flights |> 
+  mutate(
+    speed      = air_time / distance,
+    dep_hour   = dep_time %/% 100,
+    dep_minute = dep_time %% 100
+  )
+```
+
+## Pipes
+
+`|>` should always have a space after it and should usually be followed by a new line.
 After the first step, each line should be indented by two spaces.
-This structure makes it easier to add new steps (or rearrange existing steps) and harder to overlook a step.
-
-If the function as named arguments (like `mutate()` or `summarise()`) then put each argument on a new line, indented by another two spaces.
+If the function has named arguments (like `mutate()` or `summarise()`) then put each argument on a new line, indented by another two spaces.
 Make sure the closing parentheses start a new line and are lined up with the start of the function name.

 ```{r, eval = FALSE}
+# Strive for 
+flights |>  
+  filter(!is.na(arr_delay), !is.na(tailnum)) |> 
+  group_by(tailnum) |> 
+  summarise(
+    delay = mean(arr_delay, na.rm = TRUE),
+    n = n()
+  )
+
+# Avoid
+flights|> filter(!is.na(arr_delay), !is.na(tailnum)) |> 
+  group_by(tailnum) |> summarise(delay = mean(arr_delay, na.rm = TRUE), 
+                                 n = n())
+```
+
+This structure makes it easier to add new steps (or rearrange existing steps), modify elements within a step, and to get a 50,000 view just by skimming the left-hand side.
+
+It's OK to shirk some of these rules if your snippet fits easily on one line.
+But in our experience, it's pretty common for short snippets to grow longer, so you'll usually save time in the long run by starting with all the vertical space you need.
+
+```{r, eval = FALSE}
+# This fits compactly on one line
 df |> mutate(y = x + 1)
-# vs
+
+# While this spacing feels breezy, it's easily extended to 
+# more variables and more steps
 df |> 
  mutate(
    y = x + 1
@ -58,26 +114,44 @@ df |>
 The same basic rules apply to ggplot2, just treat `+` the same way as `|>`.

 ```{r, eval = FALSE}
-df |> 
-  ggplot(aes())
+flights |> 
+  group_by(month) |> 
+  summarise(delay = mean(arr_delay, na.rm = TRUE)) |> 
+  ggplot(aes(month, delay)) +
+  geom_point() + 
+  geom_line()
 ```

-It's ok to skip these rules if your snippet is fits easily on one line (e.g.) `mutate(df, y = x + 1)` or `df %>% mutate(df, y = x + 1)`.
-But it's pretty common for short snippets to grow longer, so you'll save time in the long run by starting out as you wish to continue.
-
 Be wary of writing very long pipes, say longer than 10-15 lines.
-Try to break them up into logical subtasks, giving each part an informative name.
-The names will help cue the reader into what's happening and gives convenient places to check that intermediate results are as expected.
+Try to break them up into smaller sub-tasks, giving each task an informative name.
+The names will help cue the reader into what's happening and makes it easier to check that intermediate results are as expected.
 Whenever you can give something an informative name, you should give it an informative name.
 Don't expect to get it right the first time!
 This means breaking up long pipelines if there are intermediate states that can get good names.

-Strive to limit your code to 80 characters per line.
-This fits comfortably on a printed page with a reasonably sized font.
-If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
-
 ## Organisation

-   Use empty lines to organize your code into "paragraphs" of related thoughts.
+Where possible, use comments to explain the "why" of your code, not the "how" or the "what".
+If you simply describe what your code is doing in prose, you'll have to be careful to update the comment and code in tandem: if you change the code and forget to update the comment, they'll be inconsistent which will lead to confusion when you come back to your code in the future.
+For data analysis code, use comments to explain your overall plan of attack and record important insight as you encounter them.
+There's way to re-capture this knowledge from the code itself.

-   In data analysis code, use comments to record important findings and analysis decisions.
+As your scripts get longer, use **sectioning** comments to break up your file into manageable pieces:
+
+```{r, eval = FALSE}
+# Load data --------------------------------------
+
+# Plot data --------------------------------------
+```
+
+RStudio provides a keyboard shortcut to create these headers (Cmd/Ctrl + Shift + R), and will display them in the code navigation drop-down at the bottom-left of the editor, as shown in Figure \@ref(fig:rstudio-sections).
+
+```{r rstudio-sections, echo = FALSE, out.width = NULL}
+#| echo: false
+#| out.width: NULL
+#| fig.cap: > 
+#|   After adding sectioning comments to your script, you can
+#|   easily navigate to them using the code navigation tool in the
+#|   bottom-left of the script editor.
+knitr::include_graphics("screenshots/rstudio-nav.png")
+```