Polishing workflow

This commit is contained in:
hadley 2016-08-19 14:42:43 -05:00
parent ecc6dc7909
commit 0ca5faca4e
13 changed files with 60 additions and 59 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 144 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

BIN
diagrams/rstudio-editor.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 459 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 384 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 454 KiB

View File

@ -109,10 +109,10 @@ To run the code in this book, you will need to install both R and the RStudio ID
### RStudio
RStudio is an integrated development environment, or IDE, for R programming. There are three key regions in the interface:
RStudio is an integrated development environment, or IDE, for R programming. When you get started there two key regions in the interface:
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/intro-rstudio.png")
knitr::include_graphics("diagrams/rstudio-console.png")
```
For now, all you need to know is that you type R code in the console pane, and press enter to run it. You'll learn more as we go along!

View File

@ -1,15 +1,18 @@
# Workflow: basics
You've now have some experience running R code. I didn't give you many details, but you've obviously figured out the basics, or you would've thrown this book away in frustration! Before we go any further, let's make sure you've got a solid foundation in running R code and, and that you know about the most helpful RStudio features.
You've now have some experience running R code. I didn't give you many details, but you've obviously figured out the basics, or you would've thrown this book away in frustration! Before we go any further, let's make sure you've got a solid foundation in running R code and, and that you know about some of the most helpful RStudio features.
Let's review the basics: you can use R as a calculator:
## Coding basics
Let's review some basics we've so far omitted in the interests of getting you plotting as quickly as possible. You can use R as a calculator:
```{r}
1 / 200 * 30
(59 + 73 + 2) / 3
sin(pi / 2)
```
And you can create new objects with `<-`:
You can create new objects with `<-`:
```{r}
x <- 3 * 4
@ -23,20 +26,20 @@ object_name <- value
When reading that code say "object_name gets value" in your head.
You will make lots of assignments and the operator `<-` is a pain to type. Don't be lazy and use `=`. It will work, but it will sow confusion later. Instead, use RStudio's keyboard shortcut: Alt + - (the minus sign). RStudio offers many handy keyboard shortcuts. To get the full list, use the one keyboard shortcut to rule them all: Alt + Shift + K brings up a keyboard shortcut reference card.
You will make lots of assignments and `<-` is a pain to type. Don't be lazy and use `=`: it will work, but it will cause confusion later. Instead, use RStudio's keyboard shortcut: Alt + - (the minus sign). Notice that RStudio automagically surrounds `<-` with spaces, which is a good code formatting practice. Code is miserable to read on a good day, so giveyoureeyesabreak and use spaces.
Notice that RStudio automagically surrounds `<-` with spaces, which is a good code formatting practice. Code is miserable to read on a good day, so giveyoureeyesabreak and use spaces.
## What's in a name?
Object names must start with a letter, and cannot contain characters like commas or spaces. You want your object names to be descriptive, so it's a good idea to adopt a convention for demarcating words in names. I recommend __snake_case__ where you separate lowercase words with `_`.
Object names must start with a letter, and can only contain letters, numbers, `_` and `.`. You want your object names to be descriptive, so you'll need a convention for multiple words. I recommend __snake_case__ where you separate lowercase words with `_`.
```{r, eval = FALSE}
i_use_snake_case
otherPeopleUseCamelCase
some.people.use.periods
And_aFew.People_HATEconventions
And_aFew.People_RENOUNCEconvention
```
We'll come back to code style in [functions].
We'll come back to code style later, in [functions].
You can inspect an object by typing its name:
@ -44,7 +47,6 @@ You can inspect an object by typing its name:
x
```
Make another assignment:
```{r}
@ -63,22 +65,24 @@ r_rocks <- 2 ^ 3
Let's try to inspect it:
```{r, error = TRUE}
```{r, eval = FALSE}
r_rock
#> Error: object 'r_rock' not found
R_rocks
#> Error: object 'R_rocks' not found
```
There's an implicit contract between you and R: it will do the tedious computation for you, but in return, you must be completely precise in your instructions. Typos matter. Case matters. Improving your touch typing skills will pay off!
There's an implied contract between you and R: it will do the tedious computation for you, but in return, you must be completely precise in your instructions. Typos matter. Case matters.
## Calling functions
R has a large collection of built-in functions that are called like this:
```{r eval = FALSE}
functionName(arg1 = val1, arg2 = val2, and so on)
functionName(arg1 = val1, arg2 = val2, ...)
```
Let's try using `seq()` which makes regular sequences of numbers and, while we're at it, learn more helpful features of RStudio.
Type `se` and hit TAB. A pop up shows you possible completions. Specify `seq()` by typing more (a "q") to disambiguate or using the up/down arrows to select. Notice the floating tooltip that pops up, reminding you of the function's arguments and purpose. If you want more help, press F1 to get all the details in help tab in the lower right pane.
Let's try using `seq()` which makes regular **seq**uences of numbers and, while we're at it, learn more helpful features of RStudio. Type `se` and hit TAB. A popup shows you possible completions. Specify `seq()` by typing more (a "q") to disambiguate, or by using ↑/↓ arrows to select. Notice the floating tooltip that pops up, reminding you of the function's arguments and purpose. If you want more help, press F1 to get all the details in help tab in the lower right pane.
Press TAB once more when you've selected the function you want. RStudio will add matching opening (`(`) and closing (`)`) parentheses for you. Type the arguments `1, 10` and hit return.
@ -99,9 +103,9 @@ Quotation marks and parentheses must always come in a pair. RStudio does it's be
+
```
The `+` tells you that R is waiting for more input; it doesn't think you're done yet. Usually that means you've forgotten either a `"` or a `)`. Either add missing pair, or press ESCAPE to abort the expression and try again.
The `+` tells you that R is waiting for more input; it doesn't think you're done yet. Usually that means you've forgotten either a `"` or a `)`. Either add missing pair, or press ESCAPE to abort the expression and try again.
If you make an assignment, you don't get to see the value. You're then tempted to immediately double check the result: inspect.
If you make an assignment, you don't get to see the value. You're then tempted to immediately double check the result:
```{r}
y <- seq(1, 10, length = 5)
@ -120,7 +124,7 @@ Now look at your environment in the upper right pane:
knitr::include_graphics("screenshots/rstudio-env.png")
```
The environment is where user-defined objects accumulate.
Here you can see all of the objects that you've created.
## Practice
@ -151,3 +155,6 @@ The environment is where user-defined objects accumulate.
filter(carat > 3)
```
1. Press Alt + Shift + K. What happens? How can you get to the same place
using the menus?

View File

@ -1,38 +1,36 @@
# Workflow: projects
One day you will need to quit R, go do something else and return to your analysis later. One day you will have multiple analyses going that use R and you want to keep them separate. One day you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world.
One day you will need to quit R, go do something else and return to your analysis the next day. One day you will be working on multiple analyses simultaneously that all use R and you want to keep them separate. One day you will need to bring data from the outside world into R and send numerical results and figures from R back out into the world. To handle these real life situations, you need to make two decisions:
To handle these real life situations, you need to make two decisions:
1. What about your analysis is "real", i.e. you will save it as your
1. What about your analysis is "real", i.e. what will you save as your
lasting record of what happened?
1. Where does your analysis "live"?
## What is real?
As a beginning R user, it's OK to consider your environment (i.e. the objects listed in the environment pane) "real". However, in the long-run, you'll be much better off if you consider your R scripts as "real". With the input data and the R code you used, you can reproduce _everything_. You can make your analysis fancier. You can get to the bottom of puzzling results and discover and fix bugs in your code. You can reuse the code to conduct similar analyses in new projects. You can remake a figure with different aspect ratio or save is as TIFF instead of PDF. You are ready to take questions. You are ready for the future.
As a beginning R user, it's OK to consider your environment (i.e. the objects listed in the environment pane) "real". However, in the long-run, you'll be much better off if you consider your R scripts as "real".
If you regard your environment as "real" (saving and reloading all the time), it's hard to reproduce an analysis after the fact. You'll either need to retype a lot of code (making mistakes all the way) or will have to mine your R history for the commands you used. Rather than [becoming an expert on managing the R history](https://support.rstudio.com/hc/en-us/articles/200526217-Command-History), a better use of your time and psychic energy is to keep your "good" R code in a script for future reuse.
With your R scripts (and your data files), you can recreate the environment. It's much harder to recreate your R scripts from your environment! You'll either have to retype a lot of code from memory (making mistakes all the way) or you'll have to carefully mine your R history.
To foster this behaviour, I highly recommend that you tell RStudio not to preserve your workspace between sessions:
To foster this behaviour, I highly recommend that you instruct RStudio not to preserve your workspace between sessions:
```{r, echo = FALSE, out.width = "75%"}
knitr::include_graphics("screenshots/rstudio-workspace.png")
```
This ensures that every time you restart RStudio you get a completely clean slate. That's good practice because it encourages you to capture all important interactions in your code. There's nothing worse than discovering three months after the fact that you've only stored the results of an important calculation in your workspace, not the calculation itself in your code.
This will cause you some short-term pain, because now when you restart RStudio it will not remember the results of the code that you ran last time. But this short-term pain will save you long-term agony because it forces you to capture all important interactions in your code. There's nothing worse than discovering three months after the fact that you've only stored the results of an important calculation in your workspace, not the calculation itself in your code.
There is a great pair of keyboard short cuts that will work together to make sure you've captured the important parts of your code in the editor:
There is a great pair of keyboard shortcuts that will work together to make sure you've captured the important parts of your code in the editor:
1. Press Cmd/Ctrl + Shift + F10 to restart RStudio.
2. Press Cmd/Ctrl + Shift + S to rerun the current script.
I do this probably hundreds of times a day.
I use this pattern hundreds of times a week.
## Where does your analysis live?
R has a powerful notion of the __working directory__. This is where R looks, by default, for files that you ask it to load, and where it will put any files that you save to disk. RStudio shows your current working directory at the top of the console:
R has a powerful notion of the __working directory__. This is where R looks for files that you ask it to load, and where it will put any files that you ask it to save. RStudio shows your current working directory at the top of the console:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("screenshots/rstudio-wd.png")
@ -45,23 +43,21 @@ getwd()
#> [1] "/Users/hadley/Documents/r4ds/r4ds"
```
As a beginning R user, it's OK let your home directory or any other weird directory on your computer be R's working directory. But _very soon_ you should evolve to organising your analytical projects into directories and, when working on project A, set R's working directory to the associated directory.
As a beginning R user, it's OK let your home directory or any other weird directory on your computer be R's working directory. But you're six chapters into this book, and you're no longer a rank beginner. Very soon now you should evolve to organising your analytical projects into directories and, when working on project A, set R's working directory to the associated directory.
__Although I do not recommend it__, in case you're curious, you can set R's working directory at the command line like so:
__I do not recommend it__, but you also set the working directory from within R:
```{r eval = FALSE}
setwd("~/myCoolProject")
```
But there's a better way. A way that also puts you on the path to managing your R work like an expert.
But you should never do this because there's a better way; a way that also puts you on the path to managing your R work like an expert.
## RStudio projects
Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via its _projects_.
R experts keep all the files associated with a project together --- input data, R scripts, analytical results, figures. This is such a wise and common practice that RStudio has built-in support for this via __projects__.
[Using Projects](https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects)
Let's make one for you to use for the rest of this book. Click File > New Project, then:
Let's make a project for you to use while you're working through the rest of this book. Click File > New Project, then:
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("screenshots/rstudio-project-1.png")
@ -71,16 +67,16 @@ knitr::include_graphics("screenshots/rstudio-project-3.png")
Call your project `r4ds`.
Once this process is complete, you'll get a new RStudio project that just for this book. Check that the "home" directory for your project is the working directory of our current R process:
Once this process is complete, you'll get a new RStudio project just for this book. Check that the "home" directory of your project is the current working directory:
```{r eval = FALSE}
getwd()
#> [1] ~/Desktop/r4ds
```
Now, whenever you refer to a file (sans directory) it will look for it in this directory.
Now, whenever you refer to a file (sans directory) it will look for it here.
Now enter the following commands in the script editor, then save the file, calling it "diamonds.R". Next, run the complete script which will save a pdf and csv file into your project directory. Don't worry about the details --- you'll learn them later in the book.
Now enter the following commands in the script editor, and save the file, calling it "diamonds.R". Next, run the complete script which will save a pdf and csv file into your project directory. Don't worry about the details, you'll learn them later in the book.
```{r toy-line, eval = FALSE}
library(ggplot2)
@ -93,15 +89,11 @@ ggsave("diamonds-hex.pdf")
write_csv(diamonds, "diamonds.csv")
```
Quit RStudio. Inspect the folder associated with your project --- notice the `.Rproj` file. You can click on that to re-open the project in the future (using projects even allows you to have multiple instances of RStudio open at the same time). Maybe view the PDF in an external viewer.
Quit RStudio. Inspect the folder associated with your project --- notice the `.Rproj` file. Double-click that file to re-open the project. Notice you get back to where you left off: it's the same working directory and command history, and all the files you were working on are still open. Because you followed my instructions above, you will, however, have a completely fresh environment, guaranteeing that you're starting with a clean slate.
Restart RStudio. Notice you get back to where you left off: it's the save working directory and command history, and all the files you were working on are still open. You will, however, have a completely fresh environment, guaranteeing that you're starting with a clean slate.
In your favorite OS-specific way, search your computer for `diamonds.pdf` and you will find the PDF (no surprise) but _also the script that created it_ (`diamonds.r`). This is huge win! One day you will want to remake a figure or just understand where it came from. If you rigorously save figures to file __with R code__ and never with the mouse or the clipboard, you will be able to reproduce old work with ease!
In your favorite OS-specific way, search your computer for `diamonds.pdf` and presumably you will find the PDF (no surprise) but _also the script that created it _ (`diamonds.r`). This is huge win! One day you will want to remake a figure or just simply understand where it came from. If you rigorously save figures to file __with R code__ and never with the mouse or the clipboard, you will be able to reproduce old work with ease!
## Overall workflow
RStudio projects give you a solid workflow that will serve you well in the future:
In summary, RStudio projects give you a solid workflow that will serve you well in the future:
* Create an RStudio project for each data analyis project.
* Keep data files there; we'll talk about a bit later importing in [import].

View File

@ -1,30 +1,36 @@
# Workflow: scripts
So far you've been using the console to run code. That's a great place to start, but you'll find it starts to get cramped pretty quickly as you create more complex ggplot2 graphics and dplyr pipes. To give yourself more room to work, it's a great idea to use the script editor. Open it up either clicking the File menu, and selecting New File, then R script, or using the keyboard shortcut Cmd/Ctrl + Shift + N. Now you'll see four panes:
So far you've been using the console to run code. That's a great place to start, but you'll find it gets cramped pretty quickly as you create more complex ggplot2 graphics and dplyr pipes. To give yourself more room to work, it's a great idea to use the script editor. Open it up either clicking the File menu, and selecting New File, then R script, or using the keyboard shortcut Cmd/Ctrl + Shift + N. Now you'll see four panes:
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/intro-rstudio.png")
knitr::include_graphics("diagrams/rstudio-editor.png")
```
The script editor is a great place to put code you care about. Keep experimenting in the console, but once you get some code that does what you want, put it in the script editor.
The script editor is a great place to put code you care about. Keep experimenting in the console, but once you have written code that work and does what you want, put it in the script editor. RStudio will automatically save the contents of the editor when you quit RStudio, and will automatically load it when you re-open. Nevertheless, it's a good idea to regular save your scripts regular and to back them up.
The script editor is also a great place to build up complex ggplot2 plots or long sequences of dplyr manipulations. The key to using the script editor effective is to memorise one of the most important keyboard shortcuts: Cmd/Ctrl + Enter. This executes the current R expression in the console.
## Running code
For example, take this code. If your cursor is at █, pressing Cmd + Enter will run the complete command that generates `not_cancelled`. It will also move the cursor to the next statement (beginning with `not_cancelled %>%`), which makes easy to run your script chunk by chunk.
The script editor is also a great place to build up complex ggplot2 plots or long sequences of dplyr manipulations. The key to using the script editor effective is to memorise one of the most important keyboard shortcuts: Cmd/Ctrl + Enter. This executes the current R expression in the console. For example, take the code below. If your cursor is at █, pressing Cmd + Enter will run the complete command that generates `not_cancelled`. It will also move the cursor to the next statement (beginning with `not_cancelled %>%`). That makes it easy to run your complete script by repeatedly pressing Cmd/Ctrl + Enter.
```{r, eval = FALSE}
library(dplyr)
library(nycflights13)
not_cancelled <- flights %>%
filter(!is.na(dep_delay), !is.na(arr_delay))
filter(!is.na(dep_delay), !is.na(arr_delay))
not_cancelled %>%
group_by(year, month, day) %>%
summarise(mean = mean(dep_delay))
```
You can run the complete script with one press: Cmd/Ctrl + Shift + S. Doing this regularly is a great way to check that you've captured all the important parts of your code in the script. I recommend that you always start your script with the packages that you need. That way, if you share you code with others, they can easily see what packages they need to install.
Instead of running expression-by-expression, you can also execute the complete script in one step: Cmd/Ctrl + Shift + S. Doing this regularly is a great way to check that you've captured all the important parts of your code in the script.
I recommend that you always start your script with the packages that you need. That way, if you share you code with others, they can easily see what packages they need to install.
When working through future chapters, I highly recommend starting in the editor and practicing your the keyboard shortcuts. Over time, sending code to the console in this way will become so natural that you won't even think about it.
## RStudio diagnostics
The script editor will also highlight syntax errors with a red squiggly line and a cross in the sidebar:
@ -44,10 +50,6 @@ RStudio will also let you know about potential problems:
knitr::include_graphics("screenshots/rstudio-diagnostic-warn.png")
```
RStudio will automatically save the contents of the editor when you quit RStudio, and will automatically load it when you re-open. Nevertheless, it's a good idea to regular save your scripts and back them up.
When working through future chapters, I highly recommend starting in the editor and practicing your the keyboard shortcuts. Over time, sending code to the console in this way will become so natural that you won't even think about it.
## Practice
1. Go to the RStudio Tips twitter account, <https://twitter.com/rstudiotips>