Discussion about paths

Prompted by @ csgillespie
This commit is contained in:
hadley 2016-10-04 08:28:30 -05:00
parent b3855be66c
commit 0b1e61c5ac
1 changed files with 41 additions and 10 deletions

View File

@ -32,7 +32,7 @@ I use this pattern hundreds of times a week.
R has a powerful notion of the __working directory__. This is where R looks for files that you ask it to load, and where it will put any files that you ask it to save. RStudio shows your current working directory at the top of the console:
```{r, echo = FALSE, out.width = NULL}
```{r, echo = FALSE, out.width = "50%"}
knitr::include_graphics("screenshots/rstudio-wd.png")
```
@ -43,16 +43,40 @@ getwd()
#> [1] "/Users/hadley/Documents/r4ds/r4ds"
```
As a beginning R user, it's OK let your home directory or any other weird directory on your computer be R's working directory. But you're six chapters into this book, and you're no longer a rank beginner. Very soon now you should evolve to organising your analytical projects into directories and, when working on a project, setting R's working directory to the associated directory.
As a beginning R user, it's OK let your home directory, documents directory, or any other weird directory on your computer be R's working directory. But you're six chapters into this book, and you're no longer a rank beginner. Very soon now you should evolve to organising your analytical projects into directories and, when working on a project, setting R's working directory to the associated directory.
__I do not recommend it__, but you can also set the working directory from within R:
```{r eval = FALSE}
setwd("~/myCoolProject")
setwd("/path/to/my/CoolProject")
```
But you should never do this because there's a better way; a way that also puts you on the path to managing your R work like an expert.
## Paths and directories
Paths and directories are a little complicated because there are two basic styles of paths: Mac/Linux and Windows. There are three chief ways in which they differ:
1. The most important difference is how you separate the components of the
path. Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows
uses backslashes (e.g. `plots\\diamonds.pdf`). R can work with either type
(no matter what platform you're currently using), but unfortunately,
backslashes mean something special to R, and to get a single backslash
in the path, you need to type two backslashes! That makes life frustrating,
so I recommend always using the Linux/Max style with forward slashes.
1. Absolute paths (i.e. paths that point to the same place regardless of
your working directory) look different. In Windows they start with a drive
letter (e.g. "C:`) or two backslashes (e.g. `\\\servername`) and in
Mac/Linux they start with a slash "/" (e.g. "/users/hadley"). You should
__never__ use absolute paths in your scripts, because they hinder sharing:
noone else will have exactly the same directory configuration as you.
1. The last minor difference is the place that "~" points to. "~" is a
convenient shortcut to your home directory. Windows doesn't really have
the notion of a home directory, so it instead points to your documents
directory.
## RStudio projects
R experts keep all the files associated with a project together --- input data, R scripts, analytical results, figures. This is such a wise and common practice that RStudio has built-in support for this via __projects__.
@ -71,16 +95,15 @@ Once this process is complete, you'll get a new RStudio project just for this bo
```{r eval = FALSE}
getwd()
#> [1] ~/Desktop/r4ds
#> [1] /Users/hadley/Documents/r4ds/r4ds
```
Now, whenever you refer to a file (sans directory) it will look for it here.
Whenever you refer to a file with a relative path it will look for it here.
Now enter the following commands in the script editor, and save the file, calling it "diamonds.R". Next, run the complete script which will save a pdf and csv file into your project directory. Don't worry about the details, you'll learn them later in the book.
```{r toy-line, eval = FALSE}
library(ggplot2)
library(readr)
library(tidyverse)
ggplot(diamonds, aes(carat, price)) +
geom_hex()
@ -93,11 +116,19 @@ Quit RStudio. Inspect the folder associated with your project --- notice the `.R
In your favorite OS-specific way, search your computer for `diamonds.pdf` and you will find the PDF (no surprise) but _also the script that created it_ (`diamonds.r`). This is huge win! One day you will want to remake a figure or just understand where it came from. If you rigorously save figures to files __with R code__ and never with the mouse or the clipboard, you will be able to reproduce old work with ease!
## Summary
In summary, RStudio projects give you a solid workflow that will serve you well in the future:
* Create an RStudio project for each data analyis project.
* Keep data files there; we'll talk about a bit later importing in [data import].
* Create an RStudio project for each data analyis project.
* Keep data files there; we'll talk about loading them into R in
[data import].
* Keep scripts there; edit them, run them in bits or as a whole.
* Save your outputs there.
* Save your outputs (plots and cleaned data) there.
* Only ever use relative paths, not absolute paths.
Everything you need is in one place, and cleanly separated from all the other projects that you are working on.