RStudio automatically saves the contents of the script editor when you quit, and automatically reloads it when you re-open.
Nevertheless, it's a good idea to avoid Untitled1, Untitled2, Untitled3, and so on and instead save your scripts and to give them informative names.
It might be tempting to name your files `code.R` or `myscript.R`, but you should think a bit harder before choosing a name for your file.
Three important principles for file naming are as follows:
1. File names should be **machine** readable: avoid spaces, symbols, and special characters. Don't rely on case sensitivity to distinguish files.
2. File names should be **human** readable: use file names to describe what's in the file.
3. File names should play well with default ordering: start file names with numbers so that alphabetical sorting puts them in the order they get used.
For example, suppose you have the following files in a project folder.
alternative model.R
code for exploratory analysis.r
finalreport.qmd
FinalReport.qmd
fig 1.png
Figure_02.png
model_first_try.R
run-first.r
temp.txt
There are a variety of problems here: it's hard to find which file to run first, file names contain spaces, there are two files with the same name but different capitalization (`finalreport` vs. `FinalReport`[^workflow-scripts-1]), and some names don't describe their contents (`run-first` and `temp`).
[^workflow-scripts-1]: Not to mention that you're tempting fate by using "final" in the name 😆 The comic piled higher and deeper has a [fun strip on this](https://phdcomics.com/comics/archive.php?comicid=1531).
Here's better way of naming and organizing the same set of files:
01-load-data.R
02-exploratory-analysis.R
03-model-approach-1.R
04-model-approach-2.R
fig-01.png
fig-02.png
report-2022-03-20.qmd
report-2022-04-02.qmd
report-draft-notes.txt
Numbering the key scripts make it obvious in which order to run them and a consistent naming scheme makes it easier to see what varies.
Additionally, the figures are labelled similarly, the reports are distinguished by dates included in the file names, and `temp` is renamed to `report-draft-notes` to better describe its contents.
With only your environment, it's much harder to recreate your R scripts: you'll either have to retype a lot of code from memory (inevitably making mistakes along the way) or you'll have to carefully mine your R history.
To help keep your R scripts as the source of truth for your analysis, we highly recommend that you instruct RStudio not to preserve your workspace between sessions.
You can do this either by running `usethis::use_blank_slate()`[^workflow-scripts-2] or by mimicking the options shown in @fig-blank-slate. This will cause you some short-term pain, because now when you restart RStudio, it will no longer remember the code that you ran last time.
But this short-term pain saves you long-term agony because it forces you to capture all important interactions in your code.
There's nothing worse than discovering three months after the fact that you've only stored the results of an important calculation in your workspace, not the calculation itself in your code.
As a beginning R user, it's OK to let your working directory be your home directory, documents directory, or any other weird directory on your computer.
But you're nine chapters into this book, and you're no longer a rank beginner.
Very soon now you should evolve to organizing your projects into directories and, when working on a project, set R's working directory to the associated directory.
Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via **projects**.
Because you followed our instructions above, you will, however, have a completely fresh environment, guaranteeing that you're starting with a clean slate.
In your favorite OS-specific way, search your computer for `diamonds.pdf` and you will find the PDF (no surprise) but *also the script that created it* (`diamonds.R`).
Absolute paths point to the same place regardless of your working directory.
They look a little different depending on your operating system.
On Windows they start with a drive letter (e.g. `C:`) or two backslashes (e.g. `\\servername`) and on Mac/Linux they start with a slash "/" (e.g. `/users/hadley`).
You should **never** use absolute paths in your scripts, because they hinder sharing: no one else will have exactly the same directory configuration as you.
There's another important difference between operating systems: how you separate the components of the path.
Mac and Linux uses slashes (e.g. `plots/diamonds.pdf`) and Windows uses backslashes (e.g. `plots\diamonds.pdf`).
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
- Create one RStudio project for each data analysis project.
- Save your scripts (with informative names) in the project, edit them, run them in bits or as a whole. Restart R frequently to make sure you've captured everything in your scripts.
In this chapter, you've learned how to organize your R code in scripts (files) and projects (directories).
Much like code style, this may feel like busywork at first.
But as you accumulate more code across multiple projects, you'll learn to appreciate how a little up front organisation can save you a bunch of time down the road.
Next up, we'll switch back to data science tooling to talk about exploratory data analysis (or EDA for short), a philosophy and set of tools that you can use with your data to start to get a sense of what's going on.