2. For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code).
3. As an environment in which to *do* data science, as a modern-day lab notebook where you can capture not only what you did, but also what you were thinking.
If you're an R Markdown user, you might be thinking "Quarto sounds a lot like R Markdown".
You're not wrong!
Quarto unifies the functionality of many packages from the R Markdown ecosystem (rmarkdown, bookdown, distill, xaringan, etc.) into a single consistent system as well as extends it with native support for multiple programming languages like Python and Julia in addition to R.
In a way, Quarto reflects everything that was learned from expanding and supporting the R Markdown ecosystem over a decade.
### Prerequisites
You need the Quarto command line interface (Quarto CLI), but you don't need to explicitly install it or load it, as RStudio automatically does both when needed.
If you don't like seeing your plots and output in your document and would rather make use of RStudio's console and plot panes, you can click on the gear icon next to "Render" and switch to "Chunk Output in Console", as shown in @fig-diamond-sizes-console-output.
When you render the document, Quarto sends the `.qmd` file to **knitr**, [https://yihui.name/knitr](https://yihui.name/knitr/){.uri}, which executes all of the code chunks and creates a new markdown (`.md`) document which includes the code and its output.
The markdown file generated by knitr is then processed by **pandoc**, [https://pandoc.org](https://pandoc.org/){.uri}, which is responsible for creating the finished file.
The Visual editor in RStudio provides a [WYSIWYM](https://en.wikipedia.org/wiki/WYSIWYM) interface for authoring Quarto documents.
Under the hood, prose in Quarto documents (`.qmd` files) is written in Markdown, a lightweight set of conventions for formatting plain text files.
In fact, Quarto uses Pandoc markdown (a slightly extended version of Markdown that Quarto understands), including tables, citations, cross-references, footnotes, divs/spans, definition lists, attributes, raw HTML/TeX, and more as well as support for executing code cells and viewing their output inline.
While Markdown is designed to be easy to read and write, as you will see in @sec-source-editor, it still requires learning new syntax.
Therefore, if you're new to computational documents like `.qmd` files but have experience using tools like Google Docs or MS Word, the easiest way to get started with Quarto in RStudio is the visual editor.
In the visual editor you can either use the buttons on the menu bar to insert images, tables, cross-references, etc. or you can use the catch-all <kbd>⌘ /</kbd> shortcut to insert just about anything.
Inserting images and customizing how they are displayed is also facilitated with the visual editor.
You can either paste an image from your clipboard directly into the visual editor (and RStudio will place a copy of that image in the project directory and link to it) or you can use the visual editor's Insert \> Figure / Image menu to browse to the image you want to insert or paste it's URL.
In addition, using the same menu you can resize the image as well as add a caption, alternative text, and a link.
The visual editor has many more features that we haven't enumerated here that you might find useful as you gain experience authoring with it.
Most importantly, while the visual editor displays your content with formatting, under the hood, it saves your content in plain Markdown and you can switch back and forth between the visual and source editors to view and edit your content using either tool.
4. In the visual editor, go to Insert \> Citation and insert a citation to the paper titled [Welcome to the Tidyverse](https://joss.theoj.org/papers/10.21105/joss.01686) using its DOI (digital object identifier), which is [10.21105/joss.01686](https://doi.org/10.21105/joss.01686). Render the document and observe how the reference shows up in the document. What change do you observe in the YAML of your document?
You can also edit Quarto documents using the Source editor in RStudio, without the assist of the Visual editor.
While the Visual editor will feel familiar to those with experience writing in tools like Google docs, the Source editor will feel familiar to those with experience writing R scripts or R Markdown documents.
The Source editor can also be useful for debugging any Quarto syntax errors since it's often easier to catch these in plain text.
The guide below shows how to use Pandoc's Markdown for authoring Quarto documents in the source editor.
4. Create a document in a Google doc or MS Word (or locate a document you have created previously) with some content in it such as headings, hyperlinks, formatted text, etc.
Copy the contents of this document and paste it into a Quarto document in the visual editor.
Then, switch over to the source editor and inspect the source code.
To run code inside a Quarto document, you need to insert a chunk.
There are three ways to do so:
1. The keyboard shortcut Cmd + Option + I / Ctrl + Alt + I.
2. The "Insert" button icon in the editor toolbar.
3. By manually typing the chunk delimiters ```` ```{r} ```` and ```` ``` ````.
We'd recommend you learn the keyboard shortcut.
It will save you a lot of time in the long run!
You can continue to run the code using the keyboard shortcut that by now (we hope!) you know and love: Cmd/Ctrl + Enter.
However, chunks get a new keyboard shortcut: Cmd/Ctrl + Shift + Enter, which runs all the code in the chunk.
Think of a chunk like a function.
A chunk should be relatively self-contained, and focused around a single task.
The following sections describe the chunk header which consists of ```` ```{r} ````, followed by an optional chunk label and various other chunk options, each on their own line, marked by `#|`.
### Chunk label
Chunks can be given an optional label, e.g.
```{r}
#| echo: fenced
#| label: simple-addition
1 + 1
```
This has three advantages:
1. You can more easily navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor:
```{r}
#| echo: false
#| out-width: "30%"
#| fig-alt: |
#| Snippet of RStudio IDE showing only the drop-down code navigator
#| which shows three chunks. Chunk 1 is setup. Chunk 2 is cars and
#| it is in a section called Quarto. Chunk 3 is pressure and it is in
- `error: true` causes the render to continue even if code returns an error.
This is rarely something you'll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your `.qmd`.
It's also useful if you're teaching R and want to deliberately include an error.
Each of these chunk options get added to the header of the chunk, following `#|`, e.g., in the following chunk the result is not printed since `eval` is set to false.
As you work more with knitr, you will discover that some of the default chunk options don't fit your needs and you want to change them.
You can do this by adding the preferred options in the document YAML, under `execute`.
For example, if you are preparing a report for an audience who does not need to see your code but only your results and narrative, you might set `echo: false` at the document level.
That will hide the code by default, so only showing the chunks you deliberately choose to show (with `echo: true`).
You might consider setting `message: false` and `warning: false`, but that would make it harder to debug problems because you wouldn't see any messages in the final document.
Since Quarto is designed to be multi-lingual (works with R as well as other languages like Python, Julia, etc.), all of the knitr options are not available at the document execution level since some of them only work with knitr and not other engines Quarto uses for running code in other languages (e.g., Jupyter).
To embed an image from an external file, you can use the Insert menu in RStudio and select Figure / Image.
This will pop open a menu where you can browse to the image you want to insert as well as add alternative text or caption to it and adjust its size.
In the visual editor you can also simply paste an image from your clipboard into your document and RStudio will place a copy of that image in your project folder.
If you include a code chunk that generates a figure (e.g., includes a `ggplot()` call), the resulting figure will be automatically included in your Quarto document.
Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e. height, width, and aspect ratio: pick two of three).
If you find that you're having to squint to read the text in your plot, you need to tweak `fig-width`.
If `fig-width` is larger than the size the figure is rendered in the final doc, the text will be too small; if `fig-width` is smaller, the text will be too big.
You'll often need to do a little experimentation to figure out the right ratio between the `fig-width` and the eventual width in your document.
To illustrate the principle, the following three plots have `fig-width` of 4, 6, and 8 respectively:
If you want to make sure the font size is consistent across all your figures, whenever you set `out-width`, you'll also need to adjust `fig-width` to maintain the same ratio with your default `out-width`.
For example, if your default `fig-width` is 6 and `out-width` is 0.7, when you set `out-width: "50%"` you'll need to set `fig-width` to 4.3 (6 \* 0.5 / 0.7).
Figure sizing and scaling is an art and science and getting things right can require an iterative trial-and-error approach.
You can learn more about figure sizing in the [taking control of plot scaling blog post](https://www.tidyverse.org/blog/2020/08/taking-control-of-plot-scaling/).
The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email).
1. Open `diamond-sizes.qmd` in the visual editor, find an image of a diamond, copy it, and paste it into the document. Double click on the image and add a caption. Resize the image and render your document. Observe how the image is saved in your current working directory.
2. Edit the label of the code chunk in `diamond-sizes.qmd` that generates a plot to start with the suffix `fig-` and add a caption to the figure with the chunk option `fig-cap`. Then, edit the text above the code chunk to add a cross-reference to the figure with Insert \> Cross Reference.
3. Change the size of the figure with the following chunk options, one at a time, render your document, and describe how the figure changes.
They can be markdown tables that you create directly in your Quarto document (using the Insert Table menu) or they can be tables generated as a result of a code chunk.
In this section we will focus on the latter, tables generated via computation.
By default, Quarto prints data frames and matrices as you'd see them in the console:
```{r}
mtcars[1:5, ]
```
If you prefer that data be displayed with additional formatting you can use the `knitr::kable()` function.
The code below generates @tbl-kable.
```{r}
#| label: tbl-kable
#| tbl-cap: A knitr kable.
knitr::kable(mtcars[1:5, ], )
```
Read the documentation for `?knitr::kable` to see the other ways in which you can customize the table.
For even deeper customization, consider the **gt**, **huxtable**, **reactable**, **kableExtra**, **xtable**, **stargazer**, **pander**, **tables**, and **ascii** packages.
Each provides a set of tools for returning formatted tables from R code.
1. Open `diamond-sizes.qmd` in the visual editor, insert a code chunk, and add a table with `knitr::kable()` that shows the first 5 rows of the `diamonds` data frame.
2. Display the same table with `gt::gt()` instead.
3. Add a chunk label that starts with the suffix `tbl-` and add a caption to the table with the chunk option `tbl-cap`. Then, edit the text above the code chunk to add a cross-reference to the table with Insert \> Cross Reference.
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes.
You can avoid that problem with the `dependson` chunk option:
`r chunk`{r}
#| label: processed-data
#| cache: true
#| dependson: "raw-data"
processed_data <- rawdata |>
filter(!is.na(import_var)) |>
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
`dependson` should contain a character vector of *every* chunk that the cached chunk depends on.
Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed.
Note that the chunks won't update if `a_very_large_file.csv` changes, because knitr caching only tracks changes within the `.qmd` file.
If you want to also track changes to that file you can use the `cache.extra` option.
This is an arbitrary R expression that will invalidate the cache whenever it changes.
A good function to use is `file.info()`: it returns a bunch of information about the file including when it was last modified.
We've followed the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) to name these chunks: each chunk is named after the primary object that it creates.
This makes it easier to understand the `dependson` specification.
1. Set up a network of chunks where `d` depends on `c` and `b`, and both `b` and `c` depend on `a`. Have each chunk print `lubridate::now()`, set `cache: true`, then verify your understanding of caching.
Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks.
Additionally, the error could be due to issues with the Quarto document itself or due to the R code in the Quarto document.
One common error in documents with code chunks is duplicated chunk labels, which are especially pervasive if your workflow involves copying and pasting code chunks.
To address this issue, all you need to do is to change one of your duplicated labels.
If the errors are due to the R code in the document, the first thing you should always try is to recreate the problem in an interactive session.
Restart R, then "Run all chunks" (either from Code menu, under Run region), or with the keyboard shortcut Ctrl + Alt + R.
If you're lucky, that will recreate the problem, and you can figure out what's going on interactively.
If that doesn't help, there must be something different between your interactive environment and the Quarto environment.
You're going to need to systematically explore the options.
The most common difference is the working directory: the working directory of a Quarto is the directory in which it lives.
Check the working directory is what you expect by including `getwd()` in a chunk.
Next, brainstorm all the things that might cause the bug.
You'll need to systematically check that they're the same in your R session and your Quarto session.
The easiest way to do that is to set `error: true` on the chunk causing the problem, then use `print()` and `str()` to check that settings are as you expect.
## YAML header
You can control many other "whole document" settings by tweaking the parameters of the YAML header.
You might wonder what YAML stands for: it's "YAML Ain't Markup Language", which is designed for representing hierarchical data in a way that's easy for humans to read and write.
HTML documents typically have a number of external dependencies (e.g., images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a `_files` folder in the same directory as your `.qmd` file.
If you publish the HTML file on a hosting platform (e.g., QuartoPub, <https://quartopub.com/>), the dependencies in this directory are published with your document and hence are available in the published report.
However, if you want to email the report to a colleague, you might prefer to have a single, self-contained, HTML document that embeds all of its dependencies.
You can do this by specifying the `embed-resources` option:
By default these dependencies are placed in a `_files` directory alongside your document.
For example, if you render `report.qmd` to HTML:
``` yaml
format:
html:
embed-resources: true
```
The resulting file will be self-contained, such that it will need no external files and no internet access to be displayed properly by a browser.
### Parameters
Quarto documents can include one or more parameters whose values can be set when you render the report.
Parameters are useful when you want to re-render the same report with distinct values for various key inputs.
For example, you might be producing sales reports per branch, exam results by student, or demographic summaries by country.
To declare one or more parameters, use the `params` field.
This example uses a `my_class` parameter to determine which class of cars to display:
```{r}
#| echo: false
#| out-width: "100%"
#| comment: ""
cat(readr::read_file("quarto/fuel-economy.qmd"))
```
As you can see, parameters are available within the code chunks as a read-only list named `params`.
You can write atomic vectors directly into the YAML header.
You can also run arbitrary R expressions by prefacing the parameter value with `!r`.
This is a good way to specify date/time parameters.
If you add a citation using one of the first three methods, the visual editor will automatically create a `bibliography.bib` file for you and add the reference to it.
It will also add a `bibliography` field to the document YAML.
As you add more references, this file will get populated with their citations.
You can also directly edit this file using many common bibliography formats including BibLaTeX, BibTeX, EndNote, Medline.
To create a citation within your .qmd file in the source editor, use a key composed of '\@' + the citation identifier from the bibliography file.
Then place the citation in square brackets.
Here are some examples:
``` markdown
Separate multiple citations with a `;`: Blah blah [@smith04; @doe99].
You can add arbitrary comments inside the square brackets:
Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1].
Remove the square brackets to create an in-text citation: @smith04
says blah, or @smith04 [p. 33] says blah.
Add a `-` before the citation to suppress the author's name:
Smith says blah [-@smith04].
```
When Quarto renders your file, it will build and append a bibliography to the end of your document.
The bibliography will contain each of the cited references from your bibliography file, but it will not contain a section heading.
As a result it is common practice to end your file with a section header for the bibliography, such as `# References` or `# Bibliography`.
You can change the style of your citations and bibliography by referencing a CSL (citation style language) file in the `csl` field:
``` yaml
bibliography: rmarkdown.bib
csl: apa.csl
```
As with the bibliography field, your csl file should contain a path to the file.
Here we assume that the csl file is in the same directory as the .qmd file.
A good place to find CSL style files for common bibliography styles is <https://github.com/citation-style-language/styles>.
Earlier, we discussed a basic workflow for capturing your R code where you work interactively in the *console*, then capture what works in the *script editor*.
Quarto brings together the console and the script editor, blurring the lines between interactive exploration and long-term code capture.
You can rapidly iterate within a chunk, editing and re-executing with Cmd/Ctrl + Shift + Enter.
When you're happy, you move on and start a new chunk.
Quarto is also important because it so tightly integrates prose and code.
This makes it a great **analysis notebook** because it lets you develop code and record your thoughts.
An analysis notebook shares many of the same goals as a classic lab notebook in the physical sciences.
It:
- Records what you did and why you did it.
Regardless of how great your memory is, if you don't record what you do, there will come a time when you have forgotten important details.
Write them down so you don't forget!
- Supports rigorous thinking.
You are more likely to come up with a strong analysis if you record your thoughts as you go, and continue to reflect on them.
This also saves you time when you eventually write up your analysis to share with others.
- Helps others understand your work.
It is rare to do data analysis by yourself, and you'll often be working as part of a team.
A lab notebook helps you share not only what you've done, but why you did it with your colleagues or lab mates.
Much of the good advice about using lab notebooks effectively can also be translated to analysis notebooks.
We've drawn on our own experiences and Colin Purrington's advice on lab notebooks (<https://colinpurrington.com/tips/lab-notebooks>) to come up with the following tips:
- Ensure each notebook has a descriptive title, an evocative file name, and a first paragraph that briefly describes the aims of the analysis.
- Use the YAML header date field to record the date you started working on the notebook:
``` yaml
date: 2016-08-23
```
Use ISO8601 YYYY-MM-DD format so that's there no ambiguity.
Use it even if you don't normally write dates that way!
- If you spend a lot of time on an analysis idea and it turns out to be a dead end, don't delete it!
Write up a brief note about why it failed and leave it in the notebook.
That will help you avoid going down the same dead end when you come back to the analysis in the future.
- Generally, you're better off doing data entry outside of R.
But if you do need to record a small snippet of data, clearly lay it out using `tibble::tribble()`.
- If you discover an error in a data file, never modify it directly, but instead write code to correct the value.
Explain why you made the fix.
- Before you finish for the day, make sure you can render the notebook.
If you're using caching, make sure to clear the caches.
That will let you fix any problems while the code is still fresh in your mind.
- If you want your code to be reproducible in the long-run (i.e. so you can come back to run it next month or next year), you'll need to track the versions of the packages that your code uses.
A rigorous approach is to use **renv**, <https://rstudio.github.io/renv/index.html>, which stores packages in your project directory.
A quick and dirty hack is to include a chunk that runs `sessionInfo()` --- that won't let you easily recreate your packages as they are today, but at least you'll know what they were.
- You are going to create many, many, many analysis notebooks over the course of your career.
How are you going to organize them so you can find them again in the future?
We recommend storing them in individual projects, and coming up with a good naming scheme.
In this chapter introduced you to Quarto for authoring and publishing reproducible computational documents that include your code and your prose in one place.
You've learned about writing Quarto documents in RStudio with the visual or the source editor, how code chunks work and how to customize options for them, how to include figures and tables in your Quarto documents, and options for caching for computations.
Additionally, you've learned about adjusting YAML header options for creating self-contained or parametrized documents as well as including citations and bibliography.
We have also given you some troubleshooting and workflow tips.
To improve your writing, we highly recommend reading either [*Style: Lessons in Clarity and Grace*](https://www.amazon.com/Style-Lessons-Clarity-Grace-12th/dp/0134080416) by Joseph M. Williams & Joseph Bizup, or [*The Sense of Structure: Writing from the Reader's Perspective*](https://www.amazon.com/Sense-Structure-Writing-Readers-Perspective/dp/0205296327) by George Gopen.
Both books will help you understand the structure of sentences and paragraphs, and give you the tools to make your writing more clear.
(These books are rather expensive if purchased new, but they're used by many English classes so there are plenty of cheap second-hand copies).
George Gopen also has a number of short articles on writing at <https://www.georgegopen.com/the-litigation-articles.html>.
They are aimed at lawyers, but almost everything applies to data scientists too.