r4ds/oreilly/quarto.html

685 lines
51 KiB
HTML
Raw Normal View History

<section data-type="chapter" id="chp-quarto">
2022-11-19 01:55:22 +08:00
<h1><span id="sec-quarto" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Quarto</span></span></h1>
<section id="quarto-introduction" data-type="sect1">
<h1>
Introduction</h1>
<p>Quarto provides a unified authoring framework for data science, combining your code, its results, and your prose. Quarto documents are fully reproducible and support dozens of output formats, like PDFs, Word files, presentations, and more.</p>
<p>Quarto files are designed to be used in three ways:</p>
<ol type="1"><li><p>For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.</p></li>
<li><p>For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code).</p></li>
<li><p>As an environment in which to <em>do</em> data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.</p></li>
</ol><p>Quarto is a command line interface tool, not an R package. This means that help is, by-and-large, not available through <code>?</code>. Instead, as you work through this chapter, and use Quarto in the future, you should refer to the Quarto documentation page at <a href="https://quarto.org/" class="uri">https://quarto.org</a> for help.</p>
<p>If youre an R Markdown user, you might be thinking “Quarto sounds a lot like R Markdown”. Youre not wrong! Quarto unifies the functionality of many packages from the R Markdown ecosystem (rmarkdown, bookdown, distill, xaringan, etc.) into a single consistent system as well as extends it with native support for multiple programming languages like Python and Julia in addition to R. In a way, Quarto reflects everything that was learned from expanding and supporting the R Markdown ecosystem over a decade.</p>
<section id="quarto-prerequisites" data-type="sect2">
<h2>
Prerequisites</h2>
<p>You need the Quarto command line interface (Quarto CLI), but you dont need to explicitly install it or load it, as RStudio automatically does both when needed.</p>
</section>
</section>
<section id="quarto-basics" data-type="sect1">
<h1>
Quarto basics</h1>
<p>This is a Quarto file a plain text file that has the extension <code>.qmd</code>:</p>
<div class="cell">
<pre><code>---
title: "Diamond sizes"
date: 2022-09-12
format: html
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
smaller &lt;- diamonds |&gt;
filter(carat &lt;= 2.5)
```
We have data about `r nrow(diamonds)` diamonds.
Only `r nrow(diamonds) - nrow(smaller)` are larger than 2.5 carats.
The distribution of the remainder is shown below:
```{r}
#| label: plot-smaller-diamonds
#| echo: false
smaller |&gt;
2023-01-13 07:22:57 +08:00
ggplot(aes(x = carat)) +
geom_freqpoly(binwidth = 0.01)
```</code></pre>
</div>
<p>It contains three important types of content:</p>
<ol type="1"><li>An (optional) <strong>YAML header</strong> surrounded by <code>---</code>s.</li>
<li>
<strong>Chunks</strong> of R code surrounded by <code>```</code>.</li>
<li>Text mixed with simple text formatting like <code># heading</code> and <code>_italics_</code>.</li>
</ol><p>When you open a <code>.qmd</code>, you get a notebook interface where code and output are interleaved. You can run each code chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Cmd/Ctrl + Shift + Enter. RStudio executes the code and displays the results inline with the code:</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto/diamond-sizes-notebook.png" class="img-fluid" style="width:90.0%" alt="RStudio window with a Quarto document titled &quot;diamond-sizes.qmd&quot; on the left and a blank Viewer window on the right. The Quarto document has a code chunk that creates a frequency plot of diamonds that weigh less then 2.5 carats. The plot shows that the frequency decreases as the weight increases."/></p>
</div>
</div>
<p>If you dont like seeing your plots and output in your document and would rather make use of RStudios console and plot panes, you can click on the gear icon next to “Render” and switch to “Chunk Output in Console”.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto/diamond-sizes-console-output.png" class="img-fluid" style="width:90.0%" alt="RStudio window with a Quarto document titled &quot;diamond-sizes.qmd&quot; on the left and the Plot pane on the bottom right. The Quarto document has a code chunk that creates a frequency plot of diamonds that weigh less then 2.5 carats. The plot is displayed in the Plot pane and shows that the frequency decreases as the weight increases. The RStudio option to show Chunk Output in Console is also highlighted."/></p>
</div>
</div>
<p>To produce a complete report containing all text, code, and results, click “Render” or press Cmd/Ctrl + Shift + K. You can also do this programmatically with <code>quarto::quarto_render("diamond-sizes.qmd")</code>. This will display the report in the viewer pane and create an HTML file.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto/diamond-sizes-report.png" class="img-fluid" style="width:90.0%" alt="RStudio window with a Quarto document titled &quot;diamond-sizes.qmd&quot; on the left and the Plot pane on the bottom right. The rendered document does not show any of the code, but the code is visible in the source document."/></p>
</div>
</div>
<p>When you render the document, Quarto sends the <code>.qmd</code> file to <strong>knitr</strong>, <a href="https://yihui.name/knitr/" class="uri">https://yihui.name/knitr</a>, which executes all of the code chunks and creates a new markdown (<code>.md</code>) document which includes the code and its output. The markdown file generated by knitr is then processed by <strong>pandoc</strong>, <a href="https://pandoc.org/" class="uri">https://pandoc.org</a>, which is responsible for creating the finished file. The advantage of this two step workflow is that you can create a very wide range of output formats, as youll learn about in <a href="#chp-quarto-formats" data-type="xref">#chp-quarto-formats</a>.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="images/quarto-flow.png" class="img-fluid" style="width:75.0%" alt="Workflow diagram starting with a qmd file, then knitr, then md, then pandoc, then PDF, MS Word, or HTML."/></p>
</div>
</div>
<p>To get started with your own <code>.qmd</code> file, select <em>File &gt; New File &gt; Quarto Document…</em> in the menu bar. RStudio will launch a wizard that you can use to pre-populate your file with useful content that reminds you how the key features of Quarto work.</p>
<p>The following sections dive into the three components of a Quarto document in more details: the markdown text, the code chunks, and the YAML header.</p>
<section id="quarto-exercises" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>Create a new Quarto document using <em>File &gt; New File &gt; Quarto Document</em>. Read the instructions. Practice running the chunks individually. Then render the document by clicking the appropriate button and then by using the appropriate keyboard short cut. Verify that you can modify the code, re-run it, and see modified output.</p></li>
<li><p>Create one new Quarto document for each of the three built-in formats: HTML, PDF and Word. Render each of the three documents. How do the outputs differ? How do the inputs differ? (You may need to install LaTeX in order to build the PDF output — RStudio will prompt you if this is necessary.)</p></li>
</ol></section>
</section>
<section id="visual-editor" data-type="sect1">
<h1>
Visual editor</h1>
2022-11-19 00:30:32 +08:00
<p>The Visual editor in RStudio provides a <a href="https://en.wikipedia.org/wiki/WYSIWYM">WYSIWYM</a> interface for authoring Quarto documents. Under the hood, prose in Quarto documents (<code>.qmd</code> files) is written in Markdown, a lightweight set of conventions for formatting plain text files. In fact, Quarto uses Pandoc markdown (a slightly extended version of Markdown that Quarto understands), including tables, citations, cross-references, footnotes, divs/spans, definition lists, attributes, raw HTML/TeX, and more as well as support for executing code cells and viewing their output inline. While Markdown is designed to be easy to read and write, as you will see in <a href="#sec-source-editor" data-type="xref">#sec-source-editor</a>, it still requires learning new syntax. Therefore, if youre new to computational documents like <code>.qmd</code> files but have experience using tools like Google Docs or MS Word, the easiest way to get started with Quarto in RStudio is the visual editor.</p>
<p>In the visual editor you can either use the buttons on the menu bar to insert images, tables, cross-references, etc. or you can use the catch-all <kbd>⌘ /</kbd> shortcut to insert just about anything. If you are at the beginning of a line (as shown below), you can also enter just <kbd>/</kbd> to invoke the shortcut.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto/quarto-visual-editor.png" class="img-fluid" style="width:75.0%" alt="A Quarto document displaying various features of the visual editor such as text formatting (italic, bold, underline, small caps, code, superscript, and subscript), first through third level headings, bulleted and numbered lists, links, linked phrases, and images (along with a pop-up window for customizing image size, adding a caption and alt text, etc.), tables with a header row, and the insert anything tool with options to insert an R code chunk, a Python code chunk, a div, a bullet list, a numbered list, or a first level heading (the top few choices in the tool)."/></p>
</div>
</div>
<p>Inserting images and customizing how they are displayed is also facilitated with the visual editor. You can either paste an image from your clipboard directly into the visual editor (and RStudio will place a copy of that image in the project directory and link to it) or you can use the visual editors Insert &gt; Figure / Image menu to browse to the image you want to insert or paste its URL. In addition, using the same menu you can resize the image as well as add a caption, alternative text, and a link.</p>
<p>The visual editor has many more features that we havent enumerated here that you might find useful as you gain experience authoring with it.</p>
<p>Most importantly, while the visual editor displays your content with formatting, under the hood, it saves your content in plain Markdown and you can switch back and forth between the visual and source editors to view and edit your content using either tool.</p>
<section id="quarto-exercises-1" data-type="sect2">
<h2>
Exercises</h2>
<!--# TO DO: Add exercises. -->
</section>
</section>
<section id="sec-source-editor" data-type="sect1">
<h1>
Source editor</h1>
<p>You can also edit Quarto documents using the Source editor in RStudio, without the assist of the Visual editor. While the Visual editor will feel familiar to those with experience writing in tools like Google docs, the Source editor will feel familiar to those with experience writing R scripts or R Markdown documents. The Source editor can also be useful for debugging any Quarto syntax errors since its often easier to catch these in plain text.</p>
<p>The guide below shows how to use Pandocs Markdown for authoring Quarto documents in the source editor.</p>
<div class="cell">
<pre><code>## Text formatting
*italic* **bold** [underline]{.underline} ~~strikeout~~ [small caps]{.smallcaps} `code` superscript^2^ and subscript~2~
## Headings
# 1st Level Header
## 2nd Level Header
### 3rd Level Header
## Lists
- Bulleted list item 1
- Item 2
- Item 2a
- Item 2b
1. Numbered list item 1
2. Item 2.
The numbers are incremented automatically in the output.
## Links and images
&lt;http://example.com&gt;
[linked phrase](http://example.com)
![optional caption text](quarto.png){fig-alt="Quarto logo and the word quarto spelled in small case letters"}
## Tables
| First Header | Second Header |
|--------------|---------------|
| Content Cell | Content Cell |
| Content Cell | Content Cell |
/</code></pre>
</div>
<p>The best way to learn these is simply to try them out. It will take a few days, but soon they will become second nature, and you wont need to think about them. If you forget, you can get to a handy reference sheet with <em>Help &gt; Markdown Quick Reference</em>.</p>
<section id="quarto-exercises-2" data-type="sect2">
<h2>
Exercises</h2>
<ol type="1"><li><p>Practice what youve learned by creating a brief CV. The title should be your name, and you should include headings for (at least) education or employment. Each of the sections should include a bulleted list of jobs/degrees. Highlight the year in bold.</p></li>
<li>
<p>Using the visual editor, figure out how to:</p>
<ol type="a"><li>Add a footnote.</li>
<li>Add a horizontal rule.</li>
<li>Add a block quote.</li>
</ol></li>
<li>
<p>Now, using the source editor and the Markdown quick reference, figure out how to:</p>
<ol type="a"><li>Add a footnote.</li>
<li>Add a horizontal rule.</li>
<li>Add a block quote.</li>
</ol></li>
<li><p>Copy and paste the contents of <code>diamond-sizes.qmd</code> from <a href="https://github.com/hadley/r4ds/tree/main/quarto" class="uri">https://github.com/hadley/r4ds/tree/main/quarto</a> in to a local R Quarto document. Check that you can run it, then add text after the frequency polygon that describes its most striking features.</p></li>
</ol></section>
</section>
<section id="code-chunks" data-type="sect1">
<h1>
Code chunks</h1>
<p>To run code inside a Quarto document, you need to insert a chunk. There are three ways to do so:</p>
<ol type="1"><li><p>The keyboard shortcut Cmd + Option + I / Ctrl + Alt + I.</p></li>
<li><p>The “Insert” button icon in the editor toolbar.</p></li>
<li><p>By manually typing the chunk delimiters <code>```{r}</code> and <code>```</code>.</p></li>
</ol><p>Wed recommend you learn the keyboard shortcut. It will save you a lot of time in the long run!</p>
<p>You can continue to run the code using the keyboard shortcut that by now (we hope!) you know and love: Cmd/Ctrl + Enter. However, chunks get a new keyboard shortcut: Cmd/Ctrl + Shift + Enter, which runs all the code in the chunk. Think of a chunk like a function. A chunk should be relatively self-contained, and focused around a single task.</p>
<p>The following sections describe the chunk header which consists of <code>```{r}</code>, followed by an optional chunk label and various other chunk options, each on their own line, marked by <code>#|</code>.</p>
<section id="chunk-label" data-type="sect2">
<h2>
Chunk label</h2>
<p>Chunks can be given an optional label, e.g.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="markdown">```{r}
#| label: simple-addition
1 + 1
```</pre>
<pre><code>#&gt; [1] 2</code></pre>
</div>
<p>This has three advantages:</p>
<ol type="1"><li>
<p>You can more easily navigate to specific chunks using the drop-down code navigator in the bottom-left of the script editor:</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="screenshots/quarto-chunk-nav.png" class="img-fluid" style="width:30.0%" alt="Snippet of RStudio IDE showing only the drop-down code navigator which shows three chunks. Chunk 1 is setup. Chunk 2 is cars and it is in a section called Quarto. Chunk 3 is pressure and it is in a section called Including plots."/></p>
</div>
</div>
</li>
<li><p>Graphics produced by the chunks will have useful names that make them easier to use elsewhere. More on that in <a href="#sec-figures" data-type="xref">#sec-figures</a>.</p></li>
<li><p>You can set up networks of cached chunks to avoid re-performing expensive computations on every run. More on that in <a href="#sec-caching" data-type="xref">#sec-caching</a>.</p></li>
</ol><p>Your chunk labels should be short but evocative and should not contain spaces. We recommend using dashes (<code>-</code>) to separate words (instead of underscores, <code>_</code>) and avoiding other special characters in chunk labels.</p>
<p>You are generally free to label your chunk however you like, but there is one chunk name that imbues special behavior: <code>setup</code>. When youre in a notebook mode, the chunk named setup will be run automatically once, before any other code is run.</p>
<p>Additionally, chunk labels cannot be duplicated. Each chunk label must be unique.</p>
</section>
<section id="chunk-options" data-type="sect2">
<h2>
Chunk options</h2>
<p>Chunk output can be customized with <strong>options</strong>, fields supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks. Here well cover the most important chunk options that youll use frequently. You can see the full list at <a href="https://yihui.name/knitr/options/" class="uri">https://yihui.name/knitr/options</a>.</p>
<p>The most important set of options controls if your code block is executed and what results are inserted in the finished report:</p>
<ul><li><p><code>eval: false</code> prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.</p></li>
<li><p><code>include: false</code> runs the code, but doesnt show the code or results in the final document. Use this for setup code that you dont want cluttering your report.</p></li>
<li><p><code>echo: false</code> prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who dont want to see the underlying R code.</p></li>
<li><p><code>message: false</code> or <code>warning: false</code> prevents messages or warnings from appearing in the finished file.</p></li>
<li><p><code>results: hide</code> hides printed output; <code>fig-show: hide</code> hides plots.</p></li>
<li><p><code>error: true</code> causes the render to continue even if code returns an error. This is rarely something youll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your <code>.qmd</code>. Its also useful if youre teaching R and want to deliberately include an error. The default, <code>error: false</code> causes rendering to fail if there is a single error in the document.</p></li>
2023-01-13 07:22:57 +08:00
</ul><p>Each of these chunk options get added to the header of the chunk, following <code>#|</code>, e.g. in the following chunk the result is not printed since <code>eval</code> is set to false.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="markdown">```{r}
#| label: simple-multiplication
#| eval: false
2 * 2
```</pre>
</div>
<p>The following table summarizes which types of output each option suppresses:</p>
<table class="table"><colgroup><col style="width: 24%"/><col style="width: 13%"/><col style="width: 14%"/><col style="width: 10%"/><col style="width: 9%"/><col style="width: 13%"/><col style="width: 13%"/></colgroup><thead><tr class="header"><th>Option</th>
<th style="text-align: center;">Run code</th>
<th style="text-align: center;">Show code</th>
<th style="text-align: center;">Output</th>
<th style="text-align: center;">Plots</th>
<th style="text-align: center;">Messages</th>
<th style="text-align: center;">Warnings</th>
</tr></thead><tbody><tr class="odd"><td><code>eval: false</code></td>
<td style="text-align: center;">-</td>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
</tr><tr class="even"><td><code>include: false</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
<td style="text-align: center;">-</td>
</tr><tr class="odd"><td><code>echo: false</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
</tr><tr class="even"><td><code>results: hide</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
</tr><tr class="odd"><td><code>fig-show: hide</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
</tr><tr class="even"><td><code>message: false</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
<td style="text-align: center;"/>
</tr><tr class="odd"><td><code>warning: false</code></td>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;"/>
<td style="text-align: center;">-</td>
</tr></tbody></table></section>
<section id="global-options" data-type="sect2">
<h2>
Global options</h2>
<p>As you work more with knitr, you will discover that some of the default chunk options dont fit your needs and you want to change them.</p>
<p>You can do this by adding the preferred options in the document YAML, under <code>execute</code>. For example, if you are preparing a report for an audience who does not need to see your code but only your results and narrative, you might set <code>echo: false</code> at the document level. That will hide the code by default, so only showing the chunks you deliberately choose to show (with <code>echo: true</code>). You might consider setting <code>message: false</code> and <code>warning: false</code>, but that would make it harder to debug problems because you wouldnt see any messages in the final document.</p>
<pre data-type="programlisting" data-code-language="yaml">title: "My report"
execute:
echo: false</pre>
2023-01-13 07:22:57 +08:00
<p>Since Quarto is designed to be multi-lingual (works with R as well as other languages like Python, Julia, etc.), all of the knitr options are not available at the document execution level since some of them only work with knitr and not other engines Quarto uses for running code in other languages (e.g. Jupyter). You can, however, still set these as global options for your document under the <code>knitr</code> field, under <code>opts_chunk</code>. For example, when writing books and tutorials we set:</p>
<pre data-type="programlisting" data-code-language="yaml">title: "Tutorial"
knitr:
opts_chunk:
comment: "#&gt;"
collapse: true</pre>
<p>This uses our preferred comment formatting and ensures that the code and output are kept closely entwined.</p>
</section>
<section id="inline-code" data-type="sect2">
<h2>
Inline code</h2>
<p>There is one other way to embed R code into a Quarto document: directly into the text, with: <code>`r `</code>. This can be very useful if you mention properties of your data in the text. For example, the example document used at the start of the chapter had:</p>
<blockquote class="blockquote">
<p>We have data about <code>`r nrow(diamonds)`</code> diamonds. Only <code>`r nrow(diamonds) - nrow(smaller)`</code> are larger than 2.5 carats. The distribution of the remainder is shown below:</p>
</blockquote>
<p>When the report is rendered, the results of these computations are inserted into the text:</p>
<blockquote class="blockquote">
<p>We have data about 53940 diamonds. Only 126 are larger than 2.5 carats. The distribution of the remainder is shown below:</p>
</blockquote>
2022-11-19 00:30:32 +08:00
<p>When inserting numbers into text, <code><a href="https://rdrr.io/r/base/format.html">format()</a></code> is your friend. It allows you to set the number of <code>digits</code> so you dont print to a ridiculous degree of accuracy, and a <code>big.mark</code> to make numbers easier to read. You might combine these into a helper function:</p>
<div class="cell">
2022-11-19 01:26:25 +08:00
<pre data-type="programlisting" data-code-language="r">comma &lt;- function(x) format(x, digits = 2, big.mark = ",")
comma(3452345)
#&gt; [1] "3,452,345"
comma(.12358124331)
#&gt; [1] "0.12"</pre>
</div>
</section>
<section id="quarto-exercises-3" data-type="sect2">
<h2>
Exercises</h2>
2023-01-13 07:22:57 +08:00
<ol type="1"><li><p>Add a section that explores how diamond sizes vary by cut, color, and clarity. Assume youre writing a report for someone who doesnt know R, and instead of setting <code>echo: false</code> on each chunk, set a global option.</p></li>
<li><p>Download <code>diamond-sizes.qmd</code> from <a href="https://github.com/hadley/r4ds/tree/main/quarto" class="uri">https://github.com/hadley/r4ds/tree/main/quarto</a>. Add a section that describes the largest 20 diamonds, including a table that displays their most important attributes.</p></li>
<li><p>Modify <code>diamonds-sizes.qmd</code> to use <code>label_comma()</code> to produce nicely formatted output. Also include the percentage of diamonds that are larger than 2.5 carats.</p></li>
</ol></section>
</section>
<section id="sec-figures" data-type="sect1">
<h1>
Figures</h1>
2023-01-13 07:22:57 +08:00
<p>The figures in a Quarto document can be embedded (e.g. a PNG or JPEG file) or generated as a result of a code chunk.</p>
<p>To embed an image from an external file, you can use the Insert menu in RStudio and select Figure / Image. This will pop open a menu where you can browse to the image you want to insert as well as add alternative text or caption to it and adjust its size. In the visual editor you can also simply paste an image from your clipboard into your document and RStudio will place a copy of that image in your project folder.</p>
2023-01-13 07:22:57 +08:00
<p>If you include a code chunk that generates a figure (e.g. includes a <code>ggplot()</code> call), the resulting figure will be automatically included in your Quarto document.</p>
<section id="figure-sizing" data-type="sect2">
<h2>
Figure sizing</h2>
2023-01-13 07:22:57 +08:00
<p>The biggest challenge of graphics in Quarto is getting your figures the right size and shape. There are five main options that control figure sizing: <code>fig-width</code>, <code>fig-height</code>, <code>fig-asp</code>, <code>out-width</code> and <code>out-height</code>. Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e. height, width, and aspect ratio: pick two of three).</p>
<!-- TODO: https://www.tidyverse.org/blog/2020/08/taking-control-of-plot-scaling/ -->
<p>We recommend three of the five options:</p>
<ul><li><p>Plots tend to be more aesthetically pleasing if they have consistent width. To enforce this, set <code>fig-width: 6</code> (6”) and <code>fig-asp: 0.618</code> (the golden ratio) in the defaults. Then in individual chunks, only adjust <code>fig-asp</code>.</p></li>
<li><p>Control the output size with <code>out-width</code> and set it to a percentage of the line width. We suggest to <code>out-width: "70%"</code> and <code>fig-align: "center"</code>. That gives plots room to breathe, without taking up too much space.</p></li>
<li><p>To put multiple plots in a single row, set the <code>out-width</code> to <code>50%</code> for two plots, <code>33%</code> for 3 plots, or <code>25%</code> to 4 plots, and set <code>fig-align: "default"</code>. Depending on what youre trying to illustrate (e.g. show data or show plot variations), you might also tweak <code>fig-width</code>, as discussed below.</p></li>
</ul><p>If you find that youre having to squint to read the text in your plot, you need to tweak <code>fig-width</code>. If <code>fig-width</code> is larger than the size the figure is rendered in the final doc, the text will be too small; if <code>fig-width</code> is smaller, the text will be too big. Youll often need to do a little experimentation to figure out the right ratio between the <code>fig-width</code> and the eventual width in your document. To illustrate the principle, the following three plots have <code>fig-width</code> of 4, 6, and 8 respectively:</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid" width="384"/></p>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid" width="576"/></p>
</div>
</div>
<div class="cell">
<div class="cell-output-display">
<p><img src="quarto_files/figure-html/unnamed-chunk-17-1.png" class="img-fluid" width="768"/></p>
</div>
</div>
<p>If you want to make sure the font size is consistent across all your figures, whenever you set <code>out-width</code>, youll also need to adjust <code>fig-width</code> to maintain the same ratio with your default <code>out-width</code>. For example, if your default <code>fig-width</code> is 6 and <code>out-width</code> is 0.7, when you set <code>out-width: "50%"</code> youll need to set <code>fig-width</code> to 4.3 (6 * 0.5 / 0.7).</p>
</section>
<section id="other-important-options" data-type="sect2">
<h2>
Other important options</h2>
<p>When mingling code and text, like in this book, you can set <code>fig-show: "hold"</code> so that plots are shown after the code. This has the pleasant side effect of forcing you to break up large blocks of code with their explanations.</p>
<p>To add a caption to the plot, use <code>fig-cap</code>. In Quarto this will change the figure from inline to “floating”.</p>
<p>If youre producing PDF output, the default graphics type is PDF. This is a good default because PDFs are high quality vector graphics. However, they can produce very large and slow plots if you are displaying thousands of points. In that case, set <code>fig-format: "png"</code> to force the use of PNGs. They are slightly lower quality, but will be much more compact.</p>
<p>Its a good idea to name code chunks that produce figures, even if you dont routinely label other chunks. The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet).</p>
</section>
<section id="quarto-exercises-4" data-type="sect2">
<h2>
Exercises</h2>
<!--# TO DO: Add exercises -->
</section>
</section>
<section id="quarto-tables" data-type="sect1">
<h1>
Tables</h1>
<p>Similar to figures, you can include two types of tables in a Quarto document. They can be markdown tables that you create in directly in your Quarto document (using the Insert Table menu) or they can be tables generated as a result of a code chunk. In this section we will focus on the latter, tables generated via computation.</p>
<p>By default, Quarto prints data frames and matrices as youd see them in the console:</p>
<div class="cell">
2022-11-19 01:26:25 +08:00
<pre data-type="programlisting" data-code-language="r">mtcars[1:5, ]
#&gt; mpg cyl disp hp drat wt qsec vs am gear carb
#&gt; Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#&gt; Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#&gt; Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#&gt; Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#&gt; Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2</pre>
</div>
2022-11-19 00:30:32 +08:00
<p>If you prefer that data be displayed with additional formatting you can use the <code><a href="https://rdrr.io/pkg/knitr/man/kable.html">knitr::kable()</a></code> function. The code below generates <a href="#tbl-kable" data-type="xref">#tbl-kable</a>.</p>
<div class="cell">
2022-11-19 01:26:25 +08:00
<pre data-type="programlisting" data-code-language="r">knitr::kable(mtcars[1:5, ], )</pre>
<div class="cell-output-display">
<div id="tbl-kable" class="anchored">
2023-01-13 07:22:57 +08:00
<table class="table table-sm table-striped"><caption>Table 30.1: A knitr kable.</caption>
<colgroup><col style="width: 26%"/><col style="width: 7%"/><col style="width: 5%"/><col style="width: 7%"/><col style="width: 5%"/><col style="width: 7%"/><col style="width: 8%"/><col style="width: 8%"/><col style="width: 4%"/><col style="width: 4%"/><col style="width: 7%"/><col style="width: 7%"/></colgroup><thead><tr class="header"><th style="text-align: left;"/>
<th style="text-align: right;">mpg</th>
<th style="text-align: right;">cyl</th>
<th style="text-align: right;">disp</th>
<th style="text-align: right;">hp</th>
<th style="text-align: right;">drat</th>
<th style="text-align: right;">wt</th>
<th style="text-align: right;">qsec</th>
<th style="text-align: right;">vs</th>
<th style="text-align: right;">am</th>
<th style="text-align: right;">gear</th>
<th style="text-align: right;">carb</th>
</tr></thead><tbody><tr class="odd"><td style="text-align: left;">Mazda RX4</td>
<td style="text-align: right;">21.0</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">160</td>
<td style="text-align: right;">110</td>
<td style="text-align: right;">3.90</td>
<td style="text-align: right;">2.620</td>
<td style="text-align: right;">16.46</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">4</td>
</tr><tr class="even"><td style="text-align: left;">Mazda RX4 Wag</td>
<td style="text-align: right;">21.0</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">160</td>
<td style="text-align: right;">110</td>
<td style="text-align: right;">3.90</td>
<td style="text-align: right;">2.875</td>
<td style="text-align: right;">17.02</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">4</td>
</tr><tr class="odd"><td style="text-align: left;">Datsun 710</td>
<td style="text-align: right;">22.8</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">108</td>
<td style="text-align: right;">93</td>
<td style="text-align: right;">3.85</td>
<td style="text-align: right;">2.320</td>
<td style="text-align: right;">18.61</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">4</td>
<td style="text-align: right;">1</td>
</tr><tr class="even"><td style="text-align: left;">Hornet 4 Drive</td>
<td style="text-align: right;">21.4</td>
<td style="text-align: right;">6</td>
<td style="text-align: right;">258</td>
<td style="text-align: right;">110</td>
<td style="text-align: right;">3.08</td>
<td style="text-align: right;">3.215</td>
<td style="text-align: right;">19.44</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">1</td>
</tr><tr class="odd"><td style="text-align: left;">Hornet Sportabout</td>
<td style="text-align: right;">18.7</td>
<td style="text-align: right;">8</td>
<td style="text-align: right;">360</td>
<td style="text-align: right;">175</td>
<td style="text-align: right;">3.15</td>
<td style="text-align: right;">3.440</td>
<td style="text-align: right;">17.02</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">3</td>
<td style="text-align: right;">2</td>
</tr></tbody></table></div>
</div>
</div>
2022-11-19 00:30:32 +08:00
<p>Read the documentation for <code><a href="https://rdrr.io/pkg/knitr/man/kable.html">?knitr::kable</a></code> to see the other ways in which you can customize the table. For even deeper customization, consider the <strong>gt</strong>, <strong>huxtable</strong>, <strong>reactable</strong>, <strong>kableExtra</strong>, <strong>xtable</strong>, <strong>stargazer</strong>, <strong>pander</strong>, <strong>tables</strong>, and <strong>ascii</strong> packages. Each provides a set of tools for returning formatted tables from R code.</p>
2023-01-13 07:22:57 +08:00
<p>There is also a rich set of options for controlling how figures are embedded. Youll learn about these in <span class="quarto-unresolved-ref">?sec-graphics-communication</span>.</p>
<section id="quarto-exercises-5" data-type="sect2">
<h2>
Exercises</h2>
<!--# TO DO: Add exercises -->
</section>
</section>
<section id="sec-caching" data-type="sect1">
<h1>
Caching</h1>
<p>Normally, each render of a document starts from a completely clean slate. This is great for reproducibility, because it ensures that youve captured every important computation in code. However, it can be painful if you have some computations that take a long time. The solution is <code>cache: true</code>.</p>
<p>You can enable the Knitr cache at the document level for caching the results of all computations in a document using standard YAML options:</p>
<pre data-type="programlisting" data-code-language="yaml">---
title: "My Document"
execute:
cache: true
---</pre>
<p>You can also enable caching at the chunk level for caching the results of computation in a specific chunk:</p>
<div class="cell" data-hash="quarto_cache/html/unnamed-chunk-20_0ece1c5566ef654926248351b9afb313">
<pre data-type="programlisting" data-code-language="markdown">```{r}
#| cache: true
# code for lengthy computation...
```</pre>
</div>
<p>When set, this will save the output of the chunk to a specially named file on disk. On subsequent runs, knitr will check to see if the code has changed, and if it hasnt, it will reuse the cached results.</p>
<p>The caching system must be used with care, because by default it is based on the code only, not its dependencies. For example, here the <code>processed_data</code> chunk depends on the <code>raw-data</code> chunk:</p>
<pre><code>```{r}
#| label: raw-data
rawdata &lt;- readr::read_csv("a_very_large_file.csv")
```
```{r}
#| label: processed_data
#| cache: true
processed_data &lt;- rawdata |&gt;
filter(!is.na(import_var)) |&gt;
mutate(new_variable = complicated_transformation(x, y, z))
```</code></pre>
<p>Caching the <code>processed_data</code> chunk means that it will get re-run if the dplyr pipeline is changed, but it wont get rerun if the <code>read_csv()</code> call changes. You can avoid that problem with the <code>dependson</code> chunk option:</p>
<pre><code>```{r}
#| label: processed-data
#| cache: true
#| dependson: "raw-data"
processed_data &lt;- rawdata |&gt;
filter(!is.na(import_var)) |&gt;
mutate(new_variable = complicated_transformation(x, y, z))
```</code></pre>
<p><code>dependson</code> should contain a character vector of <em>every</em> chunk that the cached chunk depends on. Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed.</p>
2022-11-19 00:30:32 +08:00
<p>Note that the chunks wont update if <code>a_very_large_file.csv</code> changes, because knitr caching only tracks changes within the <code>.qmd</code> file. If you want to also track changes to that file you can use the <code>cache.extra</code> option. This is an arbitrary R expression that will invalidate the cache whenever it changes. A good function to use is <code><a href="https://rdrr.io/r/base/file.info.html">file.info()</a></code>: it returns a bunch of information about the file including when it was last modified. Then you can write:</p>
<pre><code>```{r}
#| label: raw-data
#| cache.extra: file.info("a_very_large_file.csv")
rawdata &lt;- readr::read_csv("a_very_large_file.csv")
```</code></pre>
2022-11-19 00:30:32 +08:00
<p>As your caching strategies get progressively more complicated, its a good idea to regularly clear out all your caches with <code><a href="https://rdrr.io/pkg/knitr/man/clean_cache.html">knitr::clean_cache()</a></code>.</p>
<p>Weve followed the advice of <a href="https://twitter.com/drob/status/738786604731490304">David Robinson</a> to name these chunks: each chunk is named after the primary object that it creates. This makes it easier to understand the <code>dependson</code> specification.</p>
<section id="exercises-6" data-type="sect2">
<h2>
Exercises</h2>
2022-11-19 00:30:32 +08:00
<ol type="1"><li>Set up a network of chunks where <code>d</code> depends on <code>c</code> and <code>b</code>, and both <code>b</code> and <code>c</code> depend on <code>a</code>. Have each chunk print <code><a href="https://lubridate.tidyverse.org/reference/now.html">lubridate::now()</a></code>, set <code>cache: true</code>, then verify your understanding of caching.</li>
2023-01-13 07:22:57 +08:00
</ol><blockquote class="blockquote">
<blockquote class="blockquote">
<blockquote class="blockquote">
<blockquote class="blockquote">
<blockquote class="blockquote">
<blockquote class="blockquote">
<blockquote class="blockquote">
<p>7ff2b1502187f15a978d74f59a88534fa6f1012e ## Troubleshooting</p>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<p>Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks. Additionally, the error could be due to issues with the Quarto document itself or due to the R code in the Quarto document.</p>
<p>One common error in documents with code chunks is duplicated chunk labels, which are especially pervasive if your workflow involves copying and pasting code chunks. To address this issue, all you need to do is to change one of your duplicated labels.</p>
<p>If the errors are due to the R code in the document, the first thing you should always try is to recreate the problem in an interactive session. Restart R, then “Run all chunks” (either from Code menu, under Run region), or with the keyboard shortcut Ctrl + Alt + R. If youre lucky, that will recreate the problem, and you can figure out whats going on interactively.</p>
2022-11-19 00:30:32 +08:00
<p>If that doesnt help, there must be something different between your interactive environment and the Quarto environment. Youre going to need to systematically explore the options. The most common difference is the working directory: the working directory of a Quarto is the directory in which it lives. Check the working directory is what you expect by including <code><a href="https://rdrr.io/r/base/getwd.html">getwd()</a></code> in a chunk.</p>
<p>Next, brainstorm all the things that might cause the bug. Youll need to systematically check that theyre the same in your R session and your Quarto session. The easiest way to do that is to set <code>error: true</code> on the chunk causing the problem, then use <code><a href="https://rdrr.io/r/base/print.html">print()</a></code> and <code><a href="https://rdrr.io/r/utils/str.html">str()</a></code> to check that settings are as you expect.</p>
</section>
2023-01-13 07:22:57 +08:00
</section>
<section id="yaml-header" data-type="sect1">
<h1>
YAML header</h1>
<p>You can control many other “whole document” settings by tweaking the parameters of the YAML header. You might wonder what YAML stands for: its “YAML Aint Markup Language”, which is designed for representing hierarchical data in a way thats easy for humans to read and write. Quarto uses it to control many details of the output. Here well discuss three: self-contained documents, document parameters, and bibliographies.</p>
<section id="self-contained" data-type="sect2">
<h2>
Self-contained</h2>
2023-01-13 07:22:57 +08:00
<p>HTML documents typically have a number of external dependencies (e.g. images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a <code>_files</code> folder in the same directory as your <code>.qmd</code> file. If you publish the HTML file on a hosting platform (e.g. QuartoPub, <a href="https://quartopub.com/" class="uri">https://quartopub.com/</a>), the dependencies in this directory are published with your document and hence are available in the published report. However, if you want to email the report to a colleague, you might prefer to have a single, self-contained, HTML document that embeds all of its dependencies. You can do this by specifying the <code>embed-resources</code> option:</p>
<p>By default these dependencies are placed in a <code>_files</code> directory alongside your document. For example, if you render <code>report.qmd</code> to HTML:</p>
<pre data-type="programlisting" data-code-language="yaml">format:
html:
embed-resources: true</pre>
<p>The resulting file will be self-contained, such that it will need no external files and no internet access to be displayed properly by a browser.</p>
</section>
<section id="parameters" data-type="sect2">
<h2>
Parameters</h2>
<p>Quarto documents can include one or more parameters whose values can be set when you render the report. Parameters are useful when you want to re-render the same report with distinct values for various key inputs. For example, you might be producing sales reports per branch, exam results by student, or demographic summaries by country. To declare one or more parameters, use the <code>params</code> field.</p>
<p>This example uses a <code>my_class</code> parameter to determine which class of cars to display:</p>
<div class="cell">
<pre><code>---
output: html_document
params:
my_class: "suv"
---
```{r}
#| label: setup
#| include: false
library(tidyverse)
class &lt;- mpg |&gt; filter(class == params$my_class)
```
# Fuel economy for `r params$my_class`s
```{r}
#| message: false
2023-01-13 07:22:57 +08:00
ggplot(class, aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(se = FALSE)
```</code></pre>
</div>
<p>As you can see, parameters are available within the code chunks as a read-only list named <code>params</code>.</p>
<p>You can write atomic vectors directly into the YAML header. You can also run arbitrary R expressions by prefacing the parameter value with <code>!r</code>. This is a good way to specify date/time parameters.</p>
<pre data-type="programlisting" data-code-language="yaml">params:
start: !r lubridate::ymd("2015-01-01")
snapshot: !r lubridate::ymd_hms("2015-01-01 12:30:00")</pre>
</section>
<section id="bibliographies-and-citations" data-type="sect2">
<h2>
Bibliographies and Citations</h2>
<p>Quarto can automatically generate citations and a bibliography in a number of styles. The most straightforward way of adding citations and bibliographies to a Quarto document is using the visual editor in RStudio.</p>
<p>To add a citation using the visual editor, go to Insert &gt; Citation. Citations can be inserted from a variety of sources:</p>
2022-11-19 00:30:32 +08:00
<ol type="1"><li><p><a href="https://quarto.org/docs/visual-editor/technical.html#citations-from-dois">DOI</a> (Document Object Identifier) references.</p></li>
<li><p><a href="https://quarto.org/docs/visual-editor/technical.html#citations-from-zotero">Zotero</a> personal or group libraries.</p></li>
<li><p>Searches of <a href="https://www.crossref.org/">Crossref</a>, <a href="https://datacite.org/">DataCite</a>, or <a href="https://pubmed.ncbi.nlm.nih.gov/">PubMed</a>.</p></li>
<li><p>Your document bibliography (a <code>.bib</code> file in the directory of your document)</p></li>
</ol><p>Under the hood, the visual mode uses the standard Pandoc markdown representation for citations (e.g. <code>[@citation]</code>).</p>
<p>If you add a citation using one of the first three methods, the visual editor will automatically create a <code>bibliography.bib</code> file for you and add the reference to it. It will also add a <code>bibliography</code> field to the document YAML. As you add more references, this file will get populated with their citations. You can also directly edit this file using many common bibliography formats including BibLaTeX, BibTeX, EndNote, Medline.</p>
<p>To create a citation within your .qmd file in the source editor, use a key composed of @ + the citation identifier from the bibliography file. Then place the citation in square brackets. Here are some examples:</p>
<pre data-type="programlisting" data-code-language="markdown">Separate multiple citations with a `;`: Blah blah [@smith04; @doe99].
You can add arbitrary comments inside the square brackets:
Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1].
Remove the square brackets to create an in-text citation: @smith04
says blah, or @smith04 [p. 33] says blah.
Add a `-` before the citation to suppress the author's name:
Smith says blah [-@smith04].</pre>
<p>When Quarto renders your file, it will build and append a bibliography to the end of your document. The bibliography will contain each of the cited references from your bibliography file, but it will not contain a section heading. As a result it is common practice to end your file with a section header for the bibliography, such as <code># References</code> or <code># Bibliography</code>.</p>
<p>You can change the style of your citations and bibliography by referencing a CSL (citation style language) file in the <code>csl</code> field:</p>
<pre data-type="programlisting" data-code-language="yaml">bibliography: rmarkdown.bib
csl: apa.csl</pre>
<p>As with the bibliography field, your csl file should contain a path to the file. Here we assume that the csl file is in the same directory as the .qmd file. A good place to find CSL style files for common bibliography styles is <a href="https://github.com/citation-style-language/styles" class="uri">https://github.com/citation-style-language/styles</a>.</p>
</section>
</section>
<section id="quarto-learning-more" data-type="sect1">
<h1>
Learning more</h1>
<p>Quarto is still relatively young, and is still growing rapidly. The best place to stay on top of innovations is the official Quarto website: <a href="https://quarto.org/" class="uri">https://quarto.org</a>.</p>
<p>There are two important topics that we havent covered here: collaboration and the details of accurately communicating your ideas to other humans. Collaboration is a vital part of modern data science, and you can make your life much easier by using version control tools, like Git and GitHub. We recommend “Happy Git with R”, a user friendly introduction to Git and GitHub from R users, by Jenny Bryan. The book is freely available online: <a href="https://happygitwithr.com" class="uri">https://happygitwithr.com</a>.</p>
2022-11-19 00:30:32 +08:00
<p>We have also not touched on what you should actually write in order to clearly communicate the results of your analysis. To improve your writing, we highly recommend reading either <a href="https://www.amazon.com/Style-Lessons-Clarity-Grace-12th/dp/0134080416"><em>Style: Lessons in Clarity and Grace</em></a> by Joseph M. Williams &amp; Joseph Bizup, or <a href="https://www.amazon.com/Sense-Structure-Writing-Readers-Perspective/dp/0205296327"><em>The Sense of Structure: Writing from the Readers Perspective</em></a> by George Gopen. Both books will help you understand the structure of sentences and paragraphs, and give you the tools to make your writing more clear. (These books are rather expensive if purchased new, but theyre used by many English classes so there are plenty of cheap second-hand copies). George Gopen also has a number of short articles on writing at <a href="https://www.georgegopen.com/the-litigation-articles.html" class="uri">https://www.georgegopen.com/the-litigation-articles.html</a>. They are aimed at lawyers, but almost everything applies to data scientists too.</p>
</section>
</section>