From f232a81806472636a4f402140bbb639783de4241 Mon Sep 17 00:00:00 2001 From: Mine Cetinkaya-Rundel Date: Fri, 4 Nov 2022 16:30:13 +0100 Subject: [PATCH] Quarto edits (#1118) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Address review comments * And now a newline 🙄 * Why this now? --- quarto-formats.qmd | 1 - quarto-workflow.qmd | 5 +- quarto.qmd | 293 ++++++++++++++++++++++++-------------------- 3 files changed, 165 insertions(+), 134 deletions(-) diff --git a/quarto-formats.qmd b/quarto-formats.qmd index b2c612f..b9085b3 100644 --- a/quarto-formats.qmd +++ b/quarto-formats.qmd @@ -313,4 +313,3 @@ To learn more about effective communication in these different formats we recomm - Effectively communicating your ideas often benefits from some knowledge of graphic design. Robin Williams' [*The Non-Designer's Design Book*](https://www.amazon.com/Non-Designers-Design-Book-4th/dp/0133966151) is a great place to start. - diff --git a/quarto-workflow.qmd b/quarto-workflow.qmd index b291343..c7a3ac4 100644 --- a/quarto-workflow.qmd +++ b/quarto-workflow.qmd @@ -1,4 +1,4 @@ -# Quarto workflow {#sec-rmarkdown-workflow} +# Quarto workflow {#sec-quarto-workflow} ```{r} #| results: "asis" @@ -62,5 +62,4 @@ We've drawn on our own experiences and Colin Purrington's advice on lab notebook - You are going to create many, many, many analysis notebooks over the course of your career. How are you going to organise them so you can find them again in the future? - We recommend storing them in individual projects, and coming up with a good naming scheme. - + We recommend storing them in individual projects, and coming up with a good naming scheme. \ No newline at end of file diff --git a/quarto.qmd b/quarto.qmd index b93e5a5..3b90856 100644 --- a/quarto.qmd +++ b/quarto.qmd @@ -1,4 +1,4 @@ -# Quarto {#sec-rmarkdown} +# Quarto {#sec-quarto} ```{r} #| results: "asis" @@ -140,12 +140,13 @@ The following sections dive into the three components of a Quarto document in mo How do the inputs differ? (You may need to install LaTeX in order to build the PDF output --- RStudio will prompt you if this is necessary.) -## Visual Editor +## Visual editor -RStudio provides a [WYSIWYM](https://en.wikipedia.org/wiki/WYSIWYM) editor, the Visual Editor, for Quarto documents. -This tool provides an interface for authoring Quarto documents using Pandoc markdown (a slightly extended version of Markdown that Quarto understands), including tables, citations, cross-references, footnotes, divs/spans, definition lists, attributes, raw HTML/TeX, and more as well as support for executing code cells and viewing their output inline. - -If you're new to computational documents like `.qmd` files but have experience using tools like Google Docs or MS Word, the easiest way to get started with Quarto in RStudio is the visual editor. +The Visual editor in RStudio provides a [WYSIWYM](https://en.wikipedia.org/wiki/WYSIWYM) interface for authoring Quarto documents. +Under the hood, prose in Quarto documents (`.qmd` files) is written in Markdown, a lightweight set of conventions for formatting plain text files. +In fact, Quarto uses Pandoc markdown (a slightly extended version of Markdown that Quarto understands), including tables, citations, cross-references, footnotes, divs/spans, definition lists, attributes, raw HTML/TeX, and more as well as support for executing code cells and viewing their output inline. +While Markdown is designed to be easy to read and write, as you will see in @sec-source-editor, it still requires learning new syntax. +Therefore, if you're new to computational documents like `.qmd` files but have experience using tools like Google Docs or MS Word, the easiest way to get started with Quarto in RStudio is the visual editor. In the visual editor you can either use the buttons on the menu bar to insert images, tables, cross-references, etc. or you can use the catch-all ⌘ / shortcut to insert just about anything. If you are at the beginning of a line (as shown below), you can also enter just / to invoke the shortcut. @@ -177,12 +178,17 @@ The visual editor has many more features that we haven't enumerated here that yo Most importantly, while the visual editor displays your content with formatting, under the hood, it saves your content in plain Markdown and you can switch back and forth between the visual and source editors to view and edit your content using either tool. -## Source editor +### Exercises -Prose in `.qmd` files is written in Markdown, a lightweight set of conventions for formatting plain text files. -Markdown is designed to be easy to read and easy to write. -It is also very easy to learn. -The guide below shows how to use Pandoc's Markdown, a slightly extended version of Markdown that Quarto understands. + + +## Source editor {#sec-source-editor} + +You can also edit Quarto documents using the Source editor in RStudio, without the assist of the Visual editor. +While the Visual editor will feel familiar to those with experience writing in tools like Google docs, the Source editor will feel familiar to those with experience writing R scripts or R Markdown documents. +The Source editor can also be useful for debugging any Quarto syntax errors since it's often easier to catch these in plain text. + +The guide below shows how to use Pandoc's Markdown for authoring Quarto documents in the source editor. ```{r} #| echo: false @@ -266,17 +272,20 @@ This has three advantages: ``` 2. Graphics produced by the chunks will have useful names that make them easier to use elsewhere. - More on that in @sec-graphics-communication. + More on that in @sec-figures. 3. You can set up networks of cached chunks to avoid re-performing expensive computations on every run. - More on that below. + More on that in @sec-caching. Your chunk labels should be short but evocative and should not contain spaces. -We recommend using dashes (`-`) to separate words and avoiding other special characters in chunk labels. +We recommend using dashes (`-`) to separate words (instead of underscores, `_`) and avoiding other special characters in chunk labels. You are generally free to label your chunk however you like, but there is one chunk name that imbues special behavior: `setup`. When you're in a notebook mode, the chunk named setup will be run automatically once, before any other code is run. +Additionally, chunk labels cannot be duplicated. +Each chunk label must be unique. + ### Chunk options Chunk output can be customized with **options**, fields supplied to chunk header. @@ -303,7 +312,7 @@ The most important set of options controls if your code block is executed and wh - `error: true` causes the render to continue even if code returns an error. This is rarely something you'll want to include in the final version of your report, but can be very useful if you need to debug exactly what is going on inside your `.qmd`. It's also useful if you're teaching R and want to deliberately include an error. - The default, `error: false` causes knitting to fail if there is a single error in the document. + The default, `error: false` causes rendering to fail if there is a single error in the document. Each of these chunk options get added to the header of the chunk, following `#|`, e.g., in the following chunk the result is not printed since `eval` is set to false. @@ -327,112 +336,6 @@ The following table summarizes which types of output each option suppresses: | `message: false` | | | | | \- | | | `warning: false` | | | | | | \- | -### Table - -By default, Quarto prints data frames and matrices as you'd see them in the console: - -```{r} -mtcars[1:5, ] -``` - -If you prefer that data be displayed with additional formatting you can use the `knitr::kable()` function. -The code below generates @tbl-kable. - -```{r} -#| label: tbl-kable -#| tbl-cap: A knitr kable. - -knitr::kable(mtcars[1:5, ], ) -``` - -Read the documentation for `?knitr::kable` to see the other ways in which you can customize the table. -For even deeper customization, consider the **gt**, **huxtable**, **reactable**, **kableExtra**, **xtable**, **stargazer**, **pander**, **tables**, and **ascii** packages. -Each provides a set of tools for returning formatted tables from R code. - -There is also a rich set of options for controlling how figures are embedded. -You'll learn about these in @sec-graphics-communication. - -### Caching - -Normally, each render of a document starts from a completely clean slate. -This is great for reproducibility, because it ensures that you've captured every important computation in code. -However, it can be painful if you have some computations that take a long time. -The solution is `cache: true`. - -You can enable the Knitr cache at the document level for caching the results of all computations in a document using standard YAML options: - -``` yaml ---- -title: "My Document" -execute: - cache: true ---- -``` - -You can also enable caching at the chunk level for caching the results of computation in a specific chunk: - -```{r} -#| echo: fenced -#| cache: true - -# code for lengthy computation... -``` - -When set, this will save the output of the chunk to a specially named file on disk. -On subsequent runs, knitr will check to see if the code has changed, and if it hasn't, it will reuse the cached results. - -The caching system must be used with care, because by default it is based on the code only, not its dependencies. -For example, here the `processed_data` chunk depends on the `raw_data` chunk: - - `r chunk`{r} - #| label: raw_data - - rawdata <- readr::read_csv("a_very_large_file.csv") - `r chunk` - - `r chunk`{r} - #| label: processed_data - #| cache: true - - processed_data <- rawdata |> - filter(!is.na(import_var)) |> - mutate(new_variable = complicated_transformation(x, y, z)) - `r chunk` - -Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes. -You can avoid that problem with the `dependson` chunk option: - - `r chunk`{r} - #| label: processed_data - #| cache: true - #| dependson: "raw_data" - - processed_data <- rawdata |> - filter(!is.na(import_var)) |> - mutate(new_variable = complicated_transformation(x, y, z)) - `r chunk` - -`dependson` should contain a character vector of *every* chunk that the cached chunk depends on. -Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed. - -Note that the chunks won't update if `a_very_large_file.csv` changes, because knitr caching only tracks changes within the `.qmd` file. -If you want to also track changes to that file you can use the `cache.extra` option. -This is an arbitrary R expression that will invalidate the cache whenever it changes. -A good function to use is `file.info()`: it returns a bunch of information about the file including when it was last modified. -Then you can write: - - `r chunk`{r} - #| label: raw_data - #| cache.extra: file.info("a_very_large_file.csv") - - rawdata <- readr::read_csv("a_very_large_file.csv") - `r chunk` - -As your caching strategies get progressively more complicated, it's a good idea to regularly clear out all your caches with `knitr::clean_cache()`. - -We've followed the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) to name these chunks: each chunk is named after the primary object that it creates. -This makes it easier to understand the `dependson` specification. - ### Global options As you work more with knitr, you will discover that some of the default chunk options don't fit your needs and you want to change them. @@ -472,7 +375,7 @@ For example, the example document used at the start of the chapter had: > Only `r inline('nrow(diamonds) - nrow(smaller)')` are larger than 2.5 carats. > The distribution of the remainder is shown below: -When the report is knit, the results of these computations are inserted into the text: +When the report is rendered, the results of these computations are inserted into the text: > We have data about 53940 diamonds. > Only 126 are larger than 2.5 carats. @@ -496,18 +399,21 @@ comma(.12358124331) 2. Download `diamond-sizes.qmd` from . Add a section that describes the largest 20 diamonds, including a table that displays their most important attributes. -3. Modify `diamonds-sizes.qmd` to use `comma()` to produce nicely formatted output. +3. Modify `diamonds-sizes.qmd` to use `label_comma()` to produce nicely formatted output. Also include the percentage of diamonds that are larger than 2.5 carats. -4. Set up a network of chunks where `d` depends on `c` and `b`, and both `b` and `c` depend on `a`. - Have each chunk print `lubridate::now()`, set `cache: true`, then verify your understanding of caching. +## Figures {#sec-figures} -## Figures +The figures in a Quarto document can be embedded (e.g., a PNG or JPEG file) or generated as a result of a code chunk. + +To embed an image from an external file, you can use the Insert menu in RStudio and select Figure / Image. +This will pop open a menu where you can browse to the image you want to insert as well as add alternative text or caption to it and adjust its size. +In the visual editor you can also simply paste an image from your clipboard into your document and RStudio will place a copy of that image in your project folder. + +If you include a code chunk that generates a figure (e.g., includes a `ggplot()` call), the resulting figure will be automatically included in your Quarto document. ### Figure sizing - - The biggest challenge of graphics in Quarto is getting your figures the right size and shape. There are five main options that control figure sizing: `fig-width`, `fig-height`, `fig-asp`, `out-width` and `out-height`. Image sizing is challenging because there are two sizes (the size of the figure created by R and the size at which it is inserted in the output document), and multiple ways of specifying the size (i.e., height, width, and aspect ratio: pick two of three). @@ -579,10 +485,137 @@ They are slightly lower quality, but will be much more compact. It's a good idea to name code chunks that produce figures, even if you don't routinely label other chunks. The chunk label is used to generate the file name of the graphic on disk, so naming your chunks makes it much easier to pick out plots and reuse in other circumstances (i.e. if you want to quickly drop a single plot into an email or a tweet). +### Exercises + + + +## Tables + +Similar to figures, you can include two types of tables in a Quarto document. +They can be markdown tables that you create in directly in your Quarto document (using the Insert Table menu) or they can be tables generated as a result of a code chunk. +In this section we will focus on the latter, tables generated via computation. + +By default, Quarto prints data frames and matrices as you'd see them in the console: + +```{r} +mtcars[1:5, ] +``` + +If you prefer that data be displayed with additional formatting you can use the `knitr::kable()` function. +The code below generates @tbl-kable. + +```{r} +#| label: tbl-kable +#| tbl-cap: A knitr kable. + +knitr::kable(mtcars[1:5, ], ) +``` + +Read the documentation for `?knitr::kable` to see the other ways in which you can customize the table. +For even deeper customization, consider the **gt**, **huxtable**, **reactable**, **kableExtra**, **xtable**, **stargazer**, **pander**, **tables**, and **ascii** packages. +Each provides a set of tools for returning formatted tables from R code. + +There is also a rich set of options for controlling how figures are embedded. +You'll learn about these in @sec-graphics-communication. + +### Exercises + + + +## Caching {#sec-caching} + +Normally, each render of a document starts from a completely clean slate. +This is great for reproducibility, because it ensures that you've captured every important computation in code. +However, it can be painful if you have some computations that take a long time. +The solution is `cache: true`. + +You can enable the Knitr cache at the document level for caching the results of all computations in a document using standard YAML options: + +``` yaml +--- +title: "My Document" +execute: + cache: true +--- +``` + +You can also enable caching at the chunk level for caching the results of computation in a specific chunk: + +```{r} +#| echo: fenced +#| cache: true + +# code for lengthy computation... +``` + +When set, this will save the output of the chunk to a specially named file on disk. +On subsequent runs, knitr will check to see if the code has changed, and if it hasn't, it will reuse the cached results. + +The caching system must be used with care, because by default it is based on the code only, not its dependencies. +For example, here the `processed_data` chunk depends on the `raw-data` chunk: + + `r chunk`{r} + #| label: raw-data + + rawdata <- readr::read_csv("a_very_large_file.csv") + `r chunk` + + `r chunk`{r} + #| label: processed_data + #| cache: true + + processed_data <- rawdata |> + filter(!is.na(import_var)) |> + mutate(new_variable = complicated_transformation(x, y, z)) + `r chunk` + +Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes. +You can avoid that problem with the `dependson` chunk option: + + `r chunk`{r} + #| label: processed-data + #| cache: true + #| dependson: "raw-data" + + processed_data <- rawdata |> + filter(!is.na(import_var)) |> + mutate(new_variable = complicated_transformation(x, y, z)) + `r chunk` + +`dependson` should contain a character vector of *every* chunk that the cached chunk depends on. +Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed. + +Note that the chunks won't update if `a_very_large_file.csv` changes, because knitr caching only tracks changes within the `.qmd` file. +If you want to also track changes to that file you can use the `cache.extra` option. +This is an arbitrary R expression that will invalidate the cache whenever it changes. +A good function to use is `file.info()`: it returns a bunch of information about the file including when it was last modified. +Then you can write: + + `r chunk`{r} + #| label: raw-data + #| cache.extra: file.info("a_very_large_file.csv") + + rawdata <- readr::read_csv("a_very_large_file.csv") + `r chunk` + +As your caching strategies get progressively more complicated, it's a good idea to regularly clear out all your caches with `knitr::clean_cache()`. + +We've followed the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) to name these chunks: each chunk is named after the primary object that it creates. +This makes it easier to understand the `dependson` specification. + +### Exercises + +1. Set up a network of chunks where `d` depends on `c` and `b`, and both `b` and `c` depend on `a`. Have each chunk print `lubridate::now()`, set `cache: true`, then verify your understanding of caching. + ## Troubleshooting Troubleshooting Quarto documents can be challenging because you are no longer in an interactive R environment, and you will need to learn some new tricks. -The first thing you should always try is to recreate the problem in an interactive session. +Additionally, the error could be due to issues with the Quarto document itself or due to the R code in the Quarto document. + +One common error in documents with code chunks is duplicated chunk labels, which are especially pervasive if your workflow involves copying and pasting code chunks. +To address this issue, all you need to do is to change one of your duplicated labels. + +If the errors are due to the R code in the document, the first thing you should always try is to recreate the problem in an interactive session. Restart R, then "Run all chunks" (either from Code menu, under Run region), or with the keyboard shortcut Ctrl + Alt + R. If you're lucky, that will recreate the problem, and you can figure out what's going on interactively. @@ -604,7 +637,7 @@ Here we'll discuss three: self-contained documents, document parameters, and bib ### Self-contained -HTML documents typically have a number of external dependencies (e.g. images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a `_files` folder in the same directory as your `.qmd` file. +HTML documents typically have a number of external dependencies (e.g. images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a `_files` folder in the same directory as your `.qmd` file. If you publish the HTML file on a hosting platform (e.g., QuartoPub, ), the dependencies in this directory are published with your document and hence are available in the published report. However, if you want to email the report to a colleague, you might prefer to have a single, self-contained, HTML document that embeds all of its dependencies. You can do this by specifying the `embed-resources` option: