Complete pass through Rmarkdown

This commit is contained in:
hadley 2016-08-22 15:27:54 -05:00
parent 166a1b96d7
commit 545ca5b13c
5 changed files with 235 additions and 156 deletions

View File

@ -4,15 +4,11 @@ output: html_document
---
```{r include = FALSE}
# colorFunc <- "heat.colors"
colorFunc <- "terrain.colors"
# colorFunc <- "topo.colors"
# colorFunc <- "cm.colors"
# colorFunc <- "rainbow"
```
Base R comes with many functions for generating colors. The code below demonstrates the `r colorFunc` function.
Base R comes with many functions for generating colors. The code
below demonstrates the `r colorFunc` function.
## `r colorFunc`

View File

@ -10,10 +10,11 @@ library(marmap)
library(ggplot2)
```
The [marmap](https://cran.r-project.org/web/packages/marmap/index.html) package provides tools and data for visualizing the ocean floor. Here is an example contour plot of marmap's ``r params$data`` dataset.
The [marmap](https://cran.r-project.org/web/packages/marmap/index.html)
package provides tools and data for visualizing the ocean floor. Here
is an example contour plot of marmap's ``r params$data`` dataset.
```{r echo = FALSE}
data(list = params$data)
autoplot(get(params$data))
```

View File

@ -20,8 +20,6 @@ Headings
### 3rd Level Header
#### 4th Level Header
Lists
------------------------------------------------------------
@ -35,11 +33,13 @@ Lists
1. Numbered list item 1
2. Item 2
1. Item 2. The numbers are incremented automatically in the output.
3. Item 3
+ Item 3a
+ Item 3b
1. Item 3
1. Item 3a
1. Item 3b
Links
------------------------------------------------------------
@ -58,33 +58,11 @@ Images
![optional caption text](path/to/img.png)
```
Footnotes
Tables
------------------------------------------------------------
A [linked phrase][id].
Then below:
[id]: text of the note
Block quotes
------------------------------------------------------------
As George Box said:
> All models are wrong
> but some are useful.
Tables ---------------------------------------------------
First Header | Second Header
------------- | -------------
Content Cell | Content Cell
Content Cell | Content Cell
Reference Style Links and Images
Equations -----------------------------------------------
$E = mc^{2}$
$$E = mc^{2}$$

View File

@ -1,3 +1,54 @@
```{r include=FALSE, cache=FALSE}
set.seed(1014)
options(digits = 3)
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
cache = TRUE,
out.width = "70%",
fig.align = 'center',
fig.width = 6,
fig.asp = 0.618, # 1 / phi
fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
```
```{r include=FALSE, cache=FALSE}
set.seed(1014)
options(digits = 3)
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
cache = TRUE,
out.width = "70%",
fig.align = 'center',
fig.width = 6,
fig.asp = 0.618, # 1 / phi
fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
```
```{r include=FALSE, cache=FALSE}
set.seed(1014)
options(digits = 3)
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE,
cache = TRUE,
out.width = "70%",
fig.align = 'center',
fig.width = 6,
fig.asp = 0.618, # 1 / phi
fig.show = "hold"
)
options(dplyr.print_min = 6, dplyr.print_max = 6)
```
# R Markdown
## Introduction
@ -25,6 +76,7 @@ You need the __rmarkdown__ package, but you don't need to explicit install it or
```{r setup, include = FALSE}
chunk <- "```"
inline <- function(x = "") paste0("`` `r ", x, "` ``")
```
## R Markdown basics
@ -53,7 +105,7 @@ To produce a complete report containing all text, code, and results click "Knit"
knitr::include_graphics("screenshots/rmarkdown-report.png")
```
When you __knit__ the document R Markdown sends the .Rmd file to [knitr](http://yihui.name/knitr/), which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by [pandoc](http://pandoc.org/) which is responsible for creating the finished format. The big advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in XYZ.
When you __knit__ the document R Markdown sends the .Rmd file to [knitr](http://yihui.name/knitr/), which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by [pandoc](http://pandoc.org/) which is responsible for creating the finished format. The big advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in XYZ. Knitting is performed in a fresh instance of R which ensures that your reports are completely reproducible.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/RMarkdownFlow.png")
@ -61,56 +113,121 @@ knitr::include_graphics("images/RMarkdownFlow.png")
To get started with your own `.Rmd` file, select *File > New File > R Markdown...* in the menubar. RStudio will launch a wizard that you can use to pre-populate your file with useful content thatreminds you how the key features of R Markdown work.
The following sections dives into the three components of an R Markdown document in more details: the code chunks, the text, and the YAML header.
The following sections dives into the three components of an R Markdown document in more details: the markdown text, the code chunks, and the YAML header.
### Exercises
1. Create a new notebook using File | New File | R Notebook. Read the
instructions. Practice running the chunks. Verify that you can modify
the code, re-run it, and see modified output.
1. Create a new R Markdown document with File | New File | R Markdown...
Knit it by clicking the appropriate button. Knit it by using the
appropriate keyboard short cut. Verify that you can modify the
input and see the output update.
1. Compare and contrast the R notebook and R markdown files you created
above. How are the outputs similar? How are they different? How are
the inputs similar? How are they different? What happens if you
copy the YAML heading from one to the other?
1. Create one new R Markdown document for each of the three build in
formats: HTML, PDF and word. Knit each of the three documents.
How does the output differ? How does the input differ? (You may need
to install MikTex in order to build the PDF output.)
## Text formatting
Prose in `.Rmd` files is written in Markdown, a light weight set of conventions for formatting plan text files. Markdown is designed to be easy to read and easy to write. It is also very easy to learn. The guide below shows how to use Pandoc's Markdown, a slightly extended version of Markdown that R Markdown understands.
The following code shows you the most important R Markdown commands:
```{r, echo = FALSE, comment = ""}
cat(readr::read_file("rmarkdown-demos/markdown.Rmd"))
```
The best way to learn these is simply to try them out. It will take a few days, but soon they will become second nature, and you won't need to think about them.
### Exercises
1. Practice what you've learned by creating a brief CV. The title should be
your name, and you should include headings for (at least) education or
employment. Each of the sections should include a bulleted list of
jobs/degrees. Highlight the year in bold.
1. Using the R Markdown cheatsheet, figure out how to:
1. Add a footnote.
1. Add a horizontal rule.
1. Add a block quote.
## Code chunks
You can quickly insert code chunks into your file in three ways:
To run code inside an R Markdown document, you need to insert a chunk. There are three ways to do so:
1. The keyboard shortcut Cmd/Ctrl + Alt + I.
2. The Add Chunk icon in the editor toolbar (it looks like a green box with a C in it).
3. By manually typing the chunk delimiters ` ```{r} ` and ` ``` `.
Test code as you write by clicking the "Run Current Chunk" icons at the top of each chunk, or by pressing Cmd/Ctrl + Shift + Enter (Cmd/Ctrl + Enter still also works if you just want to run a single command). R Markdown will run the code in the chunks in your current environment and display the results in your file editor. To turn off this behavior, click the gear icon at the top of the .Rmd file and select "Chunk Output in the Console." RStudio will then run code chunks at the command line as if your .Rmd file were an R Script.
1. The "Insert" button icon in the editor toolbar.
When you render your .Rmd file, R Markdown will create a fresh environment to run the code chunks in. It will run each chunk, in order, and embed the results beneath the chunk in your final report.
1. By manually typing the chunk delimiters ` ```{r} ` and ` ``` `.
Obviously, I'd recommend you learn the keyboard shortcut. It will save you a lot of time in the long run!
You can continue to run code using the keyboard shortcut that by now (I hope!) you know and love: Cmd/Ctrl + Enter. However, chunks get a new keyboard shortcut: Cmd/Ctrl + Shift + Enter. That's often more convenient because it runs the complete chunk, in even fewer keypresses than before.
Think of a chunk like a function. A chunk should be relatively self-contained, and focussed around a single task. You can iterate quickly in the script editor, and then run the whole thing with a single keyboard shortcut.
### Chunk name
`setup` chunk will be run automatically by RStudio. Typically set `include = FALSE`
Chunks can be given an optional name: ```` ```{r by-name} ````. This has three advantages:
1. You can more easily navigate to specific chunks using the drop-down
code navigator in the bottom-left of the script editor:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("screenshots/rmarkdown-chunk-nav.png")
```
1. Graphics produced by the chunks will have useful names that make
them easier to use elsewhere. More on that in [other important options].
1. You can setup chains of cached chunks to avoid re-performing expensive
computations on every run. More on that below.
There is one chunk name that imbues special behaviour: `setup`. When you're in a notebook mode, the chunk named setup will be run automatically once before you run any other code.
### Chunk options
Chunk output can be customized with __options__, arguments supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks. Since the options are not associated with an R function, it can be difficult to figure out where to learn about them. The best place is the knitr options web page at <http://yihui.name/knitr/options/>. You can also find a list of knitr options with concise descriptions in the *R Markdown Reference Guide*, which is available in the RStudio IDE under *Help > Cheatsheets > R Markdown Reference Guide*.
Chunk output can be customized with __options__, arguments supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks. Here we'll cover some of the most important chunk options that you're likely to need frequently. You can see the full list at <http://yihui.name/knitr/options/>.
Here are some of the most useful options not listed above.
* `child = "file.Rmd"` renders a file and inserts the results into the main
document at the chunk location.
* `comment = "#>"` changes the prefix to put before each line of output.
* `error = TRUE` causes the render to continue even if code returns an error.
This is rarely useful for data analysis reports, but it's useful for
teaching so that you can illustrate important errors.
### Controlling output
The most important set of options controls if your code block is executed and what results are inserted in the finished report:
* `eval = FALSE` prevents code from being evaluated. (And obviously if the
code is not run, no results will be generated). Useful for displaying
example code.
example code, or for disabling a large block of code without commenting
each line.
* `include = FALSE` runs the code, but doesn't show the code or results
in the final document.
in the final document. Use this for setup code that you don't want
cluttering your report.
* `echo = FALSE` prevents code, but not the results from appearing in the
finished file. This is a useful way to embed figures.
finished file. Use this to create reports which don't show the R
code that generates them.
* `message = FALSE` and `warning = FALSE` prevents messages and warnings
generated by code from appearing in the finished file.
* `results = 'hide'` prevents the results, but not the code, from
appearing in the final document. Knitr still runs the code.
* `error = TRUE` causes the render to continue even if code returns an error.
This is rarely useful for data analysis reports, but it's useful for
teaching so that you can illustrate important errors. The default,
`error = FALSE` causes knitting to failure if there is a single error
in the document.
* `results = 'hide'` hides printed output; `fig.show = 'hide'` hides
plots.
The following table summarises the options:
Option | Run code | Show code | Output | Plots | Messages | Warnings
-------------------|----------|-----------|--------|-------|----------|---------
@ -122,33 +239,7 @@ Option | Run code | Show code | Output | Plots | Messages | Warnings
`message = FALSE` | | | | | - |
`warning = FALSE` | | | | | | -
There are also a rich set of options for controlling how figures embedded. You'll learn about those in [saving your plots].
### Global options
To set global options that apply to every chunk in your file, call `knitr::opts_chunk$set` in a code chunk. When writing books and tutorial I set:
```{r, eval = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE
)
```
This uses my preferred comment formatting, and ensures that the code and output are kept closely entwined.
If you were preparing a report, you might set:
```{r eval = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
cache = TRUE
)
```
You could consider setting `message = FALSE`, `warning = FALSE`, and `results = "hide"` too, but that would make it harder to debug problems because you wouldn't see any messages in the final document.
### Tables
### Table
By default, R Markdown displays data frames and matrixes as they would be in the R terminal (in a monospaced font):
@ -169,121 +260,132 @@ Read the documentation at `?knitr::kable` to see the other ways that you can cus
If you'd like to customize your tables at a deeper level, consider the __xtable__, __stargazer__, __pander__, __tables__, and __ascii__ packages. Each provides a set of tools for returning formatted tables from R code.
There are also a rich set of options for controlling how figures embedded. You'll learn about those in [saving your plots].
### Caching
If document rendering becomes time consuming due to long computations, you can use knitr caching to improve performance. Knitr will save the output of any chunk that contains the option `cache = TRUE` along with a MD5 digest of its contents to a folder alongside your .Rmd file. On subsequent renders, knitr will check the digest to see if the chunk contents have changed. If they have not, knitr will skip the chunk and insert the cached contents. If they have changed (i.e. if the chunk has been modified) Knitr will execute the chunk, embed the results, and save the new results to use for future renders.
Normally, each knit of a document starts from a completely clean slate. This is great for reproducibility, because it ensures that you've captured every important computation in code. However, it can be painful if you have some computations that take a long time. The solution is `cache = TRUE`. When set, this will save the output of the chunk to a speically named file on disk. On subsequent runs, knitr will check the to see if the code has changed, and if it hasn't, it will re-use the cached results.
Knitr's caching system is straightforward but can become complicated when one code chunk depends on the contents of another. For example, here chunk 2 depends on chunk 1.
The caching system must be used with care, because it is based on only the code itself, not its dependencies. For example, here the `processed_data` chunk depends on the `raw_data` chunk:
# chunk 1
`r chunk`{r}
a <- 1
`r chunk`{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
`r chunk`
# chunk 2
`r chunk`{r cached=TRUE}
a + 1
`r chunk`{r processed_data, cached = TRUE}
processed_data <- rawdata %>%
filter(!is.na(import_var)) %>%
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
To ensure that caching works properly in this situation, give each chunk a chunk label (Knitr assumes that the first unnamed option in the chunk header is a label). Then set the `dependson` option of the cached chunk.
Caching the `processed_data` chunk means that it will get re-run if the dplyr pipeline is changed, but it won't get rerun if the `read_csv()` call changes. You can avoid that problem with the `dependson` chunk option:
# chunk 1
`r chunk`{r chunk1}
a <- 1
`r chunk`
# chunk 2
`r chunk`{r chunk2, cached = TRUE, dependson = "chunk1"}
a + 1
`r chunk`{r processed_data, cached = TRUE, dependson = "raw_data"}
processed_data <- rawdata %>%
filter(!is.na(import_var)) %>%
mutate(new_variable = complicated_transformation(x, y, z))
`r chunk`
`dependson` should contain a character vector of *every* chunk that the cached chunk depends on. Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed.
## Text formatting
Note that the chunks won't update if `a_very_large_file.csv` changes, because knitr caching only tracks changes within the `.Rmd` file. That means it's a good idea to periodically clear out all your caches and start from scratch by running `knitr::clean_cache()`.
Format the text in your R Markdown files with Markdown, a set of markup annotations for plain text files. When you render your file, Pandoc transforms the marked up text into formatted text in your final file format. Markdown is designed to be easy to read and easy to write. It is also very easy to learn. The guide below shows how to use Pandoc's Markdown, a slightly extended version of Markdown that R Markdown understands.
Note that I've used the advice of [David Robinson](https://twitter.com/drob/status/738786604731490304) for naming these chunks. It's a good idea to name chunks that create objects after the primary object that they create. That makes it easier to understand the `dependson` specification.
```{r, echo = FALSE, comment = ""}
cat(readr::read_file("rmarkdown-demos/markdown.Rmd"))
### Global options
As you work more with knitr, you will discover that some of the default chunk options don't fit with your workflow and you want to change them. You can do by calling `knitr::opts_chunk$set()` in a code chunk. For example, when writing books and tutorial I set:
```{r, eval = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE
)
```
This uses my preferred comment formatting, and ensures that the code and output are kept closely entwined.
If you were preparing a report, you might set:
```{r eval = FALSE}
knitr::opts_chunk$set(
echo = FALSE
)
```
That will hide the code by default, so only showing the chunks you deliberately choose to show (with `echo = TRUE`). You could consider setting `message = FALSE`, `warning = FALSE`, and `results = "hide"` too, but that would make it harder to debug problems because you wouldn't see any messages in the final document.
### Inline code
Code results can be inserted directly into the *text* of a .Rmd file by enclosing the code with `` `r ` ``. The [file below](http://github.com/hadley/r4ds/tree/master/rmarkdown-demos/3-inline.Rmd) uses `` `r ` `` twice to call `colorFunc`, which returns "heat.colors." This makes it easy to update the report to refer to another function.
There is one other way to embed R code into an R Markdown document: directly into the text, with: `r inline()`. This can be very useful if you want to reduce duplication in the text. For example, the following `.Rmd` uses `r inline("colorFunc")` in the text, making it easier to update if you switch to another colour palette:
```{r, echo = FALSE, out.width = "100%"}
```{r echo = FALSE, comment = "", out.width = "100%"}
cat(htmltools::includeText("rmarkdown-demos/1-example.Rmd"))
knitr::include_graphics("images/inline-1-heat.png")
```
Inline expressions do not take knitr options. When processing inline code, R Markdown will always display the results of inline code, but not the code, and apply relevant text formatting to the results. As a result, inline output is indistinguishable from the surrounding text.
```{r, echo = FALSE, out.width = "100%"}
```
Inline expressions do not take knitr options. When processing inline code, R Markdown will always display the results of inline code, so that inline output is indistinguishable from the surrounding text.
## YAML header
You can control many other "whole document" settings by tweaking the parameters of the YAML header. You might wonder what YAML stands for: it's "yet another markup language", which is designed for representing hierarchical data in a way that's easy for humans to read and write. R Markdown uses it to control many details of the output. Here we'll discuss two: document parameters, and bibliographies.
### Parameters
R Markdown documents can include one or more parameters whose values can be set when you render the report. Parameters are useful when you want to re-render the same report with distinct values for various key inputs, for example, to run:
* a report specific to a department or geographic region.
* A report specific to a department or geographic region.
* A report that covers a specific period in time.
* Multiple versions of a report for distinct sets of core assumptions.
* a report that covers a specific period in time.
To declare one or more parameters for your file, use the `params` field in the header. This example use a `data` parameter to determines which data set to plot:
* multiple versions of a report for distinct sets of core assumptions.
To declare one or more parameters for your file, use the `params` field within the YAML header of the document. For example, the [file below](http://github.com/hadley/r4ds/tree/master/rmarkdown-demos/5-parameters.Rmd) uses a `data` parameter that determines which data set to plot.
```{r, echo = FALSE, out.width = "100%"}
```{r, echo = FALSE, out.width = "100%", comment = ""}
cat(readr::read_file("rmarkdown-demos/5-parameters.Rmd"))
knitr::include_graphics("images/params-1-hawaii.png")
```
R Markdown recognizes the atomic data types: numerics, character strings, logicals, etc. You can also pass an R expression as a parameter by prefacing the parameter value with `!R`, e.g.
```{r eval = FALSE}
---
params:
start: !r lubridate::ymd("2015-01-01")
snapshot: !r lubridate::ymd_hms("2015-01-01 12:30:00")
---
```
Parameters are available within the knit environment as a read-only list named `params`. To access a parameter in code, call `params$<parameter name>`.
Add a `params` argument to `render()` to create a report that uses a different set of parameter values. Here we modify our report to use the `aleutians` data set with:
You can write the atomic vectors directly into the YAML header. You can also run an arbitrary R expression by prefacing the parameter value with `!R`. This is a good way to use date and time data.
```{r eval = FALSE}
render("5-parameters.Rmd", params = list(data = "aleutians"))
```yaml
params:
start: !r lubridate::ymd("2015-01-01")
snapshot: !r lubridate::ymd_hms("2015-01-01 12:30:00")
```
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/params-2-aleutians.png")
```
Better yet, click the "Knit with Parameters" option in the dropdown menu next to the RStudio IDE knit button to set parameters, render, and preview the report in a single user friendly step.
In RStudio, you can click the "Knit with Parameters" option in the Knit dropdown menu to set parameters, render, and preview the report in a single user friendly step:
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/params-3-florida.png")
```
### Bibliographies and Citations
You can provide even more options in the header if you want to control exactly how that dialog appears. See <http://rmarkdown.rstudio.com/developer_parameterized_reports.html#parameter_user_interfaces> for more details.
Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the `bibliography` field in your file's header. The field should contain a filepath from the directory that contains your .Rmd file to the file that contains the bibliography file:
Alternatively, if you need to produce many such paramterised reports, you can call `rmarkdown::render()` with a list of `params`:
```yaml
---
title: "Markdown Demo"
output: html_document
bibliography: rmarkdown.bib
---
```{r eval = FALSE}
rmarkdown::render("5-parameters.Rmd", params = list(data = "aleutians"))
```
You can use any of the following formats: .bib (BibLaTeX), .bibtex (BibTeX), .copac (Copac), .enl (EndNote), .json (JSON citeproc), .medline (MEDLINE), .mods (MODS), .ris (RIS),
.wos (ISI), .xml (XML).
### Bibliographies and Citations
To create a citation within your .Rmd file, use a key composed of @ + the citation identifier from the bibliography file. Then place the citation in square brackets. Here are some example citations from rmarkdown.rstudio.com. Notice that you can
Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the `bibliography` field in your file's header. The field should contain a path from the directory that contains your .Rmd file to the file that contains the bibliography file:
* Separate multiple citations with a `;`
* Remove the square brackets to create an in-text citation
* Add a `-` before the citation to supress the author's name
```yaml
bibliography: rmarkdown.bib
```
You can many common bibliography foramts including BibLaTeX, BibTeX, endnote, medline.
To create a citation within your .Rmd file, use a key composed of @ + the citation identifier from the bibliography file. Then place the citation in square brackets. Here are some example citations from <rmarkdown.rstudio.com>:
```markdown
Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1].
@ -299,17 +401,19 @@ Blah blah [@smith04; @doe99].
Smith says blah [-@smith04].
```
Notice that you can:
* Separate multiple citations with a `;`.
* Remove the square brackets to create an in-text citation.
* Add a `-` before the citation to supress the author's name.
When R Markdown renders your file, it will build and append a bibliography to the end of your document. The bibliography will contain each of the cited references from your bibiliography file, but it will not contain a section heading. As a result it is common practice to end your file with a section header for the bibliography, such as `# References` or `# Bibliography`.
You can change the style of your citations and bibliography by adding a CSL 1.0 style file to the `csl` field of your file's header.
You can change the style of your citations and bibliography by reference a CSL (citation style language) file to the `csl` field:
```yaml
---
title: "Markdown Demo"
output: html_document
bibliography: rmarkdown.bib
csl: apa.csl
---
```
As with the bibliography field, your csl file should contain a filepath to the file (here I assume that the csl file is in the same directory as the .Rmd file). http://github.com/citation-style-language/styles contains a useful repository of CSL style files.
As with the bibliography field, your csl file should contain a path to the file. Here I assume that the csl file is in the same directory as the .Rmd file. <http://github.com/citation-style-language/styles> contains many useful CSL style files.

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB