r4ds/rmarkdown.Rmd

323 lines
15 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# R Markdown
## Introduction
R Markdown provides an unified authoring framework for data science. R Markdown documents are fully reproducible and support dozens of output formats, like PDFs, Word files, slideshows, and more. They also provide a useful notebook interface for R and are easy to track with version control software like git.
R Markdown documents rely on several technologies that go beyond R functions. To help you navigate these, the developers of RStudio have placed three R Markdown references in the RStudio IDE.
* Go to *File > Help > Cheatsheets > R Markdown Cheat Sheet* to open the main *[R Markdown Cheat Sheet](http://www.rstudio.com/wp-content/uploads/2016/03/rmarkdown-cheatsheet-2.0.pdf)*.
* Go to *File > Help > Cheatsheets > R Markdown Reference Guide* to open the main *[R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf)*.
* Go to *File > Help > Markdown Quick Reference* to open the *Markdown Quick Reference* in your help pane.
### Prerequisites
You will need the __rmarkdown__ package, but RStudio will automatically install it for you, and load it when needed.
```{r setup, include = FALSE}
chunk <- "```"
```
## R Markdown basics
This is an R Markdown file, a plain text file that has the extension `.Rmd`:
```{r echo = FALSE, comment = ""}
cat(htmltools::includeText("rmarkdown-demos/1-example.Rmd"))
```
It contains three types of content:
1. An (optional) __YAML header__ surrounded by `---`s.
1. __Chunks__ of R code surrounded by ```` ``` ````.
1. Text mixed with simple text formatting like `##` and .
RMarkdown files are designed to be used in two ways:
1. In __notebook__ mode, to provide a convenient way to mingle prose,
code, and output during an analysis.
1. To produce a final report by __rendering__ the document to produce a HTML,
PDF, or word file.
When you open an `.Rmd` you get a notebook interface for R. You can run each code chunk by clicking the Run icon (it looks like a play button at the top of the chunk), or by pressing Cmd/Ctrl + Shift + Enter. RStudio executes the code and displays the results inline with your file.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/how-1-file.png")
```
To generate a report from an R Markdown file, run the `render()` command:
```{r eval = FALSE}
rmarkdown::render("1-example.Rmd")
```
Or better, use the "Knit" button in the RStudio IDE to render the file and preview the output with a single click or keyboard shortcut (Cmd/Ctrl + Shift + K). This will generates an output file that intermingles prose, code, and results.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/how-2-chunk.png")
```
When you run `render()`, R Markdown feeds the .Rmd file to [knitr](http://yihui.name/knitr/), which executes all of the code chunks and creates a new markdown (.md) document which includes the code and its output. The markdown file generated by knitr is then processed by [pandoc](http://pandoc.org/) which is responsible for creating the finished format.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/RMarkdownFlow.png")
```
This may sound complicated, but R Markdown makes it extremely simple by encapsulating all of the above processing into a single `render()` function.
To get started with your own `.Rmd` file, select *File > New File > R Markdown...* in the menubar. RStudio will launch a wizard that you can use to pre-populate your file with useful content that reminds you how the key features of R markdown work.
## Code chunks
You can quickly insert code chunks into your file in three ways:
1. The keyboard shortcut Cmd/Ctrl + Alt + I.
2. The Add Chunk icon in the editor toolbar (it looks like a green box with a C in it).
3. By manually typing the chunk delimiters ` ```{r} ` and ` ``` `.
Test code as you write by clicking the "Run Current Chunk" icons at the top of each chunk, or by pressing Cmd/Ctrl + Shift + Enter (Cmd/Ctrl + Enter still also works if you just want to run a single command). R Markdown will run the code in the chunks in your current environment and display the results in your file editor. To turn off this behavior, click the gear icon at the top of the .Rmd file and select "Chunk Output in the Console." RStudio will then run code chunks at the command line as if your .Rmd file were an R Script.
When you render your .Rmd file, R Markdown will create a fresh environment to run the code chunks in. It will run each chunk, in order, and embed the results beneath the chunk in your final report.
### Chunk name
`setup` chunk will be run automatically by RStudio. Typically set `include = FALSE`
### Chunk options
Chunk output can be customized with __options__, arguments supplied to chunk header. Knitr provides almost 60 options that you can use to customize your code chunks. Since the options are not associated with an R function, it can be difficult to figure out where to learn about them. The best place is the knitr options web page at <http://yihui.name/knitr/options/>. You can also find a list of knitr options with concise descriptions in the *R Markdown Reference Guide*, which is available in the RStudio IDE under *Help > Cheatsheets > R Markdown Reference Guide*.
Here are some of the most useful options not listed above.
* `child = "file.Rmd"` renders a file and inserts the results into the main
document at the chunk location.
* `comment = "#>"` changes the prefix to put before each line of output.
* `error = TRUE` causes the render to continue even if code returns an error.
This is rarely useful for data analysis reports, but it's useful for
teaching so that you can illustrate important errors.
### Controlling output
* `eval = FALSE` prevents code from being evaluated. (And obviously if the
code is not run, no results will be generated). Useful for displaying
example code.
* `include = FALSE` runs the code, but doesn't show the code or results
in the final document.
* `echo = FALSE` prevents code, but not the results from appearing in the
finished file. This is a useful way to embed figures.
* `message = FALSE` and `warning = FALSE` prevents messages and warnings
generated by code from appearing in the finished file.
* `results = 'hide'` prevents the results, but not the code, from
appearing in the final document. Knitr still runs the code.
Option | Run code | Show code | Output | Plots | Messages | Warnings
-------------------|----------|-----------|--------|-------|----------|---------
`eval = FALSE` | - | - | - | - | - | -
`include = FALSE` | | - | - | - | - | -
`echo = FALSE` | | - | | | |
`results = "hide"` | | | - | | |
`fig.show = "hide"`| | | | - | |
`message = FALSE` | | | | | - |
`warning = FALSE` | | | | | | -
There are also a rich set of options for controlling how figures embedded. You'll learn about those in [saving your plots].
### Global options
To set global options that apply to every chunk in your file, call `knitr::opts_chunk$set` in a code chunk. When writing books and tutorial I set:
```{r, eval = FALSE}
knitr::opts_chunk$set(
comment = "#>",
collapse = TRUE
)
```
This uses my preferred comment formatting, and ensures that the code and output are kept closely entwined.
If you were preparing a report, you might set:
```{r eval = FALSE}
knitr::opts_chunk$set(
echo = FALSE,
cache = TRUE
)
```
You could consider setting `message = FALSE`, `warning = FALSE`, and `results = "hide"` too, but that would make it harder to debug problems because you wouldn't see any messages in the final document.
### Tables
By default, R Markdown displays data frames and matrixes as they would be in the R terminal (in a monospaced font):
```{r}
mtcars[1:5, ]
```
If you prefer that data be displayed with additional formatting you can use the `knitr::kable` function:
```{r}
knitr::kable(
mtcars[1:5, ],
caption = "A knitr kable."
)
```
Read the documentation at `?knitr::kable` to see the other ways that you can customise the table.
If you'd like to customize your tables at a deeper level, consider the __xtable__, __stargazer__, __pander__, __tables__, and __ascii__ packages. Each provides a set of tools for returning formatted tables from R code.
### Caching
If document rendering becomes time consuming due to long computations, you can use knitr caching to improve performance. Knitr will save the output of any chunk that contains the option `cache = TRUE` along with a MD5 digest of its contents to a folder alongside your .Rmd file. On subsequent renders, knitr will check the digest to see if the chunk contents have changed. If they have not, knitr will skip the chunk and insert the cached contents. If they have changed (i.e. if the chunk has been modified) Knitr will execute the chunk, embed the results, and save the new results to use for future renders.
Knitr's caching system is straightforward but can become complicated when one code chunk depends on the contents of another. For example, here chunk 2 depends on chunk 1.
# chunk 1
`r chunk`{r}
a <- 1
`r chunk`
# chunk 2
`r chunk`{r cached=TRUE}
a + 1
`r chunk`
To ensure that caching works properly in this situation, give each chunk a chunk label (Knitr assumes that the first unnamed option in the chunk header is a label). Then set the `dependson` option of the cached chunk.
# chunk 1
`r chunk`{r chunk1}
a <- 1
`r chunk`
# chunk 2
`r chunk`{r chunk2, cached = TRUE, dependson = "chunk1"}
a + 1
`r chunk`
`dependson` should contain a character vector of *every* chunk that the cached chunk depends on. Knitr will update the results for the cached chunk whenever it detects that one of its dependencies have changed.
## Inline code
Code results can be inserted directly into the *text* of a .Rmd file by enclosing the code with `` `r ` ``. The [file below](http://github.com/hadley/r4ds/tree/master/rmarkdown-demos/3-inline.Rmd) uses `` `r ` `` twice to call `colorFunc`, which returns "heat.colors." This makes it easy to update the report to refer to another function.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/inline-1-heat.png")
```
Inline expressions do not take knitr options. When processing inline code, R Markdown will always display the results of inline code, but not the code, and apply relevant text formatting to the results. As a result, inline output is indistinguishable from the surrounding text.
## Text formatting
Format the text in your R Markdown files with Markdown, a set of markup annotations for plain text files. When you render your file, Pandoc transforms the marked up text into formatted text in your final file format. Markdown is designed to be easy to read and easy to write. It is also very easy to learn. The guide below shows how to use Pandoc's Markdown, a slightly extended version of Markdown that R Markdown understands.
```{r, echo = FALSE, comment = ""}
cat(readr::read_file("rmarkdown-demos/markdown.Rmd"))
```
## YAML header
### Parameters
R Markdown documents can include one or more parameters whose values can be set when you render the report. Parameters are useful when you want to re-render the same report with distinct values for various key inputs, for example, to run:
* a report specific to a department or geographic region.
* a report that covers a specific period in time.
* multiple versions of a report for distinct sets of core assumptions.
To declare one or more parameters for your file, use the `params` field within the YAML header of the document. For example, the [file below](http://github.com/hadley/r4ds/tree/master/rmarkdown-demos/5-parameters.Rmd) uses a `data` parameter that determines which data set to plot.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/params-1-hawaii.png")
```
R Markdown recognizes the atomic data types: numerics, character strings, logicals, etc. You can also pass an R expression as a parameter by prefacing the parameter value with `!R`, e.g.
```{r eval = FALSE}
---
params:
start: !r lubridate::ymd("2015-01-01")
snapshot: !r lubridate::ymd_hms("2015-01-01 12:30:00")
---
```
Parameters are available within the knit environment as a read-only list named `params`. To access a parameter in code, call `params$<parameter name>`.
Add a `params` argument to `render()` to create a report that uses a different set of parameter values. Here we modify our report to use the `aleutians` data set with:
```{r eval = FALSE}
render("5-parameters.Rmd", params = list(data = "aleutians"))
```
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/params-2-aleutians.png")
```
Better yet, click the "Knit with Parameters" option in the dropdown menu next to the RStudio IDE knit button to set parameters, render, and preview the report in a single user friendly step.
```{r, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/params-3-florida.png")
```
### Bibliographies and Citations
Pandoc can automatically generate citations and a bibliography in a number of styles. To use this feature, specify a bibliography file using the `bibliography` field in your file's header. The field should contain a filepath from the directory that contains your .Rmd file to the file that contains the bibliography file:
```yaml
---
title: "Markdown Demo"
output: html_document
bibliography: rmarkdown.bib
---
```
You can use any of the following formats: .bib (BibLaTeX), .bibtex (BibTeX), .copac (Copac), .enl (EndNote), .json (JSON citeproc), .medline (MEDLINE), .mods (MODS), .ris (RIS),
.wos (ISI), .xml (XML).
To create a citation within your .Rmd file, use a key composed of @ + the citation identifier from the bibliography file. Then place the citation in square brackets. Here are some example citations from rmarkdown.rstudio.com. Notice that you can
* Separate multiple citations with a `;`
* Remove the square brackets to create an in-text citation
* Add a `-` before the citation to supress the author's name
```markdown
Blah blah [see @doe99, pp. 33-35; also @smith04, ch. 1].
Blah blah [@doe99, pp. 33-35, 38-39 and *passim*].
Blah blah [@smith04; @doe99].
@smith04 says blah.
@smith04 [p. 33] says blah.
Smith says blah [-@smith04].
```
When R Markdown renders your file, it will build and append a bibliography to the end of your document. The bibliography will contain each of the cited references from your bibiliography file, but it will not contain a section heading. As a result it is common practice to end your file with a section header for the bibliography, such as `# References` or `# Bibliography`.
You can change the style of your citations and bibliography by adding a CSL 1.0 style file to the `csl` field of your file's header.
```yaml
---
title: "Markdown Demo"
output: html_document
bibliography: rmarkdown.bib
csl: apa.csl
---
```
As with the bibliography field, your csl file should contain a filepath to the file (here I assume that the csl file is in the same directory as the .Rmd file). http://github.com/citation-style-language/styles contains a useful repository of CSL style files.