More writing on R Markdown chapter.

This commit is contained in:
Garrett 2016-08-04 17:02:04 -04:00
parent bff90c3ed5
commit 59ce717f7d
1 changed files with 113 additions and 70 deletions

View File

@ -1,105 +1,148 @@
# R Markdown
R Markdown is an authoring framework that does something incredibly useful, it provides a single file format that you can use to do everything from run code to publish finished reports. In other words, you can use a single R Markdown file to
R Markdown is an authoring framework that provides a versatile file format for data science. You can use a single R Markdown file to do everything that you do in your data analyses, from importing data, to tidying it, to visualizing, transforming, and modeling it, to communicating your results to an audience. Once you have done these things, your R Markdown file remains as a reproducible record of your work, easy to track in a version control system like Git.
* import data
* tidy it
* visualize, transform and model it
* and then communicate the results
R Markdown files are also seamlessly integrated into the RStudio IDE, wherethey create a noteboook interface for R (with multi-language support).
R Markdown is also exceptionally easy to learn and based on a simple plain text file format, which means that R Markdown files are unusually easy to track with version control software like Git and Github. On top of all of this, R Markdown features are seamlessly integrated in to the RStudio IDE, turning the IDE into a type of R Markdown editor. Did I mention that R Markdown files also provide a multi-language notebook interface for R?
This chapter will show you how to use R Markdown. Section 1 provides a quick tour of the basic features in R Markdown. This section is all that you need to read to get started. The remainder of the chapter will show you how to customize details of the R Markdown workflow.
This chapter will show you how to use this versatile piece of technology. Section 1 provides a quick tour of all of the basic features in R Markdown. This section is all that you need to read to get started. The remainder of the chapter will show you how to customize details of the R Markdown workflow.
## Basics
## Using R Markdown in the RStudio IDE
An R Markdown file is a plain text file saved with the extension .Rmd. >rmd files contain three types of content:
An R Markdown file is a simple plain text file saved with the extension .Rmd. You can open an R Markdown file in the RStudio IDE by going to File > New File > R Markdown. The editor will open a window that looks like this, which you can ignore. The IDE pre-populates your file with content based on what you choose in the window. If this is your first time using R Markdown, just click OK.
* An optional header of meta-data (written in YAML)
* Text formatted with Markdown syntax
* Executable code embedded into the document as knitr code chunks
<!--- ![](images/rmarkdown-wizard.png) --->
Think of a .Rmd file as the data science equivalent of [literate programming](https://en.wikipedia.org/wiki/Literate_programming), an idea promulgated by Donald Knuth. A literate program contains code that creates the program, intermixed with human-readable text that explains what the code does in the context of the program. A .Rmd file contains code that runs an analysis, intermixed with human-readable text that explains what the code does in the context of the analysis and what the results of the code mean, i.e. literate data science.
RStudio will open a new file that contains the text below, which describes how to use R Markdown. In practice, you would simply delete this content and start writing in your file. Since this is our first R Markdown file let's take a look at the content. The content itself is a viable R Markdown document.
To open a .Rmd file, open the RStudio IDE and select File > New File > R Markdown... in the menubar. RStudio will launch a wizard that you can use to pre-populate your file with useful content.
```{r echo = FALSE, comment = ""}
cat(htmltools::includeText("extra/sample-rmarkdown.Rmd"))
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-wizard.png")
```
When you write an R Markdown file, you include everything that you would need rerun your analysis, as well as everything that you would need to write a report about your analysis.
Since this is our first time using R Markdown, just click OK. RStudio will open a new file and place into it the text below.
R Markdown files contain three types of content:
```{r echo = FALSE, comment = ""}
cat(htmltools::includeText("~/Documents/r4ds/extra/sample-rmarkdown.Rmd"))
```
1. An optional header of YAML values
The file contains everything that you need to reproduce a (here trivial) analysis, as well as everything you need to generate a finished report about the analysis to export.
These key value pairs contain metadata that R Markdown can use to generate a finished report from your file. If your file contains a header it must appear at the start of the file and it must begin and end with a line that contains three dashes, e.g. `---`.
2. Text formatted with Markdown cues
### Running code
These sections of text look like plain text, but they may contain unobtrusive formatting markup written in the [Markdown](http://rmarkdown.rstudio.com/authoring_basics.html) syntax. For example, line 12 begins with two hashtags (`##`), which identify the line as a second level header.
3. Code chunks
Notice that chunks of executable R code appear in the file. Each chunk begins and ends with a line that contains three backticks (knitr::inline_expr(```)). A pair of braces follows the first set of brackets, which provides a space to set chunk parameters.
Code chunks contain executable code, often in the R language. Each chunk begins with a line that contains three backticks, knitr::inline_expr(```), and then the name of a programming language in braces. Some chunks may contain optional chunk arguments inserted between the brackets and separated by commas. Each code chunk ends with a line of three backticks.
You can create a new chunk by typing code and surrounding it with these lines in your document. Or, you can click the "Insert a new chunk" icon at the top of the .Rmd file, which will insert the lines for you at your cursor's location.
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-new-chunk.png")
```
You can run the chunks in your document one at a time by clicking the "Run Current Chunk" icon, which looks like a play button, at the top of the chunk. RStudio will run the code in the chunk in your current environment and display the results in the R Markdown file editor, turning the file editor into a code notebook.
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-run-chunk.png")
```
To turn off this behavior, click the gear icon at the top of the .Rmd file and select "Chunk Output in the Console". RStudio will then run code chunks at the command line as if your .Rmd file were an R Script.
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-chunk-console.png")
```
You can run all of the chunks that precede a code chunk, in order, by clicking the "Run All Chunks Above" icon that appears to the left of the "Run Current Chunk" icon.
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-run-above-chunks.png")
```
You can every chunk in your document, in order, at the command line by selecting Run All in the Run menu that appears at the top of your .Rmd file.
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-run-all.png")
```
## The benefits of R Markdown
### Generating reports
* Literate Data Science
* Reproducible Research.
As a data scientist, you don't run experiments, you run code. What do you need to reproduce? The whole process, this includes communication.
* Dynamic Documents
To generate a report of this analysis that you would feel comfortable presenting to a non-technical audience, click the knit button at the top of the file.
## Using R Markdown in the RStudio IDE
```{r, eval = FALSE, echo = FALSE, out.width = "65%"}
knitr::include_graphics("images/rmarkdown-knit.png")
```
If you need to get started immediately...
When you do this, two things will happen in the background.
## Write text
```{r, eval = FALSE, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/rmarkdown-flow.png")
```
Markdown is easy to use.
1. R Markdown will invoke the knitr package to run the code chunks in your document. Knitr will execute the code in a fresh R environment and then create a new markdown document. This document will contain the same text as in your .Rmd file, but each code chunk will be replaced by the markdown equivalent of the code chunk followed by the results of the code embedded into the document. If you choose to export your report as a pdf, knitr will return a tex version of your report instead of a markdown document.
## Embed code
2. R markdown will invoke the pandoc program to convert the markdown file that knitr returns into a finished document. Here this is the html document seen below. Pandoc will remove the markdown markup contained in your text, replacing the text with formatted text. Pandoc will also use the meta-data in your file to customize your report when sensible. For example, here it adds a title, author, and date to the beginning of your report. If you choose to output your report as a pdf, R Markdown will use your computer's installation of Tex to generate a pdf from the tex file supplied by knitr.
Use knitr syntax to customize the output.
R Markdown saves the output file in the same directory as your .Rmd file, and RStudio opens a preview of the file in your IDE. Use the drop down menu accessible in the gear icon to select where in the IDE to display the preview.
## Use the metadata
```{r, eval = FALSE, echo = FALSE, out.width = "100%"}
knitr::include_graphics("images/rmarkdown-html.png")
```
### Set Parameters
### Change and customize Output formats
#### Output formats
## Extensions
To generate a report in a different output format, change the output field in your file's header and then re-knit. R Markdown will repeat the process above re-executing your code each time you render a report (you can cache results if this is expensive).
Here we list some extensions to the R Markdown format. Some of these extensions are so deep, like building Shiny Apps, that I can't possibly cover everything you need to know here (but I can tell you where to learn more), others are so simple to use, like flexdashboards, that this brief entry is all you will need to get started.
R markdown recognizes the following output fields, and more are available from other packages.
Since R Markdown is designed to be extended, you should expect more extensions to appear over time.
TO DO
* Flexdashboards
* Bookdown
* Shiny apps
You can also use the knitr dropdown menu to quickly knit to several different formats. The menu will display options that are similar to the current setting for `output:`, updating if you save the file with a new output value.
#### Generating reports from the command line
To generate a finished report from the command line, run the `render()` command from the `rmarkdown` package. The first argument of `render()` is the filepath to the .Rmd file that you wish to render. You can set the name and directory of the output file with the `output_file` and `output_dir` arguments. You can use `output_format` to select the format of the exported report. The code below supplies a vector of formats, creating several documents at once.
```{r eval = FALSE}
library(rmarkdown)
render("untitled.Rmd",
output_format = c("pdf_document", "word_document"),
output_dir = "reports", output_file = "untitled-report")
```
Behind the scenes, R Markdown does the exact same thing whether you use `render()` or the knit button. However, the RStudio IDE will not automatically open a preview of the document when you use `render()`.
`render()` provides an easy way to generate multiple documents from the same .Rmd file; you can call render from a for loop or a purr function. If your .Rmd file uses parameters, you can supply a new set of parametes to `render()` for each version of the document (see PARAMETERS).
You can also call render from a chron job to create an R Markdown report that automatically updates on a scheduled basis.
## Executing code
### Code display
### Caching
### Languages
### Inline code
### Parameters
## Formatting text
### Markdown
### Tables
### Citations
### Equations
## Output formats
### Customizing output
### R Markdown extensions
As a data scientist, you don't run experiments, you run code.
As a data scientist, you are the link between data, computers, colleagues, and human decision makers. Your many roles require many tools, but now there is a powerful authoring framework that lets you do everything with a single file. Its called R Markdown. Its incredibly easy to use, and, like the rest of R, it is absolutely free.
With R Markdown, you record your work in a plain text file that contains narrative, code, and metadata. Open the file in your RStudio IDE and you have a true notebook for R. You dont even need to write your code in R. You can use Python, JavaScript, SQL, and many more languages within your file.
To share your work, generate an html, pdf, or Microsoft Word report straight from your file. Or a beamer, ioslides, slidy, or reveal.js slideshow. Or a notebook that colleagues can view in a web browser. Or an administrative dashboard, or a book, or a website, or an interactive web app. R Markdown makes all of these and more.
In every case, R Markdown executes the code in your file and inserts the results into your finished report.
You can set output options, like a table of contents, or apply reusable templates that quickly shape the appearance of your report.
You can also set parameters each time you render a new report, which turns your file into a reusable data product that you can write once and deploy multiple times.
To go beyond the basics, enhance your R Markdown files with HTML based interactivity. Create client-side interactions with Rs htmlwidgets, like leaflet, dygraph, and other JavaScript visualizations. Or create more sophisticated interactions that are processed on the server side with Shiny.
R Markdown supports anything that you can do in R, and it creates a reproducible record of your work as you go. Build models, connect to databases, or run spark code on Big Data with the sparklyr package. R Markdown handles it all. And yet, at heart, it remains a simple plain text file.
Learn more at rmarkdown.rstudio.com.