Minor edits based on PDF review (#1359)

This commit is contained in:
Mine Cetinkaya-Rundel 2023-03-10 16:52:40 -05:00 committed by GitHub
parent 8cb1037e1e
commit cfd608cf47
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 24 additions and 20 deletions

View File

@ -518,7 +518,7 @@ Here's a quick example from the diamonds dataset:
```{r}
#| dev: png
#| fig-width: 3
#| fig-width: 4
#| fig-asp: 1
#| layout-ncol: 2

View File

@ -36,6 +36,7 @@ Don't forget that you'll need to install those packages with `install.packages()
#| message: false
library(tidyverse)
library(scales)
library(ggrepel)
library(patchwork)
```
@ -394,13 +395,13 @@ Note that `breaks` is in the original scale of the data.
# Left
ggplot(diamonds, aes(x = price, y = cut)) +
geom_boxplot(alpha = 0.05) +
scale_x_continuous(labels = scales::label_dollar())
scale_x_continuous(labels = label_dollar())
# Right
ggplot(diamonds, aes(x = price, y = cut)) +
geom_boxplot(alpha = 0.05) +
scale_x_continuous(
labels = scales::label_dollar(scale = 1/1000, suffix = "K"),
labels = label_dollar(scale = 1/1000, suffix = "K"),
breaks = seq(1000, 19000, by = 6000)
)
```
@ -415,7 +416,7 @@ Another handy label function is `label_percent()`:
ggplot(diamonds, aes(x = cut, fill = clarity)) +
geom_bar(position = "fill") +
scale_y_continuous(name = "Percentage", labels = scales::label_percent())
scale_y_continuous(name = "Percentage", labels = label_percent())
```
Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur.

View File

@ -684,7 +684,7 @@ However, there are three reasons why you might need to use a stat explicitly:
1. You might want to override the default stat.
In the code below, we change the stat of `geom_bar()` from count (the default) to identity.
This lets us map the height of the bars to the raw values of a $y$ variable.
This lets us map the height of the bars to the raw values of a y variable.
```{r}
#| warning: false
@ -1024,14 +1024,20 @@ To see how this works, consider how you could build a basic plot from scratch: y
Next, you could choose a geometric object to represent each observation in the transformed data.
You could then use the aesthetic properties of the geoms to represent variables in the data.
You would map the values of each variable to the levels of an aesthetic.
These steps are illustrated in @fig-visualization-grammar.
You'd then select a coordinate system to place the geoms into, using the location of the objects (which is itself an aesthetic property) to display the values of the x and y variables.
```{r}
#| label: fig-visualization-grammar
#| echo: false
#| fig-alt: >
#| A figure demonstrating the steps for going from raw data to table of counts
#| where each row represents one level of cut and a count column shows how many
#| diamonds are in that cuit level.
#| A figure demonstrating the steps for going from raw data to table of
#| frequencies where each row represents one level of cut and a count column
#| shows how many diamonds are in that cut level. Then, these values are
#| mapped to heights of bars.
#| fig-cap: >
#| Steps for going from raw data to a table of frequencies to a bar plot where
#| the heights of the bar represent the frequencies.
knitr::include_graphics("images/visualization-grammar.png")
```

View File

@ -680,6 +680,7 @@ This suggests that the mean is unlikely to be a good summary and we might prefer
#| leave a couple of minutes early), but there's still a very steep
#| decay after that.
#| fig-asp: 0.5
library(patchwork)
full <- flights |>
@ -695,19 +696,15 @@ full + delayed120
```
It's also a good idea to check that distributions for subgroups resemble the whole.
@fig-flights-dist-daily overlays a frequency polygon for each day.
In the following plot 365 frequency polygons of `dep_delay`, one for each day, are overlaid.
The distributions seem to follow a common pattern, suggesting it's fine to use the same summary for each day.
```{r}
#| label: fig-flights-dist-daily
#| fig-cap: >
#| 365 frequency polygons of `dep_delay`, one for each day. The frequency
#| polygons appear to have the same shape, suggesting that it's reasonable
#| to compare days by looking at just a few summary statistics.
#| fig-alt: >
#| The distribution of `dep_delay` is highly right skewed with a strong
#| peak slightly less than 0. The 365 frequency polygons are mostly
#| overlapping forming a thick black bland.
flights |>
filter(dep_delay < 120) |>
ggplot(aes(x = dep_delay, group = interaction(day, month))) +

View File

@ -109,14 +109,18 @@ knitr::include_graphics("quarto/diamond-sizes-report.png")
When you render the document, Quarto sends the `.qmd` file to **knitr**, [https://yihui.name/knitr](https://yihui.name/knitr/){.uri}, which executes all of the code chunks and creates a new markdown (`.md`) document which includes the code and its output.
The markdown file generated by knitr is then processed by **pandoc**, [https://pandoc.org](https://pandoc.org/){.uri}, which is responsible for creating the finished file.
The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats.
This process is shown in @fig-quarto-flow. The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats.
```{r}
#| label: fig-quarto-flow
#| echo: false
#| out-width: "75%"
#| fig-alt: |
#| fig-alt: >
#| Workflow diagram starting with a qmd file, then knitr, then md,
#| then pandoc, then PDF, MS Word, or HTML.
#| fig-cap: >
#| Diagram of Quarto workflow from qmd, to knitr, to md, to pandoc,
#| to output in PDF, MS Word, or HTML formats.
knitr::include_graphics("images/quarto-flow.png")
```

View File

@ -136,10 +136,6 @@ It looks like they've radically increased in popularity lately!
[^regexps-4]: This gives us the proportion of **names** that contain an "x"; if you wanted the proportion of babies with a name containing an x, you'd need to perform a weighted mean.
```{r}
#| label: fig-x-names
#| fig-cap: >
#| A time series showing the proportion of baby names that contain a
#| lower case "x".
#| fig-alt: >
#| A time series showing the proportion of baby names that contain the letter x.
#| The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in