Minor edits based on PDF review (#1359)
This commit is contained in:
parent
8cb1037e1e
commit
cfd608cf47
|
@ -518,7 +518,7 @@ Here's a quick example from the diamonds dataset:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| dev: png
|
#| dev: png
|
||||||
#| fig-width: 3
|
#| fig-width: 4
|
||||||
#| fig-asp: 1
|
#| fig-asp: 1
|
||||||
#| layout-ncol: 2
|
#| layout-ncol: 2
|
||||||
|
|
||||||
|
|
|
@ -36,6 +36,7 @@ Don't forget that you'll need to install those packages with `install.packages()
|
||||||
#| message: false
|
#| message: false
|
||||||
|
|
||||||
library(tidyverse)
|
library(tidyverse)
|
||||||
|
library(scales)
|
||||||
library(ggrepel)
|
library(ggrepel)
|
||||||
library(patchwork)
|
library(patchwork)
|
||||||
```
|
```
|
||||||
|
@ -394,13 +395,13 @@ Note that `breaks` is in the original scale of the data.
|
||||||
# Left
|
# Left
|
||||||
ggplot(diamonds, aes(x = price, y = cut)) +
|
ggplot(diamonds, aes(x = price, y = cut)) +
|
||||||
geom_boxplot(alpha = 0.05) +
|
geom_boxplot(alpha = 0.05) +
|
||||||
scale_x_continuous(labels = scales::label_dollar())
|
scale_x_continuous(labels = label_dollar())
|
||||||
|
|
||||||
# Right
|
# Right
|
||||||
ggplot(diamonds, aes(x = price, y = cut)) +
|
ggplot(diamonds, aes(x = price, y = cut)) +
|
||||||
geom_boxplot(alpha = 0.05) +
|
geom_boxplot(alpha = 0.05) +
|
||||||
scale_x_continuous(
|
scale_x_continuous(
|
||||||
labels = scales::label_dollar(scale = 1/1000, suffix = "K"),
|
labels = label_dollar(scale = 1/1000, suffix = "K"),
|
||||||
breaks = seq(1000, 19000, by = 6000)
|
breaks = seq(1000, 19000, by = 6000)
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
@ -415,7 +416,7 @@ Another handy label function is `label_percent()`:
|
||||||
|
|
||||||
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
ggplot(diamonds, aes(x = cut, fill = clarity)) +
|
||||||
geom_bar(position = "fill") +
|
geom_bar(position = "fill") +
|
||||||
scale_y_continuous(name = "Percentage", labels = scales::label_percent())
|
scale_y_continuous(name = "Percentage", labels = label_percent())
|
||||||
```
|
```
|
||||||
|
|
||||||
Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur.
|
Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur.
|
||||||
|
|
14
layers.qmd
14
layers.qmd
|
@ -684,7 +684,7 @@ However, there are three reasons why you might need to use a stat explicitly:
|
||||||
|
|
||||||
1. You might want to override the default stat.
|
1. You might want to override the default stat.
|
||||||
In the code below, we change the stat of `geom_bar()` from count (the default) to identity.
|
In the code below, we change the stat of `geom_bar()` from count (the default) to identity.
|
||||||
This lets us map the height of the bars to the raw values of a $y$ variable.
|
This lets us map the height of the bars to the raw values of a y variable.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| warning: false
|
#| warning: false
|
||||||
|
@ -1024,14 +1024,20 @@ To see how this works, consider how you could build a basic plot from scratch: y
|
||||||
Next, you could choose a geometric object to represent each observation in the transformed data.
|
Next, you could choose a geometric object to represent each observation in the transformed data.
|
||||||
You could then use the aesthetic properties of the geoms to represent variables in the data.
|
You could then use the aesthetic properties of the geoms to represent variables in the data.
|
||||||
You would map the values of each variable to the levels of an aesthetic.
|
You would map the values of each variable to the levels of an aesthetic.
|
||||||
|
These steps are illustrated in @fig-visualization-grammar.
|
||||||
You'd then select a coordinate system to place the geoms into, using the location of the objects (which is itself an aesthetic property) to display the values of the x and y variables.
|
You'd then select a coordinate system to place the geoms into, using the location of the objects (which is itself an aesthetic property) to display the values of the x and y variables.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| label: fig-visualization-grammar
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| A figure demonstrating the steps for going from raw data to table of counts
|
#| A figure demonstrating the steps for going from raw data to table of
|
||||||
#| where each row represents one level of cut and a count column shows how many
|
#| frequencies where each row represents one level of cut and a count column
|
||||||
#| diamonds are in that cuit level.
|
#| shows how many diamonds are in that cut level. Then, these values are
|
||||||
|
#| mapped to heights of bars.
|
||||||
|
#| fig-cap: >
|
||||||
|
#| Steps for going from raw data to a table of frequencies to a bar plot where
|
||||||
|
#| the heights of the bar represent the frequencies.
|
||||||
|
|
||||||
knitr::include_graphics("images/visualization-grammar.png")
|
knitr::include_graphics("images/visualization-grammar.png")
|
||||||
```
|
```
|
||||||
|
|
|
@ -680,6 +680,7 @@ This suggests that the mean is unlikely to be a good summary and we might prefer
|
||||||
#| leave a couple of minutes early), but there's still a very steep
|
#| leave a couple of minutes early), but there's still a very steep
|
||||||
#| decay after that.
|
#| decay after that.
|
||||||
#| fig-asp: 0.5
|
#| fig-asp: 0.5
|
||||||
|
|
||||||
library(patchwork)
|
library(patchwork)
|
||||||
|
|
||||||
full <- flights |>
|
full <- flights |>
|
||||||
|
@ -695,19 +696,15 @@ full + delayed120
|
||||||
```
|
```
|
||||||
|
|
||||||
It's also a good idea to check that distributions for subgroups resemble the whole.
|
It's also a good idea to check that distributions for subgroups resemble the whole.
|
||||||
@fig-flights-dist-daily overlays a frequency polygon for each day.
|
In the following plot 365 frequency polygons of `dep_delay`, one for each day, are overlaid.
|
||||||
The distributions seem to follow a common pattern, suggesting it's fine to use the same summary for each day.
|
The distributions seem to follow a common pattern, suggesting it's fine to use the same summary for each day.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| label: fig-flights-dist-daily
|
|
||||||
#| fig-cap: >
|
|
||||||
#| 365 frequency polygons of `dep_delay`, one for each day. The frequency
|
|
||||||
#| polygons appear to have the same shape, suggesting that it's reasonable
|
|
||||||
#| to compare days by looking at just a few summary statistics.
|
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| The distribution of `dep_delay` is highly right skewed with a strong
|
#| The distribution of `dep_delay` is highly right skewed with a strong
|
||||||
#| peak slightly less than 0. The 365 frequency polygons are mostly
|
#| peak slightly less than 0. The 365 frequency polygons are mostly
|
||||||
#| overlapping forming a thick black bland.
|
#| overlapping forming a thick black bland.
|
||||||
|
|
||||||
flights |>
|
flights |>
|
||||||
filter(dep_delay < 120) |>
|
filter(dep_delay < 120) |>
|
||||||
ggplot(aes(x = dep_delay, group = interaction(day, month))) +
|
ggplot(aes(x = dep_delay, group = interaction(day, month))) +
|
||||||
|
|
|
@ -109,14 +109,18 @@ knitr::include_graphics("quarto/diamond-sizes-report.png")
|
||||||
|
|
||||||
When you render the document, Quarto sends the `.qmd` file to **knitr**, [https://yihui.name/knitr](https://yihui.name/knitr/){.uri}, which executes all of the code chunks and creates a new markdown (`.md`) document which includes the code and its output.
|
When you render the document, Quarto sends the `.qmd` file to **knitr**, [https://yihui.name/knitr](https://yihui.name/knitr/){.uri}, which executes all of the code chunks and creates a new markdown (`.md`) document which includes the code and its output.
|
||||||
The markdown file generated by knitr is then processed by **pandoc**, [https://pandoc.org](https://pandoc.org/){.uri}, which is responsible for creating the finished file.
|
The markdown file generated by knitr is then processed by **pandoc**, [https://pandoc.org](https://pandoc.org/){.uri}, which is responsible for creating the finished file.
|
||||||
The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats.
|
This process is shown in @fig-quarto-flow. The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
#| label: fig-quarto-flow
|
||||||
#| echo: false
|
#| echo: false
|
||||||
#| out-width: "75%"
|
#| out-width: "75%"
|
||||||
#| fig-alt: |
|
#| fig-alt: >
|
||||||
#| Workflow diagram starting with a qmd file, then knitr, then md,
|
#| Workflow diagram starting with a qmd file, then knitr, then md,
|
||||||
#| then pandoc, then PDF, MS Word, or HTML.
|
#| then pandoc, then PDF, MS Word, or HTML.
|
||||||
|
#| fig-cap: >
|
||||||
|
#| Diagram of Quarto workflow from qmd, to knitr, to md, to pandoc,
|
||||||
|
#| to output in PDF, MS Word, or HTML formats.
|
||||||
|
|
||||||
knitr::include_graphics("images/quarto-flow.png")
|
knitr::include_graphics("images/quarto-flow.png")
|
||||||
```
|
```
|
||||||
|
|
|
@ -136,10 +136,6 @@ It looks like they've radically increased in popularity lately!
|
||||||
[^regexps-4]: This gives us the proportion of **names** that contain an "x"; if you wanted the proportion of babies with a name containing an x, you'd need to perform a weighted mean.
|
[^regexps-4]: This gives us the proportion of **names** that contain an "x"; if you wanted the proportion of babies with a name containing an x, you'd need to perform a weighted mean.
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| label: fig-x-names
|
|
||||||
#| fig-cap: >
|
|
||||||
#| A time series showing the proportion of baby names that contain a
|
|
||||||
#| lower case "x".
|
|
||||||
#| fig-alt: >
|
#| fig-alt: >
|
||||||
#| A time series showing the proportion of baby names that contain the letter x.
|
#| A time series showing the proportion of baby names that contain the letter x.
|
||||||
#| The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in
|
#| The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in
|
||||||
|
|
Loading…
Reference in New Issue