diff --git a/base-R.qmd b/base-R.qmd index 8331b1f..276fde8 100644 --- a/base-R.qmd +++ b/base-R.qmd @@ -518,7 +518,7 @@ Here's a quick example from the diamonds dataset: ```{r} #| dev: png -#| fig-width: 3 +#| fig-width: 4 #| fig-asp: 1 #| layout-ncol: 2 diff --git a/communication.qmd b/communication.qmd index 1733929..1363105 100644 --- a/communication.qmd +++ b/communication.qmd @@ -36,6 +36,7 @@ Don't forget that you'll need to install those packages with `install.packages() #| message: false library(tidyverse) +library(scales) library(ggrepel) library(patchwork) ``` @@ -394,13 +395,13 @@ Note that `breaks` is in the original scale of the data. # Left ggplot(diamonds, aes(x = price, y = cut)) + geom_boxplot(alpha = 0.05) + - scale_x_continuous(labels = scales::label_dollar()) + scale_x_continuous(labels = label_dollar()) # Right ggplot(diamonds, aes(x = price, y = cut)) + geom_boxplot(alpha = 0.05) + scale_x_continuous( - labels = scales::label_dollar(scale = 1/1000, suffix = "K"), + labels = label_dollar(scale = 1/1000, suffix = "K"), breaks = seq(1000, 19000, by = 6000) ) ``` @@ -415,7 +416,7 @@ Another handy label function is `label_percent()`: ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar(position = "fill") + - scale_y_continuous(name = "Percentage", labels = scales::label_percent()) + scale_y_continuous(name = "Percentage", labels = label_percent()) ``` Another use of `breaks` is when you have relatively few data points and want to highlight exactly where the observations occur. diff --git a/layers.qmd b/layers.qmd index 5b86cd5..15868ac 100644 --- a/layers.qmd +++ b/layers.qmd @@ -684,7 +684,7 @@ However, there are three reasons why you might need to use a stat explicitly: 1. You might want to override the default stat. In the code below, we change the stat of `geom_bar()` from count (the default) to identity. - This lets us map the height of the bars to the raw values of a $y$ variable. + This lets us map the height of the bars to the raw values of a y variable. ```{r} #| warning: false @@ -1024,14 +1024,20 @@ To see how this works, consider how you could build a basic plot from scratch: y Next, you could choose a geometric object to represent each observation in the transformed data. You could then use the aesthetic properties of the geoms to represent variables in the data. You would map the values of each variable to the levels of an aesthetic. +These steps are illustrated in @fig-visualization-grammar. You'd then select a coordinate system to place the geoms into, using the location of the objects (which is itself an aesthetic property) to display the values of the x and y variables. ```{r} +#| label: fig-visualization-grammar #| echo: false #| fig-alt: > -#| A figure demonstrating the steps for going from raw data to table of counts -#| where each row represents one level of cut and a count column shows how many -#| diamonds are in that cuit level. +#| A figure demonstrating the steps for going from raw data to table of +#| frequencies where each row represents one level of cut and a count column +#| shows how many diamonds are in that cut level. Then, these values are +#| mapped to heights of bars. +#| fig-cap: > +#| Steps for going from raw data to a table of frequencies to a bar plot where +#| the heights of the bar represent the frequencies. knitr::include_graphics("images/visualization-grammar.png") ``` diff --git a/numbers.qmd b/numbers.qmd index 97f475a..5f1e9d6 100644 --- a/numbers.qmd +++ b/numbers.qmd @@ -680,6 +680,7 @@ This suggests that the mean is unlikely to be a good summary and we might prefer #| leave a couple of minutes early), but there's still a very steep #| decay after that. #| fig-asp: 0.5 + library(patchwork) full <- flights |> @@ -695,19 +696,15 @@ full + delayed120 ``` It's also a good idea to check that distributions for subgroups resemble the whole. -@fig-flights-dist-daily overlays a frequency polygon for each day. +In the following plot 365 frequency polygons of `dep_delay`, one for each day, are overlaid. The distributions seem to follow a common pattern, suggesting it's fine to use the same summary for each day. ```{r} -#| label: fig-flights-dist-daily -#| fig-cap: > -#| 365 frequency polygons of `dep_delay`, one for each day. The frequency -#| polygons appear to have the same shape, suggesting that it's reasonable -#| to compare days by looking at just a few summary statistics. #| fig-alt: > #| The distribution of `dep_delay` is highly right skewed with a strong #| peak slightly less than 0. The 365 frequency polygons are mostly #| overlapping forming a thick black bland. + flights |> filter(dep_delay < 120) |> ggplot(aes(x = dep_delay, group = interaction(day, month))) + diff --git a/quarto.qmd b/quarto.qmd index 42e6580..d2bf9fc 100644 --- a/quarto.qmd +++ b/quarto.qmd @@ -109,14 +109,18 @@ knitr::include_graphics("quarto/diamond-sizes-report.png") When you render the document, Quarto sends the `.qmd` file to **knitr**, [https://yihui.name/knitr](https://yihui.name/knitr/){.uri}, which executes all of the code chunks and creates a new markdown (`.md`) document which includes the code and its output. The markdown file generated by knitr is then processed by **pandoc**, [https://pandoc.org](https://pandoc.org/){.uri}, which is responsible for creating the finished file. -The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats. +This process is shown in @fig-quarto-flow. The advantage of this two step workflow is that you can create a very wide range of output formats, as you'll learn about in @sec-quarto-formats. ```{r} +#| label: fig-quarto-flow #| echo: false #| out-width: "75%" -#| fig-alt: | +#| fig-alt: > #| Workflow diagram starting with a qmd file, then knitr, then md, #| then pandoc, then PDF, MS Word, or HTML. +#| fig-cap: > +#| Diagram of Quarto workflow from qmd, to knitr, to md, to pandoc, +#| to output in PDF, MS Word, or HTML formats. knitr::include_graphics("images/quarto-flow.png") ``` diff --git a/regexps.qmd b/regexps.qmd index 12aa35b..912a296 100644 --- a/regexps.qmd +++ b/regexps.qmd @@ -136,10 +136,6 @@ It looks like they've radically increased in popularity lately! [^regexps-4]: This gives us the proportion of **names** that contain an "x"; if you wanted the proportion of babies with a name containing an x, you'd need to perform a weighted mean. ```{r} -#| label: fig-x-names -#| fig-cap: > -#| A time series showing the proportion of baby names that contain a -#| lower case "x". #| fig-alt: > #| A time series showing the proportion of baby names that contain the letter x. #| The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in