r4ds/oreilly/communication.html

850 lines
63 KiB
HTML
Raw Normal View History

2023-01-13 07:22:57 +08:00
<section data-type="chapter" id="chp-communication">
<h1><span id="sec-communication" class="quarto-section-identifier d-none d-lg-block"><span class="chapter-title">Communication</span></span></h1>
<section id="communication-introduction" data-type="sect1">
2023-01-13 07:22:57 +08:00
<h1>
Introduction</h1>
<p>In <a href="#chp-EDA" data-type="xref">#chp-EDA</a>, you learned how to use plots as tools for <em>exploration</em>. When you make exploratory plots, you know—even before looking—which variables the plot will display. You made each plot for a purpose, could quickly look at it, and then move on to the next plot. In the course of most analyses, youll produce tens or hundreds of plots, most of which are immediately thrown away.</p>
<p>Now that you understand your data, you need to <em>communicate</em> your understanding to others. Your audience will likely not share your background knowledge and will not be deeply invested in the data. To help others quickly build up a good mental model of the data, you will need to invest considerable effort in making your plots as self-explanatory as possible. In this chapter, youll learn some of the tools that ggplot2 provides to do so.</p>
<p>This chapter focuses on the tools you need to create good graphics. We assume that you know what you want, and just need to know how to do it. For that reason, we highly recommend pairing this chapter with a good general visualization book. We particularly like <a href="https://www.amazon.com/gp/product/0321934075/">The Truthful Art</a>, by Albert Cairo. It doesnt teach the mechanics of creating visualizations, but instead focuses on what you need to think about in order to create effective graphics.</p>
<section id="communication-prerequisites" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Prerequisites</h2>
<p>In this chapter, well focus once again on ggplot2. Well also use a little dplyr for data manipulation, <strong>scales</strong> to override the default breaks, labels, transformations and palettes, and a few ggplot2 extension packages, including <strong>ggrepel</strong> (<a href="https://ggrepel.slowkow.com/">https://ggrepel.slowkow.com</a>) by Kamil Slowikowski and <strong>patchwork</strong> (<a href="https://patchwork.data-imaginist.com/">https://patchwork.data-imaginist.com</a>) by Thomas Lin Pedersen. Dont forget that youll need to install those packages with <code><a href="https://rdrr.io/r/utils/install.packages.html">install.packages()</a></code> if you dont already have them.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">library(tidyverse)
library(ggrepel)
library(patchwork)</pre>
</div>
</section>
</section>
<section id="labels" data-type="sect1">
<h1>
Labels</h1>
<p>The easiest place to start when turning an exploratory graphic into an expository graphic is with good labels. You add labels with the <code><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs()</a></code> function. This example adds a plot title:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(title = "Fuel efficiency generally decreases with engine size")</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-3-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars, where points are colored according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of cars is overlaid. The plot is titled &quot;Fuel efficiency generally decreases with engine size&quot;." width="576"/></p>
</div>
</div>
<p>The purpose of a plot title is to summarize the main finding. Avoid titles that just describe what the plot is, e.g. “A scatterplot of engine displacement vs. fuel economy”.</p>
<p>If you need to add more text, there are two other useful labels:</p>
<ul><li><p><code>subtitle</code> adds additional detail in a smaller font beneath the title.</p></li>
<li><p><code>caption</code> adds text at the bottom right of the plot, often used to describe the source of the data.</p></li>
</ul><div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov"
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-4-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars, where points are colored according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of cars is overlaid. The plot is titled &quot;Fuel efficiency generally decreases with engine size&quot;. The subtitle is &quot;Two seaters (sports cars) are an exception because of their light weight&quot; and the caption is &quot;Data from fueleconomy.gov&quot;." width="576"/></p>
</div>
</div>
<p>You can also use <code><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs()</a></code> to replace the axis and legend titles. Its usually a good idea to replace short variable names with more detailed descriptions, and to include the units.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
x = "Engine displacement (L)",
y = "Highway fuel economy (mpg)",
color = "Car type"
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-5-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars, where points are colored according to the car class. A smooth curve following the trajectory of the relationship between highway fuel efficiency versus engine size of cars is overlaid. The x-axis is labelled &quot;Engine displacement (L)&quot; and the y-axis is labelled &quot;Highway fuel economy (mpg)&quot;. The legend is labelled &quot;Car type&quot;." width="576"/></p>
</div>
</div>
<p>Its possible to use mathematical equations instead of text strings. Just switch <code>""</code> out for <code><a href="https://rdrr.io/r/base/substitute.html">quote()</a></code> and read about the available options in <code><a href="https://rdrr.io/r/grDevices/plotmath.html">?plotmath</a></code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
x = 1:10,
y = x ^ 2
)
ggplot(df, aes(x, y)) +
geom_point() +
labs(
x = quote(sum(x[i] ^ 2, i == 1, n)),
y = quote(alpha + beta + frac(delta, theta))
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-6-1.png" style="width:50.0%" alt="Scatterplot with math text on the x and y axis labels. X-axis label says sum of x_i squared, for i from 1 to n. Y-axis label says alpha + beta + delta over theta."/></p>
</div>
</div>
<section id="communication-exercises" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Exercises</h2>
<ol type="1"><li><p>Create one plot on the fuel economy data with customized <code>title</code>, <code>subtitle</code>, <code>caption</code>, <code>x</code>, <code>y</code>, and <code>color</code> labels.</p></li>
<li>
<p>Recreate the following plot using the fuel economy data. Note that both the colors and shapes of points vary by type of drive train.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-7-1.png" alt="Scatterplot of highway versus city fuel efficiency. Shapes and colors of points are determined by type of drive train." width="576"/></p>
</div>
</div>
</li>
<li><p>Take an exploratory graphic that youve created in the last month, and add informative titles to make it easier for others to understand.</p></li>
</ol></section>
</section>
<section id="annotations" data-type="sect1">
<h1>
Annotations</h1>
<p>In addition to labelling major components of your plot, its often useful to label individual observations or groups of observations. The first tool you have at your disposal is <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code>. <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code> is similar to <code><a href="https://ggplot2.tidyverse.org/reference/geom_point.html">geom_point()</a></code>, but it has an additional aesthetic: <code>label</code>. This makes it possible to add textual labels to your plots.</p>
<p>There are two possible sources of labels. First, you might have a tibble that provides labels. In the following plot we pull out the cars with the highest engine size in each drive type and save their information as a new data frame called <code>label_info</code>. In order to create the <code>label_info</code> data frame we used a number of new dplyr functions. Youll learn more about each of these soon!</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">label_info &lt;- mpg |&gt;
group_by(drv) |&gt;
arrange(desc(displ)) |&gt;
slice_head(n = 1) |&gt;
mutate(
drive_type = case_when(
drv == "f" ~ "front-wheel drive",
drv == "r" ~ "rear-wheel drive",
drv == "4" ~ "4-wheel drive"
)
) |&gt;
select(displ, hwy, drv, drive_type)
label_info
#&gt; # A tibble: 3 × 4
#&gt; # Groups: drv [3]
#&gt; displ hwy drv drive_type
#&gt; &lt;dbl&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 6.5 17 4 4-wheel drive
#&gt; 2 5.3 25 f front-wheel drive
#&gt; 3 7 24 r rear-wheel drive</pre>
</div>
<p>Then, we use this new data frame to directly label the three groups to replace the legend with labels placed directly on the plot. Using the <code>fontface</code> and <code>size</code> arguments we can customize the look of the text labels. Theyre larger than the rest of the text on the plot and bolded. (<code>theme(legend.position = "none"</code>) turns the legend off — well talk about it more shortly.)</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
geom_text(
data = label_info,
aes(x = displ, y = hwy, label = drive_type),
fontface = "bold", size = 5, hjust = "right", vjust = "bottom"
) +
theme(legend.position = "none")
#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-9-1.png" alt="Scatterplot of highway mileage versus engine size where points are colored by drive type. Smooth curves for each drive type are overlaid. Text labels identify the curves as front-wheel, rear-wheel, and 4-wheel." width="576"/></p>
</div>
</div>
<p>Note the use of <code>hjust</code> and <code>vjust</code> to control the alignment of the label. <a href="#fig-just" data-type="xref">#fig-just</a> shows all nine possible combinations.</p>
<div class="cell">
<div class="cell-output-display">
<figure id="fig-just"><p><img src="communication_files/figure-html/fig-just-1.png" style="width:60.0%" alt="A 1x1 grid. At (0,0) hjust is set to left and vjust is set to bottom. At (0.5, 0) hjust is center and vjust is bottom and at (1, 0) hjust is right and vjust is bottom. At (0, 0.5) hjust is left and vjust is center, at (0.5, 0.5) hjust is center and vjust is center, and at (1, 0.5) hjust is right and vjust is center. Finally, at (1, 0) hjust is left and vjust is top, at (0.5, 1) hjust is center and vjust is top, and at (1, 1) hjust is right and vjust is bottom."/></p>
<figcaption>All nine combinations of <code>hjust</code> and <code>vjust</code>.</figcaption>
</figure>
</div>
</div>
<p>However the annotated plot we made above is hard to read because the labels overlap with each other, and with the points. We can make things a little better by switching to <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_label()</a></code> which draws a rectangle behind the text. We also use the <code>nudge_y</code> parameter to move the labels slightly above the corresponding points:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
geom_label(
data = label_info,
aes(x = displ, y = hwy, label = drive_type),
fontface = "bold", size = 5, hjust = "right", alpha = 0.5, nudge_y = 2,
) +
theme(legend.position = "none")
#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-11-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars, where points are colored according to the car class. Some points are labelled with the car's name. The labels are box with white, transparent background." width="576"/></p>
</div>
</div>
<p>That helps a bit, but two of the labels still overlap with each other. This is difficult to fix by applying the same transformation for every label. Instead, we can use the <code><a href="https://rdrr.io/pkg/ggrepel/man/geom_text_repel.html">geom_label_repel()</a></code> function from the ggrepel package. This useful package will automatically adjust labels so that they dont overlap:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point(alpha = 0.3) +
geom_smooth(se = FALSE) +
geom_label_repel(
data = label_info,
aes(x = displ, y = hwy, label = drive_type),
fontface = "bold", size = 5, nudge_y = 2,
) +
theme(legend.position = "none")
#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-12-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars, where points are colored according to the car class. Some points are labelled with the car's name. The labels are box with white, transparent background and positioned to not overlap." width="576"/></p>
</div>
</div>
<p>You can also use the same idea to highlight certain points on a plot with <code><a href="https://rdrr.io/pkg/ggrepel/man/geom_text_repel.html">geom_text_repel()</a></code> from the ggrepel package. Note another handy technique used here: we added a second layer of large, hollow points to further highlight the labelled points.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">potential_outliers &lt;- mpg |&gt;
filter(hwy &gt; 40 | (hwy &gt; 20 &amp; displ &gt; 5))
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_text_repel(data = potential_outliers, aes(label = model)) +
geom_point(data = potential_outliers, color = "red") +
geom_point(data = potential_outliers, color = "red", size = 3, shape = "circle open")</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-13-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. Points where highway mileage is above 40 as well as above 20 with engine size above 5 are red, with a hollow red circle, and labelled with model name of the car." width="576"/></p>
</div>
</div>
<p>Alternatively, you might just want to add a single label to the plot, but youll still need to create a data frame. Often, you want the label in the corner of the plot, so its convenient to create a new data frame using <code><a href="https://dplyr.tidyverse.org/reference/summarise.html">summarize()</a></code> to compute the maximum values of x and y.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">label_info &lt;- mpg |&gt;
summarize(
displ = max(displ),
hwy = max(hwy),
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_text(
data = label_info, aes(label = label),
vjust = "top", hjust = "right"
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-14-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. On the top right corner, inset a bit from the corner, is an annotation that reads &quot;increasing engine size is related to decreasing fuel economy&quot;. The text spans two lines." width="576"/></p>
</div>
</div>
<p>If you want to place the text exactly on the borders of the plot, you can use <code>+Inf</code> and <code>-Inf</code>. Since were no longer computing the positions from <code>mpg</code>, we can use <code><a href="https://tibble.tidyverse.org/reference/tibble.html">tibble()</a></code> to create the data frame:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">label_info &lt;- tibble(
displ = Inf,
hwy = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy."
)
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
geom_text(data = label_info, aes(label = label), vjust = "top", hjust = "right")</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-15-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. On the top right corner, flush against the corner, is an annotation that reads &quot;increasing engine size is related to decreasing fuel economy&quot;. The text spans two lines." width="576"/></p>
</div>
</div>
<p>Alternatively, we can add the annotation without creating a new data frame, using <code><a href="https://ggplot2.tidyverse.org/reference/annotate.html">annotate()</a></code>. This function adds a geom to a plot, but it doesnt map variables of a data frame to an aesthetic. The first argument of this function, <code>geom</code>, is the geometric object you want to use for annotation.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
annotate(
geom = "text", x = Inf, y = Inf,
label = "Increasing engine size is \nrelated to decreasing fuel economy.",
vjust = "top", hjust = "right"
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-16-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. On the top right corner, flush against the corner, is an annotation that reads &quot;increasing engine size is related to decreasing fuel economy&quot;. The text spans two lines." width="576"/></p>
</div>
</div>
<p>You can also use a label geom instead of a text geom like we did earlier, set aesthetics like color. Another approach for drawing attention to a plot feature is using a segment geom with the <code>arrow</code> argument. The <code>x</code> and <code>y</code> aesthetics define the starting location of the segment and <code>xend</code> and <code>yend</code> to define the end location.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
annotate(
geom = "label", x = 3.5, y = 38,
label = "Increasing engine size is \nrelated to decreasing fuel economy.",
hjust = "left", color = "red"
) +
annotate(
geom = "segment",
x = 3, y = 35, xend = 5, yend = 25, color = "red",
arrow = arrow(type = "closed")
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-17-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. A red arrow pointing down follows the trend of the points and the annptation placed next to the arrow reads &quot;increasing engine size is related to decreasing fuel economy&quot;. The arrow and the annotation text is red." width="576"/></p>
</div>
</div>
<p>In these examples, we manually broke the label up into lines using <code>"\n"</code>. Another approach is to use <code><a href="https://stringr.tidyverse.org/reference/str_wrap.html">stringr::str_wrap()</a></code> to automatically add line breaks, given the number of characters you want per line:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">"Increasing engine size is related to decreasing fuel economy." |&gt;
str_wrap(width = 40) |&gt;
writeLines()
#&gt; Increasing engine size is related to
#&gt; decreasing fuel economy.</pre>
</div>
<p>Remember, in addition to <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code>, you have many other geoms in ggplot2 available to help annotate your plot. A couple ideas:</p>
<ul><li><p>Use <code><a href="https://ggplot2.tidyverse.org/reference/geom_abline.html">geom_hline()</a></code> and <code><a href="https://ggplot2.tidyverse.org/reference/geom_abline.html">geom_vline()</a></code> to add reference lines. We often make them thick (<code>linewidth = 2</code>) and white (<code>color = white</code>), and draw them underneath the primary data layer. That makes them easy to see, without drawing attention away from the data.</p></li>
2023-01-13 07:22:57 +08:00
<li><p>Use <code><a href="https://ggplot2.tidyverse.org/reference/geom_tile.html">geom_rect()</a></code> to draw a rectangle around points of interest. The boundaries of the rectangle are defined by aesthetics <code>xmin</code>, <code>xmax</code>, <code>ymin</code>, <code>ymax</code>.</p></li>
<li><p>Use <code><a href="https://ggplot2.tidyverse.org/reference/geom_segment.html">geom_segment()</a></code> with the <code>arrow</code> argument to draw attention to a point with an arrow. Use aesthetics <code>x</code> and <code>y</code> to define the starting location, and <code>xend</code> and <code>yend</code> to define the end location.</p></li>
</ul><p>The only limit is your imagination (and your patience with positioning annotations to be aesthetically pleasing)!</p>
<section id="communication-exercises-1" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Exercises</h2>
<ol type="1"><li><p>Use <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code> with infinite positions to place text at the four corners of the plot.</p></li>
<li><p>Use <code><a href="https://ggplot2.tidyverse.org/reference/annotate.html">annotate()</a></code> to add a point geom in the middle of your last plot without having to create a tibble. Customize the shape, size, or color of the point.</p></li>
<li><p>How do labels with <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_text()</a></code> interact with faceting? How can you add a label to a single facet? How can you put a different label in each facet? (Hint: Think about the underlying data.)</p></li>
<li><p>What arguments to <code><a href="https://ggplot2.tidyverse.org/reference/geom_text.html">geom_label()</a></code> control the appearance of the background box?</p></li>
<li><p>What are the four arguments to <code><a href="https://rdrr.io/r/grid/arrow.html">arrow()</a></code>? How do they work? Create a series of plots that demonstrate the most important options.</p></li>
</ol></section>
</section>
<section id="scales" data-type="sect1">
<h1>
Scales</h1>
<p>The third way you can make your plot better for communication is to adjust the scales. Scales control the mapping from data values to things that you can perceive.</p>
<section id="default-scales" data-type="sect2">
<h2>
Default scales</h2>
<p>Normally, ggplot2 automatically adds scales for you. For example, when you type:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))</pre>
</div>
<p>ggplot2 automatically adds default scales behind the scenes:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_color_discrete()</pre>
</div>
<p>Note the naming scheme for scales: <code>scale_</code> followed by the name of the aesthetic, then <code>_</code>, then the name of the scale. The default scales are named according to the type of variable they align with: continuous, discrete, datetime, or date. There are lots of non-default scales which youll learn about below.</p>
<p>The default scales have been carefully chosen to do a good job for a wide range of inputs. Nevertheless, you might want to override the defaults for two reasons:</p>
<ul><li><p>You might want to tweak some of the parameters of the default scale. This allows you to do things like change the breaks on the axes, or the key labels on the legend.</p></li>
<li><p>You might want to replace the scale altogether, and use a completely different algorithm. Often you can do better than the default because you know more about the data.</p></li>
</ul></section>
<section id="axis-ticks-and-legend-keys" data-type="sect2">
<h2>
Axis ticks and legend keys</h2>
<p>There are two primary arguments that affect the appearance of the ticks on the axes and the keys on the legend: <code>breaks</code> and <code>labels</code>. Breaks controls the position of the ticks, or the values associated with the keys. Labels controls the text label associated with each tick/key. The most common use of <code>breaks</code> is to override the default choice:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-21-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. The y-axis has breaks starting at 15 and ending at 40, increasing by 5." width="576"/></p>
</div>
</div>
<p>You can use <code>labels</code> in the same way (a character vector the same length as <code>breaks</code>), but you can also set it to <code>NULL</code> to suppress the labels altogether. This is useful for maps, or for publishing plots where you cant share the absolute numbers.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
scale_x_continuous(labels = NULL) +
scale_y_continuous(labels = NULL)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-22-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars. The x and y-axes do not have any labels at the axis ticks." width="576"/></p>
</div>
</div>
<p>The <code>labels</code> argument coupled with labelling functions from the scales package is also useful for formatting numbers as currency, percent, etc. The plot on the left shows default labelling with <code>label_dollar()</code>, which adds a dollar sign as well as a thousand separator comma. The plot on the right adds further customization by dividing dollar values by 1,000 and adding a suffix “K” (for “thousands”) as well as adding custom breaks. Note that <code>breaks</code> is in the original scale of the data.</p>
<div>
<pre data-type="programlisting" data-code-language="r"># Left
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot(alpha = 0.05) +
scale_y_continuous(labels = scales::label_dollar())
# Right
ggplot(diamonds, aes(x = cut, y = price)) +
geom_boxplot(alpha = 0.05) +
scale_y_continuous(
labels = scales::label_dollar(scale = 1/1000, suffix = "K"),
breaks = seq(1000, 19000, by = 6000)
)</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-23-1.png" alt="Two side-by-side box plots of price versus cut of diamonds. The outliers are transparent. On both plots the y-axis labels are formatted as dollars. The y-axis labels on the plot start at $0 and go to $15,000, increasing by $5,000. The y-axis labels on the right plot start at $1K and go to $19K, increasing by $6K." width="576"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-23-2.png" alt="Two side-by-side box plots of price versus cut of diamonds. The outliers are transparent. On both plots the y-axis labels are formatted as dollars. The y-axis labels on the plot start at $0 and go to $15,000, increasing by $5,000. The y-axis labels on the right plot start at $1K and go to $19K, increasing by $6K." width="576"/></p>
</div>
</div>
</div>
</div>
<p>Another handy label function is <code>label_percent()</code>:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(x = cut, fill = clarity)) +
geom_bar(position = "fill") +
scale_y_continuous(
name = "Percentage",
labels = scales::label_percent()
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-24-1.png" alt="Segmented bar plots of cut, filled with levels of clarity. The y-axis labels start at 0% and go to 100%, increasing by 25%. The y-axis label name is &quot;Percentage&quot;." width="576"/></p>
</div>
</div>
<p>You can also use <code>breaks</code> and <code>labels</code> to control the appearance of legends. Collectively axes and legends are called <strong>guides</strong>. Axes are used for x and y aesthetics; legends are used for everything else.</p>
<p>Another use of <code>breaks</code> is when you have relatively few data points and want to highlight exactly where the observations occur. For example, take this plot that shows when each US president started and ended their term.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">presidential |&gt;
mutate(id = 33 + row_number()) |&gt;
ggplot(aes(x = start, y = id)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_x_date(name = NULL, breaks = presidential$start, date_labels = "'%y")</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-25-1.png" alt="Line plot of id number of presidents versus the year they started their presidency. Start year is marked with a point and a segment that starts there and ends at the end of the presidency. The x-axis labels are formatted as two digit years starting with an apostrophe, e.g., '53." width="576"/></p>
</div>
</div>
<p>Note that the specification of breaks and labels for date and datetime scales is a little different:</p>
<ul><li><p><code>date_labels</code> takes a format specification, in the same form as <code><a href="https://readr.tidyverse.org/reference/parse_datetime.html">parse_datetime()</a></code>.</p></li>
<li><p><code>date_breaks</code> (not shown here), takes a string like “2 days” or “1 month”.</p></li>
</ul></section>
<section id="legend-layout" data-type="sect2">
<h2>
Legend layout</h2>
<p>You will most often use <code>breaks</code> and <code>labels</code> to tweak the axes. While they both also work for legends, there are a few other techniques you are more likely to use.</p>
<p>To control the overall position of the legend, you need to use a <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme()</a></code> setting. Well come back to themes at the end of the chapter, but in brief, they control the non-data parts of the plot. The theme setting <code>legend.position</code> controls where the legend is drawn:</p>
<div>
<pre data-type="programlisting" data-code-language="r">base &lt;- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class))
base + theme(legend.position = "left")
base + theme(legend.position = "top")
base + theme(legend.position = "bottom")
base + theme(legend.position = "right") # the default</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-26-1.png" alt="Four scatterplots of highway fuel efficiency versus engine size of cars where points are colored based on class of car. Clockwise, the legend is placed on the left, top, bottom, and right of the plot." width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-26-2.png" alt="Four scatterplots of highway fuel efficiency versus engine size of cars where points are colored based on class of car. Clockwise, the legend is placed on the left, top, bottom, and right of the plot." width="384"/></p>
</div>
</div>
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-26-3.png" alt="Four scatterplots of highway fuel efficiency versus engine size of cars where points are colored based on class of car. Clockwise, the legend is placed on the left, top, bottom, and right of the plot." width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-26-4.png" alt="Four scatterplots of highway fuel efficiency versus engine size of cars where points are colored based on class of car. Clockwise, the legend is placed on the left, top, bottom, and right of the plot." width="384"/></p>
</div>
</div>
</div>
</div>
<p>You can also use <code>legend.position = "none"</code> to suppress the display of the legend altogether.</p>
<p>To control the display of individual legends, use <code><a href="https://ggplot2.tidyverse.org/reference/guides.html">guides()</a></code> along with <code><a href="https://ggplot2.tidyverse.org/reference/guide_legend.html">guide_legend()</a></code> or <code><a href="https://ggplot2.tidyverse.org/reference/guide_colourbar.html">guide_colorbar()</a></code>. The following example shows two important settings: controlling the number of rows the legend uses with <code>nrow</code>, and overriding one of the aesthetics to make the points bigger. This is particularly useful if you have used a low <code>alpha</code> to display many points on a plot.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme(legend.position = "bottom") +
guides(color = guide_legend(nrow = 1, override.aes = list(size = 4)))
#&gt; `geom_smooth()` using method = 'loess' and formula = 'y ~ x'</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-27-1.png" alt="Scatterplot of highway fuel efficiency versus engine size of cars where points are colored based on class of car. Overlaid on the plot is a smooth curve. The legend is in the bottom and classes are listed horizontally in a row. The points in the legend are larger than the points in the plot." width="576"/></p>
</div>
</div>
</section>
<section id="replacing-a-scale" data-type="sect2">
<h2>
Replacing a scale</h2>
<p>Instead of just tweaking the details a little, you can instead replace the scale altogether. There are two types of scales youre mostly likely to want to switch out: continuous position scales and color scales. Fortunately, the same principles apply to all the other aesthetics, so once youve mastered position and color, youll be able to quickly pick up other scale replacements.</p>
<p>Its very useful to plot transformations of your variable. For example, its easier to see the precise relationship between <code>carat</code> and <code>price</code> if we log transform them:</p>
<div>
<pre data-type="programlisting" data-code-language="r"># Left
ggplot(diamonds, aes(x = carat, y = price)) +
geom_bin2d()
# Right
ggplot(diamonds, aes(x = log10(carat), y = log10(price))) +
geom_bin2d()</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-28-1.png" alt="Two plots of price versus carat of diamonds. Data binned and the color of the rectangles representing each bin based on the number of points that fall into that bin. In the plot on the right, price and carat values are logged and the axis labels shows the logged values." width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-28-2.png" alt="Two plots of price versus carat of diamonds. Data binned and the color of the rectangles representing each bin based on the number of points that fall into that bin. In the plot on the right, price and carat values are logged and the axis labels shows the logged values." width="384"/></p>
</div>
</div>
</div>
</div>
<p>However, the disadvantage of this transformation is that the axes are now labelled with the transformed values, making it hard to interpret the plot. Instead of doing the transformation in the aesthetic mapping, we can instead do it with the scale. This is visually identical, except the axes are labelled on the original data scale.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(x = carat, y = price)) +
geom_bin2d() +
scale_x_log10() +
scale_y_log10()</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-29-1.png" alt="Plot of price versus carat of diamonds. Data binned and the color of the rectangles representing each bin based on the number of points that fall into that bin. The axis labels are on the original data scale." width="576"/></p>
</div>
</div>
<p>Another scale that is frequently customized is color. The default categorical scale picks colors that are evenly spaced around the color wheel. Useful alternatives are the ColorBrewer scales which have been hand tuned to work better for people with common types of color blindness. The two plots below look similar, but there is enough difference in the shades of red and green that the dots on the right can be distinguished even by people with red-green color blindness.</p>
<div>
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv))
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv)) +
scale_color_brewer(palette = "Set1")</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-30-1.png" alt="Two scatterplots of highway mileage versus engine size where points are colored by drive type. The plot on the left uses the default ggplot2 color palette and the plot on the right uses a different color palette." width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-30-2.png" alt="Two scatterplots of highway mileage versus engine size where points are colored by drive type. The plot on the left uses the default ggplot2 color palette and the plot on the right uses a different color palette." width="384"/></p>
</div>
</div>
</div>
</div>
<p>Dont forget simpler techniques. If there are just a few colors, you can add a redundant shape mapping. This will also help ensure your plot is interpretable in black and white.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = drv, shape = drv)) +
scale_color_brewer(palette = "Set1")</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-31-1.png" alt="Two scatterplots of highway mileage versus engine size where both color and shape of points are based on drive type. The color palette is not the default ggplot2 palette." width="576"/></p>
</div>
</div>
<p>The ColorBrewer scales are documented online at <a href="https://colorbrewer2.org/" class="uri">https://colorbrewer2.org/</a> and made available in R via the <strong>RColorBrewer</strong> package, by Erich Neuwirth. <a href="#fig-brewer" data-type="xref">#fig-brewer</a> shows the complete list of all palettes. The sequential (top) and diverging (bottom) palettes are particularly useful if your categorical values are ordered, or have a “middle”. This often arises if youve used <code><a href="https://rdrr.io/r/base/cut.html">cut()</a></code> to make a continuous variable into a categorical variable.</p>
<div class="cell">
<div class="cell-output-display">
<figure id="fig-brewer"><p><img src="communication_files/figure-html/fig-brewer-1.png" alt="All colorBrewer scales. One group goes from light to dark colors. Another group is a set of non ordinal colors. And the last group has diverging scales (from dark to light to dark again). Within each set there are a number of palettes." width="576"/></p>
<figcaption>All colorBrewer scales.</figcaption>
</figure>
</div>
</div>
<p>When you have a predefined mapping between values and colors, use <code><a href="https://ggplot2.tidyverse.org/reference/scale_manual.html">scale_color_manual()</a></code>. For example, if we map presidential party to color, we want to use the standard mapping of red for Republicans and blue for Democrats:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">presidential |&gt;
mutate(id = 33 + row_number()) |&gt;
ggplot(aes(x = start, y = id, color = party)) +
geom_point() +
geom_segment(aes(xend = end, yend = id)) +
scale_color_manual(values = c(Republican = "red", Democratic = "blue"))</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-33-1.png" alt="Line plot of id number of presidents versus the year they started their presidency. Start year is marked with a point and a segment that starts there and ends at the end of the presidency. Democratic presidents are represented in black and Republicans in red." width="576"/></p>
</div>
</div>
<p>For continuous color, you can use the built-in <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_color_gradient()</a></code> or <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_fill_gradient()</a></code>. If you have a diverging scale, you can use <code><a href="https://ggplot2.tidyverse.org/reference/scale_gradient.html">scale_color_gradient2()</a></code>. That allows you to give, for example, positive and negative values different colors. Thats sometimes also useful if you want to distinguish points above or below the mean.</p>
<p>Another option is to use the viridis color scales. The designers, Nathaniel Smith and Stéfan van der Walt, carefully tailored continuous color schemes that are perceptible to people with various forms of color blindness as well as perceptually uniform in both color and black and white. These scales are available as continuous (<code>c</code>), discrete (<code>d</code>), and binned (<code>b</code>) palettes in ggplot2.</p>
<div>
<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
labs(title = "Default, continuous")
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
scale_fill_viridis_c() +
labs(title = "Viridis, continuous")
ggplot(df, aes(x, y)) +
geom_hex() +
coord_fixed() +
scale_fill_viridis_b() +
labs(title = "Viridis, binned")</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-34-1.png" alt="Three hex plots where the color of the hexes show the number of observations that fall into that hex bin. The first plot uses the default, continuous ggplot2 scale. The second plot uses the viridis, continuous scale, and the third plot uses the viridis, binned scale." width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-34-2.png" alt="Three hex plots where the color of the hexes show the number of observations that fall into that hex bin. The first plot uses the default, continuous ggplot2 scale. The second plot uses the viridis, continuous scale, and the third plot uses the viridis, binned scale." width="384"/></p>
</div>
</div>
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-34-3.png" alt="Three hex plots where the color of the hexes show the number of observations that fall into that hex bin. The first plot uses the default, continuous ggplot2 scale. The second plot uses the viridis, continuous scale, and the third plot uses the viridis, binned scale." width="384"/></p>
</div>
</div>
</div>
</div>
<p>Note that all color scales come in two variety: <code>scale_color_x()</code> and <code>scale_fill_x()</code> for the <code>color</code> and <code>fill</code> aesthetics respectively (the color scales are available in both UK and US spellings).</p>
</section>
<section id="zooming" data-type="sect2">
<h2>
Zooming</h2>
<p>There are three ways to control the plot limits:</p>
<ol type="1"><li>Adjusting what data are plotted.</li>
<li>Setting the limits in each scale.</li>
<li>Setting <code>xlim</code> and <code>ylim</code> in <code><a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html">coord_cartesian()</a></code>.</li>
</ol><p>To zoom in on a region of the plot, its generally best to use <code><a href="https://ggplot2.tidyverse.org/reference/coord_cartesian.html">coord_cartesian()</a></code>. Compare the following two plots:</p>
<div>
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
mpg |&gt;
filter(displ &gt;= 5, displ &lt;= 7, hwy &gt;= 10, hwy &lt;= 30) |&gt;
ggplot(aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth()</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-35-1.png" width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-35-2.png" width="384"/></p>
</div>
</div>
</div>
</div>
<p>You can also set the <code>limits</code> on individual scales. Reducing the limits is basically equivalent to subsetting the data. It is generally more useful if you want to <em>expand</em> the limits, for example, to match scales across different plots. For example, if we extract two classes of cars and plot them separately, its difficult to compare the plots because all three scales (the x-axis, the y-axis, and the color aesthetic) have different ranges.</p>
2023-01-13 07:22:57 +08:00
<div>
<pre data-type="programlisting" data-code-language="r">suv &lt;- mpg |&gt; filter(class == "suv")
compact &lt;- mpg |&gt; filter(class == "compact")
ggplot(suv, aes(x = displ, y = hwy, color = drv)) +
geom_point()
ggplot(compact, aes(x = displ, y = hwy, color = drv)) +
geom_point()</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-36-1.png" width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-36-2.png" width="384"/></p>
</div>
</div>
</div>
</div>
<p>One way to overcome this problem is to share scales across multiple plots, training the scales with the <code>limits</code> of the full data.</p>
<div>
<pre data-type="programlisting" data-code-language="r">x_scale &lt;- scale_x_continuous(limits = range(mpg$displ))
y_scale &lt;- scale_y_continuous(limits = range(mpg$hwy))
col_scale &lt;- scale_color_discrete(limits = unique(mpg$drv))
ggplot(suv, aes(x = displ, y = hwy, color = drv)) +
geom_point() +
x_scale +
y_scale +
col_scale
ggplot(compact, aes(x = displ, y = hwy, color = drv)) +
geom_point() +
x_scale +
y_scale +
col_scale</pre>
<div class="cell quarto-layout-panel">
<div class="quarto-layout-row quarto-layout-valign-top">
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-37-1.png" width="384"/></p>
</div>
<div class="cell-output-display quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="communication_files/figure-html/unnamed-chunk-37-2.png" width="384"/></p>
</div>
</div>
</div>
</div>
<p>In this particular case, you could have simply used faceting, but this technique is useful more generally, if for instance, you want to spread plots over multiple pages of a report.</p>
</section>
<section id="communication-exercises-2" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Exercises</h2>
<ol type="1"><li>
<p>Why doesnt the following code override the default scale?</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">df &lt;- tibble(
x = rnorm(10000),
y = rnorm(10000)
)
ggplot(df, aes(x, y)) +
geom_hex() +
scale_color_gradient(low = "white", high = "red") +
coord_fixed()</pre>
</div>
</li>
<li><p>What is the first argument to every scale? How does it compare to <code><a href="https://ggplot2.tidyverse.org/reference/labs.html">labs()</a></code>?</p></li>
<li>
<p>Change the display of the presidential terms by:</p>
<ol type="a"><li>Combining the two variants shown above.</li>
<li>Improving the display of the y axis.</li>
<li>Labelling each term with the name of the president.</li>
<li>Adding informative plot labels.</li>
<li>Placing breaks every 4 years (this is trickier than it seems!).</li>
</ol></li>
<li>
<p>Use <code>override.aes</code> to make the legend on the following plot easier to see.</p>
<div class="cell" data-fig.format="png">
<pre data-type="programlisting" data-code-language="r">ggplot(diamonds, aes(x = carat, y = price)) +
geom_point(aes(color = cut), alpha = 1/20)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-39-1.png" style="width:50.0%" alt="Scatterplot of price versus carat of diamonds. The points are colored by cut of the diamonds and they're very transparent."/></p>
</div>
</div>
</li>
</ol></section>
</section>
<section id="sec-themes" data-type="sect1">
<h1>
Themes</h1>
<p>Finally, you can customize the non-data elements of your plot with a theme:</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
theme_bw()</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-40-1.png" width="576"/></p>
</div>
</div>
<p>ggplot2 includes eight themes by default, as shown in <a href="#fig-themes" data-type="xref">#fig-themes</a>. Many more are included in add-on packages like <strong>ggthemes</strong> (<a href="https://jrnold.github.io/ggthemes" class="uri">https://jrnold.github.io/ggthemes</a>), by Jeffrey Arnold. You can also create your own themes, if you are trying to match a particular corporate or journal style.</p>
<div class="cell">
<div class="cell-output-display">
<figure id="fig-themes"><p><img src="images/visualization-themes.png" alt="Eight barplots created with ggplot2, each with one of the eight built-in themes: theme_bw() - White background with grid lines, theme_light() - Light axes and grid lines, theme_classic() - Classic theme, axes but no grid lines, theme_linedraw() - Only black lines, theme_dark() - Dark background for contrast, theme_minimal() - Minimal theme, no background, theme_gray() - Gray background (default theme), theme_void() - Empty theme, only geoms are visible." width="1600"/></p>
<figcaption>The eight themes built-in to ggplot2.</figcaption>
</figure>
</div>
</div>
<p>Many people wonder why the default theme has a gray background. This was a deliberate choice because it puts the data forward while still making the grid lines visible. The white grid lines are visible (which is important because they significantly aid position judgments), but they have little visual impact and we can easily tune them out. The grey background gives the plot a similar typographic color to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background. Finally, the grey background creates a continuous field of color which ensures that the plot is perceived as a single visual entity.</p>
<p>Its also possible to control individual components of each theme, like the size and color of the font used for the y axis. Weve already seen that <code>legend.position</code> controls where the legend is drawn. There are many other aspects of the legend that can be customized with <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme()</a></code>. For example, in the plot below we change the direction of the legend as well as put a black border around it. A few other helpful <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme()</a></code> components are use to change the placement for format of the title and caption text.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">ggplot(mpg, aes(x = displ, y = hwy, color = drv)) +
geom_point() +
labs(
title = "Highway mileage decreases as engine size increases",
caption = "Source: https://fueleconomy.gov."
) +
theme(
legend.position = c(0.6, 0.7),
legend.direction = "horizontal",
legend.box.background = element_rect(color = "black"),
plot.title = element_text(face = "bold"),
plot.title.position = "plot",
plot.caption.position = "plot",
plot.caption = element_text(hjust = 0)
)</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-42-1.png" width="576"/></p>
</div>
</div>
<p>For an overview of all <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">theme()</a></code> components, see help with <code><a href="https://ggplot2.tidyverse.org/reference/theme.html">?theme</a></code>. The <a href="https://ggplot2-book.org/">ggplot2 book</a> is also a great place to go for the full details on theming.</p>
<section id="communication-exercises-3" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Exercises</h2>
<ol type="1"><li>Pick a theme offered by the ggthemes package and apply it to the last plot you made.</li>
<li>Make the axis labels of your plot blue and bolded.</li>
</ol></section>
</section>
<section id="layout" data-type="sect1">
<h1>
Layout</h1>
<p>So far we talked about how to create and modify a single plot. What if you have multiple plots you want to lay out in a certain way? The patchwork package allows you to combine separate plots into the same graphic. We loaded this package earlier in the chapter.</p>
<p>To place two plots next to each other, you can simply add them to each other. Note that you first need to create the plots and save them as objects (in the following example theyre called <code>p1</code> and <code>p2</code>). Then, you place them next to each other with <code>+</code>.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">p1 &lt;- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
labs(title = "Plot 1")
p2 &lt;- ggplot(mpg, aes(x = drv, y = hwy)) +
geom_boxplot() +
labs(title = "Plot 2")
p1 + p2</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-43-1.png" alt="Two plots (a scatterplot of highway mileage versus engine size and a side-by-side boxplots of highway mileage versus drive train) placed next to each other." width="576"/></p>
</div>
</div>
<p>Its important to note that in the above code chunk we did not use a new function from the patchwork package. Instead, the package added a new functionality to the <code>+</code> operator.</p>
<p>You can also create arbitrary plot layouts with patchwork. In the following, <code>|</code> places the <code>p1</code> and <code>p3</code> next to each other and <code>/</code> moves <code>p2</code> to the next line.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">p3 &lt;- ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
labs(title = "Plot 3")
(p1 | p3) / p2</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-44-1.png" alt="Three plots laid out such that first and third plot are next to each other and the second plot streatched beneath them. The first plot is a scatterplot of highway mileage versus engine size, third plot is a scatterplot of highway mileage versus city mileage, and the third plot is side-by-side boxplots of highway mileage versus drive train) placed next to each other." width="576"/></p>
</div>
</div>
<p>Additionally, patchwork allows you to collect legends from multiple plots into one common legend, customize the placement of the legend as well as dimensions of the plots, and add a common title, subtitle, caption, etc. to your plots. In the following, we have 5 plots. We have turned off the legends on the box plots and the scatterplot and collected the legends for the density plots at the top of the plot with <code>&amp; theme(legend.position = "top")</code>. Note the use of the <code>&amp;</code> operator here instead of the usual <code>+</code>. This is because were modifying the theme for the patchwork plot as opposed to the individual ggplots. The legend is placed on top, inside the <code><a href="https://patchwork.data-imaginist.com/reference/guide_area.html">guide_area()</a></code>. Finally, we have also customized the heights of the various components of our patchwork the guide has a height of 1, the box plots 3, density plots 2, and the faceted scatter plot 4. Patchwork divides up the area you have allotted for your plot using this scale and places the components accordingly.</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">p1 &lt;- ggplot(mpg, aes(x = drv, y = cty, color = drv)) +
geom_boxplot(show.legend = FALSE) +
labs(title = "Plot 1")
p2 &lt;- ggplot(mpg, aes(x = drv, y = hwy, color = drv)) +
geom_boxplot(show.legend = FALSE) +
labs(title = "Plot 2")
p3 &lt;- ggplot(mpg, aes(x = cty, color = drv, fill = drv)) +
geom_density(alpha = 0.5) +
labs(title = "Plot 3")
p4 &lt;- ggplot(mpg, aes(x = hwy, color = drv, fill = drv)) +
geom_density(alpha = 0.5) +
labs(title = "Plot 4")
p5 &lt;- ggplot(mpg, aes(x = cty, y = hwy, color = drv)) +
geom_point(show.legend = FALSE) +
facet_wrap(~drv) +
labs(title = "Plot 5")
(guide_area() / (p1 + p2) / (p3 + p4) / p5) +
plot_annotation(
title = "City and highway mileage for cars with different drive trains",
caption = "Source: Source: https://fueleconomy.gov."
) +
plot_layout(
guides = "collect",
heights = c(1, 3, 2, 4)
) &amp;
theme(legend.position = "top")</pre>
2023-01-13 07:22:57 +08:00
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-45-1.png" alt="Five plots laid out such that first two plots are next to each other. Plots three and four are underneath them. And the fifth plot stretches under them. The patchworked plot is titled &quot;City and highway mileage for cars with different drive trains&quot; and captioned &quot;Source: Source: https://fueleconomy.gov&quot;. The first two plots are side-by-side box plots. Plots 3 and 4 are density plots. And the fifth plot is a faceted scatterplot. Each of these plots show geoms colored by drive train, but the patchworked plot has only one legend that applies to all of them, above the plots and beneath the title." width="576"/></p>
</div>
</div>
<p>If youd like to learn more about combining and layout out multiple plots with patchwork, we recommend looking through the guides on the package website: <a href="https://patchwork.data-imaginist.com" class="uri">https://patchwork.data-imaginist.com</a>.</p>
<section id="communication-exercises-4" data-type="sect2">
2023-01-13 07:22:57 +08:00
<h2>
Exercises</h2>
<ol type="1"><li>
<p>What happens if you omit the parentheses in the following plot layout. Can you explain why this happens?</p>
<div class="cell">
<pre data-type="programlisting" data-code-language="r">p1 &lt;- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
labs(title = "Plot 1")
p2 &lt;- ggplot(mpg, aes(x = drv, y = hwy)) +
geom_boxplot() +
labs(title = "Plot 2")
p3 &lt;- ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point() +
labs(title = "Plot 3")
(p1 | p2) / p3</pre>
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-46-1.png" width="576"/></p>
</div>
</div>
</li>
<li>
<p>Using the three plots from the previous exercise, recreate the following patchwork.</p>
<div class="cell">
<div class="cell-output-display">
<p><img src="communication_files/figure-html/unnamed-chunk-47-1.png" alt="Three plots: Plot 1 is a scatterplot of highway mileage versus engine size. Plot 2 is side-by-side box plots of highway mileage versus drive train. Plot 3 is side-by-side box plots of city mileage versus drive train. Plots 1 is on the first row. Plots 2 and 3 are on the next row, each span half the width of Plot 1. Plot 1 is labelled &quot;Fig. A&quot;, Plot 2 is labelled &quot;Fig. B&quot;, and Plot 3 is labelled &quot;Fig. C&quot;." width="576"/></p>
</div>
</div>
</li>
</ol></section>
</section>
<section id="communication-summary" data-type="sect1">
2023-01-13 07:22:57 +08:00
<h1>
Summary</h1>
<p>In this chapter youve learned about adding plot labels such as title, subtitle, caption as well as modifying default axis labels, using annotation to add informational text to your plot or to highlight specific data points, customizing the axis scales, and changing the theme of your plot. Youve also learned about combining multiple plots in a single graph using both simple and complex plot layouts.</p>
<p>While youve so far learned about how to make many different types of plots and how to customize them using a variety of techniques, weve barely scratched the surface of what you can create with ggplot2. If you want to get a comprehensive understanding of ggplot2, we recommend reading the book, <a href="https://ggplot2-book.org"><em>ggplot2: Elegant Graphics for Data Analysis</em></a>. Other useful resources are the <a href="https://r-graphics.org"><em>R Graphics Cookbook</em></a> by Winston Chang and <a href="https://clauswilke.com/dataviz/"><em>Fundamentals of Data Visualization</em></a> by Claus Wilke.</p>
</section>
</section>