-Figure 6.2: A line plot showing how the rank of a song changes over time.
+
+A line plot showing how the rank of a song changes over time.
@@ -315,8 +315,8 @@ How does pivoting work?
-
-Figure 6.3: Columns that are already variables need to be repeated, once for each column that is pivotted.
+
+Columns that are already variables need to be repeated, once for each column that is pivotted.
@@ -324,8 +324,8 @@ How does pivoting work?
-
-Figure 6.4: The column names of pivoted columns become a new column.
+
+The column names of pivoted columns become a new column.
@@ -333,8 +333,8 @@ How does pivoting work?
-
-Figure 6.5: The number of values is preserved (not repeated), but unwound row-by-row.
+
+The number of values is preserved (not repeated), but unwound row-by-row.
@@ -389,8 +389,8 @@ Many variables in column names
-
-Figure 6.6: Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.
+
+Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.
@@ -439,8 +439,8 @@ Data and variable names in the column headers
-
-Figure 6.7: Pivoting with names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.
+
+Pivoting with names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.
-Figure 2.1: R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.
+
+R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.
@@ -499,8 +499,8 @@ Statistical transformations
-
-Figure 2.2: When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.
+
+When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.
diff --git a/oreilly/intro.html b/oreilly/intro.html
index 5735e90..3d940dd 100644
--- a/oreilly/intro.html
+++ b/oreilly/intro.html
@@ -7,8 +7,8 @@ What you will learn
-
-Figure 1.1: In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.
+
+In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.
@@ -79,8 +79,8 @@ RStudio
-
-Figure 1.2: The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.
+
+The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.
diff --git a/oreilly/joins.html b/oreilly/joins.html
index 5e41366..cd0280a 100644
--- a/oreilly/joins.html
+++ b/oreilly/joins.html
@@ -116,8 +116,8 @@ Primary and foreign keys
-
-Figure 19.1: Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.
+
+Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.
@@ -500,8 +500,8 @@ y <- tribble(
-
-Figure 19.2: Graphical representation of two simple tables. The coloured key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.
+
+Graphical representation of two simple tables. The coloured key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.
@@ -509,8 +509,8 @@ y <- tribble(
-
-Figure 19.3: To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.
+
+To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.
@@ -518,8 +518,8 @@ y <- tribble(
-
-Figure 19.4: An inner join matches each row in x to the row in y that has the same value of key. Each match becomes a row in the output.
+
+An inner join matches each row in x to the row in y that has the same value of key. Each match becomes a row in the output.x to the row in y that has the same value of key. Each match becomes a row in the output.
@@ -529,8 +529,8 @@ y <- tribble(
-
-Figure 19.5: A visual representation of the left join where every row in x appears in the output.
+
+A visual representation of the left join where every row in x appears in the output.x appears in the output.
@@ -540,8 +540,8 @@ y <- tribble(
-
-Figure 19.6: A visual representation of the right join where every row of y appears in the output.
+
+A visual representation of the right join where every row of y appears in the output.y appears in the output.
@@ -551,8 +551,8 @@ y <- tribble(
-
-Figure 19.7: A visual representation of the full join where every row in x and y appears in the output.
+
+A visual representation of the full join where every row in x and y appears in the output.x and y appears in the output.
@@ -561,8 +561,8 @@ y <- tribble(
-
-Figure 19.8: Venn diagrams showing the difference between inner, left, right, and full joins.
+
+Venn diagrams showing the difference between inner, left, right, and full joins.
@@ -574,8 +574,8 @@ Row matching
-
-Figure 19.9: The three ways a row in x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.
+
+The three ways a row in x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.
@@ -683,16 +683,16 @@ Filtering joins
-
-Figure 19.10: In a semi-join it only matters that there is a match; otherwise values in y don’t affect the output.
+
+In a semi-join it only matters that there is a match; otherwise values in y don’t affect the output.y don’t affect the output.
-
-Figure 19.11: An anti-join is the inverse of a semi-join, dropping rows from x that have a match in y.
+
+An anti-join is the inverse of a semi-join, dropping rows from x that have a match in y.x that have a match in y.
@@ -716,8 +716,8 @@ Non-equi joins
-
-Figure 19.12: An left join showing both x and y keys in the output.
+
+An left join showing both x and y keys in the output.x and y keys in the output.
@@ -725,8 +725,8 @@ Non-equi joins
-
-Figure 19.13: A non-equi join where the x key must greater than or equal to than the y key. Many rows generate multiple matches.
+
+A non-equi join where the x key must greater than or equal to than the y key. Many rows generate multiple matches.x key must greater than or equal to than the y key. Many rows generate multiple matches.
@@ -748,8 +748,8 @@ Cross joins
-
-Figure 19.14: A cross join matches each row in x with every row in y.
+
+A cross join matches each row in x with every row in y.x with every row in y.
@@ -777,8 +777,8 @@ Inequality joins
-
-Figure 19.15: An inequality join where x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.
+
+An inequality join where x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.
@@ -807,8 +807,8 @@ Rolling joins
-
-Figure 19.16: A following join is similar to a greater-than-or-equal inequality join but only matches the first value.
+
+A following join is similar to a greater-than-or-equal inequality join but only matches the first value.
-Figure 12.1: The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.
+
+The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.
-Figure 13.1: A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.
+
+A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.
@@ -641,8 +641,8 @@ Center
#> ℹ Please use `linewidth` instead.
-
-Figure 13.2: A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.
+
+A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.
@@ -716,18 +716,18 @@ flights |>
-
-(a) Histogram shows the full range of delays.
+
+(a) Histogram shows the full range of delays.
-
-(b) Histogram is zoomed in to show delays less than 2 hours.
+
+(b) Histogram is zoomed in to show delays less than 2 hours.
-Figure 13.3: The distribution of dep_delay appears highly skewed to the right in both histograms.
+Figure 13.3: The distribution of dep_delay appears highly skewed to the right in both histograms.
It’s also a good idea to check that distributions for subgroups resemble the whole. #fig-flights-dist-daily overlays a frequency polygon for each day. The distributions seem to follow a common pattern, suggesting it’s fine to use the same summary for each day.
-Figure 13.4: 365 frequency polygons of dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.
+
+365 frequency polygons of dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.
-Figure 22.1: The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.
+
+The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.
-
-Figure 22.2: Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.
+
+Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.
-
-Figure 22.3: You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case x4[[2]][[2]][[2]].
+
+You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case x4[[2]][[2]][[2]].x4[[2]][[2]][[2]].
-Figure 15.1: A time series showing the proportion of baby names that contain a lower case “x”.
+A time series showing the proportion of baby names that contain a lower case “x”.
-Figure 5.1: To insert |>, make sure the “Use native pipe operator” option is checked.|>, make sure the “Use native pipe operator” option is checked.
+To insert |>, make sure the “Use native pipe operator” option is checked.|>, make sure the “Use native pipe operator” option is checked.
-Figure 9.1: Opening the script editor adds a new pane at the top-left of the IDE.
+
+Copy these options in your RStudio options to always start your RStudio session with a clean slate.
@@ -115,8 +115,8 @@ What is the source of truth?
-
-Figure 9.2: Copy these options in your RStudio options to always start your RStudio session with a clean slate.
+
+(a) First click New Directory.
@@ -159,22 +159,22 @@ Where does your analysis live?
RStudio projects
Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via projects. Let’s make a project for you to use while you’re working through the rest of this book. Click File > New Project, then follow the steps shown in #fig-new-project.
-
+
-
-(a) First click New Directory.
+
+(a) First click New Directory.
-
-(b) Then click New Project.
+
+(b) Then click New Project.
-
-(c) Finally, fill in the directory (project) name, choose a good subdirectory for its home and click Create Project.
+
+Opening the script editor adds a new pane at the top-left of the IDE.
Figure 9.3: Create a new project by following these three steps.
diff --git a/oreilly/workflow-style.html b/oreilly/workflow-style.html
index dba3248..01e238e 100644
--- a/oreilly/workflow-style.html
+++ b/oreilly/workflow-style.html
@@ -10,8 +10,8 @@
Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. Even as a very new programmer it’s a good idea to work on your code style. Using a consistent style makes it easier for others (including future-you!) to read your work, and is particularly important if you need to get help from someone else. This chapter will introduce to the most important points of the tidyverse style guide, which is used throughout this book.
Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature. Additionally, there are some great tools to quickly restyle existing code, like the styler package by Lorenz Walthert. Once you’ve installed it with install.packages("styler"), an easy way to use it is via RStudio’s command palette. The command palette lets you use any build-in RStudio command, as well as many addins provided by packages. Open the palette by pressing Cmd/Ctrl + Shift + P, then type “styler” to see all the shortcuts provided by styler. #fig-styler shows the results.
-
-Figure 7.1: RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.
+
+RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.
@@ -180,8 +180,8 @@ Sectioning comments
-
-Figure 7.2: After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.
+
+After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.
diff --git a/preface-2e.qmd b/preface-2e.qmd
index ec2dece..468439d 100644
--- a/preface-2e.qmd
+++ b/preface-2e.qmd
@@ -22,4 +22,3 @@ Welcome to the second edition of "R for Data Science".
## Acknowledgements {.unnumbered}
*TO DO: Add acknowledgements.*
-