From 89a854b7d04a37d6ce5b2f11471aa7d1f423005d Mon Sep 17 00:00:00 2001 From: Hadley Wickham Date: Fri, 18 Nov 2022 10:42:43 -0600 Subject: [PATCH] Fix figure manipulation --- oreilly/base-R.html | 12 +++---- oreilly/communicate-plots.html | 12 +++---- oreilly/data-tidy.html | 28 +++++++-------- oreilly/data-visualize.html | 8 ++--- oreilly/intro.html | 8 ++--- oreilly/joins.html | 64 +++++++++++++++++----------------- oreilly/logicals.html | 8 ++--- oreilly/numbers.html | 22 ++++++------ oreilly/rectangling.html | 12 +++---- oreilly/regexps.html | 2 +- oreilly/spreadsheets.html | 20 +++++------ oreilly/workflow-pipes.html | 2 +- oreilly/workflow-scripts.html | 22 ++++++------ oreilly/workflow-style.html | 8 ++--- preface-2e.qmd | 1 - 15 files changed, 114 insertions(+), 115 deletions(-) diff --git a/oreilly/base-R.html b/oreilly/base-R.html index 422713e..27cdbcf 100644 --- a/oreilly/base-R.html +++ b/oreilly/base-R.html @@ -333,24 +333,24 @@ str(l$a)
-

A photo of a glass pepper shaker. Instead of the pepper shaker containing pepper, it contains many packets of pepper.

-
Figure 26.1: A pepper shaker that Hadley once found in his hotel room.
+

A photo of a glass pepper shaker. Instead of the pepper shaker containing pepper, it contains many packets of pepper.

+
A pepper shaker that Hadley once found in his hotel room.
-

A photo of the glass pepper shaker containing just one packet of pepper.

-
Figure 26.2: pepper[1]
+

A photo of the glass pepper shaker containing just one packet of pepper.

+
pepper[1]pepper[1]
-

A photo of single packet of pepper.

-
Figure 26.3: pepper[[1]]
+

A photo of single packet of pepper.

+
pepper[[1]]pepper[[1]]
diff --git a/oreilly/communicate-plots.html b/oreilly/communicate-plots.html index add1f3f..eb18d33 100644 --- a/oreilly/communicate-plots.html +++ b/oreilly/communicate-plots.html @@ -204,8 +204,8 @@ ggplot(mpg, aes(displ, hwy)) +
-

-
Figure 28.1: All nine combinations of hjust and vjust.hjust and vjust.
+

+
All nine combinations of hjust and vjust.hjust and vjust.
@@ -398,8 +398,8 @@ ggplot(mpg, aes(displ, hwy)) +
-

-
Figure 28.2: All ColourBrewer scales.
+

+
All ColourBrewer scales.
@@ -591,8 +591,8 @@ Themes
-

Eight barplots created with ggplot2, each with one of the eight built-in themes: theme_bw() - White background with grid lines, theme_light() - Light axes and grid lines, theme_classic() - Classic theme, axes but no grid lines, theme_linedraw() - Only black lines, theme_dark() - Dark background for contrast, theme_minimal() - Minimal theme, no background, theme_gray() - Gray background (default theme), theme_void() - Empty theme, only geoms are visible.

-
Figure 28.3: The eight themes built-in to ggplot2.
+

Eight barplots created with ggplot2, each with one of the eight built-in themes: theme_bw() - White background with grid lines, theme_light() - Light axes and grid lines, theme_classic() - Classic theme, axes but no grid lines, theme_linedraw() - Only black lines, theme_dark() - Dark background for contrast, theme_minimal() - Minimal theme, no background, theme_gray() - Gray background (default theme), theme_void() - Empty theme, only geoms are visible.

+
The eight themes built-in to ggplot2.
diff --git a/oreilly/data-tidy.html b/oreilly/data-tidy.html index 4a87acc..4978583 100644 --- a/oreilly/data-tidy.html +++ b/oreilly/data-tidy.html @@ -97,8 +97,8 @@ table4b # population
-

Three panels, each representing a tidy data frame. The first panel shows that each variable is a column. The second panel shows that each observation is a row. The third panel shows that each value is a cell.

-
Figure 6.1: The following three rules make a dataset tidy: variables are columns, observations are rows, and values are cells.
+

Three panels, each representing a tidy data frame. The first panel shows that each variable is a column. The second panel shows that each observation is a row. The third panel shows that each value is a cell.

+
The following three rules make a dataset tidy: variables are columns, observations are rows, and values are cells.
@@ -274,8 +274,8 @@ billboard_tidy scale_y_reverse()
-

A line plot with week on the x-axis and rank on the y-axis, where each line represents a song. Most songs appear to start at a high rank, rapidly accelerate to a low rank, and then decay again. There are suprisingly few tracks in the region when week is >20 and rank is >50.

-
Figure 6.2: A line plot showing how the rank of a song changes over time.
+

A line plot with week on the x-axis and rank on the y-axis, where each line represents a song. Most songs appear to start at a high rank, rapidly accelerate to a low rank, and then decay again. There are suprisingly few tracks in the region when week is >20 and rank is >50.

+
A line plot showing how the rank of a song changes over time.
@@ -315,8 +315,8 @@ How does pivoting work?
-

A diagram showing how `pivot_longer()` transforms a simple dataset, using color to highlight how the values in the `var` column ("A", "B", "C") are each repeated twice in the output because there are two columns being pivotted ("col1" and "col2").

-
Figure 6.3: Columns that are already variables need to be repeated, once for each column that is pivotted.
+

A diagram showing how `pivot_longer()` transforms a simple dataset, using color to highlight how the values in the `var` column ("A", "B", "C") are each repeated twice in the output because there are two columns being pivotted ("col1" and "col2").

+
Columns that are already variables need to be repeated, once for each column that is pivotted.
@@ -324,8 +324,8 @@ How does pivoting work?
-

A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("col1" and "col2") become the values in a new `var` column. They are repeated three times because there were three rows in the input.

-
Figure 6.4: The column names of pivoted columns become a new column.
+

A diagram showing how `pivot_longer()` transforms a simple data set, using color to highlight how column names ("col1" and "col2") become the values in a new `var` column. They are repeated three times because there were three rows in the input.

+
The column names of pivoted columns become a new column.
@@ -333,8 +333,8 @@ How does pivoting work?
-

A diagram showing how `pivot_longer()` transforms data, using color to highlight how the cell values (the numbers 1 to 6) become the values in a new `value` column. They are unwound row-by-row, so the original rows (1,2), then (3,4), then (5,6), become a column running from 1 to 6.

-
Figure 6.5: The number of values is preserved (not repeated), but unwound row-by-row.
+

A diagram showing how `pivot_longer()` transforms data, using color to highlight how the cell values (the numbers 1 to 6) become the values in a new `value` column. They are unwound row-by-row, so the original rows (1,2), then (3,4), then (5,6), become a column running from 1 to 6.

+
The number of values is preserved (not repeated), but unwound row-by-row.
@@ -389,8 +389,8 @@ Many variables in column names
-

A diagram that uses color to illustrate how supplying `names_sep` and multiple `names_to` creates multiple variables in the output. The input has variable names "x_1" and "y_2" which are split up by "_" to create name and number columns in the output. This is is similar case with a single `names_to`, but what would have been a single output variable is now separated into multiple variables.

-
Figure 6.6: Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.
+

A diagram that uses color to illustrate how supplying `names_sep` and multiple `names_to` creates multiple variables in the output. The input has variable names "x_1" and "y_2" which are split up by "_" to create name and number columns in the output. This is is similar case with a single `names_to`, but what would have been a single output variable is now separated into multiple variables.

+
Pivotting with many variables in the column names means that each column name now fills in values in multiple output columns.
@@ -439,8 +439,8 @@ Data and variable names in the column headers
-

A diagram that uses color to illustrate how the special ".value" sentinel works. The input has names "x_1", "x_2", "y_1", and "y_2", and we want to use the first component ("x", "y") as a variable name and the second ("1", "2") as the value for a new "id" column.

-
Figure 6.7: Pivoting with names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.
+

A diagram that uses color to illustrate how the special ".value" sentinel works. The input has names "x_1", "x_2", "y_1", and "y_2", and we want to use the first component ("x", "y") as a variable name and the second ("1", "2") as the value for a new "id" column.

+
Pivoting with names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.names_to = c(".value", "id") splits the column names into two components: the first part determines the output column name (x or y), and the second part determines the value of the id column.
diff --git a/oreilly/data-visualize.html b/oreilly/data-visualize.html index f5ae204..185e5a5 100644 --- a/oreilly/data-visualize.html +++ b/oreilly/data-visualize.html @@ -177,8 +177,8 @@ ggplot(data = mpg) +
-

Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue.

-
Figure 2.1: R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.
+

Mapping between shapes and the numbers that represent them: 0 - square, 1 - circle, 2 - triangle point up, 3 - plus, 4 - cross, 5 - diamond, 6 - triangle point down, 7 - square cross, 8 - star, 9 - diamond plus, 10 - circle plus, 11 - triangles up and down, 12 - square plus, 13 - circle cross, 14 - square and triangle down, 15 - filled square, 16 - filled circle, 17 - filled triangle point-up, 18 - filled diamond, 19 - solid circle, 20 - bullet (smaller circle), 21 - filled circle blue, 22 - filled square blue, 23 - filled diamond blue, 24 - filled triangle point-up blue, 25 - filled triangle point down blue.

+
R has 25 built in shapes that are identified by numbers. There are some seeming duplicates: for example, 0, 15, and 22 are all squares. The difference comes from the interaction of the color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.color and fill aesthetics. The hollow shapes (0–14) have a border determined by color; the solid shapes (15–20) are filled with color; the filled shapes (21–24) have a border of color and are filled with fill.
@@ -499,8 +499,8 @@ Statistical transformations
-

A figure demonstrating three steps of creating a bar chart. Step 1. geom_bar() begins with the diamonds data set. Step 2. geom_bar() transforms the data with the count stat, which returns a data set of cut values and counts. Step 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis.

-
Figure 2.2: When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.
+

A figure demonstrating three steps of creating a bar chart. Step 1. geom_bar() begins with the diamonds data set. Step 2. geom_bar() transforms the data with the count stat, which returns a data set of cut values and counts. Step 3. geom_bar() uses the transformed data to build the plot. cut is mapped to the x-axis, count is mapped to the y-axis.

+
When create a bar chart we first start with the raw data, then aggregate it to count the number of observations in each bar, and finally map those computed variables to plot aesthetics.
diff --git a/oreilly/intro.html b/oreilly/intro.html index 5735e90..3d940dd 100644 --- a/oreilly/intro.html +++ b/oreilly/intro.html @@ -7,8 +7,8 @@ What you will learn
-

A diagram displaying the data science cycle: Import -> Tidy -> Understand (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Communicate.

-
Figure 1.1: In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.
+

A diagram displaying the data science cycle: Import -> Tidy -> Understand (which has the phases Transform -> Visualize -> Model in a cycle) -> Communicate. Surrounding all of these is Communicate.

+
In our model of the data science process you start with data import and tidying. Next you understand your data with an iterative cycle of transforming, visualizing, and modeling. You finish the process by communicating your results to other humans.
@@ -79,8 +79,8 @@ RStudio
-

The RStudio IDE with the panes Console and Output highlighted.

-
Figure 1.2: The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.
+

The RStudio IDE with the panes Console and Output highlighted.

+
The RStudio IDE has two key regions: type R code in the console pane on the left, and look for plots in the output pane on the right.
diff --git a/oreilly/joins.html b/oreilly/joins.html index 5e41366..cd0280a 100644 --- a/oreilly/joins.html +++ b/oreilly/joins.html @@ -116,8 +116,8 @@ Primary and foreign keys
-

The relationships between airports, planes, flights, weather, and airlines datasets from the nycflights13 package. airports$faa connected to the flights$origin and flights$dest. planes$tailnum is connected to the flights$tailnum. weather$time_hour and weather$origin are jointly connected to flights$time_hour and flights$origin. airlines$carrier is connected to flights$carrier. There are no direct connections between airports, planes, airlines, and weather data frames.

-
Figure 19.1: Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.
+

The relationships between airports, planes, flights, weather, and airlines datasets from the nycflights13 package. airports$faa connected to the flights$origin and flights$dest. planes$tailnum is connected to the flights$tailnum. weather$time_hour and weather$origin are jointly connected to flights$time_hour and flights$origin. airlines$carrier is connected to flights$carrier. There are no direct connections between airports, planes, airlines, and weather data frames.

+
Connections between all five data frames in the nycflights13 package. Variables making up a primary key are coloured grey, and are connected to their corresponding foreign keys with arrows.
@@ -500,8 +500,8 @@ y <- tribble(
-

x and y are two data frames with 2 columns and 3 rows, with contents as described in the text. The values of the keys are coloured: 1 is green, 2 is purple, 3 is orange, and 4 is yellow.

-
Figure 19.2: Graphical representation of two simple tables. The coloured key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.
+

x and y are two data frames with 2 columns and 3 rows, with contents as described in the text. The values of the keys are coloured: 1 is green, 2 is purple, 3 is orange, and 4 is yellow.

+
Graphical representation of two simple tables. The coloured key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.key columns map background colour to key value. The grey columns represent the “value” columns that are carried along for the ride.
@@ -509,8 +509,8 @@ y <- tribble(
-

x and y are placed at right-angles, with horizonal lines extending from x and vertical lines extending from y. There are 3 rows in x and 3 rows in y, which leads to nine intersections representing nine potential matches.

-
Figure 19.3: To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.
+

x and y are placed at right-angles, with horizonal lines extending from x and vertical lines extending from y. There are 3 rows in x and 3 rows in y, which leads to nine intersections representing nine potential matches.

+
To understand how joins work, it’s useful to think of every possible match. Here we show that with a grid of connecting lines.
@@ -518,8 +518,8 @@ y <- tribble(
-

x and y are placed at right-angles with lines forming a grid of potential matches. Keys 1 and 2 appear in both x and y, so we get a match, indicated by a dot. Each dot corresponds to a row in the output, so the resulting joined data frame has two rows.

-
Figure 19.4: An inner join matches each row in x to the row in y that has the same value of key. Each match becomes a row in the output.
+

x and y are placed at right-angles with lines forming a grid of potential matches. Keys 1 and 2 appear in both x and y, so we get a match, indicated by a dot. Each dot corresponds to a row in the output, so the resulting joined data frame has two rows.

+
An inner join matches each row in x to the row in y that has the same value of key. Each match becomes a row in the output.x to the row in y that has the same value of key. Each match becomes a row in the output.
@@ -529,8 +529,8 @@ y <- tribble(
-

Compared to the previous diagram showing an inner join, the y table gets a new virtual row containin NA that will match any row in x that didn't otherwise match. This means that the output now has three rows. For key = 3, which matches this virtual row, val_y takes value NA.

-
Figure 19.5: A visual representation of the left join where every row in x appears in the output.
+

Compared to the previous diagram showing an inner join, the y table gets a new virtual row containin NA that will match any row in x that didn't otherwise match. This means that the output now has three rows. For key = 3, which matches this virtual row, val_y takes value NA.

+
A visual representation of the left join where every row in x appears in the output.x appears in the output.
@@ -540,8 +540,8 @@ y <- tribble(
-

Compared to the previous diagram showing an left join, the x table now gains a virtual row so that every row in y gets a match in x. val_x contains NA for the row in y that didn't match x.

-
Figure 19.6: A visual representation of the right join where every row of y appears in the output.
+

Compared to the previous diagram showing an left join, the x table now gains a virtual row so that every row in y gets a match in x. val_x contains NA for the row in y that didn't match x.

+
A visual representation of the right join where every row of y appears in the output.y appears in the output.
@@ -551,8 +551,8 @@ y <- tribble(
-

Now both x and y have a virtual row that always matches. The result has 4 rows: keys 1, 2, 3, and 4 with all values from val_x and val_y, however key 2, val_y and key 4, val_x are NAs since those keys don't have a match in the other data frames.

-
Figure 19.7: A visual representation of the full join where every row in x and y appears in the output.
+

Now both x and y have a virtual row that always matches. The result has 4 rows: keys 1, 2, 3, and 4 with all values from val_x and val_y, however key 2, val_y and key 4, val_x are NAs since those keys don't have a match in the other data frames.

+
A visual representation of the full join where every row in x and y appears in the output.x and y appears in the output.
@@ -561,8 +561,8 @@ y <- tribble(
-

Venn diagrams for inner, full, left, and right joins. Each join represented with two intersecting circles representing data frames x and y, with x on the right and y on the left. Shading indicates the result of the join.

-
Figure 19.8: Venn diagrams showing the difference between inner, left, right, and full joins.
+

Venn diagrams for inner, full, left, and right joins. Each join represented with two intersecting circles representing data frames x and y, with x on the right and y on the left. Shading indicates the result of the join.

+
Venn diagrams showing the difference between inner, left, right, and full joins.
@@ -574,8 +574,8 @@ Row matching
-

A join diagram where x has key values 1, 2, and 3, and y has key values 1, 2, 2. The output has three rows because key 1 matches one row, key 2 matches two rows, and key 3 matches zero rows.

-
Figure 19.9: The three ways a row in x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.
+

A join diagram where x has key values 1, 2, and 3, and y has key values 1, 2, 2. The output has three rows because key 1 matches one row, key 2 matches two rows, and key 3 matches zero rows.

+
The three ways a row in x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.x can match. x1 matches one row in y, x2 matches two rows in y, x3 matches zero rows in y. Note that while there are three rows in x and three rows in the output, there isn’t a direct correspondence between the rows.
@@ -683,16 +683,16 @@ Filtering joins
-

A join diagram with old friends x and y. In a semi join, only the presence of a match matters so the output contains the same columns as x.

-
Figure 19.10: In a semi-join it only matters that there is a match; otherwise values in y don’t affect the output.
+

A join diagram with old friends x and y. In a semi join, only the presence of a match matters so the output contains the same columns as x.

+
In a semi-join it only matters that there is a match; otherwise values in y don’t affect the output.y don’t affect the output.
-

An anti-join is the inverse of a semi-join so matches are drawn with red lines indicating that they will be dropped from the output.

-
Figure 19.11: An anti-join is the inverse of a semi-join, dropping rows from x that have a match in y.
+

An anti-join is the inverse of a semi-join so matches are drawn with red lines indicating that they will be dropped from the output.

+
An anti-join is the inverse of a semi-join, dropping rows from x that have a match in y.x that have a match in y.
@@ -716,8 +716,8 @@ Non-equi joins
-

A join diagram showing an inner join betwen x and y. The result now includes four columns: key.x, val_x, key.y, and val_y. The values of key.x and key.y are identical, which is why we usually only show one.

-
Figure 19.12: An left join showing both x and y keys in the output.
+

A join diagram showing an inner join betwen x and y. The result now includes four columns: key.x, val_x, key.y, and val_y. The values of key.x and key.y are identical, which is why we usually only show one.

+
An left join showing both x and y keys in the output.x and y keys in the output.
@@ -725,8 +725,8 @@ Non-equi joins
-

A join diagram illustrating join_by(key >= key). The first row of x matches one row of y and the second and thirds rows each match two rows. This means the output has five rows containing each of the following (key.x, key.y) pairs: (1, 1), (2, 1), (2, 2), (3, 1), (3, 2).

-
Figure 19.13: A non-equi join where the x key must greater than or equal to than the y key. Many rows generate multiple matches.
+

A join diagram illustrating join_by(key >= key). The first row of x matches one row of y and the second and thirds rows each match two rows. This means the output has five rows containing each of the following (key.x, key.y) pairs: (1, 1), (2, 1), (2, 2), (3, 1), (3, 2).

+
A non-equi join where the x key must greater than or equal to than the y key. Many rows generate multiple matches.x key must greater than or equal to than the y key. Many rows generate multiple matches.
@@ -748,8 +748,8 @@ Cross joins
-

A join diagram showing a dot for every combination of x and y.

-
Figure 19.14: A cross join matches each row in x with every row in y.
+

A join diagram showing a dot for every combination of x and y.

+
A cross join matches each row in x with every row in y.x with every row in y.
@@ -777,8 +777,8 @@ Inequality joins
-

-
Figure 19.15: An inequality join where x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.
+

+
An inequality join where x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.x is joined to y on rows where the key of x is less than the key of y. This makes a triangular shape in the top-left corner.
@@ -807,8 +807,8 @@ Rolling joins
-

A rolling join is a subset of an inequality join so some matches are grayed out indicating that they're not used because they're not the "closest".

-
Figure 19.16: A following join is similar to a greater-than-or-equal inequality join but only matches the first value.
+

A rolling join is a subset of an inequality join so some matches are grayed out indicating that they're not used because they're not the "closest".

+
A following join is similar to a greater-than-or-equal inequality join but only matches the first value.
diff --git a/oreilly/logicals.html b/oreilly/logicals.html index e66ae5c..80379e6 100644 --- a/oreilly/logicals.html +++ b/oreilly/logicals.html @@ -252,8 +252,8 @@ Boolean algebra
-

Six Venn diagrams, each explaining a given logical operator. The circles (sets) in each of the Venn diagrams represent x and y. 1. y & !x is y but none of x; x & y is the intersection of x and y; x & !y is x but none of y; x is all of x none of y; xor(x, y) is everything except the intersection of x and y; y is all of y and none of x; and x | y is everything.

-
Figure 12.1: The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.
+

Six Venn diagrams, each explaining a given logical operator. The circles (sets) in each of the Venn diagrams represent x and y. 1. y & !x is y but none of x; x & y is the intersection of x and y; x & !y is x but none of y; x is all of x none of y; xor(x, y) is everything except the intersection of x and y; y is all of y and none of x; and x | y is everything.

+
The complete set of boolean operations. x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.x is the left-hand circle, y is the right-hand circle, and the shaded region show which parts each operator selects.
@@ -427,8 +427,8 @@ Numeric summaries of logical vectors geom_histogram(binwidth = 0.05)
-

The distribution is unimodal and mildly right skewed. The distribution peaks around 30% delayed flights.

-
Figure 12.2: A histogram showing the proportion of delayed flights each day.
+

The distribution is unimodal and mildly right skewed. The distribution peaks around 30% delayed flights.

+
A histogram showing the proportion of delayed flights each day.
diff --git a/oreilly/numbers.html b/oreilly/numbers.html index 6279f8f..df4f523 100644 --- a/oreilly/numbers.html +++ b/oreilly/numbers.html @@ -315,8 +315,8 @@ Modular arithmetic geom_point(aes(size = n))
-

A line plot showing how proportion of cancelled flights changes over the course of the day. The proportion starts low at around 0.5% at 6am, then steadily increases over the course of the day until peaking at 4% at 7pm. The proportion of cancelled flights then drops rapidly getting down to around 1% by midnight.

-
Figure 13.1: A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.
+

A line plot showing how proportion of cancelled flights changes over the course of the day. The proportion starts low at around 0.5% at 6am, then steadily increases over the course of the day until peaking at 4% at 7pm. The proportion of cancelled flights then drops rapidly getting down to around 1% by midnight.

+
A line plot with scheduled departure hour on the x-axis, and proportion of cancelled flights on the y-axis. Cancellations seem to accumulate over the course of the day until 8pm, very late flights are much less likely to be cancelled.
@@ -641,8 +641,8 @@ Center #> ℹ Please use `linewidth` instead.
-

All points fall below a 45° line, meaning that the median delay is always less than the mean delay. Most points are clustered in a dense region of mean [0, 20] and median [0, 5]. As the mean delay increases, the spread of the median also increases. There are two outlying points with mean ~60, median ~50, and mean ~85, median ~55.

-
Figure 13.2: A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.
+

All points fall below a 45° line, meaning that the median delay is always less than the mean delay. Most points are clustered in a dense region of mean [0, 20] and median [0, 5]. As the mean delay increases, the spread of the median also increases. There are two outlying points with mean ~60, median ~50, and mean ~85, median ~55.

+
A scatterplot showing the differences of summarising hourly depature delay with median instead of mean.
@@ -716,18 +716,18 @@ flights |>
-

Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that.

-
(a) Histogram shows the full range of delays.
+

Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that.

+
(a) Histogram shows the full range of delays.
-

Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that.

-
(b) Histogram is zoomed in to show delays less than 2 hours.
+

Two histograms of `dep_delay`. On the left, it's very hard to see any pattern except that there's a very large spike around zero, the bars rapidly decay in height, and for most of the plot, you can't see any bars because they are too short to see. On the right, where we've discarded delays of greater than two hours, we can see that the spike occurs slightly below zero (i.e. most flights leave a couple of minutes early), but there's still a very steep decay after that.

+
(b) Histogram is zoomed in to show delays less than 2 hours.
-
Figure 13.3: The distribution of dep_delay appears highly skewed to the right in both histograms.
+

Figure 13.3: The distribution of dep_delay appears highly skewed to the right in both histograms.

It’s also a good idea to check that distributions for subgroups resemble the whole. #fig-flights-dist-daily overlays a frequency polygon for each day. The distributions seem to follow a common pattern, suggesting it’s fine to use the same summary for each day.

@@ -738,8 +738,8 @@ flights |> geom_freqpoly(binwidth = 5, alpha = 1/5)
-

The distribution of `dep_delay` is highly right skewed with a strong peak slightly less than 0. The 365 frequency polygons are mostly overlapping forming a thick black bland.

-
Figure 13.4: 365 frequency polygons of dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.
+

The distribution of `dep_delay` is highly right skewed with a strong peak slightly less than 0. The 365 frequency polygons are mostly overlapping forming a thick black bland.

+
365 frequency polygons of dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.dep_delay, one for each day. The frequency polygons appear to have the same shape, suggesting that it’s reasonable to compare days by looking at just a few summary statistics.
diff --git a/oreilly/rectangling.html b/oreilly/rectangling.html index 8f4c4f7..287d74e 100644 --- a/oreilly/rectangling.html +++ b/oreilly/rectangling.html @@ -133,24 +133,24 @@ str(x5)
-

A screenshot of RStudio showing the list-viewer. It shows the two children of x4: the first child is a double vector and the second child is a list. A rightward facing triable indicates that the second child itself has children but you can't see them.

-
Figure 22.1: The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.
+

A screenshot of RStudio showing the list-viewer. It shows the two children of x4: the first child is a double vector and the second child is a list. A rightward facing triable indicates that the second child itself has children but you can't see them.

+
The RStudio view lets you interactively explore a complex list. The viewer opens showing only the top level of the list.
-

Another screenshot of the list-viewer having expand the second child of x2. It also has two children, a double vector and another list.

-
Figure 22.2: Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.
+

Another screenshot of the list-viewer having expand the second child of x2. It also has two children, a double vector and another list.

+
Clicking on the rightward facing triangle expands that component of the list so that you can also see its children.
-

Another screenshot, having expanded the grandchild of x4 to see its two children, again a double vector and a list.

-
Figure 22.3: You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case x4[[2]][[2]][[2]].
+

Another screenshot, having expanded the grandchild of x4 to see its two children, again a double vector and a list.

+
You can repeat this operation as many times as needed to get to the data you’re interested in. Note the bottom-left corner: if you click an element of the list, RStudio will give you the subsetting code needed to access it, in this case x4[[2]][[2]][[2]].x4[[2]][[2]][[2]].
diff --git a/oreilly/regexps.html b/oreilly/regexps.html index 80dfa3a..105694a 100644 --- a/oreilly/regexps.html +++ b/oreilly/regexps.html @@ -198,7 +198,7 @@ Detect matches

A timeseries showing the proportion of baby names that contain the letter x. The proportion declines gradually from 8 per 1000 in 1880 to 4 per 1000 in 1980, then increases rapidly to 16 per 1000 in 2019.

-
Figure 15.1: A time series showing the proportion of baby names that contain a lower case “x”.
+
A time series showing the proportion of baby names that contain a lower case “x”.
diff --git a/oreilly/spreadsheets.html b/oreilly/spreadsheets.html index efc2261..80d6765 100644 --- a/oreilly/spreadsheets.html +++ b/oreilly/spreadsheets.html @@ -50,8 +50,8 @@ Reading spreadsheets
-

A look at the students spreadsheet in Excel. The spreadsheet contains information on 6 students, their ID, full name, favourite food, meal plan, and age.

-
Figure 20.1: Spreadsheet called students.xlsx in Excel.
+

A look at the students spreadsheet in Excel. The spreadsheet contains information on 6 students, their ID, full name, favourite food, meal plan, and age.

+
Spreadsheet called students.xlsx in Excel.
@@ -188,8 +188,8 @@ Reading individual sheets
-

A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island.

-
Figure 20.2: Spreadsheet called penguins.xlsx in Excel.
+

A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island.

+
Spreadsheet called penguins.xlsx in Excel.
@@ -270,8 +270,8 @@ Reading part of a sheet
-

A look at the deaths spreadsheet in Excel. The spreadsheet has four rows on top that contain non-data information; the text 'For the same of consistency in the data layout, which is really a beautiful thing, I will keep making notes up here.' is spread across cells in these top four rows. Then, there is a data frame that includes information on deaths of 10 famous people, including their names, professions, ages, whether they have kids or not, date of birth and death. At the bottom, there are four more rows of non-data information; the text 'This has been really fun, but we're signing off now!' is spread across cells in these bottom four rows.

-
Figure 20.3: Spreadsheet called deaths.xlsx in Excel.
+

A look at the deaths spreadsheet in Excel. The spreadsheet has four rows on top that contain non-data information; the text 'For the same of consistency in the data layout, which is really a beautiful thing, I will keep making notes up here.' is spread across cells in these top four rows. Then, there is a data frame that includes information on deaths of 10 famous people, including their names, professions, ages, whether they have kids or not, date of birth and death. At the bottom, there are four more rows of non-data information; the text 'This has been really fun, but we're signing off now!' is spread across cells in these bottom four rows.

+
Spreadsheet called deaths.xlsx in Excel.
@@ -398,8 +398,8 @@ write_xlsx(bake_sale, path = "data/bake-sale.xlsx")
-

Bake sale data frame created earlier in Excel.

-
Figure 20.4: Spreadsheet called bake_sale.xlsx in Excel.
+

Bake sale data frame created earlier in Excel.

+
Spreadsheet called bake_sale.xlsx in Excel.
@@ -477,8 +477,8 @@ writeDataTable(
-

A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island.

-
Figure 20.5: Spreadsheet called penguins.xlsx in Excel.
+

A look at the penguins spreadsheet in Excel. The spreadsheet contains has three sheets: Torgersen Island, Biscoe Island, and Dream Island.

+
Spreadsheet called penguins.xlsx in Excel.
diff --git a/oreilly/workflow-pipes.html b/oreilly/workflow-pipes.html index 40793c0..d2ca1b3 100644 --- a/oreilly/workflow-pipes.html +++ b/oreilly/workflow-pipes.html @@ -11,7 +11,7 @@

Screenshot showing the "Use native pipe operator" option which can be found on the "Editing" panel of the "Code" options.

-
Figure 5.1: To insert |>, make sure the “Use native pipe operator” option is checked.|>, make sure the “Use native pipe operator” option is checked.
+
To insert |>, make sure the “Use native pipe operator” option is checked.|>, make sure the “Use native pipe operator” option is checked.
diff --git a/oreilly/workflow-scripts.html b/oreilly/workflow-scripts.html index ec55024..05f51a1 100644 --- a/oreilly/workflow-scripts.html +++ b/oreilly/workflow-scripts.html @@ -17,8 +17,8 @@ Scripts
-

RStudio IDE with Editor, Console, and Output highlighted.

-
Figure 9.1: Opening the script editor adds a new pane at the top-left of the IDE.
+

RStudio IDE with Editor, Console, and Output highlighted.

+
Copy these options in your RStudio options to always start your RStudio session with a clean slate.
@@ -115,8 +115,8 @@ What is the source of truth?
-

RStudio preferences window where the option Restore .RData into workspace at startup is not checked. Also, the option Save workspace to .RData on exit is set to Never.

-
Figure 9.2: Copy these options in your RStudio options to always start your RStudio session with a clean slate.
+

RStudio preferences window where the option Restore .RData into workspace at startup is not checked. Also, the option Save workspace to .RData on exit is set to Never.

+
(a) First click New Directory.
@@ -159,22 +159,22 @@ Where does your analysis live? RStudio projects

Keeping all the files associated with a given project (input data, R scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via projects. Let’s make a project for you to use while you’re working through the rest of this book. Click File > New Project, then follow the steps shown in #fig-new-project.

-
+
-

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

-
(a) First click New Directory.
+

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

+
(a) First click New Directory.
-

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

-
(b) Then click New Project.
+

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

+
(b) Then click New Project.
-

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

-
(c) Finally, fill in the directory (project) name, choose a good subdirectory for its home and click Create Project.
+

Three screenshots of the New Project menu. In the first screenshot, the Create Project window is shown and New Directory is selected. In the second screenshot, the Project Type window is shown and Empty Project is selected. In the third screenshot, the Create New Project  window is shown and the directory name is given as r4ds and the project is being created as subdirectory of the Desktop.

+
Opening the script editor adds a new pane at the top-left of the IDE.
Figure 9.3: Create a new project by following these three steps.
diff --git a/oreilly/workflow-style.html b/oreilly/workflow-style.html index dba3248..01e238e 100644 --- a/oreilly/workflow-style.html +++ b/oreilly/workflow-style.html @@ -10,8 +10,8 @@

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. Even as a very new programmer it’s a good idea to work on your code style. Using a consistent style makes it easier for others (including future-you!) to read your work, and is particularly important if you need to get help from someone else. This chapter will introduce to the most important points of the tidyverse style guide, which is used throughout this book.

Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature. Additionally, there are some great tools to quickly restyle existing code, like the styler package by Lorenz Walthert. Once you’ve installed it with install.packages("styler"), an easy way to use it is via RStudio’s command palette. The command palette lets you use any build-in RStudio command, as well as many addins provided by packages. Open the palette by pressing Cmd/Ctrl + Shift + P, then type “styler” to see all the shortcuts provided by styler. #fig-styler shows the results.

-

A screenshot showing the command palette after typing "styler", showing the four styling tool provided by the package.

-
Figure 7.1: RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.
+

A screenshot showing the command palette after typing "styler", showing the four styling tool provided by the package.

+
RStudio’s command palette makes it easy to access every RStudio command using only the keyboard.
@@ -180,8 +180,8 @@ Sectioning comments
-

-
Figure 7.2: After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.
+

+
After adding sectioning comments to your script, you can easily navigate to them using the code navigation tool in the bottom-left of the script editor.
diff --git a/preface-2e.qmd b/preface-2e.qmd index ec2dece..468439d 100644 --- a/preface-2e.qmd +++ b/preface-2e.qmd @@ -22,4 +22,3 @@ Welcome to the second edition of "R for Data Science". ## Acknowledgements {.unnumbered} *TO DO: Add acknowledgements.* -