From 5d912aaed825ec9ac1b176f80f0aba2ce8852a77 Mon Sep 17 00:00:00 2001 From: Stephen Balogun <77954949+stephenbalogun@users.noreply.github.com> Date: Mon, 23 Jan 2023 23:23:26 +0100 Subject: [PATCH] Minor corrections (#1244) --- joins.qmd | 8 ++++---- spreadsheets.qmd | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/joins.qmd b/joins.qmd index 7e02faa..4f1a861 100644 --- a/joins.qmd +++ b/joins.qmd @@ -35,7 +35,7 @@ library(nycflights13) ## Keys -To understand joins, you need to first understand how two tables can be connected through a pair of keys, with on each table. +To understand joins, you need to first understand how two tables can be connected through a pair of keys, within each table. In this section, you'll learn about the two types of key and see examples of both in the datasets of the nycflights13 package. You'll also learn how to check that your keys are valid, and what to do if your table lacks a key. @@ -138,7 +138,7 @@ weather |> ### Surrogate keys So far we haven't talked about the primary key for `flights`. -It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if have some way to describe them to others. +It's not super important here, because there are no data frames that use it as a foreign key, but it's still useful to consider because it's easier to work with observations if we have some way to describe them to others. After a little thinking and experimentation, we determined that there are three variables that together uniquely identify each flight: @@ -194,7 +194,7 @@ Surrogate keys can be particular useful when communicating to other humans: it's ## Basic joins {#sec-mutating-joins} Now that you understand how data frames are connected via keys, we can start using joins to better understand the `flights` dataset. -dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, and `anti_join()`. +dplyr provides six join functions: `left_join()`, `inner_join()`, `right_join()`, `semi_join()`, `anti_join(), and full_join()`. They all have the same interface: they take a pair of data frames (`x` and `y`) and return a data frame. The order of the rows and columns in the output is primarily determined by `x`. @@ -321,7 +321,7 @@ airports |> **Anti-joins** are the opposite: they return all rows in `x` that don't have a match in `y`. They're useful for finding missing values that are **implicit** in the data, the topic of @sec-missing-implicit. Implicitly missing values don't show up as `NA`s but instead only exist as an absence. -For example, we can find rows that as missing from `airports` by looking for flights that don't have a matching destination airport: +For example, we can find rows that are missing from `airports` by looking for flights that don't have a matching destination airport: ```{r} flights2 |> diff --git a/spreadsheets.qmd b/spreadsheets.qmd index 3906104..7b76bff 100644 --- a/spreadsheets.qmd +++ b/spreadsheets.qmd @@ -404,7 +404,7 @@ read_excel("data/bake-sale.xlsx") ### Formatted output -The readxl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package. +The writexl package is a light-weight solution for writing a simple Excel spreadsheet, but if you're interested in additional features like writing to sheets within a spreadsheet and styling, you will want to use the **openxlsx** package. Note that this package is not part of the tidyverse so the functions and workflows may feel unfamiliar. For example, function names are camelCase, multiple functions can't be composed in pipelines, and arguments are in a different order than they tend to be in the tidyverse. However, this is ok.