From 3e167168e7a4849c4947c9bd2b2610f54545ab94 Mon Sep 17 00:00:00 2001 From: Hadley Wickham Date: Wed, 12 Oct 2022 10:36:02 -0500 Subject: [PATCH] Joins proofing --- joins.qmd | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/joins.qmd b/joins.qmd index 937ba38..4fe895f 100644 --- a/joins.qmd +++ b/joins.qmd @@ -13,24 +13,22 @@ It's rare that a data analysis involves only a single data frame. Typically you have many data frames, and you must **join** them together to answer the questions that you're interested in. This chapter will introduce you to two important types of joins: -- Mutating joins, add new variables to one data frame from matching observations in another. -- Filtering joins, filter observations from one data frame based on whether or not they match an observation in another. +- Mutating joins, which add new variables to one data frame from matching observations in another. +- Filtering joins, which filter observations from one data frame based on whether or not they match an observation in another. We'll begin by discussing keys, the variables used to connect a pair of data frames in a join. -You'll then see how to use joins to tackle a variety of challenges from the nycflights13 dataset. +We cement the theory with an examination of the keys in the nycflights13 datasets, then use that knowledge to start joining data frames together. Next we'll discuss how joins work, focusing on their action on the rows. We'll finish up with a discussion of non-equi-joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship. -If you're familiar with SQL, you should find the ideas in this chapter familiar, as their realization in dplyr is very similar. - ### Prerequisites ::: callout-important This chapter relies on features only found in dplyr 1.1.0, which is still in development. -If you want to live life on the edge you can get the dev version with `devtools::install_github("tidyverse/dplyr")`. +If you want to live life on the edge, you can get the dev version with `devtools::install_github("tidyverse/dplyr")`. ::: -We'll explore the five related datasets from nycflights13 using the join functions from dplyr. +In this chapter, we'll explore the five related datasets from nycflights13 using the join functions from dplyr. ```{r} #| label: setup @@ -42,8 +40,7 @@ library(nycflights13) ## Keys -To understand joins, you need to first understand how two tables might be connected. -The connection between a pair of tables is defined by a pair of keys, which each consist of one or more variables. +To understand joins, you need to first understand how two tables can be connected through a pair of keys, with on each table. In this section, you'll learn about the two types of key and their realization in the datasets of the nycflights13 package. You'll also learn how to check that your keys are valid, and what to do if your table lacks a key.