diff --git a/joins.qmd b/joins.qmd index acce9e4..bd8ce5a 100644 --- a/joins.qmd +++ b/joins.qmd @@ -164,7 +164,7 @@ airports |> filter(n > 1) ``` -Identifying an airport by it's altitude and latitude is clearly a bad idea, and in general it's not possible to know from the data alone whether or not a combination of variables makes a good a primary key. +Identifying an airport by its altitude and latitude is clearly a bad idea, and in general it's not possible to know from the data alone whether or not a combination of variables makes a good a primary key. But for flights, the combination of `time_hour`, `carrier`, and `flight` seems reasonable because it would be really confusing for an airline and its customers if there were multiple flights with the same flight number in the air at the same time. That said, we might be better off introducing a simple numeric surrogate key using the row number: