From 03f1c6c6f4d38e49605c24b718975c9735436e2c Mon Sep 17 00:00:00 2001 From: Hadley Wickham Date: Tue, 7 Feb 2023 10:37:50 -0600 Subject: [PATCH] Consistently use `smaller` dataset in section Fixes #1252 --- EDA.qmd | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/EDA.qmd b/EDA.qmd index 4825608..3ed05be 100644 --- a/EDA.qmd +++ b/EDA.qmd @@ -560,7 +560,7 @@ For larger plots, you might want to try the heatmaply package, which creates int You've already seen one great way to visualize the covariation between two numerical variables: draw a scatterplot with `geom_point()`. You can see covariation as a pattern in the points. -For example, you can see an exponential relationship between the carat size and price of a diamond. +For example, you can see an exponential relationship between the carat size and price of a diamond: ```{r} #| dev: "png" @@ -568,10 +568,12 @@ For example, you can see an exponential relationship between the carat size and #| A scatterplot of price vs. carat. The relationship is positive, somewhat #| strong, and exponential. -ggplot(diamonds, aes(x = carat, y = price)) + +ggplot(smaller, aes(x = carat, y = price)) + geom_point() ``` +(In this section we'll use the `smaller` dataset to stay focused on the bulk of the diamonds that are smaller than 3 carats) + Scatterplots become less useful as the size of your dataset grows, because points begin to overplot, and pile up into areas of uniform black (as above). You've already seen one way to fix the problem: using the `alpha` aesthetic to add transparency. @@ -583,7 +585,7 @@ You've already seen one way to fix the problem: using the `alpha` aesthetic to a #| the number of points is higher than other areas, The most obvious clusters #| are for diamonds with 1, 1.5, and 2 carats. -ggplot(diamonds, aes(x = carat, y = price)) + +ggplot(smaller, aes(x = carat, y = price)) + geom_point(alpha = 1 / 100) ```