diff --git a/EDA.qmd b/EDA.qmd index 4825608..3ed05be 100644 --- a/EDA.qmd +++ b/EDA.qmd @@ -560,7 +560,7 @@ For larger plots, you might want to try the heatmaply package, which creates int You've already seen one great way to visualize the covariation between two numerical variables: draw a scatterplot with `geom_point()`. You can see covariation as a pattern in the points. -For example, you can see an exponential relationship between the carat size and price of a diamond. +For example, you can see an exponential relationship between the carat size and price of a diamond: ```{r} #| dev: "png" @@ -568,10 +568,12 @@ For example, you can see an exponential relationship between the carat size and #| A scatterplot of price vs. carat. The relationship is positive, somewhat #| strong, and exponential. -ggplot(diamonds, aes(x = carat, y = price)) + +ggplot(smaller, aes(x = carat, y = price)) + geom_point() ``` +(In this section we'll use the `smaller` dataset to stay focused on the bulk of the diamonds that are smaller than 3 carats) + Scatterplots become less useful as the size of your dataset grows, because points begin to overplot, and pile up into areas of uniform black (as above). You've already seen one way to fix the problem: using the `alpha` aesthetic to add transparency. @@ -583,7 +585,7 @@ You've already seen one way to fix the problem: using the `alpha` aesthetic to a #| the number of points is higher than other areas, The most obvious clusters #| are for diamonds with 1, 1.5, and 2 carats. -ggplot(diamonds, aes(x = carat, y = price)) + +ggplot(smaller, aes(x = carat, y = price)) + geom_point(alpha = 1 / 100) ```