Eliminate two plots in EDA.qmd
Noticed these in passing. cc @mine-cetinkaya-rundel.
This commit is contained in:
parent
03f1c6c6f4
commit
504db47630
37
EDA.qmd
37
EDA.qmd
|
@ -637,20 +637,6 @@ ggplot(smaller, aes(x = carat, y = price)) +
|
||||||
By default, boxplots look roughly the same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that each boxplot summaries a different number of points.
|
By default, boxplots look roughly the same (apart from number of outliers) regardless of how many observations there are, so it's difficult to tell that each boxplot summaries a different number of points.
|
||||||
One way to show that is to make the width of the boxplot proportional to the number of points with `varwidth = TRUE`.
|
One way to show that is to make the width of the boxplot proportional to the number of points with `varwidth = TRUE`.
|
||||||
|
|
||||||
Another approach is to display approximately the same number of points in each bin.
|
|
||||||
That's the job of `cut_number()`:
|
|
||||||
|
|
||||||
```{r}
|
|
||||||
#| fig-alt: >
|
|
||||||
#| Side-by-side box plots of price by carat. Each box plot represents 20
|
|
||||||
#| diamonds. The box plots show that as carat increases the median price
|
|
||||||
#| increases as well. Cheaper, smaller diamonds have outliers on the higher
|
|
||||||
#| end, more expensive, bigger diamonds have outliers on the lower end.
|
|
||||||
|
|
||||||
ggplot(smaller, aes(x = carat, y = price)) +
|
|
||||||
geom_boxplot(aes(group = cut_number(carat, 20)))
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Exercises
|
#### Exercises
|
||||||
|
|
||||||
1. Instead of summarizing the conditional distribution with a boxplot, you could use a frequency polygon.
|
1. Instead of summarizing the conditional distribution with a boxplot, you could use a frequency polygon.
|
||||||
|
@ -665,21 +651,26 @@ ggplot(smaller, aes(x = carat, y = price)) +
|
||||||
4. Combine two of the techniques you've learned to visualize the combined distribution of cut, carat, and price.
|
4. Combine two of the techniques you've learned to visualize the combined distribution of cut, carat, and price.
|
||||||
|
|
||||||
5. Two dimensional plots reveal outliers that are not visible in one dimensional plots.
|
5. Two dimensional plots reveal outliers that are not visible in one dimensional plots.
|
||||||
For example, some points in the plot below have an unusual combination of `x` and `y` values, which makes the points outliers even though their `x` and `y` values appear normal when examined separately.
|
For example, some points in the following plot have an unusual combination of `x` and `y` values, which makes the points outliers even though their `x` and `y` values appear normal when examined separately.
|
||||||
|
Why is a scatterplot a better display than a binned plot for this case?
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
#| dev: "png"
|
#| eval: false
|
||||||
#| fig-alt: >
|
diamonds |>
|
||||||
#| A scatterplot of widths vs. lengths of diamonds. There is a positive,
|
filter(x >= 4) |>
|
||||||
#| strong, linear relationship. There are a few unusual observations
|
ggplot(aes(x = x, y = y)) +
|
||||||
#| above and below the bulk of the data, more below it than above.
|
|
||||||
|
|
||||||
ggplot(diamonds, aes(x = x, y = y)) +
|
|
||||||
geom_point() +
|
geom_point() +
|
||||||
coord_cartesian(xlim = c(4, 11), ylim = c(4, 11))
|
coord_cartesian(xlim = c(4, 11), ylim = c(4, 11))
|
||||||
```
|
```
|
||||||
|
|
||||||
Why is a scatterplot a better display than a binned plot for this case?
|
6. Instead of creating boxes of equal width with `cut_width()`, we could create boxes that contain roughly equal number of points with `cut_number()`.
|
||||||
|
What are the advantages and disadvantages of this approach?
|
||||||
|
|
||||||
|
```{r}
|
||||||
|
#| eval: false
|
||||||
|
ggplot(smaller, aes(x = carat, y = price)) +
|
||||||
|
geom_boxplot(aes(group = cut_number(carat, 20)))
|
||||||
|
```
|
||||||
|
|
||||||
## Patterns and models
|
## Patterns and models
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue