From fa8218035f973e0edc54dc7fe40a69ea51de8bd0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mine=20=C3=87etinkaya-Rundel?= Date: Mon, 9 May 2022 12:31:46 -0400 Subject: [PATCH] Clarify why histogram starts below 0, closes #724 --- EDA.Rmd | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/EDA.Rmd b/EDA.Rmd index e2eb117..783d14e 100644 --- a/EDA.Rmd +++ b/EDA.Rmd @@ -141,7 +141,9 @@ diamonds |> ``` A histogram divides the x-axis into equally spaced bins and then uses the height of a bar to display the number of observations that fall in each bin. -In the graph above, the tallest bar shows that almost 30,000 observations have a `carat` value between 0.25 and 0.75, which are the left and right edges of the bar. +Note that even though it's not possible to have a `carat` value that is smaller than 0 (since weights of diamonds, by definition, are positive values), the bins start at a negative value (-0.25) in order to create bins of equal width across the range of the data with the center of the first bin at 0. +This behavior is also apparent in the histogram above, where the first bar ranges from -0.25 to 0.25. +The tallest bar shows that almost 30,000 observations have a `carat` value between 0.25 and 0.75, which are the left and right edges of the bar centered at 0.5. You can set the width of the intervals in a histogram with the `binwidth` argument, which is measured in the units of the `x` variable. You should always explore a variety of binwidths when working with histograms, as different binwidths can reveal different patterns.