From 3c6c28d44ca5bb4ee7089df5f70fe99bea751493 Mon Sep 17 00:00:00 2001 From: Garrett Date: Thu, 7 Jan 2016 11:58:03 -0500 Subject: [PATCH 1/3] Adds section intro for Essentials --- _includes/package-nav.html | 2 +- essentials.Rmd | 28 ++++++++++++++++++++++++++++ intro.Rmd | 11 ----------- 3 files changed, 29 insertions(+), 12 deletions(-) create mode 100644 essentials.Rmd diff --git a/_includes/package-nav.html b/_includes/package-nav.html index b6f0236..a586ab5 100644 --- a/_includes/package-nav.html +++ b/_includes/package-nav.html @@ -1,6 +1,6 @@
  • Introduction
  • - +
  • Visualize
  • Transform
  • Tidy
  • diff --git a/essentials.Rmd b/essentials.Rmd new file mode 100644 index 0000000..2dea311 --- /dev/null +++ b/essentials.Rmd @@ -0,0 +1,28 @@ +--- +layout: default +title: Essentials +--- + +If you measure any quantity twice---and precisely enough, you will get two different results. This is true even for quantities that should be constant, like the speed of light (below). + +This phenomenon, called _variation_, is the beginning of data science. To understand anything you must decipher patterns of variation. But variation does more than just obscure, it is an incredibly useful tool. Patterns of variation provide evidence of causal relationships. + +The best way to study variation is to collect data, particularly rectangular data: data that is made up of variables, observations, and values. + +* A _variable_ is a quantity, quality, or property that you can measure. + +* A _value_ is the state of a variable when you measure it. The value of a + variable may change from measurement to measurement. + +* An _observation_ is a set of measurements you make under similar conditions + (usually all at the same time or on the same object). Observations contain + values that you measure on different variables. + +Rectangular data provides a clear record of variation, but that doesn't mean it is easy to understand. The human mind isn't built to process tables of data. This section will show you the best ways to comprehend your own data, which is the most important challenge of data science. + +```{r, echo = FALSE} + +mat <- as.data.frame(matrix(morley$Speed + 299000, ncol = 10)) + +knitr::kable(mat, caption = "*The speed of light is* the *universal constant, but variation obscures its value, here demonstrated by Albert Michelson in 1879. Michelson measured the speed of light 100 times and observed 30 different values (in km/sec).*", col.names = c("\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s")) +``` diff --git a/intro.Rmd b/intro.Rmd index 1d989ab..7dda0ae 100644 --- a/intro.Rmd +++ b/intro.Rmd @@ -82,17 +82,6 @@ However, we strongly believe that it's best to master one tool at a time. You wi ### Non-rectangular data -This book focusses exclusively on rectangular data, data made up of variables, observations, and values: - -* A _variable_ is a quantity, quality, or property that you can measure. - -* A _value_ is the state of a variable when you measure it. The value of a - variable may change from measurement to measurement. - -* An _observation_ is a set of measurments you make under similar conditions - (usually all at the same time or on the same object). Observations contain - values that you measure on different variables. - This book focuses exclusively on structured data sets: collections of values that are each associated with a variable and an observation. There are lots of data sets that do not naturally fit in this paradigm: images, sounds, trees, text. But data frames are extremely common in science and in industry and we believe that they're a great place to start your data analysis journey. ### Formal Statistics and Machine Learning From 387d65c4f1af255050fa6409e03cdc7ada96bfa5 Mon Sep 17 00:00:00 2001 From: "Jennifer (Jenny) Bryan" Date: Thu, 7 Jan 2016 10:00:03 -0800 Subject: [PATCH 2/3] extra `)` --- relational-data.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/relational-data.Rmd b/relational-data.Rmd index ea10328..283bea2 100644 --- a/relational-data.Rmd +++ b/relational-data.Rmd @@ -236,7 +236,7 @@ dplyr | merge `inner_join(x, y)` | `merge(x, y)` `left_join(x, y)` | `merge(x, y, all.x = TRUE)` `right_join(x, y)` | `merge(x, y, all.y = TRUE)`, -`full_join(x, y)` | `merge(x, y, all.x = TRUE), all.y = TRUE)` +`full_join(x, y)` | `merge(x, y, all.x = TRUE, all.y = TRUE)` The advantages of the specific dplyr verbs is that they more clearly convey the intent of your code: the difference between the joins is really important but concealed in the arguments of `merge()`. dplyr's joins are considerably faster and don't mess with the order of the rows. From 967de5e604cd1c9c5a9d7b4a677e0beaffff56f0 Mon Sep 17 00:00:00 2001 From: "Jennifer (Jenny) Bryan" Date: Thu, 7 Jan 2016 10:46:27 -0800 Subject: [PATCH 3/3] swap x and y --- relational-data.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/relational-data.Rmd b/relational-data.Rmd index ea10328..f236ddc 100644 --- a/relational-data.Rmd +++ b/relational-data.Rmd @@ -140,7 +140,7 @@ There are three ways that the keys might match: one-to-one, one-to-many, and man inner_join(x, y, by = "key") ``` -* In a one-to-many match, each key in `x` matches multiple keys in `y`. This +* In a one-to-many match, each key in `y` matches multiple keys in `x`. This is useful when you want to add in additional information. ```{r, echo = FALSE, out.width = "100%"}