From 9137dd91b5d45e3599f800c7915cec408d890903 Mon Sep 17 00:00:00 2001 From: Garrett Date: Wed, 20 Jan 2016 16:18:50 -0500 Subject: [PATCH] Reorders outline. Model section becomes do science section. --- _includes/package-nav.html | 17 +++++++++-------- communicate.Rmd | 6 ++++++ essentials.Rmd | 28 ---------------------------- program.Rmd | 6 ++++++ science.Rmd | 6 ++++++ understand.Rmd | 6 ++++++ eda.Rmd => variation.Rmd | 33 ++++++++++++++++++++++++++++++--- work.Rmd | 6 ++++++ 8 files changed, 69 insertions(+), 39 deletions(-) create mode 100644 communicate.Rmd delete mode 100644 essentials.Rmd create mode 100644 program.Rmd create mode 100644 science.Rmd create mode 100644 understand.Rmd rename eda.Rmd => variation.Rmd (93%) create mode 100644 work.Rmd diff --git a/_includes/package-nav.html b/_includes/package-nav.html index eca9916..98a6850 100644 --- a/_includes/package-nav.html +++ b/_includes/package-nav.html @@ -1,29 +1,30 @@
  • Introduction
  • - +
  • Visualize
  • Transform
  • Model
  • -
  • Explore
  • +
  • Variation
  • - +
  • Import
  • Tidy
  • Relational data
  • Strings
  • Dates and times
  • - +
  • Express yourself with code
  • Data structures
  • Lists and functional programming
  • Robust code
  • - -
  • Models and visualisation
  • -
  • Model assessment
  • + +
  • Discover
  • +
  • Predict
  • +
  • Test
  • - +
  • R Markdown
  • Shiny
  • diff --git a/communicate.Rmd b/communicate.Rmd new file mode 100644 index 0000000..1e70684 --- /dev/null +++ b/communicate.Rmd @@ -0,0 +1,6 @@ +--- +layout: default +title: Communicate your work +--- + +Reproducible, literate code is the data science equivalent of the Scientific Report (i.e, Intro, Methods and materials, Results, Discussion). diff --git a/essentials.Rmd b/essentials.Rmd deleted file mode 100644 index 2dea311..0000000 --- a/essentials.Rmd +++ /dev/null @@ -1,28 +0,0 @@ ---- -layout: default -title: Essentials ---- - -If you measure any quantity twice---and precisely enough, you will get two different results. This is true even for quantities that should be constant, like the speed of light (below). - -This phenomenon, called _variation_, is the beginning of data science. To understand anything you must decipher patterns of variation. But variation does more than just obscure, it is an incredibly useful tool. Patterns of variation provide evidence of causal relationships. - -The best way to study variation is to collect data, particularly rectangular data: data that is made up of variables, observations, and values. - -* A _variable_ is a quantity, quality, or property that you can measure. - -* A _value_ is the state of a variable when you measure it. The value of a - variable may change from measurement to measurement. - -* An _observation_ is a set of measurements you make under similar conditions - (usually all at the same time or on the same object). Observations contain - values that you measure on different variables. - -Rectangular data provides a clear record of variation, but that doesn't mean it is easy to understand. The human mind isn't built to process tables of data. This section will show you the best ways to comprehend your own data, which is the most important challenge of data science. - -```{r, echo = FALSE} - -mat <- as.data.frame(matrix(morley$Speed + 299000, ncol = 10)) - -knitr::kable(mat, caption = "*The speed of light is* the *universal constant, but variation obscures its value, here demonstrated by Albert Michelson in 1879. Michelson measured the speed of light 100 times and observed 30 different values (in km/sec).*", col.names = c("\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s")) -``` diff --git a/program.Rmd b/program.Rmd new file mode 100644 index 0000000..62563db --- /dev/null +++ b/program.Rmd @@ -0,0 +1,6 @@ +--- +layout: default +title: Save time by programming +--- + +Computer-human communication matters. diff --git a/science.Rmd b/science.Rmd new file mode 100644 index 0000000..816bdb8 --- /dev/null +++ b/science.Rmd @@ -0,0 +1,6 @@ +--- +layout: default +title: Do science with data +--- + +The scientific method guides data science. Data science solves known problems with the scientific method. diff --git a/understand.Rmd b/understand.Rmd new file mode 100644 index 0000000..d233976 --- /dev/null +++ b/understand.Rmd @@ -0,0 +1,6 @@ +--- +layout: default +title: Understand your data +--- + +Data poses a cognitive problem; Data comprehension is a skill. diff --git a/eda.Rmd b/variation.Rmd similarity index 93% rename from eda.Rmd rename to variation.Rmd index 952f89e..5f6c075 100644 --- a/eda.Rmd +++ b/variation.Rmd @@ -1,18 +1,45 @@ --- layout: default -title: Exploratory data analysis +title: Variation --- -# Exploratory data analysis +# Variation ```{r, include = FALSE} library(ggplot2) knitr::opts_chunk$set( cache = TRUE, - fig.path = "figures/eda/" + fig.path = "figures/variation/" ) ``` + +If you measure any quantity twice---and precisely enough, you will get two different results. This is true even for quantities that should be constant, like the speed of light (below). + +This phenomenon, called _variation_, is the beginning of data science. To understand anything you must decipher patterns of variation. But variation does more than just obscure, it is an incredibly useful tool. Patterns of variation provide evidence of causal relationships. + +The best way to study variation is to collect data, particularly rectangular data: data that is made up of variables, observations, and values. + +* A _variable_ is a quantity, quality, or property that you can measure. + +* A _value_ is the state of a variable when you measure it. The value of a + variable may change from measurement to measurement. + +* An _observation_ is a set of measurements you make under similar conditions + (usually all at the same time or on the same object). Observations contain + values that you measure on different variables. + +Rectangular data provides a clear record of variation, but that doesn't mean it is easy to understand. The human mind isn't built to process tables of data. This section will show you the best ways to comprehend your own data, which is the most important challenge of data science. + +```{r, echo = FALSE} + +mat <- as.data.frame(matrix(morley$Speed + 299000, ncol = 10)) + +knitr::kable(mat, caption = "*The speed of light is* the *universal constant, but variation obscures its value, here demonstrated by Albert Michelson in 1879. Michelson measured the speed of light 100 times and observed 30 different values (in km/sec).*", col.names = c("\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s", "\\s")) +``` + + + *** *Tip*: Throughout this section, we will rely on a distinction between two types of variables: diff --git a/work.Rmd b/work.Rmd new file mode 100644 index 0000000..9fb6141 --- /dev/null +++ b/work.Rmd @@ -0,0 +1,6 @@ +--- +layout: default +title: Work with your data +--- + +With data, the relationships between values matter as much as the values themselves. Tidy data encodes those relationships.