diff --git a/EDA.Rmd b/EDA.Rmd index 8e1a6e4..7e5e6f7 100644 --- a/EDA.Rmd +++ b/EDA.Rmd @@ -623,6 +623,8 @@ It's possible to use a model to remove the very strong relationship between pric The following code fits a model that predicts `price` from `carat` and then computes the residuals (the difference between the predicted value and the actual value). The residuals give us a view of the price of the diamond, once the effect of carat has been removed. + + ```{r, dev = "png"} library(modelr) @@ -643,8 +645,7 @@ ggplot(data = diamonds2) + geom_boxplot(mapping = aes(x = cut, y = resid)) ``` -You'll learn how models, and the modelr package, work in the final part of the book, [model](#model-intro). -We're saving modelling for later because understanding what models are and how they work is easiest once you have tools of data wrangling and programming in hand. +We're not discussing modelling in this book because understanding what models are and how they work is easiest once you have tools of data wrangling and programming in hand. ## ggplot2 calls diff --git a/_bookdown.yml b/_bookdown.yml index 31655de..99f4872 100644 --- a/_bookdown.yml +++ b/_bookdown.yml @@ -30,11 +30,6 @@ rmd_files: [ "vectors.Rmd", "iteration.Rmd", - "model.Rmd", - "model-basics.Rmd", - "model-building.Rmd", - "model-many.Rmd", - "communicate.Rmd", "rmarkdown.Rmd", "communicate-plots.Rmd", diff --git a/communicate-plots.Rmd b/communicate-plots.Rmd index 63dae53..c66523d 100644 --- a/communicate-plots.Rmd +++ b/communicate-plots.Rmd @@ -99,6 +99,7 @@ ggplot(df, aes(x, y)) + 2. The `geom_smooth()` is somewhat misleading because the `hwy` for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines. Use your modelling tools to fit and display a better model. + 3. Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand. diff --git a/communicate.Rmd b/communicate.Rmd index 5d0b520..91c0afb 100644 --- a/communicate.Rmd +++ b/communicate.Rmd @@ -2,7 +2,7 @@ # Introduction {#communicate-intro} -So far, you've learned the tools to get your data into R, tidy it into a form convenient for analysis, and then understand your data through transformation, visualisation and modelling. +So far, you've learned the tools to get your data into R, tidy it into a form convenient for analysis, and then understand your data through transformation, and visualisation. However, it doesn't matter how great your analysis is unless you can explain it to others: you need to **communicate** your results. ```{r echo = FALSE, out.width = "75%"} diff --git a/explore.Rmd b/explore.Rmd index 861993d..522a646 100644 --- a/explore.Rmd +++ b/explore.Rmd @@ -20,7 +20,6 @@ In this part of the book you will learn some useful tools that have an immediate - Finally, in [exploratory data analysis], you'll combine visualisation and transformation with your curiosity and scepticism to ask and answer interesting questions about data. Modelling is an important part of the exploratory process, but you don't have the skills to effectively learn or apply it yet. -We'll come back to it in [modelling](#model-intro), once you're better equipped with more data wrangling and programming tools. Nestled among these three chapters that teach you the tools of exploration are three chapters that focus on your R workflow. In [workflow: basics], [workflow: scripts], and [workflow: projects] you'll learn good practices for writing and organising your R code. diff --git a/model-basics.Rmd b/extra/model/model-basics.Rmd similarity index 100% rename from model-basics.Rmd rename to extra/model/model-basics.Rmd diff --git a/model-building.Rmd b/extra/model/model-building.Rmd similarity index 100% rename from model-building.Rmd rename to extra/model/model-building.Rmd diff --git a/model-many.Rmd b/extra/model/model-many.Rmd similarity index 100% rename from model-many.Rmd rename to extra/model/model-many.Rmd diff --git a/model.Rmd b/extra/model/model.Rmd similarity index 100% rename from model.Rmd rename to extra/model/model.Rmd diff --git a/import.Rmd b/import.Rmd index ed51c16..a78f1b8 100644 --- a/import.Rmd +++ b/import.Rmd @@ -639,7 +639,7 @@ There are two alternatives: ``` Feather tends to be faster than RDS and is usable outside of R. -RDS supports list-columns (which you'll learn about in [many models]); feather currently does not. +RDS supports list-columns (which you'll learn about in ); feather currently does not. ```{r, include = FALSE} file.remove("challenge-2.csv") diff --git a/index.Rmd b/index.Rmd index 33cdedd..7d5d962 100644 --- a/index.Rmd +++ b/index.Rmd @@ -14,7 +14,7 @@ documentclass: book # Welcome {.unnumbered} Buy from amazon This is the website for the work-in-progress 2nd edition of **"R for Data Science"**. This book will teach you how to do data science with R: You'll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. -In this book, you will find a practicum of skills for data science. + In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you'll learn how to clean data and draw plots---and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You'll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. diff --git a/intro.Rmd b/intro.Rmd index f5ba338..52d184c 100644 --- a/intro.Rmd +++ b/intro.Rmd @@ -140,7 +140,6 @@ Hypothesis confirmation is hard for two reasons: 2. You can only use an observation once to confirm a hypothesis. As soon as you use it more than once you're back to doing exploratory analysis. This means to do hypothesis confirmation you need to "preregister" (write out in advance) your analysis plan, and not deviate from it even when you have seen the data. - We'll talk a little about some strategies you can use to make this easier in [modelling](#model-intro). It's common to think about modelling as a tool for hypothesis confirmation, and visualisation as a tool for hypothesis generation. But that's a false dichotomy: models are often used for exploration, and with a little care you can use visualisation for confirmation. diff --git a/transform.Rmd b/transform.Rmd index 73614ea..8575335 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -423,7 +423,7 @@ There's no way to list every possible function that you might use, but here's a - Logs: `log()`, `log2()`, `log10()`. Logarithms are an incredibly useful transformation for dealing with data that ranges across multiple orders of magnitude. - They also convert multiplicative relationships to additive, a feature we'll come back to in modelling. + They also convert multiplicative relationships to additive. All else being equal, I recommend using `log2()` because it's easy to interpret: a difference of 1 on the log scale corresponds to doubling on the original scale and a difference of -1 corresponds to halving.