Remove modelling

- Move files to extras/ for now
- Adjust references to modelling
- Or add TO DO items to adjust later
This commit is contained in:
Mine Çetinkaya-Rundel 2021-02-21 21:29:24 +00:00
parent f9109aadfe
commit 55803fc8a3
13 changed files with 8 additions and 13 deletions

View File

@ -623,6 +623,8 @@ It's possible to use a model to remove the very strong relationship between pric
The following code fits a model that predicts `price` from `carat` and then computes the residuals (the difference between the predicted value and the actual value). The following code fits a model that predicts `price` from `carat` and then computes the residuals (the difference between the predicted value and the actual value).
The residuals give us a view of the price of the diamond, once the effect of carat has been removed. The residuals give us a view of the price of the diamond, once the effect of carat has been removed.
<!--# TO DO: Replace modelr based workflow with tidymodels, as a sneak preview. -->
```{r, dev = "png"} ```{r, dev = "png"}
library(modelr) library(modelr)
@ -643,8 +645,7 @@ ggplot(data = diamonds2) +
geom_boxplot(mapping = aes(x = cut, y = resid)) geom_boxplot(mapping = aes(x = cut, y = resid))
``` ```
You'll learn how models, and the modelr package, work in the final part of the book, [model](#model-intro). We're not discussing modelling in this book because understanding what models are and how they work is easiest once you have tools of data wrangling and programming in hand.
We're saving modelling for later because understanding what models are and how they work is easiest once you have tools of data wrangling and programming in hand.
## ggplot2 calls ## ggplot2 calls

View File

@ -30,11 +30,6 @@ rmd_files: [
"vectors.Rmd", "vectors.Rmd",
"iteration.Rmd", "iteration.Rmd",
"model.Rmd",
"model-basics.Rmd",
"model-building.Rmd",
"model-many.Rmd",
"communicate.Rmd", "communicate.Rmd",
"rmarkdown.Rmd", "rmarkdown.Rmd",
"communicate-plots.Rmd", "communicate-plots.Rmd",

View File

@ -99,6 +99,7 @@ ggplot(df, aes(x, y)) +
2. The `geom_smooth()` is somewhat misleading because the `hwy` for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines. 2. The `geom_smooth()` is somewhat misleading because the `hwy` for large engines is skewed upwards due to the inclusion of lightweight sports cars with big engines.
Use your modelling tools to fit and display a better model. Use your modelling tools to fit and display a better model.
<!--# TO DO: Reconsider this exercise in light of removing modeling chapters. -->
3. Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand. 3. Take an exploratory graphic that you've created in the last month, and add informative titles to make it easier for others to understand.

View File

@ -2,7 +2,7 @@
# Introduction {#communicate-intro} # Introduction {#communicate-intro}
So far, you've learned the tools to get your data into R, tidy it into a form convenient for analysis, and then understand your data through transformation, visualisation and modelling. So far, you've learned the tools to get your data into R, tidy it into a form convenient for analysis, and then understand your data through transformation, and visualisation.
However, it doesn't matter how great your analysis is unless you can explain it to others: you need to **communicate** your results. However, it doesn't matter how great your analysis is unless you can explain it to others: you need to **communicate** your results.
```{r echo = FALSE, out.width = "75%"} ```{r echo = FALSE, out.width = "75%"}

View File

@ -20,7 +20,6 @@ In this part of the book you will learn some useful tools that have an immediate
- Finally, in [exploratory data analysis], you'll combine visualisation and transformation with your curiosity and scepticism to ask and answer interesting questions about data. - Finally, in [exploratory data analysis], you'll combine visualisation and transformation with your curiosity and scepticism to ask and answer interesting questions about data.
Modelling is an important part of the exploratory process, but you don't have the skills to effectively learn or apply it yet. Modelling is an important part of the exploratory process, but you don't have the skills to effectively learn or apply it yet.
We'll come back to it in [modelling](#model-intro), once you're better equipped with more data wrangling and programming tools.
Nestled among these three chapters that teach you the tools of exploration are three chapters that focus on your R workflow. Nestled among these three chapters that teach you the tools of exploration are three chapters that focus on your R workflow.
In [workflow: basics], [workflow: scripts], and [workflow: projects] you'll learn good practices for writing and organising your R code. In [workflow: basics], [workflow: scripts], and [workflow: projects] you'll learn good practices for writing and organising your R code.

View File

@ -639,7 +639,7 @@ There are two alternatives:
``` ```
Feather tends to be faster than RDS and is usable outside of R. Feather tends to be faster than RDS and is usable outside of R.
RDS supports list-columns (which you'll learn about in [many models]); feather currently does not. RDS supports list-columns (which you'll learn about in <!--# TO DO: Link to to-be-added list columns chapter. -->); feather currently does not.
```{r, include = FALSE} ```{r, include = FALSE}
file.remove("challenge-2.csv") file.remove("challenge-2.csv")

View File

@ -14,7 +14,7 @@ documentclass: book
# Welcome {.unnumbered} # Welcome {.unnumbered}
<a href="http://amzn.to/2aHLAQ1"><img src="cover.png" alt="Buy from amazon" class="cover" width="250" height="375"/></a> This is the website for the work-in-progress 2nd edition of **"R for Data Science"**. This book will teach you how to do data science with R: You'll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. <a href="http://amzn.to/2aHLAQ1"><img src="cover.png" alt="Buy from amazon" class="cover" width="250" height="375"/></a> This is the website for the work-in-progress 2nd edition of **"R for Data Science"**. This book will teach you how to do data science with R: You'll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it.
In this book, you will find a practicum of skills for data science. <!--# TO DO: Should "model it" stay here? Omitted? Mentioned with an explanation as to where to go for modeling? --> In this book, you will find a practicum of skills for data science.
Just as a chemist learns how to clean test tubes and stock a lab, you'll learn how to clean data and draw plots---and many other things besides. Just as a chemist learns how to clean test tubes and stock a lab, you'll learn how to clean data and draw plots---and many other things besides.
These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R.
You'll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You'll learn how to use the grammar of graphics, literate programming, and reproducible research to save time.

View File

@ -140,7 +140,6 @@ Hypothesis confirmation is hard for two reasons:
2. You can only use an observation once to confirm a hypothesis. 2. You can only use an observation once to confirm a hypothesis.
As soon as you use it more than once you're back to doing exploratory analysis. As soon as you use it more than once you're back to doing exploratory analysis.
This means to do hypothesis confirmation you need to "preregister" (write out in advance) your analysis plan, and not deviate from it even when you have seen the data. This means to do hypothesis confirmation you need to "preregister" (write out in advance) your analysis plan, and not deviate from it even when you have seen the data.
We'll talk a little about some strategies you can use to make this easier in [modelling](#model-intro).
It's common to think about modelling as a tool for hypothesis confirmation, and visualisation as a tool for hypothesis generation. It's common to think about modelling as a tool for hypothesis confirmation, and visualisation as a tool for hypothesis generation.
But that's a false dichotomy: models are often used for exploration, and with a little care you can use visualisation for confirmation. But that's a false dichotomy: models are often used for exploration, and with a little care you can use visualisation for confirmation.

View File

@ -423,7 +423,7 @@ There's no way to list every possible function that you might use, but here's a
- Logs: `log()`, `log2()`, `log10()`. - Logs: `log()`, `log2()`, `log10()`.
Logarithms are an incredibly useful transformation for dealing with data that ranges across multiple orders of magnitude. Logarithms are an incredibly useful transformation for dealing with data that ranges across multiple orders of magnitude.
They also convert multiplicative relationships to additive, a feature we'll come back to in modelling. They also convert multiplicative relationships to additive.
All else being equal, I recommend using `log2()` because it's easy to interpret: a difference of 1 on the log scale corresponds to doubling on the original scale and a difference of -1 corresponds to halving. All else being equal, I recommend using `log2()` because it's easy to interpret: a difference of 1 on the log scale corresponds to doubling on the original scale and a difference of -1 corresponds to halving.