Rename chapters

This commit is contained in:
hadley 2016-07-19 08:16:35 -05:00
parent 51f8c2d8b5
commit 028b165236
3 changed files with 2 additions and 22 deletions

View File

@ -552,23 +552,3 @@ ggplot(data = diamonds2, mapping = aes(x = carat, y = resid)) +
ggplot(data = diamonds2, mapping = aes(x = cut, y = resid)) +
geom_boxplot()
```
## What's next?
__Part 1__ (this part) of the book has given you the basic tools to do data science. Just by knowing how to transform and visualise data, there is are tremendous number of insights that you can understand. And somewhat counterintuitively, these tools scale really well to big data: the bigger data the more important that simple tools like binning and counting become.
To see what's coming up in the rest of the book, it's useful to refer back to my model of data science:
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/data-science.png")
```
The main tool that you are missing is modelling. Modelling is important because once you have recognise a pattern, a model allows you to make that pattern quantitative and precise, and partition it out from what remains. That supports a powerful interative appraoch where you indentify a pattern with visualisation, then subtract with a model, allowing you to see the subtler trends that remain. I deliberately chose not to teach modelling yet, because understanding what models are and how they work are easiest once you have some other tools in hand: data wrangling, and programming.
__Part 2__, up next, covers data wrangling. So far we've focussed on datasets that are already in the right form in R. In real life, you'll need tools to get your data into R (import it), organise it into an consistent format (tidy it), and then specialised tools for specialised types of data (like strings and dates).
__Part 3__ teaches you more about programming. All of this work will involve a computer; you cannot do it in your head, nor with paper and pencil. To work efficiently, you will need to know how to program in a computer language, such as R.
Now we can return to modelling in __Part 4__. You'll use your new tools of data wrangling and programming, to fit many models and understand how they work. The focus of this book is on exploration, not confirmation or formal inference. But you'll learn a few basic tools that help you understand the variation within your models.
The successful completion of a data science project you will have built up a good understand of what is going on with the data. It doesn't matter how brilliant your understand is unless you can communicate it with others. You will need to share your work in a way that your audience can understand. Your audience might be fellow scientists who will want to reproduce the work, non-scientists who will want to understand your findings in plain terms, or yourself (in the future) who will be thankful if you make your work easy to re-learn and recreate. __Part 5__ discusses communication, and how you can use RMarkdown to generate reproducible artefacts that combine prose and code.

View File

@ -7,7 +7,7 @@ rmd_files: [
"explore.Rmd",
"visualize.Rmd",
"transform.Rmd",
"variation.Rmd",
"EDA.Rmd",
"wrangle.Rmd",
"import.Rmd",
@ -19,7 +19,7 @@ rmd_files: [
"program.Rmd",
"pipes.Rmd",
"functions.Rmd",
"data-structures.Rmd",
"vectors.Rmd",
"iteration.Rmd",
"hierarchy.Rmd",