Manually integrate changes

Closes #424
This commit is contained in:
hadley 2016-10-03 07:39:46 -05:00
parent 1468be51b5
commit c1f90f87e4
1 changed files with 4 additions and 4 deletions

View File

@ -10,7 +10,7 @@ knitr::include_graphics("diagrams/data-science-model.png")
The goal of a model is to provide a simple low-dimensional summary of a dataset. Ideally, the model will capture true "signals" (i.e. patterns generated by the phenomenon of interest), and ignore "noise" (i.e. random variation that you're not interested in). Here we only cover "predictive" models, which, as the name suggests, generate predictions. There is another type of model that we're not going to discuss: "data discovery" models. These models don't make predictions, but instead help you discover interesting relationships within your data. (These two categories of models are sometimes called supervised and unsupervised, but I don't think that terminology is particularly illuminating.)
This book is not going to give you a deep understanding of the mathematical theory that underlies models. It will, however, build your intution about how statistical models work, and give you a family of useful tools that allow you to use models to better understand your data:
This book is not going to give you a deep understanding of the mathematical theory that underlies models. It will, however, build your intuition about how statistical models work, and give you a family of useful tools that allow you to use models to better understand your data:
* In [model basics], you'll learn how models work mechanistically, focussing on
the important family of linear models. You'll learn general tools for gaining
@ -19,7 +19,7 @@ This book is not going to give you a deep understanding of the mathematical theo
* In [model building], you'll learn how to use models to pull out known
patterns in real data. Once you have recognised an important pattern
it's useful to make it explicitly in a model, because then you can
it's useful to make it explicit in a model, because then you can
more easily see the subtler signals that remain.
* In [many models], you'll learn how to use many simple models to help
@ -30,7 +30,7 @@ These topics are notable because of what they don't include: any tools for quant
## Hypothesis generation vs. hypothesis confirmation
In this book, we are going to use models as a tool for exploration, completing the trifecta of the tools for tools EDA that were introduced in Part 1. This is not how models are usually taught, but as you will see, models are an important tool for exploration. Traditionally, the focus of modelling is on inference, or for confirming that an hypothesis is true. Doing this correctly is not complicated, but it is hard. There is a pair of ideas that you must understand in order to do inference correctly:
In this book, we are going to use models as a tool for exploration, completing the trifecta of the tools for EDA that were introduced in Part 1. This is not how models are usually taught, but as you will see, models are an important tool for exploration. Traditionally, the focus of modelling is on inference, or for confirming that an hypothesis is true. Doing this correctly is not complicated, but it is hard. There is a pair of ideas that you must understand in order to do inference correctly:
1. Each observation can either be used for exploration or confirmation,
not both.
@ -54,6 +54,6 @@ If you are serious about doing an confirmatory analysis, one approach is to spli
1. 20% is held back for a __test__ set. You can only use this data ONCE, to
test your final model.
This partitioning allows you to explore the training data, occassionally generating candidate hypotheses that you check with the query set. When you are confident you have the right model, you can check it once with the test data.
This partitioning allows you to explore the training data, occasionally generating candidate hypotheses that you check with the query set. When you are confident you have the right model, you can check it once with the test data.
(Note that even when doing confirmatory modelling, you will still need to do EDA. If you don't do any EDA you will remain blind to the quality problems with your data.)