More modelling thoughts

This commit is contained in:
hadley 2016-06-20 08:31:16 -05:00
parent c1ec9d4f1a
commit c8c9dc4d07
3 changed files with 8 additions and 1 deletions

View File

@ -6,7 +6,7 @@ set.seed(1014)
options(digits = 3)
```
In this chapter, you'll turn the tools of multiple models towards model assessment: learning how the model performs when giving new data. So far we've focussed on models as tools for description, using models to help us understand the patterns in the data we have collected so far. But ideally a model will do more than just describe what we have seen so far - it will also help predict what will come next.
In this chapter, you'll turn the tools of multiple models towards model assessment: learning how the model performs when given new data. So far we've focussed on models as tools for description, using models to help us understand the patterns in the data we have collected so far. But ideally a model will do more than just describe what we have seen so far - it will also help predict what will come next.
In other words, we want a model that doesn't just perform well on the sample, but also accurately summarises the underlying population.

View File

@ -2,6 +2,7 @@
A model is a function that summarizes how the values of one variable vary in relation to the values of other variables. Models play a large role in hypothesis testing and prediction, but for the moment you should think of models just like you think of statistics. A statistic summarizes a *distribution* in a way that is easy to understand; and a model summarizes *covariation* in a way that is easy to understand. In other words, a model is just another way to describe data.
Family of models vs fitted model. Set of possible values, vs. one specific model. A fitted model = family of models plus a dataset.
This chapter will explain how to build useful models with R.

View File

@ -25,8 +25,14 @@ In the course of modelling, you'll often discover data quality problems. Maybe a
<https://blog.engineyard.com/2014/pets-vs-cattle>.
<https://en.wikipedia.org/wiki/R/K_selection_theory>
## Exploring vs. confirming
In this book we are going to focus on models primarily as tools for description. This is rather non-standard because we're normally interested in models for their inferential power: their ability to make accurate predictions for observations that we haven't seen yet.
In other words, in this book, we're typically going to think about a good model as a model that well captures the patterns that we see in the data. For now, a good model captures the majority of the patterns that are generated by the underlying mechanism of interest, and captures few patterns that are not generated by that mechanism. When you go on from this book and learn other ways of thinking about models this will stand you in good stead: if you can't capture patterns in the data that you can see, it's unlikely you'll be able to make good predictions about data that you haven't seen.
It's not possible to do both on the same data set.
Doing correct inference is hard!
Generally, however, this will tend to make us over-optimistic about the quality of our model. Chapter XXX you'll start to learn more about how we can judge the quality of a model on data that it was 't fit it. But you have to beware of overfitting the data - in the next section we'll discuss some formal methods. But a healthy dose of scepticism is also as powerful as precise quantitative methods: do you believe that a pattern you see in your sample is going to generalise to a wider population?