model-vis -> model-building

This commit is contained in:
hadley 2016-06-08 09:34:14 -05:00
parent edcc140da8
commit f84f26a7ec
2 changed files with 12 additions and 12 deletions

View File

@ -4,6 +4,8 @@ A model is a function that summarizes how the values of one variable vary in rel
In this book, we'll focus on deep tools for rather simple models. We want to give you tools to help you build your intuition for what models do. There's little mathematical formalism, and only a relatively small overlap with how modelling is normally presented in statistics. Modelling is a huge topic and we can only scratch the surface here. You'll definitely need to consult other resources as the complexity of your models grow.
We're going to give you a basic strategy, and point you to places to learn more. The key is to think about data generated from your model as regular data - you're going to want to manipulate it and visualise it in many different ways. Being good at modelling is a mixture of having some good general principles and having a big toolbox of techniques. Here we'll focus on general techniques to help you undertand what your model is telling you.
This chapter will explain how to build useful models with R.
## Outline
@ -166,6 +168,15 @@ summary(h)
For simple models, like this one, you can figure out what the model says about the data by carefully studying the coefficients. If you ever take a formal statistics course on modelling, you'll spend a lot of time doing that. Here, however, we're going to take a different tack. In this book, we're going to focus on understanding a model by looking at its predictions. This has a big advantage: every type of model makes predictions (otherwise what use would it be?) so we can use the same set of techniques to understand simple linear models or complex random forrests. We'll see that advantage later on when we explore some other families of models.
1. Make predictions across a uniform grid to understand the "shape" of
the model.
1. Subtract predictions from the actual values to look at the residuals
and to learn what the model misses.
1. Create succinct numerical summaries when you need to compare
many models.
### Predictions
To visualise the predictions from a model, we start by generating an evenly spaced grid of values that covers the region where our data lies. The easiest way to do that is to use `tidyr::expand()`. It's first argument is a data frame, and for each subsequent argument it finds the unique variables and then generates all combinations:

View File

@ -8,25 +8,14 @@ library(nycflights13)
library(modelr)
```
# Model visualisation
In this chapter we will explore model visualisation from two different sides:
1. Use a model to make it easier to see important patterns in our data.
1. Use visualisation to understand what a model is telling us about our data.
We're going to give you a basic strategy, and point you to places to learn more. The key is to think about data generated from your model as regular data - you're going to want to manipulate it and visualise it in many different ways. Being good at modelling is a mixture of having some good general principles and having a big toolbox of techniques. Here we'll focus on general techniques to help you undertand what your model is telling you.
# Model building
<!-- residuals vs. predictions -->
Centered around looking at residuals and looking at predictions. You'll see those here applied to linear models (and some minor variations), but it's a flexible technique since every model can generate predictions and residuals.
Attack the problem from two directions: building up from a simple model, and subtracting off the full dataset.
Transition from implicit knowledge in your head and in data to explicit knowledge in the model. In other words, you want to make explicit your knowledge of the data and capture it explicitly in a model. This makes it easier to apply to new domains, and easier for others to use. But you must always remember that your knowledge is incomplete. Subtract patterns from the data, and add patterns to the model.
<!-- purpose of modelling -->
What is a good model? We'll think about that more in the next chapter. For now, a good model captures the majority of the patterns that are generated by the underlying mechanism of interest, and captures few patterns that are not generated by that mechanism. Another way to frame that is that you want your model to be good at inference, not just description. Inference is one of the most important parts of a model - you want to not just make statements about the data you have observed, but data that you have not observed (like things that will happen in the future).