r4ds/model-assess.Rmd

# Model assessment

```{r setup-model, include=FALSE}
library(purrr)
set.seed(1014)
options(digits = 3)
```

In this chapter, you'll turn the tools of multiple models towards model assessment: learning how the model performs when giving new data. So far we've focussed on models as tools for description, using models to help us understand the patterns in the data we have collected so far. But ideally a model will do more than just describe what we have seen so far - it will also help predict what will come next.  

In other words, we want a model that doesn't just perform well on the sample, but also accurately summarises the underlying population.

In some industries this is primarily the use of models: you spend relatively little time fitting the model compared to how many times you use it.

There are two basic ways that a model can fail with new data:

* You can under- or over-fit the model.  Underfitting is where you fail
  to model and important trend: you leave too much in the residuals, and not 
  enough in the model. Overfitting is the opposite: you fit a trend to
  what is actually random noise: you've too put much model and not left
  enough in the residuals. Generally overfitting tends to be more of a 
  problem than underfitting.

* The process that generates the data might change. There's nothing the 
  model can do about this. You can protect yourself against this to some
  extent by creating models that you understand and applying your knowledge
  to the problem. Are these fundamentals likely to change? If you have 
  a model that you are going to use again and again for a long time, you
  need to plan to maintain the model, regularly checking that it still 
  makes sense. i.e. is the population the same?
  
    <http://research.google.com/pubs/pub43146.html>
    <http://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/>
 
The most common problem with a model that causes it to do poorly with new data is overfitting. 


Obviously, there's a bit of a problem here: we don't have new data with which to check the model, and even if we did, we'd presumably use it to make the model better in the first place. One powerful technique of approaches can help us get around this problem: resampling.

There are two main resampling techniques that we're going to cover.

* We will use __cross-validation__ to assess model quality. In 
  cross-validation, you split the data into test and training sets. You fit 
  the data to the training set, and evaluate it on the test set. This avoids
  intrinsic bias of using the same data to both fit the model and assess it's
  quality. However it introduces a new bias: you're not using all the data to
  fit the model so it's going to be quite as good as it could be.
  
* We will use __boostrapping__ to understand how stable (or how variable)
  the model is. If you sample data from the same population multiple times, 
  how much does your model vary? Instead of going back to collect new data, 
  you can use the best estimate of the population data: the data you've
  collected so far. The amazing idea of the bootstrap is that you can resample
  from the data you already have.

There are lots of high-level helpers to do these resampling methods in R. We're going to use the tools provided by the modelr package because they are explicit - you'll see exactly what's going on at each step. 

<http://topepo.github.io/caret>. [Applied Predictive Modeling](https://amzn.com/1461468485), by Max Kuhn and Kjell Johnson.

If you're competing in competitions, like Kaggle, that are predominantly about creating good predicitons, developing a good strategy for avoiding overfitting is very important. Otherwise you risk tricking yourself into thinking that you have a good model, when in reality you just have a model that does a good job of fitting your data.

There is a closely related family that uses a similar idea: model ensembles. However, instead of trying to find the best models, ensembles make use of all the models, acknowledging that even models that don't fit all the data particularly well can still model some subsets well. In general, you can think of model ensemble techniques as functions that take a list of models, and a return a single model that attempts to take the best part of each.

### Confirmatory analysis

Split between exploratory and confirmatory analysis. The focus of this book is on using data to generate hypotheses and explore them.

Either is fine, but confirmatory is much much harder. If you want your confirmatory analysis to be correct, you need to take a stricter approach:

1.  60% of your data goes into a __training__ set. You're allowed to do
    anything you like with this data: visualise it, fit tons of models to it,
    cross-validate it.
  
1.  20% of goes into a __query__ set. You can use this data
    to compare models by hand, but you're not allowed to use it automatically.

1.  20% goes into  amount back for a __test__ set. You can only use this
    data ONCE, to test your final model. If you use this data more than
    once you're no longer doing confirmatory analysis, you're doing exploratory
    analysis.

### Prerequisites

```{r setup, message = FALSE}
# Standard data manipulation and visulisation
library(dplyr)
library(ggplot2)

# Tools for working with models
library(broom)
library(modelr)
library(splines)

# Tools for working with lots of models
library(purrr)
library(tidyr)
```

## Overfitting

Both bootstrapping and cross-validation help us to spot and remedy the problem of __over fitting__, where the model fits the data we've seen so far extremely well, but does a bad job of generalising to new data.

A classic example of over-fitting is to using a polynomial with too many degrees of freedom.

Bias - variance tradeoff.  Simpler = more biased. Complex = more variable.  Occam's razor.

```{r}
true_model <- function(x) {
  1 + 2 * x + rnorm(length(x), sd = 0.25)
}

df <- data_frame(
  x = seq(0, 1, length = 20),
  y = true_model(x)
)

df %>% 
  ggplot(aes(x, y)) +
  geom_point()
```

We can create a model that fits this data very well:

```{r, message = FALSE}
library(splines)
my_model <- function(df) {
  lm(y ~ poly(x, 7), data = df)
}

mod <- my_model(df)
rmse(mod, df)

grid <- df %>% expand(x = seq_range(x, 50))
preds <- grid %>% add_predictions(mod, var = "y")

df %>% 
  ggplot(aes(x, y)) +
  geom_line(data = preds) + 
  geom_point()
```

As we fit progressively more and more complicated models, the model error decreases:

```{r}
fs <- list(
  y ~ x,
  y ~ poly(x, 2),
  y ~ poly(x, 3),
  y ~ poly(x, 4),
  y ~ poly(x, 5),
  y ~ poly(x, 6),
  y ~ poly(x, 7)
)

models <- data_frame(
  n = 1:7, 
  f = fs,
  mod = map(f, lm, data = df),
  rmse = map2_dbl(mod, list(df), rmse)
)

models %>% 
  ggplot(aes(n, rmse)) + 
  geom_line(colour = "grey70") + 
  geom_point(size = 3)
```

But do you think this model will do well if we apply it to new data from the same population? 

In real-life you can't easily go out and recollect your data. There are two approach to help you get around this problem. I'll introduce them briefly here, and then we'll go into more depth in the following sections.

```{r}
boot <- bootstrap(df, 100) %>% mutate(
  mod = map(strap, my_model),
  pred = map2(list(grid), mod, add_predictions)
)

boot %>% 
  unnest(pred) %>% 
  ggplot(aes(x, pred, group = .id)) +
  geom_line(alpha = 1/3)
```

It's a little easier to see what's going on if we zoom on the y axis:

```{r}
last_plot() + 
  coord_cartesian(ylim = c(0, 5))
```

(You might notice that while each individual model varies a lot, the average of all the models seems like it might not be that bad. That gives rise to a model ensemble technique called model averaging.)

Bootstrapping is a useful tool to help us understand how the model might vary if we'd collected a different sample from the population. A related technique is cross-validation which allows us to explore the quality of the model. It works by repeatedly splitting the data into two pieces. One piece, the training set, is used to fit, and the other piece, the test set, is used to measure the model quality.

The following code generates 100 test-training splits, holding out 20% of the data for testing each time. We then fit a model to the training set, and evalute the error on the test set:

```{r}
cv <- crossv_mc(df, 100) %>% 
  mutate(
    mod = map(train, my_model),
    rmse = map2_dbl(mod, test, rmse)
  )
cv
```

Obviously, a plot is going to help us see distribution more easily. I've added our original estimate of the model error as a white vertical line (where the same dataset is used for both training and teseting), and you can see it's very optimistic.

```{r}
cv %>% 
  ggplot(aes(rmse)) +
  geom_ref_line(v = rmse(mod, df)) +
  geom_freqpoly(binwidth = 0.2) +
  geom_rug()
```

The distribution of errors is highly skewed: there are a few cases which have very high errors. These respresent samples where we ended up with a few cases on all with low values or high values of x.  Let's take a look:

```{r}
filter(cv, rmse > 1.5) %>% 
  unnest(map(train, as.data.frame)) %>% 
  ggplot(aes(x, .id)) + 
    geom_point() + 
    xlim(0, 1)
```

All of the models that fit particularly poorly were fit to samples that either missed the first one or two or the last one or two observation. Because polynomials shoot off to positive and negative, they give very bad predictions for those values.

Now that we've given you a quick overview and intuition for these techniques, lets dive in more more detail.

## Resamples

### Building blocks

Both the boostrap and cross-validation are build on top of a "resample" object. In modelr, you can access these low-level tools directly with the `resample_*` functions. 

These functions return an object of class "resample", which represents the resample in a memory efficient way. Instead of storing the resampled data set itself, it instead stores the integer indices, and a "pointer" to the original dataset. This makes resamples take up much less memory.

```{r}
x <- resample_bootstrap(as_data_frame(mtcars))
class(x)

x
```

Most modelling functions call `as.data.frame()` on the `data` argument. This generates a resampled data frame. Because it's called automatically you can just pass the object.

```{r}
lm(mpg ~ wt, data = x)
```

If you get a strange error, it's probably because the modelling function doesn't do this, and you need to do it yourself. You'll also need to do it yourself if you want to `unnest()` the data so you can visualise it.  If you want to just get the rows selected, you can use `as.integer()`.

### Dataframe API

`bootstrap()` and `crossv_mc()` are built on top of these simpler primitives. They are designed to work naturally in a model exploration environment by returning data frames. Each row of the data frame represents a single sample. They return slightly different columns:

*   `boostrap()` returns a data frame with two columns:

    ```{r}
    bootstrap(df, 3)
    ```
    
    `strap` gives the bootstrap sample dataset, and `.id` assigns a 
    unique identifer to each model (this is often useful for plotting)
    
*   `crossv_mc()` return a data frame with three columns:

    ```{r}
    crossv_mc(df, 3)
    ```
    `train` contains the data that you should use to fit (train) the model,
    and `test` contains the data you should use to validate the model. Together,
    the test and train columns form an exclusive partition of the full dataset.

## Bootstrapping


## Cross-validation
Make sure first element is heading 2015-12-12 02:34:20 +08:00			`# Model assessment`

Local bookdown working 2015-12-12 03:28:10 +08:00			```{r setup-model, include=FALSE}
Add needed packages to model-assess 2015-12-07 23:06:19 +08:00			`library(purrr)`
			`set.seed(1014)`
			`options(digits = 3)`
			```

More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`In this chapter, you'll turn the tools of multiple models towards model assessment: learning how the model performs when giving new data. So far we've focussed on models as tools for description, using models to help us understand the patterns in the data we have collected so far. But ideally a model will do more than just describe what we have seen so far - it will also help predict what will come next.`
Add needed packages to model-assess 2015-12-07 23:06:19 +08:00
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`In other words, we want a model that doesn't just perform well on the sample, but also accurately summarises the underlying population.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`In some industries this is primarily the use of models: you spend relatively little time fitting the model compared to how many times you use it.`

			`There are two basic ways that a model can fail with new data:`

			`* You can under- or over-fit the model. Underfitting is where you fail`
			`to model and important trend: you leave too much in the residuals, and not`
			`enough in the model. Overfitting is the opposite: you fit a trend to`
			`what is actually random noise: you've too put much model and not left`
			`enough in the residuals. Generally overfitting tends to be more of a`
			`problem than underfitting.`

			`* The process that generates the data might change. There's nothing the`
			`model can do about this. You can protect yourself against this to some`
			`extent by creating models that you understand and applying your knowledge`
			`to the problem. Are these fundamentals likely to change? If you have`
			`a model that you are going to use again and again for a long time, you`
			`need to plan to maintain the model, regularly checking that it still`
			`makes sense. i.e. is the population the same?`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`<http://research.google.com/pubs/pub43146.html>`
			`<http://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/>`

			`The most common problem with a model that causes it to do poorly with new data is overfitting.`


			`Obviously, there's a bit of a problem here: we don't have new data with which to check the model, and even if we did, we'd presumably use it to make the model better in the first place. One powerful technique of approaches can help us get around this problem: resampling.`

			`There are two main resampling techniques that we're going to cover.`

			`* We will use __cross-validation__ to assess model quality. In`
			`cross-validation, you split the data into test and training sets. You fit`
			`the data to the training set, and evaluate it on the test set. This avoids`
			`intrinsic bias of using the same data to both fit the model and assess it's`
			`quality. However it introduces a new bias: you're not using all the data to`
			`fit the model so it's going to be quite as good as it could be.`

			`* We will use __boostrapping__ to understand how stable (or how variable)`
			`the model is. If you sample data from the same population multiple times,`
			`how much does your model vary? Instead of going back to collect new data,`
			`you can use the best estimate of the population data: the data you've`
			`collected so far. The amazing idea of the bootstrap is that you can resample`
			`from the data you already have.`

			`There are lots of high-level helpers to do these resampling methods in R. We're going to use the tools provided by the modelr package because they are explicit - you'll see exactly what's going on at each step.`

			`<http://topepo.github.io/caret>. [Applied Predictive Modeling](https://amzn.com/1461468485), by Max Kuhn and Kjell Johnson.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`If you're competing in competitions, like Kaggle, that are predominantly about creating good predicitons, developing a good strategy for avoiding overfitting is very important. Otherwise you risk tricking yourself into thinking that you have a good model, when in reality you just have a model that does a good job of fitting your data.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`There is a closely related family that uses a similar idea: model ensembles. However, instead of trying to find the best models, ensembles make use of all the models, acknowledging that even models that don't fit all the data particularly well can still model some subsets well. In general, you can think of model ensemble techniques as functions that take a list of models, and a return a single model that attempts to take the best part of each.`

More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`### Confirmatory analysis`

			`Split between exploratory and confirmatory analysis. The focus of this book is on using data to generate hypotheses and explore them.`

			`Either is fine, but confirmatory is much much harder. If you want your confirmatory analysis to be correct, you need to take a stricter approach:`

			`1. 60% of your data goes into a __training__ set. You're allowed to do`
			`anything you like with this data: visualise it, fit tons of models to it,`
			`cross-validate it.`

			`1. 20% of goes into a __query__ set. You can use this data`
			`to compare models by hand, but you're not allowed to use it automatically.`

			`1. 20% goes into amount back for a __test__ set. You can only use this`
			`data ONCE, to test your final model. If you use this data more than`
			`once you're no longer doing confirmatory analysis, you're doing exploratory`
			`analysis.`

Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`### Prerequisites`

			```{r setup, message = FALSE}
			`# Standard data manipulation and visulisation`
			`library(dplyr)`
			`library(ggplot2)`

			`# Tools for working with models`
			`library(broom)`
			`library(modelr)`
Update for modelr API changes 2016-06-15 21:27:35 +08:00			`library(splines)`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
			`# Tools for working with lots of models`
			`library(purrr)`
			`library(tidyr)`
			```

			`## Overfitting`

			`Both bootstrapping and cross-validation help us to spot and remedy the problem of __over fitting__, where the model fits the data we've seen so far extremely well, but does a bad job of generalising to new data.`

More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`A classic example of over-fitting is to using a polynomial with too many degrees of freedom.`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
			`Bias - variance tradeoff. Simpler = more biased. Complex = more variable. Occam's razor.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
			```{r}
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`true_model <- function(x) {`
			`1 + 2 * x + rnorm(length(x), sd = 0.25)`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00			`}`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
			`df <- data_frame(`
			`x = seq(0, 1, length = 20),`
			`y = true_model(x)`
			`)`

			`df %>%`
			`ggplot(aes(x, y)) +`
			`geom_point()`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00			```

More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`We can create a model that fits this data very well:`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			```{r, message = FALSE}
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`library(splines)`
			`my_model <- function(df) {`
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`lm(y ~ poly(x, 7), data = df)`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`}`

			`mod <- my_model(df)`
			`rmse(mod, df)`

			`grid <- df %>% expand(x = seq_range(x, 50))`
Update to latest modelr 2016-06-16 03:10:48 +08:00			`preds <- grid %>% add_predictions(mod, var = "y")`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`df %>%`
			`ggplot(aes(x, y)) +`
			`geom_line(data = preds) +`
			`geom_point()`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00			```

More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`As we fit progressively more and more complicated models, the model error decreases:`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
			```{r}
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`fs <- list(`
			`y ~ x,`
			`y ~ poly(x, 2),`
			`y ~ poly(x, 3),`
			`y ~ poly(x, 4),`
			`y ~ poly(x, 5),`
			`y ~ poly(x, 6),`
			`y ~ poly(x, 7)`
			`)`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`models <- data_frame(`
			`n = 1:7,`
			`f = fs,`
			`mod = map(f, lm, data = df),`
			`rmse = map2_dbl(mod, list(df), rmse)`
			`)`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`models %>%`
			`ggplot(aes(n, rmse)) +`
			`geom_line(colour = "grey70") +`
			`geom_point(size = 3)`
			```
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`But do you think this model will do well if we apply it to new data from the same population?`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`In real-life you can't easily go out and recollect your data. There are two approach to help you get around this problem. I'll introduce them briefly here, and then we'll go into more depth in the following sections.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
			```{r}
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`boot <- bootstrap(df, 100) %>% mutate(`
			`mod = map(strap, my_model),`
			`pred = map2(list(grid), mod, add_predictions)`
			`)`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`boot %>%`
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`unnest(pred) %>%`
			`ggplot(aes(x, pred, group = .id)) +`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`geom_line(alpha = 1/3)`
			```
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`It's a little easier to see what's going on if we zoom on the y axis:`

			```{r}
			`last_plot() +`
			`coord_cartesian(ylim = c(0, 5))`
			```

			`(You might notice that while each individual model varies a lot, the average of all the models seems like it might not be that bad. That gives rise to a model ensemble technique called model averaging.)`

			`Bootstrapping is a useful tool to help us understand how the model might vary if we'd collected a different sample from the population. A related technique is cross-validation which allows us to explore the quality of the model. It works by repeatedly splitting the data into two pieces. One piece, the training set, is used to fit, and the other piece, the test set, is used to measure the model quality.`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`The following code generates 100 test-training splits, holding out 20% of the data for testing each time. We then fit a model to the training set, and evalute the error on the test set:`
Extract out multiple models to modelling chapter 2015-12-06 17:11:52 +08:00
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			```{r}
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`cv <- crossv_mc(df, 100) %>%`
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`mutate(`
			`mod = map(train, my_model),`
			`rmse = map2_dbl(mod, test, rmse)`
			`)`
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`cv`
			```

			`Obviously, a plot is going to help us see distribution more easily. I've added our original estimate of the model error as a white vertical line (where the same dataset is used for both training and teseting), and you can see it's very optimistic.`
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			```{r}
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			`cv %>%`
			`ggplot(aes(rmse)) +`
			`geom_ref_line(v = rmse(mod, df)) +`
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`geom_freqpoly(binwidth = 0.2) +`
Brain dump on model assessment 2016-06-14 22:02:17 +08:00			`geom_rug()`
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			```

			`The distribution of errors is highly skewed: there are a few cases which have very high errors. These respresent samples where we ended up with a few cases on all with low values or high values of x. Let's take a look:`

			```{r}
			`filter(cv, rmse > 1.5) %>%`
			`unnest(map(train, as.data.frame)) %>%`
			`ggplot(aes(x, .id)) +`
			`geom_point() +`
			`xlim(0, 1)`
			```

			`All of the models that fit particularly poorly were fit to samples that either missed the first one or two or the last one or two observation. Because polynomials shoot off to positive and negative, they give very bad predictions for those values.`

			`Now that we've given you a quick overview and intuition for these techniques, lets dive in more more detail.`

			`## Resamples`

			`### Building blocks`

			Both the boostrap and cross-validation are build on top of a "resample" object. In modelr, you can access these low-level tools directly with the `resample_*` functions.
Brain dump on model assessment 2016-06-14 22:02:17 +08:00
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00			`These functions return an object of class "resample", which represents the resample in a memory efficient way. Instead of storing the resampled data set itself, it instead stores the integer indices, and a "pointer" to the original dataset. This makes resamples take up much less memory.`

			```{r}
			`x <- resample_bootstrap(as_data_frame(mtcars))`
			`class(x)`

			`x`
			```

			Most modelling functions call `as.data.frame()` on the `data` argument. This generates a resampled data frame. Because it's called automatically you can just pass the object.

			```{r}
			`lm(mpg ~ wt, data = x)`
More brainstorming about role of assessment Clean up/simplify examples a lil 2016-06-17 02:36:02 +08:00			```
More work on model assessment * some thoughts from GA * finally decent over fitting example * talk about modelr data structures a bit 2016-06-17 21:49:46 +08:00
			If you get a strange error, it's probably because the modelling function doesn't do this, and you need to do it yourself. You'll also need to do it yourself if you want to `unnest()` the data so you can visualise it. If you want to just get the rows selected, you can use `as.integer()`.

			`### Dataframe API`

			`bootstrap()` and `crossv_mc()` are built on top of these simpler primitives. They are designed to work naturally in a model exploration environment by returning data frames. Each row of the data frame represents a single sample. They return slightly different columns:

			* `boostrap()` returns a data frame with two columns:

			```{r}
			`bootstrap(df, 3)`
			```

			`strap` gives the bootstrap sample dataset, and `.id` assigns a
			`unique identifer to each model (this is often useful for plotting)`

			* `crossv_mc()` return a data frame with three columns:

			```{r}
			`crossv_mc(df, 3)`
			```
			`train` contains the data that you should use to fit (train) the model,
			and `test` contains the data you should use to validate the model. Together,
			`the test and train columns form an exclusive partition of the full dataset.`

			`## Bootstrapping`


			`## Cross-validation`