r4ds/model.Rmd

---
layout: default
title: Model
---

# Model

A model is a function that summarizes how the values of one variable vary in response to the values of other variables. Models play a large role in hypothesis testing and prediction, but for the moment you should think of models just like you think of statistics. A statistic summarizes a *distribution* in a way that is easy to understand; and a model summarizes *covariation* in a way that is easy to understand. In other words, a model is just another way to describe data.

This chapter will explain how to build useful models with R.
 
### Outline

*Section 1* will explain what models are and what they can do for you.

*Section 2* will introduce R's model syntax, a general syntax that you can reuse with any of R's modeling functions. In this section, you will use the syntax to build linear models, the most commonly used type of model.

*Section 3* will teach you to build and interpret multivariate linear models, models that use more than one explanatory variable to explain the values of a response variable.

*Section 4* will explain how to use categorical variables in your models and how to interpret the results of models that use categorical variables.

*Section 5* will present a logical way to extend linear models to describe non-linear relationships.

### Prerequisites

To access the functions and data sets that we will use in the chapter, load the `ggplot2`, `dplyr`, `mgcv`, `splines`, and `broom` packages:

```{r}
# install.packages("")
library(ggplot2)
library(dplyr)
library(mgcv)
library(splines)
library(broom)
```
  
## What is a model?

Have you heard that a relationship exists between your height and your income? It sounds far-fetched---and maybe it is---but many people believe that taller people will be promoted faster and valued more for their work, an effect that directly inflates the income of the vertically gifted.

Do you think this is true? Could a relationship exist between a person's height and their income? Luckily, it is easy to measure someone's height as well as their income (and a swath of other related variables to boot), which means that we can collect data relevant to the question. In fact, the Bureau of Labor Statistics has been doing this in a controlled way for over 50 years with the [National Longitudinal Surveys (NLS)](https://www.nlsinfo.org/). The NLS tracks the income, education, and life circumstances of a large cohort of Americans across several decades. In case you are wondering, the point of the NLS is not to study the relationhip between height and income, that's just a lucky accident of the data.

You can load the latest cross-section of NLS data, collected in 2013 with the code below. 

```{r echo = FALSE}
heights <- readRDS("data/heights.RDS")
```

I've narrowed the data down to 10 variables:

* `id` - A number ot identify each subject
* `income` - The self-reported income of each subject
* `height` - The height of each subject in inches
* `weight` - The weight of each subject in inches
* `sex` - The sex of each subject
* `race` - The race of each subject
* `education` - The number of years of education completed by each subject
* `asvab` - Each subject's score on the Armed Services Vocational Aptitude Battery (ASVAB), an intelligence assessment, out of 100.
* `sat_math` - Each subject's score on the math portion of the Scholastic Aptitude Test (SAT), out of 800.
* `bdate` - Month of birth with 1 = January.

summary that describes a r, like a mean, median, or variance.
    + Example problem/data set
  

```{r}
head(heights)
```
    
2. As normally taught, modeling conflates three subjects
    + Models as summaries
    + Hypothesis testing
    + Predictive modeling
3. C. This chapter shows how to build a model and use it as a summary. The methods for building a model apply to all three subjects.
  
## How to build a model

1. Best fit
    + Best fit of what? A certain class of function.
    + But how do you know which class to use? In some cases, the data can provide suggestions. In other cases existing theory can provide suggestions. But ultimately, you'll never know for sure. But that's okay, good enough is good enough.
2. What does best fit mean?
    + It may or may not accurately describe the true relationship. Heck, there might not even be a true relationship. But it is the best guess given the data.
    + Example problem/data set
    + It does not mean causation exists. Causation is just one type of relations, which is difficult enough to define, let alone prove.
3. How do you find the best fit?
    + With an algorithm. There is an algorithm to fit each specific class of function. We will cover some of the most useful here.
4. How do you know how good the fit is? 
    + Adjusted $R^{2}$
5. Are we making assumptions when we fit a model?
    + No. Not unless you assume that you've selected the correct type of function (and I see no reason why you should assume that).
    + Assumptions come when you start hypothesis testing.

## Linear models

1. Linear models fit linear functions
2. How to fit in R
    + model syntax, which is reusable with all model functions

```{r}
income ~ height 
mod <- lm(income ~ height, data = heights)
mod
tidy(mod)
glance(mod)
augment(mod)
```

    + visualize
```{r}
ggplot(data = heights, mapping = aes(x = height, y = income)) +
  geom_point() +
  geom_smooth(method = lm)
```
    
    + intercept or no intercept
```{r}
0 + earn ~ height 
tidy(lm(income ~ 0 + height, data = heights))
tidy(lm(income ~ height, data = heights))
```

3. How to interpret
    + extract information. Resid. Predict.
```{r}
augment(mod)$.resid
augment(mod)$.fitted
```
    + Interpret coefficient
```{r}
tidy(mod)$estimate
```

```{r}
heights %>% 
  group_by(sex)  %>% 
  do(glance(lm(income ~ height, data = .)))
```

## Categorical data

```{r}
smod <- lm(income ~ sex, data = heights)
smod
```

1. Factors

```{r}
heights$sex <- factor(heights$sex, levels = c("female", "male"))

smod2 <- lm(income ~ sex, data = heights)

tidy(smod)
tidy(smod2)
```

2. How to interpret


## Multiple Variables

1. How to fit multivariate models in R

```{r}
mmod <- lm(income ~ height + sex, data = heights)
summary(mmod)
```

2. How to interpret

```{r}
tidy(mmod)
```

3. Interaction effects

```{r}
tidy(lm(income ~ height + sex, data = heights))
tidy(lm(income ~ height + sex + height:sex, data = heights))
tidy(lm(income ~ height * sex, data = heights))
```

```{r}
tidy(lm(income ~ height + education, data = heights))
tidy(lm(income ~ height * education, data = heights))
```

4. Partition variance
    + Checking residuals
```{r}
m1 <- lm(income ~ height, data = heights)
# plot histogram of residuals
# plot residulas vs. sex
m2 <- lm(income ~ height + sex, data = heights)
# plot histogram of residuals
# plot residuals vs. education
m3 <- lm(income ~ height + sex + education, data = heights)
# plot histogram of residuals
m4 <- lm(income ~ height + sex + race + education, 
  data = heights)
# plot histogram of residuals
m5 <- lm(income ~ ., data = heights)
```


## Non-linear functions (recipes?)

0. Transformations
    + Log
```{r}
ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point()
ggplot(diamonds, aes(x = log(carat), y = log(price))) +
  geom_point()
```

```{r}
lm(log(price) ~ log(carat), data = diamonds)
# visualize model line
```

    + Logit with `glm()`
    
    
What if no handy transformation exists?

```{r}
ggplot(data = heights, mapping = aes(x = education, y = income)) + 
  geom_boxplot(aes(group = education)) +
  geom_smooth() + 
  coord_cartesian(ylim = c(0, 125000))
```

1. Polynomials
    + How to fit
    + why it doesn't work with missing values
    
```{r}
heights2 <- na.omit(heights)
tidy(lm(income ~ poly(education, 3), data = heights2))

ggplot(data = heights2, mapping = aes(x= education, y = income)) +
  geom_point() +
  geom_smooth(method = lm, formula = y ~ poly(x, 3))
```

    + How to interpret
    + Strengths and Weaknesses
2. Splines
    + How to fit. Knots. Different types of splines.
```{r eval = FALSE}
bs(degree = 1) # linear splines
bs()           # cubic splines
ns()           # natural splines
```

```{r}
tidy(lm(income ~ ns(education, knots = c(10, 17)), data = heights))
tidy(lm(income ~ ns(education, df = 4), data = heights))
```    

```{r}
ggplot(data = heights, mapping = aes(x= education, y = income)) +
  geom_point() +
  geom_smooth(method = lm, formula = y ~ ns(x, df = 4)) +  
  coord_cartesian(ylim = c(0, 125000))
```

    + How to interpret
    + Strengths and weaknesses
    
    
3. General Additive Models
    + How to fit
    
```{r}
gam(income ~ s(education), data = heights)

ggplot(data = heights, mapping = aes(x= education, y = income)) +
  geom_point() +
  geom_smooth(method = gam, formula = y ~ s(x))
```
    + How to interpret
    + Strengths and weaknesses
    
```{r eval = FALSE}
# Linear z
gam(y ~ s(x) + z, data = df)

# Smooth x and smooth z
gam(y ~ s(x) + s(z), data = df)

# Smooth surface of x and z 
# (a smooth function that takes both x and z)
gam(y ~ s(x, z), data = df)
```
Fix yaml metadata 2015-12-17 07:22:03 +08:00			`---`
Restore yaml metadata necessary for travis 2015-12-17 06:41:08 +08:00			`layout: default`
Fix yaml metadata 2015-12-17 07:22:03 +08:00			`title: Model`
			`---`
Restore yaml metadata necessary for travis 2015-12-17 06:41:08 +08:00
Sketch out complete book outline 2015-12-06 22:02:29 +08:00			`# Model`

Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`A model is a function that summarizes how the values of one variable vary in response to the values of other variables. Models play a large role in hypothesis testing and prediction, but for the moment you should think of models just like you think of statistics. A statistic summarizes a distribution in a way that is easy to understand; and a model summarizes covariation in a way that is easy to understand. In other words, a model is just another way to describe data.`
Sketch out complete book outline 2015-12-06 22:02:29 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`This chapter will explain how to build useful models with R.`

Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`### Outline`
Sketch out complete book outline 2015-12-06 22:02:29 +08:00
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`Section 1 will explain what models are and what they can do for you.`
Sketch out complete book outline 2015-12-06 22:02:29 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`Section 2 will introduce R's model syntax, a general syntax that you can reuse with any of R's modeling functions. In this section, you will use the syntax to build linear models, the most commonly used type of model.`
Sketch out complete book outline 2015-12-06 22:02:29 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`Section 3 will teach you to build and interpret multivariate linear models, models that use more than one explanatory variable to explain the values of a response variable.`
Restructuring 2015-12-08 04:57:08 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`Section 4 will explain how to use categorical variables in your models and how to interpret the results of models that use categorical variables.`
Restructuring 2015-12-08 04:57:08 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`Section 5 will present a logical way to extend linear models to describe non-linear relationships.`
Restructuring 2015-12-08 04:57:08 +08:00
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`### Prerequisites`
Restructuring 2015-12-08 04:57:08 +08:00
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			To access the functions and data sets that we will use in the chapter, load the `ggplot2`, `dplyr`, `mgcv`, `splines`, and `broom` packages:
Restructuring 2015-12-08 04:57:08 +08:00
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```{r}
			`# install.packages("")`
			`library(ggplot2)`
			`library(dplyr)`
			`library(mgcv)`
			`library(splines)`
			`library(broom)`
			```

			`## What is a model?`

Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`Have you heard that a relationship exists between your height and your income? It sounds far-fetched---and maybe it is---but many people believe that taller people will be promoted faster and valued more for their work, an effect that directly inflates the income of the vertically gifted.`

			Do you think this is true? Could a relationship exist between a person's height and their income? Luckily, it is easy to measure someone's height as well as their income (and a swath of other related variables to boot), which means that we can collect data relevant to the question. In fact, the Bureau of Labor Statistics has been doing this in a controlled way for over 50 years with the [National Longitudinal Surveys (NLS)](https://www.nlsinfo.org/). The NLS tracks the income, education, and life circumstances of a large cohort of Americans across several decades. In case you are wondering, the point of the NLS is not to study the relationhip between height and income, that's just a lucky accident of the data.

			`You can load the latest cross-section of NLS data, collected in 2013 with the code below.`

Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```{r echo = FALSE}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`heights <- readRDS("data/heights.RDS")`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`I've narrowed the data down to 10 variables:`

			* `id` - A number ot identify each subject
			* `income` - The self-reported income of each subject
			* `height` - The height of each subject in inches
			* `weight` - The weight of each subject in inches
			* `sex` - The sex of each subject
			* `race` - The race of each subject
			* `education` - The number of years of education completed by each subject
			* `asvab` - Each subject's score on the Armed Services Vocational Aptitude Battery (ASVAB), an intelligence assessment, out of 100.
			* `sat_math` - Each subject's score on the math portion of the Scholastic Aptitude Test (SAT), out of 800.
			* `bdate` - Month of birth with 1 = January.

			`summary that describes a r, like a mean, median, or variance.`
			`+ Example problem/data set`


Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```{r}
			`head(heights)`
			```

Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`2. As normally taught, modeling conflates three subjects`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`+ Models as summaries`
			`+ Hypothesis testing`
			`+ Predictive modeling`
			`3. C. This chapter shows how to build a model and use it as a summary. The methods for building a model apply to all three subjects.`

			`## How to build a model`

			`1. Best fit`
			`+ Best fit of what? A certain class of function.`
			`+ But how do you know which class to use? In some cases, the data can provide suggestions. In other cases existing theory can provide suggestions. But ultimately, you'll never know for sure. But that's okay, good enough is good enough.`
			`2. What does best fit mean?`
			`+ It may or may not accurately describe the true relationship. Heck, there might not even be a true relationship. But it is the best guess given the data.`
			`+ Example problem/data set`
			`+ It does not mean causation exists. Causation is just one type of relations, which is difficult enough to define, let alone prove.`
			`3. How do you find the best fit?`
			`+ With an algorithm. There is an algorithm to fit each specific class of function. We will cover some of the most useful here.`
			`4. How do you know how good the fit is?`
			`+ Adjusted $R^{2}$`
			`5. Are we making assumptions when we fit a model?`
			`+ No. Not unless you assume that you've selected the correct type of function (and I see no reason why you should assume that).`
			`+ Assumptions come when you start hypothesis testing.`

			`## Linear models`

			`1. Linear models fit linear functions`
			`2. How to fit in R`
			`+ model syntax, which is reusable with all model functions`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`income ~ height`
			`mod <- lm(income ~ height, data = heights)`
			`mod`
			`tidy(mod)`
			`glance(mod)`
			`augment(mod)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`+ visualize`
			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`ggplot(data = heights, mapping = aes(x = height, y = income)) +`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`geom_point() +`
			`geom_smooth(method = lm)`
			```

			`+ intercept or no intercept`
			```{r}
			`0 + earn ~ height`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(lm(income ~ 0 + height, data = heights))`
			`tidy(lm(income ~ height, data = heights))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`3. How to interpret`
			`+ extract information. Resid. Predict.`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			```{r}
			`augment(mod)$.resid`
			`augment(mod)$.fitted`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```
			`+ Interpret coefficient`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			```{r}
			`tidy(mod)$estimate`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			```{r}
			`heights %>%`
			`group_by(sex) %>%`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`do(glance(lm(income ~ height, data = .)))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`## Categorical data`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`smod <- lm(income ~ sex, data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`smod`
			```

			`1. Factors`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`heights$sex <- factor(heights$sex, levels = c("female", "male"))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`smod2 <- lm(income ~ sex, data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(smod)`
			`tidy(smod2)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`2. How to interpret`


			`## Multiple Variables`

			`1. How to fit multivariate models in R`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`mmod <- lm(income ~ height + sex, data = heights)`
			`summary(mmod)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`2. How to interpret`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(mmod)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`3. Interaction effects`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(lm(income ~ height + sex, data = heights))`
			`tidy(lm(income ~ height + sex + height:sex, data = heights))`
			`tidy(lm(income ~ height * sex, data = heights))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(lm(income ~ height + education, data = heights))`
			`tidy(lm(income ~ height * education, data = heights))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`4. Partition variance`
			`+ Checking residuals`
			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`m1 <- lm(income ~ height, data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`# plot histogram of residuals`
			`# plot residulas vs. sex`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`m2 <- lm(income ~ height + sex, data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`# plot histogram of residuals`
			`# plot residuals vs. education`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`m3 <- lm(income ~ height + sex + education, data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`# plot histogram of residuals`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`m4 <- lm(income ~ height + sex + race + education,`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`data = heights)`
			`# plot histogram of residuals`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`m5 <- lm(income ~ ., data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```


			`## Non-linear functions (recipes?)`

			`0. Transformations`
			`+ Log`
			```{r}
			`ggplot(diamonds, aes(x = carat, y = price)) +`
			`geom_point()`
			`ggplot(diamonds, aes(x = log(carat), y = log(price))) +`
			`geom_point()`
			```

			```{r}
			`lm(log(price) ~ log(carat), data = diamonds)`
			`# visualize model line`
			```

			+ Logit with `glm()`


			`What if no handy transformation exists?`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`ggplot(data = heights, mapping = aes(x = education, y = income)) +`
			`geom_boxplot(aes(group = education)) +`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`geom_smooth() +`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`coord_cartesian(ylim = c(0, 125000))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`1. Polynomials`
			`+ How to fit`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`+ why it doesn't work with missing values`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00
			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`heights2 <- na.omit(heights)`
			`tidy(lm(income ~ poly(education, 3), data = heights2))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`ggplot(data = heights2, mapping = aes(x= education, y = income)) +`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`geom_point() +`
			`geom_smooth(method = lm, formula = y ~ poly(x, 3))`
			```

			`+ How to interpret`
			`+ Strengths and Weaknesses`
			`2. Splines`
			`+ How to fit. Knots. Different types of splines.`
			```{r eval = FALSE}
			`bs(degree = 1) # linear splines`
			`bs() # cubic splines`
			`ns() # natural splines`
			```

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`tidy(lm(income ~ ns(education, knots = c(10, 17)), data = heights))`
			`tidy(lm(income ~ ns(education, df = 4), data = heights))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`ggplot(data = heights, mapping = aes(x= education, y = income)) +`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`geom_point() +`
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`geom_smooth(method = lm, formula = y ~ ns(x, df = 4)) +`
			`coord_cartesian(ylim = c(0, 125000))`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			```

			`+ How to interpret`
			`+ Strengths and weaknesses`


			`3. General Additive Models`
			`+ How to fit`

			```{r}
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`gam(income ~ s(education), data = heights)`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00
Updates data set for model chapter to BLS NLS cross-section. 2016-01-08 02:25:25 +08:00			`ggplot(data = heights, mapping = aes(x= education, y = income)) +`
Expends outline and code for models chapter. 2015-12-19 06:16:03 +08:00			`geom_point() +`
			`geom_smooth(method = gam, formula = y ~ s(x))`
			```
			`+ How to interpret`
			`+ Strengths and weaknesses`

			```{r eval = FALSE}
			`# Linear z`
			`gam(y ~ s(x) + z, data = df)`

			`# Smooth x and smooth z`
			`gam(y ~ s(x) + s(z), data = df)`

			`# Smooth surface of x and z`
			`# (a smooth function that takes both x and z)`
			`gam(y ~ s(x, z), data = df)`
			```
Restructuring 2015-12-08 04:57:08 +08:00