parent
f02e080778
commit
3a7882e602
|
@ -324,7 +324,7 @@ The following sections explore how this plays out in more detail.
|
||||||
|
|
||||||
### Categorical variables
|
### Categorical variables
|
||||||
|
|
||||||
Generating a function from a formula is straight forward when the predictor is continuous, but things get a bit more complicated when the predictor is categorical. Imagine you have a formula like `y ~ sex`, where sex could either be male or female. It doesn't make sense to convert that to a formula like `y = x_0 + x_1 * sex` because `sex` isn't a number - you can't multiply it! Instead what R does is convert it to `y = x_0 + x_1 * sex_male` where `sex_male` is one if `sex` is male and 0 otherwise.
|
Generating a function from a formula is straight forward when the predictor is continuous, but things get a bit more complicated when the predictor is categorical. Imagine you have a formula like `y ~ sex`, where sex could either be male or female. It doesn't make sense to convert that to a formula like `y = x_0 + x_1 * sex` because `sex` isn't a number - you can't multiply it! Instead what R does is convert it to `y = x_0 + x_1 * sex_male` where `sex_male` is one if `sex` is male and zero otherwise.
|
||||||
|
|
||||||
If you want to see what R actually does, you can use the `model.matrix()` function. It takes similar inputs to `lm()` but returns the numeric matrix that R uses to fit the model. This is useful if you ever want to understand exactly which equation is generated by your formula.
|
If you want to see what R actually does, you can use the `model.matrix()` function. It takes similar inputs to `lm()` but returns the numeric matrix that R uses to fit the model. This is useful if you ever want to understand exactly which equation is generated by your formula.
|
||||||
|
|
||||||
|
@ -360,7 +360,7 @@ grid <- sim2 %>%
|
||||||
grid
|
grid
|
||||||
```
|
```
|
||||||
|
|
||||||
Effectively, a model with a categorical `x` will predict the mean value for each category. (Why? Because the mean minimise the root-mean-squared distance.) That's easy to see if we overlay the predictions on top of the original data:
|
Effectively, a model with a categorical `x` will predict the mean value for each category. (Why? Because the mean minimises the root-mean-squared distance.) That's easy to see if we overlay the predictions on top of the original data:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
ggplot(sim2, aes(x)) +
|
ggplot(sim2, aes(x)) +
|
||||||
|
@ -400,7 +400,7 @@ To visualise these models we need two new tricks:
|
||||||
|
|
||||||
1. To generate predictions from both models simultaneously, we can use
|
1. To generate predictions from both models simultaneously, we can use
|
||||||
`gather_predictions()` which adds each prediction as a row. The
|
`gather_predictions()` which adds each prediction as a row. The
|
||||||
complete of `gather_predictions()` is `spread_predictions()` which adds
|
complement of `gather_predictions()` is `spread_predictions()` which adds
|
||||||
each prediction to a new column.
|
each prediction to a new column.
|
||||||
|
|
||||||
Together this gives us:
|
Together this gives us:
|
||||||
|
@ -539,9 +539,9 @@ Here we've focused on linear models, which is a fairly limited space (but it doe
|
||||||
Some extensions of linear models are:
|
Some extensions of linear models are:
|
||||||
|
|
||||||
* Generalised linear models, e.g. `stats::glm()`. Linear models assume that
|
* Generalised linear models, e.g. `stats::glm()`. Linear models assume that
|
||||||
the predictor is continuous and the errors has a normal distribution.
|
the response is continuous and the error has a normal distribution.
|
||||||
Generalised linear models extend linear models to include non-continuous
|
Generalised linear models extend linear models to include non-continuous
|
||||||
predictors (e.g. binary data or counts). They work by defining a distance
|
responses (e.g. binary data or counts). They work by defining a distance
|
||||||
metric based on the statistical idea of likelihood.
|
metric based on the statistical idea of likelihood.
|
||||||
|
|
||||||
* Generalised additive models, e.g. `mgcv::gam()`, extend generalised
|
* Generalised additive models, e.g. `mgcv::gam()`, extend generalised
|
||||||
|
|
Loading…
Reference in New Issue