Use new model_matrix

2016-07-25 07:24:10 -05:00 · 2016-07-25 07:24:10 -05:00 · f1cc2088f9
parent 3836b6b352
commit f1cc2088f9
1 changed files with 2 additions and 2 deletions
--- a/model-basics.Rmd
+++ b/model-basics.Rmd
@ -325,7 +325,7 @@ The following sections explore how this plays out in more detail.

 Generating a function from a formula is straight forward when the predictor is continuous, but things get a bit more complicated when the predictor is categorical. Imagine you have a formula like `y ~ sex`, where sex could either be male or female. It doesn't make sense to convert that to a formula like `y = x_0 + x_1 * sex` because `sex` isn't a number - you can't multiply it! Instead what R does is convert it to `y = x_0 + x_1 * sex_male` where `sex_male` is one if `sex` is male and zero otherwise.

-If you want to see what R actually does, you can use the `model.matrix()` function. It takes similar inputs to `lm()` but returns the numeric matrix that R uses to fit the model. This is useful if you ever want to understand exactly which equation is generated by your formula.
+If you want to see what R actually does, you can use the `model_matrix()` function. It takes a data frame and a formula and returns a tibble that defines the model equation: each column in the output is associated with one coefficient in the model. This is useful if you ever want to understand exactly which equation is generated by your formula.

 ```{r, echo = FALSE}
 df <- frame_data(
@ -334,7 +334,7 @@ df <- frame_data(
  "female", 2,
  "male", 1
 )
-as_tibble(model.matrix(response ~ sex, data = df))
+model_matrix(df, response ~ sex)
 ```

 The process of turning a categorical variable into a 0-1 matrix has different names. Sometimes the individual 0-1 columns are called dummy variables. In machine learning, it's called one-hot encoding. In statistics, the process is called creating a contrast matrix.  General example of "feature generation": taking things that aren't continuous variables and figuring out how to represent them.