From 3104dbfca0120e183e6dc103c0f79330806c2802 Mon Sep 17 00:00:00 2001 From: hadley Date: Wed, 27 Jul 2016 17:04:38 -0500 Subject: [PATCH] Move learning more to the end of model building --- model-building.Rmd | 27 +++++++++++++++++++++++++++ model.Rmd | 29 +++-------------------------- 2 files changed, 30 insertions(+), 26 deletions(-) diff --git a/model-building.Rmd b/model-building.Rmd index 7ed99f5..be28e15 100644 --- a/model-building.Rmd +++ b/model-building.Rmd @@ -441,3 +441,30 @@ How do you decide how many parameters to use for the spline? You can either eith 1. It's a little frustrating that Sunday and Saturday are on separate ends of the plot. Write a small function to set the manipulate the levels of the factor so that the week starts on Monday. + +## Learning more + +We have only scratched the absolute surface of modelling, but you have hopefully gained some simple, but general purpose tools that you can use to improve your own data analyses. It's ok to start simple! As you've seen, even very simple models can make a dramatic difference in your ability to tease out interactions between variables. + +These modelling chapters are even more opinionated than the rest of the book. I approach modelling from a somewhat different perspective to most others, and there is relatively little space devoted to it. Modelling really deserves a book on its own, so I'd highly recommend that you read at least one of these three books: + +* *Statistical Modeling: A Fresh Approach* by Danny Kaplan, + . This book provides + a gentle introduction to modelling, where you build your intuition, + mathematical tools, and R skills in parallel. The book replaces a traditional + "introduction to statistics" course, providing a curriculum that is up-to-date + and relevant to data science. + +* *An Introduction to Statistical Learning* by Gareth James, Daniela Witten, + Trevor Hastie, and Robert Tibshirani, + (available online for free). This book presents a family of modern modelling + techniques collectively known as statistical learning. For an even deeper + understanding of the math behind the models, read the classic + *Elements of Statistical Learning* by Trevor Hastie, Robert Tibshirani, and + Jerome Friedman, (also + available online for free). + +* *Applied Predictive Modeling* by Max Kuhn and Kjell Johnson, + . This book is a companion to the + __caret__ package, and provides practical tools for dealing with real-life + predictive modelling challenges. diff --git a/model.Rmd b/model.Rmd index b69871a..3201877 100644 --- a/model.Rmd +++ b/model.Rmd @@ -33,6 +33,8 @@ This book is not going to give you a deep understanding of the mathematical theo on the powerful idea of random resamples. These will help you understand how your model will behave on new datasets. +## Hypothesis generation vs. hypothesis confirmation + In this book, we are going to use models as a tool for exploration, completing the trifecta of EDA tools introduced in Part 1. This is not how models are usually taught, but they make for a particularly useful tool in this context. Every exploratory analysis will involve some transformation, modelling, and visualisation. Models are more common taught as tools for doing inference, or for confirming that an hypothesis is true. Doing this correctly is not complicated, but it is hard. There is a pair of ideas that you must understand in order to do inference correctly: @@ -59,29 +61,4 @@ This is necessary because to confirm a hypothesis you must use data this is inde This partitioning allows you to explore the training data, occassionally generating candidate hypotheses that you check with the query set. When you are confident you have the right model, you can check it once with the test data. -(Note that tven when doing confirmatory modelling, you will still need to do EDA. If you don't do any EDA you will remain blind to the quality problems with your data.) - -### Other references - -The modelling chapters are even more opinionated than the rest of the book. I approach modelling from a somewhat different perspective to most others, and there is relatively little space devoted to it. Modelling really deserves a book on its own, so I'd highly recommend that you read at least one of these three books: - -* *Statistical Modeling: A Fresh Approach* by Danny Kaplan, - . This book provides - a gentle introduction to modelling, where you build your intuition, - mathematical tools, and R skills in parallel. The book replaces a traditional - "introduction to statistics" course, providing a curriculum that is up-to-date - and relevant to data science. - -* *An Introduction to Statistical Learning* by Gareth James, Daniela Witten, - Trevor Hastie, and Robert Tibshirani, - (available online for free). This book presents a family of modern modelling - techniques collectively known as statistical learning. For an even deeper - understanding of the math behind the models, read the classic - *Elements of Statistical Learning* by Trevor Hastie, Robert Tibshirani, and - Jerome Friedman, (also - available online for free). - -* *Applied Predictive Modeling* by Max Kuhn and Kjell Johnson, - . This book is a companion to the - __caret__ package, and provides practical tools for dealing with real-life - predictive modelling challenges. +(Note that even when doing confirmatory modelling, you will still need to do EDA. If you don't do any EDA you will remain blind to the quality problems with your data.)