Update model.Rmd (#226)

typos
This commit is contained in:
S'busiso Mkhondwane 2016-08-02 21:02:38 +02:00 committed by Hadley Wickham
parent 2985b6d617
commit 0eedd7a31a
1 changed files with 4 additions and 4 deletions

View File

@ -8,9 +8,9 @@ Now that you are equipped with powerful programming tools we can finally return
knitr::include_graphics("diagrams/data-science-model.png")
```
The goal of a model is to provide a simple low-dimensional summary of a dataset. Ideally, the model will capture true "signals" (i.e. patterns generated by the phenomenon of interest), and ignore "noise" (i.e. random variation that you're not interested in). Here we only cover "predictive" models, which, as the name suggests, generate predictions. There is another type of model that we're not going to discuss: "data discovery" models. These models don't make predictions, but instead help you discover interesting relationships within your data. (These two categories of models are sometimes called supervised and unsuperivsed, but I don't think that terminology is particularly illuminating.)
The goal of a model is to provide a simple low-dimensional summary of a dataset. Ideally, the model will capture true "signals" (i.e. patterns generated by the phenomenon of interest), and ignore "noise" (i.e. random variation that you're not interested in). Here we only cover "predictive" models, which, as the name suggests, generate predictions. There is another type of model that we're not going to discuss: "data discovery" models. These models don't make predictions, but instead help you discover interesting relationships within your data. (These two categories of models are sometimes called supervised and unsupervised, but I don't think that terminology is particularly illuminating.)
This book is not going to give you a deep understanding of the mathematical theory that underlies models. It will, however, build your intution about how statisitcal models work, and give you a family of useful tools that allow you to use models to better understand your data:
This book is not going to give you a deep understanding of the mathematical theory that underlies models. It will, however, build your intution about how statistical models work, and give you a family of useful tools that allow you to use models to better understand your data:
* In [model basics], you'll learn how models work mechanistically, focussing on
the important family of linear models. You'll learn general tools for gaining
@ -19,7 +19,7 @@ This book is not going to give you a deep understanding of the mathematical theo
* In [model building], you'll learn how to use models to pull out known
patterns in real data. Once you have recognised an important pattern
it's useful to make it explicit it in a model, because then you can
it's useful to make it explicitly in a model, because then you can
more easily see the subtler signals that remina.
* In [many models], you'll learn how to use many simple models to help
@ -46,7 +46,7 @@ Models are more common taught as tools for doing inference, or for confirming th
but you can only use it once for confirmation. As soon as you use an
observation twice, you've switched from confirmation to exploration.
This is necessary because to confirm a hypothesis you must use data this is independent of the data that you used to generate the hypothesis. Otherwise you will be over optimistic. There is absolutely nothing wrong with exploration, but you should never sell an exploratory analysis as a confirmatory analysis because it is fundamentally misleading. If you are serious about doing an confirmatory analysis, before you begin the analysis you should split your data up into three piecese:
This is necessary because to confirm a hypothesis you must use data that is independent of the data that you used to generate the hypothesis. Otherwise you will be over optimistic. There is absolutely nothing wrong with exploration, but you should never sell an exploratory analysis as a confirmatory analysis because it is fundamentally misleading. If you are serious about doing an confirmatory analysis, before you begin the analysis you should split your data up into three pieces:
1. 60% of your data goes into a __training__ (or exploration) set. You're
allowed to do anything you like with this data: visualise it and fit tons