r4ds/explore.Rmd

52 lines
2.3 KiB
Plaintext
Raw Normal View History

2016-04-27 15:04:29 +08:00
# (PART) Explore {-}
2016-04-21 21:01:34 +08:00
# Introduction
2016-02-12 06:31:34 +08:00
2016-07-19 22:39:00 +08:00
The goal of the first part of this book is to get your up to speed with the basic tools of data exploration as quickly as possible:
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/data-science-explore.png")
```
2016-07-12 04:48:58 +08:00
```{r setup, include = FALSE}
2016-05-19 00:57:20 +08:00
library(ggplot2)
library(dplyr)
```
2016-07-12 04:48:58 +08:00
If you are like most humans, your brain is not designed to work with raw data. Your working memory can only pay attention to a few values at a time, which makes it difficult to discover patterns in raw data. For example, can you spot the striking relationship between $X$ and $Y$ in the table below?
2016-05-19 00:57:20 +08:00
```{r data, echo=FALSE}
2016-07-12 04:48:58 +08:00
circle <- tibble(
pi = sort(rep(seq(0.2, 1.8 * pi, length = 5), 2) + runif(10, -0.15, 0.15)),
x = cos(pi) + 1,
y = sin(pi) - 1
)
circle %>%
select(-pi) %>%
knitr::kable(digits = 2)
2016-05-19 00:57:20 +08:00
```
2016-07-19 22:39:00 +08:00
While we may stumble over raw data, we can easily process visual information. Visualization works because your brain processes visual information in a different (and much wider) channel than it processes symbolic information, like words and numbers. Within your brain is a powerful visual processing system fine-tuned by millions of years of evolution. As a result, often the quickest way to understand your data is to visualize it. Once you plot your data, you can instantly see the relationships between values. Here, we see that the values fall on a circle.
2016-05-19 00:57:20 +08:00
2016-07-18 22:52:55 +08:00
```{r echo=FALSE, dependson = data, fig.asp = 1, out.width = "30%", fig.width = 3}
2016-07-12 04:48:58 +08:00
ggplot(circle, aes(x, y)) +
geom_point() +
coord_fixed()
2016-05-19 00:57:20 +08:00
```
2016-07-19 22:39:00 +08:00
In the following chapters you will:
* Dive into ggplot2 in [data visualisation], learning powerful
and general techniques for turning raw data into visual insights.
2016-05-19 00:57:20 +08:00
2016-07-19 22:39:00 +08:00
* Visualisation alone is typically not enough, so in [data transformation]
you'll learn the key verbs that allow you select important variables,
filter out key observations, and create new variables and summaries.
* In [exploratory data analysis], you'll combine visualisation and
transformation with your curiosity and scepticism to ask and answer
interesting questions about data.
2016-05-19 00:57:20 +08:00
2016-07-19 22:39:00 +08:00
Modelling is an important part of the exploratory process, but you don't have the skills to effectively learn or apply it yet. We'll come back to modelling in [model], once you're better equipped with more data wrangling and programming tools.