r4ds/wrangle.Rmd

40 lines
1.8 KiB
Plaintext
Raw Normal View History

2016-04-27 15:04:29 +08:00
# (PART) Wrangle {-}
2016-04-21 21:01:34 +08:00
2016-07-22 22:15:55 +08:00
# Introduction {#wrangle-intro}
2016-02-12 06:31:34 +08:00
2016-08-11 21:41:07 +08:00
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for visualisation and modelling. Data wrangling is very important: without it you can't work with your own data! There are three main parts to data wrangling:
2016-03-25 22:39:49 +08:00
2016-07-19 22:39:00 +08:00
```{r echo = FALSE, out.width = "75%"}
knitr::include_graphics("diagrams/data-science-wrangle.png")
2016-03-25 22:39:49 +08:00
```
2016-08-11 21:41:07 +08:00
This part of the book proceeds as follows:
2016-07-25 03:53:59 +08:00
* In [tibbles], you'll learn about the variant of the data frame that we use
in this book: the __tibble__. You'll learn what makes them different
from regular data frames, and how you can construct them "by hand".
2016-08-11 21:41:07 +08:00
* In [data import], you'll learn how to get your data from disk and into R.
We'll focus on plain-text rectangular formats, but will give you pointers
to packages that help with other types of data.
2016-07-25 03:53:59 +08:00
* In [tidy data], you'll learn about tidy data, a consistent way of storing
your data that makes transformation, visualisation, and modelling easier.
2016-08-11 21:41:07 +08:00
You'll learn the underlying principles, and how to get your data into a
tidy form.
2016-07-25 03:53:59 +08:00
Data wrangling also encompasses data transformation, which you've already learned a little about. Now we'll focus on new skills for three specific types of data you will frequently encounter in practice:
2016-07-25 03:53:59 +08:00
* [Relational data] will give you tools for working with multiple
interrelated datasets.
2016-07-25 03:53:59 +08:00
* [Strings] will introduce regular expressions, a powerful tool for
manipulating strings.
2016-08-17 06:06:51 +08:00
* [Factors] are how R stores categorical data. They are used when a variable
2016-10-03 21:36:51 +08:00
has a fixed set of possible values, or when you want to use a non-alphabetical
2016-08-17 06:06:51 +08:00
ordering of a string.
2016-07-25 03:53:59 +08:00
* [Dates and times] will give you the key tools for working with
dates and date-times.