Update transform part intro

This commit is contained in:
Hadley Wickham 2022-09-29 11:22:45 -05:00
parent 86324b358d
commit 931e568be2
4 changed files with 36 additions and 18 deletions

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

View File

@ -6,33 +6,51 @@
source("_common.R")
```
In this part of the book, you'll learn about various types of data the columns of a data frame can contain and how to transform them.
The transformations you might want to apply to a column vary depending on the type of data you're working with, for example if you have text strings you might want to extract or remove certain pieces while if you have numerical data, you might want to rescale them.
You've already learned a little about data wrangling in the previous part.
Now we'll focus on new skills for specific types of data you will frequently encounter in practice.
After writing the first part of the tool, you understand (at least superficially) the most important tools for doing data science.
Now it's time to start diving into the details.
In this part of the book, you'll learn about important data types, and the tools you can use to work with them.
This is important because what you can do to a column depends on what type of column it is.
<!--# TO DO: Add a diagram? -->
```{r}
#| label: fig-ds-transform
#| echo: false
#| fig-cap: >
#| The options for data transformation depends heavily on the type of
#| data involve, the subject of this part of the book.
#| fig-alt: >
#| Our data science model transform, highlighted in blue.
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/transform.png", dpi = 270)
```
This part of the book proceeds as follows:
- In @sec-tibbles, you'll learn about the variant of the data frame that we use in this book: the **tibble**.
You'll learn what makes them different from regular data frames, and how you can construct them "by hand".
- In @sec-tibbles, you'll learn about **tibble**, the variant of the data frame that we use in this book.
You'll learn what makes tibbles different from regular data frames, and how you can construct them "by hand".
- @sec-joins will give you tools for working with multiple interrelated datasets.
- @sec-logicals teaches you about logical vectors.
These are simplest type of vector in R, but are extremely powerful.
You'll learn how to create them with numeric comparisons, how to combine them with Boolean algebra, how to use them in summaries, and how to use them for condition transformations.
- @sec-numbers ...
- @sec-numbers dives into tools for vectors of numbers, the powerhouse of data science.
You'll learn new counting techniques, important transformations and important summary functions.
- @sec-logicals ...
- @sec-strings will give you tools for working with strings: you'll slice them, you'll dice, and you'll stick them back together again.
This chapter mostly focusses on the stringr package, but you'll also learn some more tidyr functions devoted to extracting data from strings.
- @sec-missing-values...
- @sec-regular-expressions goes into the details of regular expressions, a powerful tool for manipulating strings.
This chapter will take you from thinking "a cat just walked over my keyboard" to reading and writing complex string patterns.
- @sec-strings will give you tools for working with strings and introduce regular expressions, a powerful tool for manipulating strings.
- @sec-regular-expressions ...
- @sec-factors will introduce factors -- how R stores categorical data.
- @sec-factors will introduce factors -- the data type that R uses to store categorical data.
They are used when a variable has a fixed set of possible values, or when you want to use a non-alphabetical ordering of a string.
- @sec-dates-and-times will give you the key tools for working with dates and date-times.
Unfortunately, the more you learn about date-times, the more complicated they seem to get, but with the help of the lubridate package, you'll learn to how to overcome the most common challenges.
<!-- TO DO: Add chapter descriptions -->
- We've discussed missing values are couple of times in isolation, but @sec-missing-values will go into detail, helping you come to grips with the different between implicit and explicit missing values, and how and why you might convert between them.
- @sec-joins finishes up this part of the book, by giving you tools to join two (or more) data frames together.
Learning about joins will force you to grapple with the idea of keys, and think about how you identify each row in a dataset.
You can read these chapters as you need them; they're designed to be largely standalone so that they can be read out of order.

View File

@ -18,7 +18,7 @@ But in more complex cases it encompasses both tidying and transformation as the
#| transforming.
#| fig-alt: >
#| Our data science model with import, tidy, and transform, highlighted
#| in blue.
#| in blue and labelled with "wrangle".
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/wrangle.png", dpi = 270)