r4ds/wrangle.qmd

46 lines
1.8 KiB
Plaintext

# Wrangle {#sec-wrangle .unnumbered}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for further work.
In some cases, this is a relatively simple application of a package that does data import.
But in more complex cases it encompasses both tidying and transformation as the native structure of the data might be quite far from the tidy rectangle you'd prefer to work with.
```{r}
#| label: fig-ds-wrangle
#| echo: false
#| fig-cap: >
#| Data wrangling is the combination of importing, tidying, and
#| transforming.
#| fig-alt: >
#| Our data science model with import, tidy, and transform, highlighted
#| in blue and labelled with "wrangle".
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/wrangle.png", dpi = 270)
```
This part of the book proceeds as follows:
- In @sec-rectangling, you'll learn how to get plain-text data in rectangular formats from disk and into R.
- In @sec-import-spreadsheets, you'll learn how to get data from Excel spreadsheets and Google Sheets into R.
- In @sec-import-databases, you'll learn about getting data into R from databases.
- In @sec-rectangling, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
- In @sec-scraping, you'll learn about harvesting data off the web and getting it into R.
Some other types of data are not covered in this book:
- **haven** reads SPSS, Stata, and SAS files.
- xml2 for **xml2** for XML
For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [**rio**](https://github.com/leeper/rio) package.