Wrangle -> import (#1177)

* Wrangle -> import

* Update import.qmd

* Update import.qmd

* Update import.qmd

* Update import.qmd

Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
Hadley Wickham 2022-12-08 15:48:59 +13:00 committed by GitHub
parent 281005a31c
commit 0743cbd3aa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 42 additions and 45 deletions

View File

@ -53,7 +53,7 @@ book:
- missing-values.qmd
- joins.qmd
- part: wrangle.qmd
- part: import.qmd
chapters:
- spreadsheets.qmd
- databases.qmd

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

41
import.qmd Normal file
View File

@ -0,0 +1,41 @@
# Import {#sec-import .unnumbered}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
In this part of the book, you'll learn how to import a wider range of data into R, as well as how to get it into a form useful form for analysis.
Sometimes this is just a matter of calling a function from the appropriate data import package.
But in more complex cases it might require both tidying and transformation in order to get to the tidy rectangle that you'd prefer to work with.
```{r}
#| label: fig-ds-import
#| echo: false
#| fig-cap: >
#| Data import is the beginning of the data science process; without
#| data you can't do data science!
#| fig-alt: >
#| Our data science model with import highlighted in blue.
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/import.png", dpi = 270)
```
In this part of the book you'll learn how to access data stored in the following ways:
- In @sec-import-spreadsheets, you'll learn how to import data from Excel spreadsheets and Google Sheets.
- In @sec-import-databases, you'll learn about getting data out of a database and into R (and you'll also learn a little about how to get data out of R and into a database).
- In @sec-arrow, you'll learn about Arrow, a powerful tool for working with out-of-memory data, particularly when it's stored in the parquet format.
- In @sec-rectangling, you'll learn how to work with hierarchical data, including the the deeply nested lists produced by data stored in the JSON format.
- In @sec-scraping, you'll learn web "scraping", the art and science of extracting data from web pages.
There are two important tidyverse packages that we don't discuss here: haven and xml2.
If you working with data from SPSS, Stata, and SAS files, check out the **haven** package, <https://haven.tidyverse.org>.
If you're working with XML data, check out the **xml2** package, <https://xml2.r-lib.org>.
Otherwise, you'll need to do some research to figure which package you'll need to use; google is your friend here 😃.

View File

@ -1,44 +0,0 @@
# Wrangle {#sec-wrangle .unnumbered}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
In this part of the book, you'll learn about data wrangling, the art of getting your data into R in a useful form for further work.
In some cases, this is a relatively simple application of a package that does data import.
But in more complex cases it encompasses both tidying and transformation as the native structure of the data might be quite far from the tidy rectangle you'd prefer to work with.
```{r}
#| label: fig-ds-wrangle
#| echo: false
#| fig-cap: >
#| Data wrangling is the combination of importing, tidying, and
#| transforming.
#| fig-alt: >
#| Our data science model with import, tidy, and transform, highlighted
#| in blue and labelled with "wrangle".
#| out.width: NULL
knitr::include_graphics("diagrams/data-science/wrangle.png", dpi = 270)
```
This part of the book proceeds as follows:
- In @sec-rectangling, you'll learn how to get plain-text data in rectangular formats from disk and into R.
- In @sec-import-spreadsheets, you'll learn how to get data from Excel spreadsheets and Google Sheets into R.
- In @sec-import-databases, you'll learn about getting data into R from databases.
- In @sec-arrow, you'll learn about Arrow, a powerful tool for working with large on-disk files.
- In @sec-rectangling, you'll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.
- In @sec-scraping, you'll learn about harvesting data off the web and getting it into R.
There are two important tidyverse packages that we don't discuss here: haven and xml2.
If you working with data from SPSS, Stata, and SAS files, check out the **haven** package, <https://haven.tidyverse.org>.
If you're working with XML, check out the **xml2** package, <https://xml2.r-lib.org>.
Otherwise, you'll need to do some research to figure which package you'll need to use; google is your friend here 😃.