r4ds/oreilly/import.html

15 lines
2.5 KiB
HTML
Raw Normal View History

2023-01-13 07:22:57 +08:00
<div data-type="part">
<h1><span id="sec-import" class="quarto-section-identifier d-none d-lg-block">Import</span></h1><p>In this part of the book, youll learn how to import a wider range of data into R, as well as how to get it into a form useful form for analysis. Sometimes this is just a matter of calling a function from the appropriate data import package. But in more complex cases it might require both tidying and transformation in order to get to the tidy rectangle that youd prefer to work with.</p><div class="cell">
<div class="cell-output-display">
<figure id="fig-ds-import"><p><img src="diagrams/data-science/import.png" alt="Our data science model with import highlighted in blue. " width="535"/></p>
<figcaption>Figure 1: Data import is the beginning of the data science process; without data you cant do data science!</figcaption>
</figure>
</div>
</div><p>In this part of the book youll learn how to access data stored in the following ways:</p><ul><li><p>In <a href="#chp-spreadsheets" data-type="xref">#chp-spreadsheets</a>, youll learn how to import data from Excel spreadsheets and Google Sheets.</p></li>
<li><p>In <a href="#chp-databases" data-type="xref">#chp-databases</a>, youll learn about getting data out of a database and into R (and youll also learn a little about how to get data out of R and into a database).</p></li>
<li><p>In <a href="#chp-arrow" data-type="xref">#chp-arrow</a>, youll learn about Arrow, a powerful tool for working with out-of-memory data, particularly when its stored in the parquet format.</p></li>
<li><p>In <a href="#chp-rectangling" data-type="xref">#chp-rectangling</a>, youll learn how to work with hierarchical data, including the the deeply nested lists produced by data stored in the JSON format.</p></li>
<li><p>In <a href="#chp-webscraping" data-type="xref">#chp-webscraping</a>, youll learn web “scraping”, the art and science of extracting data from web pages.</p></li>
</ul><p>There are two important tidyverse packages that we dont discuss here: haven and xml2. If you working with data from SPSS, Stata, and SAS files, check out the <strong>haven</strong> package, <a href="https://haven.tidyverse.org" class="uri">https://haven.tidyverse.org</a>. If youre working with XML data, check out the <strong>xml2</strong> package, <a href="https://xml2.r-lib.org" class="uri">https://xml2.r-lib.org</a>. Otherwise, youll need to do some research to figure which package youll need to use; google is your friend here 😃.</p></div>