r4ds/oreilly/wrangle.html

17 lines
2.2 KiB
HTML
Raw Normal View History

<div data-type="part">
<h1><span id="sec-wrangle" class="quarto-section-identifier d-none d-lg-block">Wrangle</span></h1><p>In this part of the book, youll learn about data wrangling, the art of getting your data into R in a useful form for further work. In some cases, this is a relatively simple application of a package that does data import. But in more complex cases it encompasses both tidying and transformation as the native structure of the data might be quite far from the tidy rectangle youd prefer to work with.</p><div class="cell">
<div class="cell-output-display">
<figure id="fig-ds-wrangle"><p><img src="diagrams/data-science/wrangle.png" alt="Our data science model with import, tidy, and transform, highlighted in blue and labelled with &quot;wrangle&quot;. " width="535"/></p>
<figcaption>Figure 1: Data wrangling is the combination of importing, tidying, and transforming.</figcaption>
</figure>
</div>
</div><p>This part of the book proceeds as follows:</p><ul><li><p>In <a href="#chp-rectangling" data-type="xref">#chp-rectangling</a>, youll learn how to get plain-text data in rectangular formats from disk and into R.</p></li>
<li><p>In <a href="#chp-spreadsheets" data-type="xref">#chp-spreadsheets</a>, youll learn how to get data from Excel spreadsheets and Google Sheets into R.</p></li>
<li><p>In <a href="#chp-databases" data-type="xref">#chp-databases</a>, youll learn about getting data into R from databases.</p></li>
<li><p>In <a href="#chp-rectangling" data-type="xref">#chp-rectangling</a>, youll learn how to work with hierarchical data that includes deeply nested lists, as is often created we your raw data is in JSON.</p></li>
<li><p>In <a href="#chp-webscraping" data-type="xref">#chp-webscraping</a>, youll learn about harvesting data off the web and getting it into R.</p></li>
</ul><p>Some other types of data are not covered in this book:</p><ul><li><p><strong>haven</strong> reads SPSS, Stata, and SAS files.</p></li>
<li><p>xml2 for <strong>xml2</strong> for XML</p></li>
2022-11-19 00:30:32 +08:00
</ul><p>For other file types, try the <a href="https://cran.r-project.org/doc/manuals/r-release/R-data.html">R data import/export manual</a> and the <a href="https://github.com/leeper/rio"><strong>rio</strong></a> package.</p></div>