Rough notes for import & transform

This commit is contained in:
hadley 2015-09-21 08:41:14 -05:00
parent 4a70fc4a91
commit 1b42fe786a
3 changed files with 63 additions and 11 deletions

View File

@ -1,14 +1,15 @@
<li><a href="intro.html">Introduction</a></li>
<li><a href="tidy.html">Tidy data</a></li>
<!--
<li class="dropdown-header">R for Data Science</li>
<li><a href="intro.html">Introduction to data science</a></li>
<li><a href="visualize.html">Visualize data</a></li>
<li><a href="transform.html">Transform data</a></li>
<li><a href="tidy.html">Tidy data</a></li>
<li><a href="import.html">Import data</a></li>
<li><a href="dates.html">Dates and times</a></li>
<li><a href="strings.html">Regular expresssions</a></li>
<li><a href="models.html">Model data</a></li>
<li><a href="reports.html">Reporting and reproducible research</a></li>
<li><a href="visualize.html">Visualize</a></li>
-->
<li><a href="transform.html">Transform</a></li>
<!--
<li><a href="strings.html">Regular expresssions</a></li>
<li><a href="dates.html">Dates and times</a></li>
-->
<li><a href="tidy.html">Tidy</a></li>
<li><a href="import.html">Import</a></li>
<!--
<li><a href="models.html">Model</a></li>
<li><a href="communicate.html">Communicate</a></li>
-->

24
import.Rmd Normal file
View File

@ -0,0 +1,24 @@
---
layout: default
title: Data import
output: bookdown::html_chapter
---
## Overview
You can't apply any of the tools you've applied so far to your own work, unless you can get your own data into R. In this chapter, you'll learn how to import:
* Flat files (like csv) with readr.
* Database queries with DBI.
* Data from web APIs with httr.
* Binary file formats (like excel or sas), with haven and readxl.
## Flat files
## Databases
## Web APIs
## Binary files
Needs to discuss how data types in different languages are converted to R. Similarly for missing values.

27
transform.Rmd Normal file
View File

@ -0,0 +1,27 @@
---
layout: default
title: Data transformation
output: bookdown::html_chapter
---
## Missing values
* Why `NA == NA` is not `TRUE`
* Why default is `na.rm = FALSE`.
## Data types
Overview of different data types and useful summary functions for working with them. Strings and dates covered in more detail in future chapters.
Need to mention `typeof()` vs. `class()` mostly in context of how date/times and factors are built on top of simpler structures.
### Logical
When used with numeric functions, `TRUE` is converted to 1 and `FALSE` to 0. This makes `sum()` and `mean()` particularly useful: `sum(x)` gives the number of `TRUE`s in `x`, and `mean(x)` gives the proportion.
### Numeric (integer and double)
### Strings (and factors)
### Date/times