From 1b42fe786ae13180df9a21e5ca9523a94e908fed Mon Sep 17 00:00:00 2001 From: hadley Date: Mon, 21 Sep 2015 08:41:14 -0500 Subject: [PATCH] Rough notes for import & transform --- _includes/package-nav.html | 23 ++++++++++++----------- import.Rmd | 24 ++++++++++++++++++++++++ transform.Rmd | 27 +++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 11 deletions(-) create mode 100644 import.Rmd create mode 100644 transform.Rmd diff --git a/_includes/package-nav.html b/_includes/package-nav.html index c2bd686..1cd532a 100644 --- a/_includes/package-nav.html +++ b/_includes/package-nav.html @@ -1,14 +1,15 @@
  • Introduction
  • -
  • Tidy data
  • +
  • Transform
  • + +
  • Tidy
  • +
  • Import
  • + diff --git a/import.Rmd b/import.Rmd new file mode 100644 index 0000000..1cd2772 --- /dev/null +++ b/import.Rmd @@ -0,0 +1,24 @@ +--- +layout: default +title: Data import +output: bookdown::html_chapter +--- + +## Overview + +You can't apply any of the tools you've applied so far to your own work, unless you can get your own data into R. In this chapter, you'll learn how to import: + +* Flat files (like csv) with readr. +* Database queries with DBI. +* Data from web APIs with httr. +* Binary file formats (like excel or sas), with haven and readxl. + +## Flat files + +## Databases + +## Web APIs + +## Binary files + +Needs to discuss how data types in different languages are converted to R. Similarly for missing values. diff --git a/transform.Rmd b/transform.Rmd new file mode 100644 index 0000000..66ef687 --- /dev/null +++ b/transform.Rmd @@ -0,0 +1,27 @@ +--- +layout: default +title: Data transformation +output: bookdown::html_chapter +--- + + +## Missing values + +* Why `NA == NA` is not `TRUE` +* Why default is `na.rm = FALSE`. + +## Data types + +Overview of different data types and useful summary functions for working with them. Strings and dates covered in more detail in future chapters. + +Need to mention `typeof()` vs. `class()` mostly in context of how date/times and factors are built on top of simpler structures. + +### Logical + +When used with numeric functions, `TRUE` is converted to 1 and `FALSE` to 0. This makes `sum()` and `mean()` particularly useful: `sum(x)` gives the number of `TRUE`s in `x`, and `mean(x)` gives the proportion. + +### Numeric (integer and double) + +### Strings (and factors) + +### Date/times