From 9d62e1c23e24b676bc718a04e0fa0c270f5b8e77 Mon Sep 17 00:00:00 2001 From: hadley Date: Sun, 10 Jul 2016 07:56:06 -0500 Subject: [PATCH] Complete column parsing --- import.Rmd | 54 ++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 36 insertions(+), 18 deletions(-) diff --git a/import.Rmd b/import.Rmd index 8eb9a59..c049b29 100644 --- a/import.Rmd +++ b/import.Rmd @@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2)) The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R). -If you'd like to learn more, I'd recommend . +Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at . ### Dates, date times, and times You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read: -* `parse_datetime()`: an - [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This - is the most important date/time standard, and I recommend that you get - a little familiar with it. +* `parse_datetime()` expects an ISO8601 date time. ISO8691 is an + international standard in which the components of a date are + organised from biggest to smallest: year, month, day, hour, minute, + second: ```{r} parse_datetime("2010-10-01T2010") @@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number parse_datetime("20101010") ``` -* `parse_date()`: a year, optional separator, month, optional separator, - day. + This is the most important date/time standard, and if you work with + dates and times frequently, I recommend reading + + +* `parse_date()` expects a year, an optional separator, a month, + an optional separator, and then a day: ```{r} parse_date("2010-10-01") ``` -* `parse_time()`: an hour, optional colon, hour, optional colon, minute, - optional colon, optional seconds, optional am/pm. Base R doesn't have - a great built in class for time data, so we use the one provided in the - hms package. +* `parse_time()` expects an hour, an optional colon, a minute, + an optional colon, optional seconds, and optional am/pm specifier: ```{r} library(hms) parse_time("20:10:01") ``` + + Base R doesn't have a great built in class for time data, so we use + the one provided in the hms package. -If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces: +If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces: Year : `%Y` (4 digits). @@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d") If you're using `%b` or `%B` with non-English month names, you'll need to set the `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`. ```{r} -locale("fr") - parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) ``` -, + ### Exercises -1. What are the most important options to locale? If you live outside the - US, create a new locale object that encapsulates the settings for the - data files you read most commonly. +1. What are the most important arguments to `locale()`? If you live + outside the US, create a new locale object that encapsulates the + settings for the types of file you read most commonly. + +1. What's the difference between `read_csv()` and `read_csv2()`? 1. I didn't discuss the `date_format` and `time_format` options to `locale()`. What do they do? Construct an example that shows when they @@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr")) 1. What are the most common encodings used in Europe? What are the most common encodings used in Asia? +1. Generate the correct format string to parse each of the following + dates and times: + + ```{r} + d1 <- "January 1, 2010" + d2 <- "2015-Mar-07" + d3 <- "06-Jun-2017" + d4 <- "August 19 (2015)" + d5 <- "12/30/14" # Dec 12, 2014 + t1 <- "1705" + t2 <- "11:15:10.12 PM" + ``` + ## Parsing a file Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section: