Complete column parsing

2016-07-10 07:56:06 -05:00 · 2016-07-10 07:56:06 -05:00 · 9d62e1c23e
parent 785018073b
commit 9d62e1c23e
1 changed files with 36 additions and 18 deletions
--- a/import.Rmd
+++ b/import.Rmd
@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2))

 The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).

-If you'd like to learn more, I'd recommend <http://kunststube.net/encoding/>.
+Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.

 ### Dates, date times, and times

 You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:

-*   `parse_datetime()`: an 
-    [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This
-    is the most important date/time standard, and I recommend that you get
-    a little familiar with it.
+*   `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
+    international standard in which the components of a date are
+    organised from biggest to smallest: year, month, day, hour, minute, 
+    second:
    
    ```{r}
    parse_datetime("2010-10-01T2010")
@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number
    parse_datetime("20101010")
    ```
    
-*   `parse_date()`: a year, optional separator, month, optional separator, 
-    day.
+    This is the most important date/time standard, and if you work with
+    dates and times frequently, I recommend reading
+    <https://en.wikipedia.org/wiki/ISO_8601>
+    
+*   `parse_date()` expects a year, an optional separator, a month, 
+    an optional separator, and then a day:
    
    ```{r}
    parse_date("2010-10-01")
    ```

-*   `parse_time()`: an hour, optional colon, hour, optional colon, minute,
-    optional colon, optional seconds, optional am/pm. Base R doesn't have
-    a great built in class for time data, so we use the one provided in the
-    hms package.
+*   `parse_time()` expects an hour, an optional colon, a minute, 
+    an optional colon, optional seconds, and optional am/pm specifier:
  
    ```{r}
    library(hms)
    parse_time("20:10:01")
    ```
+    
+    Base R doesn't have a great built in class for time data, so we use 
+    the one provided in the hms package.

-If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces:
+If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces:

 Year
 :  `%Y` (4 digits). 
@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d")
 If you're using `%b` or `%B` with non-English month names, you'll need to set the  `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.

 ```{r}
-locale("fr")
-
 parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
 ```
-,
+
 ### Exercises

-1.  What are the most important options to locale?  If you live outside the
-    US, create a new locale object that encapsulates the settings for the
-    data files you read most commonly.
+1.  What are the most important arguments to `locale()`?  If you live
+    outside the US, create a new locale object that encapsulates the
+    settings for the types of file you read most commonly.
+    
+1.  What's the difference between `read_csv()` and `read_csv2()`?
    
 1.  I didn't discuss the `date_format` and `time_format` options to
    `locale()`. What do they do? Construct an example that shows when they
@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
 1.  What are the most common encodings used in Europe? What are the
    most common encodings used in Asia?

+1.  Generate the correct format string to parse each of the following 
+    dates and times:
+    
+    ```{r}
+    d1 <- "January 1, 2010"
+    d2 <- "2015-Mar-07"
+    d3 <- "06-Jun-2017"
+    d4 <- "August 19 (2015)"
+    d5 <- "12/30/14" # Dec 12, 2014
+    t1 <- "1705"
+    t2 <- "11:15:10.12 PM"
+    ```
+
 ## Parsing a file

 Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section: