Complete column parsing

This commit is contained in:
hadley 2016-07-10 07:56:06 -05:00
parent 785018073b
commit 9d62e1c23e
1 changed files with 36 additions and 18 deletions

View File

@ -256,16 +256,16 @@ guess_encoding(charToRaw(x2))
The first argument to `guess_encoding()` can either be a path to a file, or, as in this case, a raw vector (useful if the strings are already in R).
If you'd like to learn more, I'd recommend <http://kunststube.net/encoding/>.
Encodings are a rich and complex topic, and I've only scratched the surface here. We'll come back to encodings again in [[Encoding]], but if you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
### Dates, date times, and times
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date time (the number of seconds since midnight 1970-01-01), or a time (i.e. the number of seconds since midnight). The defaults read:
* `parse_datetime()`: an
[ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date time. This
is the most important date/time standard, and I recommend that you get
a little familiar with it.
* `parse_datetime()` expects an ISO8601 date time. ISO8691 is an
international standard in which the components of a date are
organised from biggest to smallest: year, month, day, hour, minute,
second:
```{r}
parse_datetime("2010-10-01T2010")
@ -273,24 +273,29 @@ You pick between three parsers depending on whether you want a date (the number
parse_datetime("20101010")
```
* `parse_date()`: a year, optional separator, month, optional separator,
day.
This is the most important date/time standard, and if you work with
dates and times frequently, I recommend reading
<https://en.wikipedia.org/wiki/ISO_8601>
* `parse_date()` expects a year, an optional separator, a month,
an optional separator, and then a day:
```{r}
parse_date("2010-10-01")
```
* `parse_time()`: an hour, optional colon, hour, optional colon, minute,
optional colon, optional seconds, optional am/pm. Base R doesn't have
a great built in class for time data, so we use the one provided in the
hms package.
* `parse_time()` expects an hour, an optional colon, a minute,
an optional colon, optional seconds, and optional am/pm specifier:
```{r}
library(hms)
parse_time("20:10:01")
```
Base R doesn't have a great built in class for time data, so we use
the one provided in the hms package.
If these defaults don't work for your data you can supply your own date time formats, built up of the following pieces:
If these defaults don't work for your data you can supply your own datetime formats, built up of the following pieces:
Year
: `%Y` (4 digits).
@ -335,16 +340,16 @@ parse_date("01/02/15", "%y/%m/%d")
If you're using `%b` or `%B` with non-English month names, you'll need to set the `lang` argument to `locale()`. See the list of built-in languages in `date_names_langs()`, or if your language is not already included, create your own with `date_names()`.
```{r}
locale("fr")
parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
```
,
### Exercises
1. What are the most important options to locale? If you live outside the
US, create a new locale object that encapsulates the settings for the
data files you read most commonly.
1. What are the most important arguments to `locale()`? If you live
outside the US, create a new locale object that encapsulates the
settings for the types of file you read most commonly.
1. What's the difference between `read_csv()` and `read_csv2()`?
1. I didn't discuss the `date_format` and `time_format` options to
`locale()`. What do they do? Construct an example that shows when they
@ -353,6 +358,19 @@ parse_date("1 janvier 2015", "%d %B %Y", locale = locale("fr"))
1. What are the most common encodings used in Europe? What are the
most common encodings used in Asia?
1. Generate the correct format string to parse each of the following
dates and times:
```{r}
d1 <- "January 1, 2010"
d2 <- "2015-Mar-07"
d3 <- "06-Jun-2017"
d4 <- "August 19 (2015)"
d5 <- "12/30/14" # Dec 12, 2014
t1 <- "1705"
t2 <- "11:15:10.12 PM"
```
## Parsing a file
Now that you've learned how to parse an individual vector, it's time to turn back and explore how readr parses a file. There are three new things that you'll learn about in this section: