Add brief mention of factors

Fixes #212
This commit is contained in:
hadley 2016-10-03 12:45:50 -05:00
parent e8a9f8da6b
commit 5ea91ca7ef
1 changed files with 15 additions and 1 deletions

View File

@ -190,6 +190,9 @@ Using parsers is mostly a matter of understanding what's available and how they
1. `parse_character()` seems so simple that it shouldn't be necessary. But
one complication makes it quite important: character encodings.
1. `parse_factor()` create factors, the data structure that R uses to represent
categorical variables with fixed and known values.
1. `parse_datetime()`, `parse_date()`, and `parse_time()` allow you to
parse various date & time specifications. These are the most complicated
because there are so many different ways of writing dates.
@ -240,7 +243,7 @@ parse_number("123.456.789", locale = locale(grouping_mark = "."))
parse_number("123'456'789", locale = locale(grouping_mark = "'"))
```
### Character
### Strings {#readr-strings}
It seems like `parse_character()` should be really simple --- it could just return its input. Unfortunately life isn't so simple, as there are multiple ways to represent the same string. To understand what's going on, we need to dive into the details of how computers represent strings. In R, we can get at the underlying representation of a string using `charToRaw()`:
@ -280,6 +283,17 @@ The first argument to `guess_encoding()` can either be a path to a file, or, as
Encodings are a rich and complex topic, and I've only scratched the surface here. If you'd like to learn more I'd recommend reading the detailed explanation at <http://kunststube.net/encoding/>.
### Factors {#readr-factors}
R uses factors to represent categorical variables that have a known set of possible values. Given `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
```{r}
fruit <- c("apple", "banana")
parse_factor(c("apple", "banana", "bananana"), levels = fruit)
```
If you have problematic entries, it's often easier to read in as strings and then use the tools you'll learn about in [strings] and [factors] to clean them up.
### Dates, date-times, and times {#readr-datetimes}
You pick between three parsers depending on whether you want a date (the number of days since 1970-01-01), a date-time (the number of seconds since midnight 1970-01-01), or a time (the number of seconds since midnight). When called without any additional arguments: