Update import.Rmd (#251)

Typo
This commit is contained in:
S'busiso Mkhondwane 2016-08-13 16:02:48 +02:00 committed by Hadley Wickham
parent 2c5a19e5ff
commit d67c997d21
1 changed files with 3 additions and 3 deletions

View File

@ -250,7 +250,7 @@ charToRaw("Hadley")
Each hexadecimal number represents a byte of information: `48` is H, `61` is a, and so on. The mapping from hexadecimal number to character is called the encoding, and in this case the encoding is called ASCII. ASCII does a great job of representing English characters, because it's the __American__ Standard Code for Information Interchange.
Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you need to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!).
Things get more complicated for languages other than English. In the early days of computing there were many competing standards for encoding non-English characters, and to correctly interpret a string you needed to know both the values and the encoding. For example, two common encodings are Latin1 (aka ISO-8859-1, used for Western European languages) and Latin2 (aka ISO-8859-2, used for Eastern European languages). In Latin1, the byte `b1` is "±", but in Latin2, it's "ą"! Fortunately, today there is one standard that is supported almost everywhere: UTF-8. UTF-8 can encode just about every character used by humans today, as well as many extra symbols (like emoji!).
readr uses UTF-8 everywhere: it assumes your data is UTF-8 encoded when you read it, and always uses it when writing. This is a good default, but will fail for data produced by older systems that don't understand UTF-8. If this happens to you, your strings will look weird when you print them. Sometimes just one or two characters might be messed up; other times you'll get complete gibberish. For example:
@ -340,7 +340,7 @@ Time
: `%M` minutes.
: `%S` integer seconds.
: `%OS` real seconds.
: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware abbreviations:
: `%Z` Time zone (as name, e.g. `America/Chicago`). Beware of abbreviations:
if you're American, note that "EST" is a Canadian time zone that does not
have daylight savings time. It is \emph{not} Eastern Standard Time! We'll
come back to this [time zones].
@ -628,6 +628,6 @@ To get other types of data into R, we recommend starting with the tidyverse pack
__RSQLite__, __RPostgreSQL__ etc) allows you to run SQL queries against a
database and return a data frame.
For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. whichYou will need to convert them to data frames using the tools on [handling hierarchy].
For hierarchical data: use __jsonlite__ (by Jeroen Ooms) for json, and __xml2__ for XML. You will need to convert them to data frames using the tools on [handling hierarchy].
For other file types, try the [R data import/export manual](https://cran.r-project.org/doc/manuals/r-release/R-data.html) and the [__rio__](https://github.com/leeper/rio) package.