More @jennybc comments

This commit is contained in:
hadley 2016-10-07 08:16:20 -05:00
parent daaa861f74
commit 6afdb03666
4 changed files with 14 additions and 10 deletions

View File

@ -285,14 +285,14 @@ Encodings are a rich and complex topic, and I've only scratched the surface here
### Factors {#readr-factors}
R uses factors to represent categorical variables that have a known set of possible values. Given `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
R uses factors to represent categorical variables that have a known set of possible values. Give `parse_factor()` a vector of known `levels` to generate a warning whenever an unexpected value is present:
```{r}
fruit <- c("apple", "banana")
parse_factor(c("apple", "banana", "bananana"), levels = fruit)
```
If you have problematic entries, it's often easier to read in as strings and then use the tools you'll learn about in [strings] and [factors] to clean them up.
But it you many problematic entries, it's often easier to leave as character vectors and then use the tools you'll learn about in [strings] and [factors] to clean them up.
### Dates, date-times, and times {#readr-datetimes}

View File

@ -90,10 +90,6 @@ For nycflights13:
it contained weather records for all airports in the USA, what additional
relation would it define with `flights`?
1. You might expect that there's an implicit relationship between plane
and airline, because each plane is flown by a single airline. Confirm
or reject this hypothesis using data.
1. We know that some days of the year are "special", and fewer people than
usual fly on them. How might you represent that data as a data frame?
What would be the primary keys of that table? How would it connect to the
@ -531,6 +527,10 @@ flights %>%
1. What does `anti_join(flights, airports, by = c("dest" = "faa"))` tell you?
What does `anti_join(airports, flights, by = c("faa" = "dest"))` tell you?
1. You might expect that there's an implicit relationship between plane
and airline, because each plane is flown by a single airline. Confirm
or reject this hypothesis using the tools you've learned above.
## Join problems
The data you've been working with in this chapter has been cleaned up so that you'll have as few problems as possible. Your own data is unlikely to be so nice, so there are a few things that you should do with your own data to make your joins go smoothly.

View File

@ -158,6 +158,9 @@ The main reason that some older functions don't work with tibble is the `[` func
df[, c("abc", "xyz")]
```
1. If you have the name of a variable stored in an object, e.g. `var <- "mpg"`,
how can you extract the reference variable from a tibble?
1. Practice referring to non-syntactic names in the following data frame by:
1. Extracting the variable called `1`.

View File

@ -340,7 +340,8 @@ table5 %>%
do? Why would you set it to `FALSE`?
1. Compare and contrast `separate()` and `extract()`. Why are there
three variations of separation, but only one unite?
three variations of separation (by position, by separator, and with
groups), but only one unite?
## Missing values
@ -441,7 +442,7 @@ The best place to start is almost always to gather together the columns that are
in the variable names (e.g. `new_sp_m014`, `new_ep_m014`, `new_ep_f014`)
these are likely to be values, not variables.
So we need to gather together all the columns from `new_sp_m3544` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
So we need to gather together all the columns from `new_sp_m014` to `newrel_f65`. We don't know what those values represent yet, so we'll give them the generic name `"key"`. We know the cells represent the count of cases, so we'll use the variable `cases`. There are a lot of missing values in the current representation, so for now we'll use `na.rm` just so we can focus on the values that are present.
```{r}
who1 <- who %>%
@ -539,10 +540,10 @@ who %>%
missing values? What's the difference between an `NA` and zero?
1. What happens if you neglect the `mutate()` step?
(`mutate(key = stringr::str_replace(key, "newrel", "new_rel"))`)
1. I claimed that `iso2` and `iso3` were redundant with `country`.
Confirm my claim by creating a table that uniquely maps from `country`
to `iso2` and `iso3`.
Confirm this claim.
1. For each country, year, and sex compute the total number of cases of
TB. Make an informative visualisation of the data.