From 445b1a0748f00f6abdb839321fd7eec744f0d693 Mon Sep 17 00:00:00 2001 From: hadley Date: Fri, 20 Nov 2015 06:18:36 +1300 Subject: [PATCH] Add exercises. Update adverbs --- lists.Rmd | 41 +++++++++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/lists.Rmd b/lists.Rmd index 87d8e15..f08a646 100644 --- a/lists.Rmd +++ b/lists.Rmd @@ -228,6 +228,24 @@ compute_summary(x, mean) Instead of hardcoding the summary function, we allow it to vary, by adding an addition argument that is a function. It can take a while to wrap your head around this, but it's very powerful technique. This is one of the reasons that R is known as a "functional" programming language. +### Exercises + +1. Read the documentation for `apply()`. In the 2d case, what two for loops + does it generalise? + +1. It's common to see for loops that don't preallocate the output and instead + increase the length of a vector at each step: + + ```{r} + results <- vector("integer", 0) + for (i in seq_along(x)) { + results <- results(c, results) + } + results + ``` + + How does this impact performance? + ## The map functions This pattern of looping over a list and doing something to each element is so common that the purrr package provides a family of functions to do it for you. Each function always returns the same type of output so there are six variations based on what sort of result you want: @@ -304,14 +322,25 @@ If you're familiar with the apply family of functions in base R, you might have `map_lgl(df, is.numeric)`. One advantage to `vapply()` over the map functions is that it can also produce matrices. +### Exercises + +1. How can you determine which columns in a data frame are factors? + (Hint: data frames are lists.) + +1. What happens when you use the map functions on vectors that aren't lists? + What does `map(1:5, runif)` do? Why? + +1. What does `map(-2:2, rnorm, n = 5)` do. Why? + ## Pipelines -`map()` is particularly useful when constructing more complex transformations because it both inputs and outputs a list. That makes it well suited for solving a problem a piece at a time. For example, imagine you want to fit a linear model to each individual in a dataset. +`map()` is particularly useful when constructing more complex transformations because it both inputs and outputs a list. That makes it well suited for solving a problem a piece at a time. -Let's start by working through the whole process on the complete dataset. It's always a good idea to start simple (with a single object), and figure out the basic workflow. Then you can generalise up to the harder problem of applying the same steps to multiple models. TODO: find interesting dataset +For example, imagine you want to fit a linear model to each individual in a dataset. Let's start by working through the whole process on the complete dataset. It's always a good idea to start simple (with a single object), and figure out the basic workflow. Then you can generalise up to the harder problem of applying the same steps to multiple models. + You could start by creating a list where each element is a data frame for a different person: ```{r} @@ -407,12 +436,12 @@ Other predicate functionals: `head_while()`, `tail_while()`, `some()`, `every()` When you start doing many operations with purrr, you'll soon discover that not everything always succeeds. For example, you might be fitting a bunch of more complicated models, and not every model will converge. How do you ensure that one bad apple doesn't ruin the whole barrel? -Dealing with errors is fundamentally painful because errors are sort of a side-channel to the way that functions usually return values. The best way to handle them is to turn them into a regular output with the `safe()` function. This function is similar to the `try()` function in base R, but instead of sometimes returning the original output and sometimes returning a error, `safe()` always returns the same type of object: a list with elements `result` and `error`. For any given run, one will always be `NULL`, but because the structure is always the same its easier to deal with. +Dealing with errors is fundamentally painful because errors are sort of a side-channel to the way that functions usually return values. The best way to handle them is to turn them into a regular output with the `safely()` function. This function is similar to the `try()` function in base R, but instead of sometimes returning the original output and sometimes returning a error, `safe()` always returns the same type of object: a list with elements `result` and `error`. For any given run, one will always be `NULL`, but because the structure is always the same its easier to deal with. Let's illustrate this with a simple example: `log()`: ```{r} -safe_log <- safe(log) +safe_log <- safely(log) str(safe_log(10)) str(safe_log("a")) ``` @@ -459,10 +488,10 @@ dplyr::filter(all, is_ok) Other related functions: -* `maybe()`: if you don't care about the error message, and instead +* `possibly()`: if you don't care about the error message, and instead just want a default value on failure. -* `outputs()`: does a similar job but for other outputs like printed +* `quietly()`: does a similar job but for other outputs like printed ouput, messages, and warnings. Challenge: read all the csv files in this directory. Which ones failed