diff --git a/import.Rmd b/import.Rmd index 3b3bca0..9a07d15 100644 --- a/import.Rmd +++ b/import.Rmd @@ -548,7 +548,7 @@ There are a few other general strategies to help you parse files: frame. ```{r} - df <- tibble::tibble( + df <- tibble( x = c("1", "2", "3"), y = c("1.21", "2.32", "4.56") ) diff --git a/iteration.Rmd b/iteration.Rmd index 64ca68f..56809ae 100644 --- a/iteration.Rmd +++ b/iteration.Rmd @@ -31,7 +31,7 @@ library(tidyverse) Imagine we have this simple tibble: ```{r} -df <- tibble::tibble( +df <- tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), @@ -172,7 +172,7 @@ There are four variations on the basic theme of the for loop: Sometimes you want to use a for loop to modify an existing object. For example, remember our challenge from [functions]. We wanted to rescale every column in a data frame: ```{r} -df <- tibble::tibble( +df <- tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), @@ -373,7 +373,7 @@ For loops are not as important in R as they are in other languages because R is To see why this is important, consider (again) this simple data frame: ```{r} -df <- tibble::tibble( +df <- tibble( a = rnorm(10), b = rnorm(10), c = rnorm(10), @@ -781,7 +781,7 @@ knitr::include_graphics("diagrams/lists-pmap-named.png") Since the arguments are all the same length, it makes sense to store them in a data frame: ```{r} -params <- tibble::tribble( +params <- tribble( ~mean, ~sd, ~n, 5, 1, 1, 10, 5, 3, @@ -818,7 +818,7 @@ knitr::include_graphics("diagrams/lists-invoke.png") The first argument is a list of functions or character vector of function names. The second argument is a list of lists giving the arguments that vary for each function. The subsequent arguments are passed on to every function. -And again, you can use `tibble::tribble()` to make creating these matching pairs a little easier: +And again, you can use `tribble()` to make creating these matching pairs a little easier: ```{r, eval = FALSE} sim <- tribble( @@ -918,12 +918,12 @@ Sometimes you have a complex list that you want to reduce to a simple list by re ```{r} dfs <- list( - age = tibble::tibble(name = "John", age = 30), - sex = tibble::tibble(name = c("John", "Mary"), sex = c("M", "F")), - trt = tibble::tibble(name = "Mary", treatment = "A") + age = tibble(name = "John", age = 30), + sex = tibble(name = c("John", "Mary"), sex = c("M", "F")), + trt = tibble(name = "Mary", treatment = "A") ) -dfs %>% reduce(dplyr::full_join) +dfs %>% reduce(full_join) ``` Or maybe you have a list of vectors, and want to find the intersection: @@ -970,7 +970,7 @@ x %>% accumulate(`+`) But it has a number of bugs as illustrated with the following inputs: ```{r, eval = FALSE} - df <- tibble::tibble( + df <- tibble( x = 1:3, y = 3:1, z = c("a", "b", "c") diff --git a/model-assess.Rmd b/model-assess.Rmd index 3d1a933..2957609 100644 --- a/model-assess.Rmd +++ b/model-assess.Rmd @@ -225,7 +225,7 @@ Both the boostrap and cross-validation are build on top of a "resample" object. These functions return an object of class "resample", which represents the resample in a memory efficient way. Instead of storing the resampled dataset itself, it instead stores the integer indices, and a "pointer" to the original dataset. This makes resamples take up much less memory. ```{r} -x <- resample_bootstrap(tibble::as_tibble(mtcars)) +x <- resample_bootstrap(as_tibble(mtcars)) class(x) x @@ -268,7 +268,7 @@ When you start dealing with many models, it's helpful to have some rough way of One way to capture the quality of the model is to summarise the distribution of the residuals. For example, you could look at the quantiles of the absolute residuals. For this dataset, 25% of predictions are less than \$7,400 away, and 75% are less than \$25,800 away. That seems like quite a bit of error when predicting someone's income! ```{r} -heights <- tibble::as_tibble(readRDS("data/heights.RDS")) +heights <- tibble(readRDS("data/heights.RDS")) h <- lm(income ~ height, data = heights) h diff --git a/model-basics.Rmd b/model-basics.Rmd index 21e1c6d..d14dc6b 100644 --- a/model-basics.Rmd +++ b/model-basics.Rmd @@ -324,7 +324,7 @@ You've seen formulas before when using `facet_wrap()` and `facet_grid()`. In R, The majority of modelling functions in R use a standard conversion from formulas to functions. You've seen one simple conversion already: `y ~ x` is translated to `y = a_1 + a_2 * x`. If you want to see what R actually does, you can use the `model_matrix()` function. It takes a data frame and a formula and returns a tibble that defines the model equation: each column in the output is associated with one coefficient in the model, the function is always `y = a_1 * out1 + a_2 * out_2`. For the simplest case of `y ~ x1` this shows us something interesting: ```{r} -df <- tibble::tribble( +df <- tribble( ~y, ~x1, ~x2, 4, 2, 5, 5, 1, 6 @@ -353,7 +353,7 @@ The following sections expand on how this formula notation works for categorcal Generating a function from a formula is straight forward when the predictor is continuous, but things get a bit more complicated when the predictor is categorical. Imagine you have a formula like `y ~ sex`, where sex could either be male or female. It doesn't make sense to convert that to a formula like `y = x_0 + x_1 * sex` because `sex` isn't a number - you can't multiply it! Instead what R does is convert it to `y = x_0 + x_1 * sex_male` where `sex_male` is one if `sex` is male and zero otherwise: ```{r, echo = FALSE} -df <- tibble::tribble( +df <- tribble( ~ sex, ~ response, "male", 1, "female", 2, @@ -665,7 +665,7 @@ sim6 %>% Missing values obviously can not convey any information about the relationship between the variables, so modelling functions will drop any rows that contain missing values. R's default behaviour is to silently drop them, but `options(na.action = na.warn)` (run in the prerequisites), makes sure you get a warning. ```{r} -df <- tibble::frame_data( +df <- tribble( ~x, ~y, 1, 2.2, 2, NA, diff --git a/model-many.Rmd b/model-many.Rmd index ac9c36d..21aeb79 100644 --- a/model-many.Rmd +++ b/model-many.Rmd @@ -368,7 +368,7 @@ df %>% Another example of this pattern is using the `map()`, `map2()`, `pmap()` from purrr. For example, we could take the final example from [Invoking different functions] and rewrite it to use `mutate()`: ```{r} -sim <- tibble::tribble( +sim <- tribble( ~f, ~params, "runif", list(min = -1, max = -1), "rnorm", list(sd = 5), @@ -420,7 +420,7 @@ x <- list( c = 5:6 ) -df <- tibble::enframe(x) +df <- enframe(x) df ``` diff --git a/rmarkdown.Rmd b/rmarkdown.Rmd index 748ff7a..0ef11d1 100644 --- a/rmarkdown.Rmd +++ b/rmarkdown.Rmd @@ -352,7 +352,7 @@ rmarkdown::render("fuel-economy.Rmd", params = list(my_class = "suv")) This is particularly powerful in conjunction with `purrr:pwalk()`. The following example creates a report for each value of `class` found in `mpg`. ```{r, eval = FALSE} -reports <- tibble::tibble( +reports <- tibble( class = unique(mpg$class), filename = stringr::str_c("fuel-economy-", class, ".html"), params = purrr::map(class, ~ list(my_class = .)) diff --git a/tidy.Rmd b/tidy.Rmd index 9379740..770a78d 100644 --- a/tidy.Rmd +++ b/tidy.Rmd @@ -205,7 +205,7 @@ As you might have guessed from the common `key` and `value` arguments, `spread() Carefully consider the following example: ```{r, eval = FALSE} - stocks <- tibble::tibble( + stocks <- tibble( year = c(2015, 2015, 2016, 2016), half = c( 1, 2, 1, 2), return = c(1.88, 0.59, 0.92, 0.17) @@ -231,7 +231,7 @@ As you might have guessed from the common `key` and `value` arguments, `spread() the problem? ```{r} - people <- tibble::tribble( + people <- tribble( ~name, ~key, ~value, #-----------------|--------|------ "Phillip Woods", "age", 45, @@ -246,7 +246,7 @@ As you might have guessed from the common `key` and `value` arguments, `spread() What are the variables? ```{r} - preg <- tibble::tribble( + preg <- tribble( ~pregnant, ~male, ~female, "yes", NA, 10, "no", 20, 12 @@ -329,10 +329,10 @@ table5 %>% Experiment with the various options for the following two toy datasets. ```{r, eval = FALSE} - tibble::tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% + tibble(x = c("a,b,c", "d,e,f,g", "h,i,j")) %>% separate(x, c("one", "two", "three")) - tibble::tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% + tibble(x = c("a,b,c", "d,e", "f,g,i")) %>% separate(x, c("one", "two", "three")) ``` @@ -352,7 +352,7 @@ Changing the representation of a dataset brings up an important subtlety of miss Let's illustrate this idea with a very simple data set: ```{r} -stocks <- tibble::tibble( +stocks <- tibble( year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016), qtr = c( 1, 2, 3, 4, 2, 3, 4), return = c(1.88, 0.59, 0.35, NA, 0.92, 0.17, 2.66) @@ -396,7 +396,7 @@ stocks %>% There's one other important tool that you should know for working with missing values. Sometimes when a data source has primarily been used for data entry, missing values indicate that the previous value should be carried forward: ```{r} -treatment <- tibble::tribble( +treatment <- tribble( ~ person, ~ treatment, ~response, "Derrick Whitmore", 1, 7, NA, 2, 10, diff --git a/transform.Rmd b/transform.Rmd index b487493..3c6bb84 100644 --- a/transform.Rmd +++ b/transform.Rmd @@ -625,7 +625,7 @@ When I plot the skill of the batter (measured by the batting average, `ba`) agai ```{r} # Convert to a tibble so it prints nicely -batting <- tibble::as_tibble(Lahman::Batting) +batting <- as_tibble(Lahman::Batting) batters <- batting %>% group_by(playerID) %>% diff --git a/vectors.Rmd b/vectors.Rmd index 842fee5..52124cf 100644 --- a/vectors.Rmd +++ b/vectors.Rmd @@ -268,11 +268,11 @@ Here, R will expand the shortest vector to the same length as the longest, so ca While vector recycling can be used to create very succinct, clever code, it can also silently conceal problems. For this reason, the vectorised functions in tidyverse will throw errors when you recycle anything other than a scalar. If you do want to recycle, you'll need to do it yourself with `rep()`: ```{r, error = TRUE} -tibble::tibble(x = 1:4, y = 1:2) +tibble(x = 1:4, y = 1:2) -tibble::tibble(x = 1:4, y = rep(1:2, 2)) +tibble(x = 1:4, y = rep(1:2, 2)) -tibble::tibble(x = 1:4, y = rep(1:2, each = 2)) +tibble(x = 1:4, y = rep(1:2, each = 2)) ``` ### Naming vectors @@ -286,7 +286,7 @@ c(x = 1, y = 2, z = 4) Or after the fact with `purrr::set_names()`: ```{r} -purrr::set_names(1:3, c("a", "b", "c")) +set_names(1:3, c("a", "b", "c")) ``` Named vectors are most useful for subsetting, described next. diff --git a/visualize.Rmd b/visualize.Rmd index 6d54577..9003474 100644 --- a/visualize.Rmd +++ b/visualize.Rmd @@ -498,7 +498,7 @@ Stats are the most subtle part of plotting because you can't see them directly. me map the height of the bars to the raw values of a $y$ variable. ```{r} - demo <- tibble::tibble( + demo <- tibble( a = c("bar_1", "bar_2", "bar_3"), b = c(20, 30, 40) )