r4ds/tibble.Rmd

# Tibbles

## Introduction

Throughout this book we work with "tibbles" instead of the traditional data frame. Tibbles _are_ data frames, but tweak some older behaviours to make life a littler easier. R is an old language, and some things that were useful 10 or 20 years ago now get in your way. It's difficult to change base R without breaking existing code, so most innovation occurs in packages. Here we will describe the tibble package, which provides opinionated data frames that make working in the tidyverse a little easier. 

If this chapter leaves you wanting to learn even more about tibbles, you can read more about them in the vignette that is include in the tibble package: `vignette("tibble")`.

### Prerequisites

In this chapter we'll specifically explore the __tibble__ package. Most chapters don't load tibble explicitly, because most of the functions you'll use from tibble are automatically provided by dplyr. You'll only need if you are creating tibbles "by hand".

```{r setup}
library(tibble)
```

## Creating tibbles {#tibbles}

The majority of the functions that you'll use in this book already produce tibbles. If you're working with functions from other packages, you might need to coerce a regular data frame to a tibble. You can do that with `as_tibble()`:

```{r}
as_tibble(iris)
```

`as_tibble()` knows how to convert data frames, lists (provided the elements are vectors with the same length), matrices, and tables. 

You can create a new tibble from individual vectors with `tibble()`:

```{r}
tibble(x = 1:5, y = 1, z = x ^ 2 + y)
```

`tibble()` automatically recycles inputs of length 1, and you can refer to variables that you just created. If you're already familiar with `data.frame()`, note that `tibble()` does much less: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row names.

It's possible for a tibble to have column names that are not valid R variable names, called __non-syntactic__ names. For example, they might not start with a letter, or they might contain unusual values like a space. To refer to these variables, you need to surround them with backticks, `` ` ``:

```{r}
tb <- tibble(
  `:)` = "smile", 
  ` ` = "space",
  `2000` = "number"
)
tb
```

Another way to create a tibble is with `frame_data()`, which is customised for data entry in R code. Column headings are defined by formulas (`~`), and entries are separated by commas:

```{r}
frame_data(
  ~x, ~y, ~z,
  #--|--|----
  "a", 2, 3.6,
  "b", 1, 8.5
)
```

### Exercises

1.  How can you tell if an object is a tibble?

1.  What does `enframe()` do? When might you use it?

1.  Practice referring to non-syntactic names by:

    1.  Plotting a scatterplot of `1` vs `2`.

    1.  Creating a new column called `3` which is `2` divided by `1`.
        
    1.  Renaming the columns to `one`, `two` and `three`. 
    
    ```{r}
    annoying <- tibble(
      `1` = 1:10,
      `2` = `1` * 2 + rnorm(length(`1`))
    )
    ```

## Tibbles vs. data frames

There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.

### Printing

Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. In addition to its name, each column reports its type, a nice feature borrowed from `str()`:

```{r}
tibble(
  a = lubridate::now() + runif(1e3) * 60,
  b = lubridate::today() + runif(1e3),
  c = 1:1e3,
  d = runif(1e3),
  e = sample(letters, 1e3, replace = TRUE)
)
```

To show all the columns in a single tibble, explicitly call `print()` with `width = Inf`:

```{r, eval = FALSE}
nycflights13::flights %>% 
  print(width = Inf)
```

You can also get a scrollable view of the complete data set using RStudio's built-in data viewer. This is often useful at the end of a long chain of manipulations.

```{r, eval = FALSE}
nycflights13::flights %>% View()
```

You can also control the default appearance globally, by setting options:

* `options(tibble.print_max = n, tibble.print_min = m)`: if more than `m`
  rows, print `n` rows. Use `options(dplyr.print_max = Inf)` to always
  show all rows.

* `options(tibble.width = Inf)` will always print all columns, regardless
   of the width of the screen.

You can see a complete list of options by looking at the package help: `package?tibble`.

### Subsetting

Tibbles are strict about subsetting. If you try to access a variable that does not exist, you'll get a warning. Unlike data frames, tibbles do not use partial matching on column names:

```{r}
df <- data.frame(
  abc = 1:10, 
  def = runif(10), 
  xyz = sample(letters, 10)
)
tb <- as_tibble(df)

df$a
tb$a
```

Tibbles clearly delineate `[` and `[[`: `[` always returns another tibble, `[[` always returns a vector.

```{r}
# With data frames, [ sometimes returns a data frame, and sometimes returns 
# a vector
df[, "abc"]

# With tibbles, [ always returns another tibble
tb[, "abc"]

# To extract a single element, you should always use [[
tb[["abc"]]
```

This is useful to know if you want to extract a single column at the end of dplyr pipeline.

### Exercises

1.  How can you print all rows of a tibble?

1.  What option controls how many additional column names are printed
    at the footer of a tibble?

## Interacting with legacy code

Some older functions don't work with tibbles because they expect `df[, 1]` to return a vector, not a data frame. If you encounter one of these functions, use `as.data.frame()` to turn a tibble back to a data frame:

```{r}
class(as.data.frame(tb))
```
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			`# Tibbles`

			`## Introduction`

Tibble tweaks 2016-07-25 04:04:41 +08:00			Throughout this book we work with "tibbles" instead of the traditional data frame. Tibbles _are_ data frames, but tweak some older behaviours to make life a littler easier. R is an old language, and some things that were useful 10 or 20 years ago now get in your way. It's difficult to change base R without breaking existing code, so most innovation occurs in packages. Here we will describe the tibble package, which provides opinionated data frames that make working in the tidyverse a little easier.

Tibble tweaks 2016-07-27 20:47:42 +08:00			If this chapter leaves you wanting to learn even more about tibbles, you can read more about them in the vignette that is include in the tibble package: `vignette("tibble")`.
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			`### Prerequisites`

Tibble tweaks 2016-07-27 20:47:42 +08:00			`In this chapter we'll specifically explore the __tibble__ package. Most chapters don't load tibble explicitly, because most of the functions you'll use from tibble are automatically provided by dplyr. You'll only need if you are creating tibbles "by hand".`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			```{r setup}
			`library(tibble)`
			```

			`## Creating tibbles {#tibbles}`

Update tibble.Rmd (#193) Inserted to in "you might need to coerce a regular data frame a tibble" 2016-07-26 20:42:19 +08:00			The majority of the functions that you'll use in this book already produce tibbles. If you're working with functions from other packages, you might need to coerce a regular data frame to a tibble. You can do that with `as_tibble()`:
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			```{r}
			`as_tibble(iris)`
			```

Tibble tweaks 2016-07-25 04:04:41 +08:00			`as_tibble()` knows how to convert data frames, lists (provided the elements are vectors with the same length), matrices, and tables.
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			You can create a new tibble from individual vectors with `tibble()`:

			```{r}
			`tibble(x = 1:5, y = 1, z = x ^ 2 + y)`
			```

Tibble tweaks 2016-07-27 20:47:42 +08:00			`tibble()` automatically recycles inputs of length 1, and you can refer to variables that you just created. If you're already familiar with `data.frame()`, note that `tibble()` does much less: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row names.
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
Tibble tweaks 2016-07-27 20:47:42 +08:00			It's possible for a tibble to have column names that are not valid R variable names, called __non-syntactic__ names. For example, they might not start with a letter, or they might contain unusual values like a space. To refer to these variables, you need to surround them with backticks, `` ` ``:
Tidy chapter updates 2016-07-26 00:28:05 +08:00
			```{r}
			`tb <- tibble(`
			`:)` = "smile",
			` ` = "space",
			`2000` = "number"
			`)`
			`tb`
			```

Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			Another way to create a tibble is with `frame_data()`, which is customised for data entry in R code. Column headings are defined by formulas (`~`), and entries are separated by commas:

			```{r}
			`frame_data(`
Tibble tweaks 2016-07-27 20:47:42 +08:00			`~x, ~y, ~z,`
			`#--\|--\|----`
			`"a", 2, 3.6,`
			`"b", 1, 8.5`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			`)`
			```

Tibble tweaks 2016-07-25 04:04:41 +08:00			`### Exercises`

Tibble tweaks 2016-07-27 20:47:42 +08:00			`1. How can you tell if an object is a tibble?`
Tibble tweaks 2016-07-25 04:04:41 +08:00
			1. What does `enframe()` do? When might you use it?

Tidy chapter updates 2016-07-26 00:28:05 +08:00			`1. Practice referring to non-syntactic names by:`

			1. Plotting a scatterplot of `1` vs `2`.

			1. Creating a new column called `3` which is `2` divided by `1`.

			1. Renaming the columns to `one`, `two` and `three`.

			```{r}
			`annoying <- tibble(`
			`1` = 1:10,
			`2` = `1` * 2 + rnorm(length(`1`))
			`)`
			```

Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			`## Tibbles vs. data frames`

			`There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting.`

			`### Printing`

			Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data. In addition to its name, each column reports its type, a nice feature borrowed from `str()`:

			```{r}
			`tibble(`
			`a = lubridate::now() + runif(1e3) * 60,`
			`b = lubridate::today() + runif(1e3),`
			`c = 1:1e3,`
			`d = runif(1e3),`
			`e = sample(letters, 1e3, replace = TRUE)`
			`)`
			```

Tibble printing tip 2016-07-22 22:33:12 +08:00			To show all the columns in a single tibble, explicitly call `print()` with `width = Inf`:

			```{r, eval = FALSE}
			`nycflights13::flights %>%`
			`print(width = Inf)`
			```

Tibble tweaks 2016-07-27 20:47:42 +08:00			`You can also get a scrollable view of the complete data set using RStudio's built-in data viewer. This is often useful at the end of a long chain of manipulations.`
Tidy chapter updates 2016-07-26 00:28:05 +08:00
			```{r, eval = FALSE}
			`nycflights13::flights %>% View()`
			```

Tibble tweaks 2016-07-27 20:47:42 +08:00			`You can also control the default appearance globally, by setting options:`

			* `options(tibble.print_max = n, tibble.print_min = m)`: if more than `m`
			rows, print `n` rows. Use `options(dplyr.print_max = Inf)` to always
			`show all rows.`

			* `options(tibble.width = Inf)` will always print all columns, regardless
			`of the width of the screen.`

			You can see a complete list of options by looking at the package help: `package?tibble`.

Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			`### Subsetting`

Tibble tweaks 2016-07-27 20:47:42 +08:00			`Tibbles are strict about subsetting. If you try to access a variable that does not exist, you'll get a warning. Unlike data frames, tibbles do not use partial matching on column names:`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			```{r}
			`df <- data.frame(`
			`abc = 1:10,`
			`def = runif(10),`
			`xyz = sample(letters, 10)`
			`)`
			`tb <- as_tibble(df)`

			`df$a`
			`tb$a`
			```

			Tibbles clearly delineate `[` and `[[`: `[` always returns another tibble, `[[` always returns a vector.

			```{r}
			`# With data frames, [ sometimes returns a data frame, and sometimes returns`
			`# a vector`
Tibble tweaks 2016-07-25 04:04:41 +08:00			`df[, "abc"]`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			`# With tibbles, [ always returns another tibble`
Tibble tweaks 2016-07-25 04:04:41 +08:00			`tb[, "abc"]`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00
			`# To extract a single element, you should always use [[`
Tibble tweaks 2016-07-25 04:04:41 +08:00			`tb[["abc"]]`
Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			```

Tibble tweaks 2016-07-25 04:04:41 +08:00			`This is useful to know if you want to extract a single column at the end of dplyr pipeline.`

			`### Exercises`

			`1. How can you print all rows of a tibble?`

			`1. What option controls how many additional column names are printed`
			`at the footer of a tibble?`

Move tibble to own chapter. 2016-07-19 21:57:22 +08:00			`## Interacting with legacy code`

			Some older functions don't work with tibbles because they expect `df[, 1]` to return a vector, not a data frame. If you encounter one of these functions, use `as.data.frame()` to turn a tibble back to a data frame:

			```{r}
			`class(as.data.frame(tb))`
			```