Tibble tweaks

This commit is contained in:
hadley 2016-07-27 07:47:42 -05:00
parent 2a93db4ff3
commit 6fdcf51930
1 changed files with 24 additions and 23 deletions

View File

@ -4,11 +4,11 @@
Throughout this book we work with "tibbles" instead of the traditional data frame. Tibbles _are_ data frames, but tweak some older behaviours to make life a littler easier. R is an old language, and some things that were useful 10 or 20 years ago now get in your way. It's difficult to change base R without breaking existing code, so most innovation occurs in packages. Here we will describe the tibble package, which provides opinionated data frames that make working in the tidyverse a little easier.
If this chapter leaves you wanting to learn even more about tibbles, you can read more about them in the accompanying vignette: `vignette("tibble")`.
If this chapter leaves you wanting to learn even more about tibbles, you can read more about them in the vignette that is include in the tibble package: `vignette("tibble")`.
### Prerequisites
In this chapter we'll specifically explore the tibble package. Most chapters don't load tibble explicitly, because most of the functions you'll use from tibble are automatically provided by dplyr. You'll only need if you are creating tibbles "by hand".
In this chapter we'll specifically explore the __tibble__ package. Most chapters don't load tibble explicitly, because most of the functions you'll use from tibble are automatically provided by dplyr. You'll only need if you are creating tibbles "by hand".
```{r setup}
library(tibble)
@ -30,9 +30,9 @@ You can create a new tibble from individual vectors with `tibble()`:
tibble(x = 1:5, y = 1, z = x ^ 2 + y)
```
`tibble()` automatically recycles inputs of length 1, and you can refer to variables that you just created. Compared to `data.frame()`, `tibble()` does much less: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row names.
`tibble()` automatically recycles inputs of length 1, and you can refer to variables that you just created. If you're already familiar with `data.frame()`, note that `tibble()` does much less: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, and it never creates row names.
It's possible for a tibble to have column names that are not valid R variables, or __non-syntactic__ names. For example, they might not start with a letter, or they might contain unusual values like a space. To refer to these variables, you need to surround them with backticks, `` ` ``:
It's possible for a tibble to have column names that are not valid R variable names, called __non-syntactic__ names. For example, they might not start with a letter, or they might contain unusual values like a space. To refer to these variables, you need to surround them with backticks, `` ` ``:
```{r}
tb <- tibble(
@ -47,15 +47,16 @@ Another way to create a tibble is with `frame_data()`, which is customised for d
```{r}
frame_data(
~x, ~y, ~z,
"a", 2, 3.6,
"b", 1, 8.5
~x, ~y, ~z,
#--|--|----
"a", 2, 3.6,
"b", 1, 8.5
)
```
### Exercises
1. What function tells you if an object is a tibble?
1. How can you tell if an object is a tibble?
1. What does `enframe()` do? When might you use it?
@ -92,7 +93,20 @@ tibble(
)
```
You can control the default appearance with options:
To show all the columns in a single tibble, explicitly call `print()` with `width = Inf`:
```{r, eval = FALSE}
nycflights13::flights %>%
print(width = Inf)
```
You can also get a scrollable view of the complete data set using RStudio's built-in data viewer. This is often useful at the end of a long chain of manipulations.
```{r, eval = FALSE}
nycflights13::flights %>% View()
```
You can also control the default appearance globally, by setting options:
* `options(tibble.print_max = n, tibble.print_min = m)`: if more than `m`
rows, print `n` rows. Use `options(dplyr.print_max = Inf)` to always
@ -101,24 +115,11 @@ You can control the default appearance with options:
* `options(tibble.width = Inf)` will always print all columns, regardless
of the width of the screen.
To show all the columns in a single tibble, explicitly call `print()` with `width = Inf`:
```{r, eval = FALSE}
nycflights13::flights %>%
print(width = Inf)
```
You can see a complete list of options by looking at the package help: `package?tibble`.
Remember, you can also get a nicer view of the data set using RStudio's built-in data viewer. This is often useful at the end of a long chain of manipulations.
```{r, eval = FALSE}
nycflights13::flights %>% View()
```
### Subsetting
Tibbles are stricter about subsetting. If you try to access a variable that does not exist, you'll get a warning. Unlike data frames, tibbles do not use partial matching on column names:
Tibbles are strict about subsetting. If you try to access a variable that does not exist, you'll get a warning. Unlike data frames, tibbles do not use partial matching on column names:
```{r}
df <- data.frame(