Update arrow chapter code to avoid errors (#1517)
* Add in `col_types` to specify schema * Just use open_dataset()
This commit is contained in:
parent
2674b870ae
commit
c1e1437fd8
|
@ -76,13 +76,15 @@ This means we want to avoid `read_csv()` and instead use the `arrow::open_datase
|
||||||
```{r open-dataset}
|
```{r open-dataset}
|
||||||
seattle_csv <- open_dataset(
|
seattle_csv <- open_dataset(
|
||||||
sources = "data/seattle-library-checkouts.csv",
|
sources = "data/seattle-library-checkouts.csv",
|
||||||
|
col_types = schema(ISBN = string()),
|
||||||
format = "csv"
|
format = "csv"
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
What happens when this code is run?
|
What happens when this code is run?
|
||||||
`open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
|
`open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
|
||||||
Then it records what it's found and stops; it will only read further rows as you specifically request them.
|
The `ISBN` column contains blank values for the first 80,000 rows, so we have to specify the column type to help arrow work out the data structure.
|
||||||
|
Once the data has been scanned by `open_dataset()`, it records what it's found and stops; it will only read further rows as you specifically request them.
|
||||||
This metadata is what we see if we print `seattle_csv`:
|
This metadata is what we see if we print `seattle_csv`:
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
|
Loading…
Reference in New Issue