Try simpler code with latest arrow (#1334)
This commit is contained in:
parent
c6edfb977e
commit
810b9f6a3c
File diff suppressed because one or more lines are too long
|
@ -75,18 +75,12 @@ A good rule of thumb is that you usually want at least twice as much memory as t
|
||||||
This means we want to avoid `read_csv()` and instead use the `arrow::open_dataset()`:
|
This means we want to avoid `read_csv()` and instead use the `arrow::open_dataset()`:
|
||||||
|
|
||||||
```{r open-dataset}
|
```{r open-dataset}
|
||||||
# partial schema for ISBN column only
|
|
||||||
opts <- CsvConvertOptions$create(col_types = schema(ISBN = string()))
|
|
||||||
|
|
||||||
seattle_csv <- open_dataset(
|
seattle_csv <- open_dataset(
|
||||||
sources = "data/seattle-library-checkouts.csv",
|
sources = "data/seattle-library-checkouts.csv",
|
||||||
format = "csv",
|
format = "csv"
|
||||||
convert_options = opts
|
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
(Here we've had to use some relatively advanced code to parse the ISBN variable correctly: this is because the first \~83,000 rows don't contain any data so arrow guesses the wrong types. The arrow team is aware of this problem and there will hopefully be a better approach by the time you read this chapter.)
|
|
||||||
|
|
||||||
What happens when this code is run?
|
What happens when this code is run?
|
||||||
`open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
|
`open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
|
||||||
Then it records what it's found and stops; it will only read further rows as you specifically request them.
|
Then it records what it's found and stops; it will only read further rows as you specifically request them.
|
||||||
|
|
Loading…
Reference in New Issue