Try simpler code with latest arrow (#1334)

2023-03-07 08:05:43 -06:00 · 2023-03-07 08:05:43 -06:00 · 810b9f6a3c
parent c6edfb977e
commit 810b9f6a3c
2 changed files with 3 additions and 9 deletions
--- a/_freeze/arrow/execute-results/html.json
+++ b/_freeze/arrow/execute-results/html.json
--- a/arrow.qmd
+++ b/arrow.qmd
@ -75,18 +75,12 @@ A good rule of thumb is that you usually want at least twice as much memory as t
 This means we want to avoid `read_csv()` and instead use the `arrow::open_dataset()`:

 ```{r open-dataset}
-# partial schema for ISBN column only
-opts <- CsvConvertOptions$create(col_types = schema(ISBN = string()))
-
 seattle_csv <- open_dataset(
  sources = "data/seattle-library-checkouts.csv", 
-  format = "csv",
-  convert_options = opts
+  format = "csv"
 )
 ```

-(Here we've had to use some relatively advanced code to parse the ISBN variable correctly: this is because the first \~83,000 rows don't contain any data so arrow guesses the wrong types. The arrow team is aware of this problem and there will hopefully be a better approach by the time you read this chapter.)
-
 What happens when this code is run?
 `open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
 Then it records what it's found and stops; it will only read further rows as you specifically request them.