Update arrow chapter code to avoid errors (#1517)

* Add in `col_types` to specify schema * Just use open_dataset()
2023-07-16 13:29:21 +01:00 · 2023-07-16 13:29:21 +01:00 · c1e1437fd8
parent 2674b870ae
commit c1e1437fd8
1 changed files with 3 additions and 1 deletions
--- a/arrow.qmd
+++ b/arrow.qmd
@ -76,13 +76,15 @@ This means we want to avoid `read_csv()` and instead use the `arrow::open_datase
 ```{r open-dataset}
 seattle_csv <- open_dataset(
  sources = "data/seattle-library-checkouts.csv", 
+  col_types = schema(ISBN = string()),
  format = "csv"
 )
 ```

 What happens when this code is run?
 `open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
-Then it records what it's found and stops; it will only read further rows as you specifically request them.
+The `ISBN` column contains blank values for the first 80,000 rows, so we have to specify the column type to help arrow work out the data structure.
+Once the data has been scanned by `open_dataset()`, it records what it's found and stops; it will only read further rows as you specifically request them.
 This metadata is what we see if we print `seattle_csv`:

 ```{r}