Update arrow chapter code to avoid errors (#1517)

* Add in `col_types` to specify schema * Just use open_dataset()
2023-07-16 13:29:21 +01:00 · 2023-07-16 13:29:21 +01:00 · c1e1437fd8
parent 2674b870ae
commit c1e1437fd8
1 changed files with 3 additions and 1 deletions
--- a/arrow.qmd
+++ b/arrow.qmd
@ -76,13 +76,15 @@ This means we want to avoid `read_csv()` and instead use the `arrow::open_datase
 ```{r open-dataset}
 seattle_csv <- open_dataset(
  sources = "data/seattle-library-checkouts.csv", 
  col_types = schema(ISBN = string()),
  format = "csv"
 )
 ```
 What happens when this code is run?
 `open_dataset()` will scan a few thousand rows to figure out the structure of the dataset.
-Then it records what it's found and stops; it will only read further rows as you specifically request them.
+The `ISBN` column contains blank values for the first 80,000 rows, so we have to specify the column type to help arrow work out the data structure.
 Once the data has been scanned by `open_dataset()`, it records what it's found and stops; it will only read further rows as you specifically request them.
 This metadata is what we see if we print `seattle_csv`:
 ```{r}