diff --git a/DESCRIPTION b/DESCRIPTION index c4ced78..deb6059 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -12,6 +12,7 @@ Depends: Imports: arrow, babynames, + curl (>= 5.0.0), dplyr, duckdb, gapminder, diff --git a/arrow.qmd b/arrow.qmd index b46c2f0..3393ba2 100644 --- a/arrow.qmd +++ b/arrow.qmd @@ -53,16 +53,18 @@ We begin by getting a dataset worthy of these tools: a data set of item checkout This dataset contains 41,389,465 rows that tell you how many times each book was checked out each month from April 2005 to October 2022. The following code will get you a cached copy of the data. -The data is a 9GB CSV file, so it will take some time to download: simply getting the data is often the first challenge! +The data is a 9GB CSV file, so it will take some time to download. +I highly recommend using `curl::multidownload()` to get very large files as it's built for exactly this purpose: it gives you a progress bar and it can resume the download if its interrupted. ```{r} #| eval: false dir.create("data", showWarnings = FALSE) -url <- "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv" -# Default timeout is 60s; bump it up to an hour -options(timeout = 60 * 60) -download.file(url, "data/seattle-library-checkouts.csv") +curl::multi_download( + "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv", + "data/seattle-library-checkouts.csv", + resume = TRUE +) ``` ## Opening a dataset