Use curl::multi_download()

cc @djnavarro

Fixes #1226
This commit is contained in:
Hadley Wickham 2023-01-23 08:18:36 -06:00
parent 707b332c3c
commit f09058420e
2 changed files with 8 additions and 5 deletions

View File

@ -12,6 +12,7 @@ Depends:
Imports:
arrow,
babynames,
curl (>= 5.0.0),
dplyr,
duckdb,
gapminder,

View File

@ -53,16 +53,18 @@ We begin by getting a dataset worthy of these tools: a data set of item checkout
This dataset contains 41,389,465 rows that tell you how many times each book was checked out each month from April 2005 to October 2022.
The following code will get you a cached copy of the data.
The data is a 9GB CSV file, so it will take some time to download: simply getting the data is often the first challenge!
The data is a 9GB CSV file, so it will take some time to download.
I highly recommend using `curl::multidownload()` to get very large files as it's built for exactly this purpose: it gives you a progress bar and it can resume the download if its interrupted.
```{r}
#| eval: false
dir.create("data", showWarnings = FALSE)
url <- "https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv"
# Default timeout is 60s; bump it up to an hour
options(timeout = 60 * 60)
download.file(url, "data/seattle-library-checkouts.csv")
curl::multi_download(
"https://r4ds.s3.us-west-2.amazonaws.com/seattle-library-checkouts.csv",
"data/seattle-library-checkouts.csv",
resume = TRUE
)
```
## Opening a dataset