This commit is contained in:
mine-cetinkaya-rundel 2023-03-26 10:03:01 -04:00
commit f023b8a2ee
14 changed files with 37 additions and 49 deletions

View File

@ -19,7 +19,7 @@ But CSV files aren't very efficient: you have to do quite a lot of work to read
In this chapter, you'll learn about a powerful alternative: the [parquet format](https://parquet.apache.org/), an open standards-based format widely used by big data systems.
We'll pair parquet files with [Apache Arrow](https://arrow.apache.org), a multi-language toolbox designed for efficient analysis and transport of large datasets.
We'll use Apache Arrow via the the [arrow package](https://arrow.apache.org/docs/r/), which provides a dplyr backend allowing you to analyze larger-than-memory datasets using familiar dplyr syntax.
We'll use Apache Arrow via the [arrow package](https://arrow.apache.org/docs/r/), which provides a dplyr backend allowing you to analyze larger-than-memory datasets using familiar dplyr syntax.
As an additional benefit, arrow is extremely fast: you'll see some examples later in the chapter.
Both arrow and dbplyr provide dplyr backends, so you might wonder when to use each.

View File

@ -336,7 +336,7 @@ l <- list(
```
The difference between `[` and `[[` is particularly important for lists because `[[` drills down into the list while `[` returns a new, smaller list.
To help you remember the difference, take a look at the an unusual pepper shaker shown in @fig-pepper.
To help you remember the difference, take a look at the unusual pepper shaker shown in @fig-pepper.
If this pepper shaker is your list `pepper`, then, `pepper[1]` is a pepper shaker containing a single pepper packet.
If we suppose this pepper shaker is a list called `pepper`, then `pepper[1]` is a pepper shaker containing a single pepper packet.
`pepper[2]` would look the same, but would contain the second packet.

View File

@ -10,9 +10,9 @@ BarkleyBG,1,Brian G. Barkley,BarkleyBG.netlify.com
BinxiePeterson,1,Bianca Peterson,NA
BirgerNi,1,Birger Niklas,NA
DDClark,1,David Clark,NA
DOH-RPS1303,1,Russell Shean,
DOH-RPS1303,1,Russell Shean,NA
DSGeoff,1,NA,NA
Divider85,3,NA,
Divider85,3,NA,NA
EdwinTh,4,Edwin Thoen,thats-so-random.com
EricKit,1,Eric Kitaif,NA
GeroVanMi,1,Gerome Meyer,https://astralibra.ch
@ -20,12 +20,12 @@ GoldbergData,1,Josh Goldberg,https://twitter.com/GoldbergData
Iain-S,1,Iain,NA
JeffreyRStevens,2,Jeffrey Stevens,https://decisionslab.unl.edu/
JeldorPKU,1,蒋雨蒙,https://jeldorpku.github.io
KittJonathan,10,Jonathan Kitt,
KittJonathan,10,Jonathan Kitt,NA
MJMarshall,2,NA,NA
MarckK,1,Kara de la Marck,https://www.linkedin.com/in/karadelamarck
MattWittbrodt,1,Matt Wittbrodt,mattwittbrodt.com
MatthiasLiew,3,Matthias Liew,
NedJWestern,1,Ned Western,
MatthiasLiew,3,Matthias Liew,NA
NedJWestern,1,Ned Western,NA
Nowosad,6,Jakub Nowosad,https://nowosad.github.io
PursuitOfDataScience,14,Y. Yu,https://youzhi.netlify.app/
RIngyao,1,Jajo,NA
@ -44,7 +44,7 @@ a-rosenberg,1,NA,NA
a2800276,1,Tim Becker,NA
adam-gruer,1,Adam Gruer,adamgruer.rbind.io
adidoit,1,adi pradhan,http://adidoit.github.io
aephidayatuloh,1,Aep Hidyatuloh,
aephidayatuloh,1,Aep Hidyatuloh,NA
agila5,1,Andrea Gilardi,NA
ajay-d,1,Ajay Deonarine,http://deonarine.com/
aleloi,1,NA,NA
@ -70,7 +70,7 @@ bgreenwell,9,Brandon Greenwell,NA
bklamer,11,Brett Klamer,NA
boardtc,1,NA,NA
c-hoh,1,Christian,hohenfeld.is
caddycarine,1,Caddy,
caddycarine,1,Caddy,NA
camillevleonard,1,Camille V Leonard,https://www.camillevleonard.com/
canovasjm,1,NA,NA
cedricbatailler,1,Cedric Batailler,cedricbatailler.me
@ -84,7 +84,7 @@ curtisalexander,1,Curtis Alexander,https://www.calex.org
cwarden,2,Christian G. Warden,http://xn.pinkhamster.net/
cwickham,1,Charlotte Wickham,http://cwick.co.nz
darrkj,1,Kenny Darrell,http://darrkj.github.io/blogs
davidrsch,4,David,
davidrsch,5,David,NA
davidrubinger,1,David Rubinger,NA
derwinmcgeary,1,Derwin McGeary,http://derwinmcgeary.github.io
dgromer,2,Daniel Gromer,NA
@ -97,7 +97,7 @@ dylancashman,1,Dylan Cashman,https://www.eecs.tufts.edu/~dcashm01/
eddelbuettel,1,Dirk Eddelbuettel,http://dirk.eddelbuettel.com
elgabbas,1,Ahmed El-Gabbas,https://elgabbas.github.io
enryH,1,Henry Webel,NA
ercan7,1,Ercan Karadas,
ercan7,1,Ercan Karadas,NA
ericwatt,1,Eric Watt,www.ericdwatt.com
erikerhardt,2,Erik Erhardt,StatAcumen.com
etiennebr,2,Etienne B. Racine,NA
@ -112,7 +112,7 @@ garrettgman,103,Garrett Grolemund,NA
gl-eb,1,Gleb Ebert,glebsite.ch
gridgrad,1,bahadir cankardes,NA
gustavdelius,2,Gustav W Delius,NA
hadley,1151,Hadley Wickham,http://hadley.nz
hadley,1166,Hadley Wickham,http://hadley.nz
hao-trivago,2,Hao Chen,NA
harrismcgehee,7,Harris McGehee,https://gist.github.com/harrismcgehee
hendrikweisser,1,NA,NA
@ -145,7 +145,7 @@ jpetuchovas,1,Justinas Petuchovas,NA
jrdnbradford,1,Jordan,www.linkedin.com/in/jrdnbradford
jrnold,4,Jeffrey Arnold,http://jrnold.me
jroberayalas,7,Jose Roberto Ayala Solares,jroberayalas.netlify.com
jtr13,1,Joyce Robbins,
jtr13,1,Joyce Robbins,NA
juandering,1,NA,NA
jules32,1,Julia Stewart Lowndes,http://jules32.github.io
kaetschap,1,Sonja,NA
@ -168,10 +168,10 @@ matanhakim,1,Matan Hakim,NA
maurolepore,2,Mauro Lepore,https://fgeo.netlify.com/
mbeveridge,7,Mark Beveridge,https://twitter.com/mbeveridge
mcewenkhundi,1,NA,NA
mcsnowface,6,"mcsnowface, PhD",
mcsnowface,6,"mcsnowface, PhD",NA
mfherman,1,Matt Herman,mattherman.info
michaelboerman,1,Michael Boerman,https://michaelboerman.com
mine-cetinkaya-rundel,95,Mine Cetinkaya-Rundel,https://stat.duke.edu/~mc301
mine-cetinkaya-rundel,119,Mine Cetinkaya-Rundel,https://stat.duke.edu/~mc301
mitsuoxv,5,Mitsuo Shiota,https://mitsuoxv.rbind.io/
mjhendrickson,1,Matthew Hendrickson,https://about.me/matthew.j.hendrickson
mmhamdy,1,Mohammed Hamdy,NA
@ -188,10 +188,11 @@ nirmalpatel,2,Nirmal Patel,http://playpowerlabs.com
nischalshrestha,1,Nischal Shrestha,http://nischalshrestha.me
njtierney,1,Nicholas Tierney,http://www.njtierney.com
olivier6088,1,NA,NA
oliviercailloux,1,Olivier Cailloux,https://www.lamsade.dauphine.fr/~ocailloux/
p0bs,1,Robin Penfold,p0bs.com
pabloedug,1,Pablo E. Garcia,NA
padamson,1,Paul Adamson,padamson.github.io
penelopeysm,1,Penelope Y,
penelopeysm,1,Penelope Y,NA
peterhurford,1,Peter Hurford,http://www.peterhurford.com
pkq,4,Patrick Kennedy,NA
pooyataher,1,Pooya Taherkhani,https://gitlab.com/pooyat
@ -238,7 +239,7 @@ werkstattcodes,1,NA,http://werk.statt.codes
wibeasley,2,Will Beasley,http://scholar.google.com/citations?user=ffsJTC0AAAAJ&hl=en
yihui,4,Yihui Xie,https://yihui.name
yimingli,3,Yiming (Paul) Li,https://yimingli.net
yingxingwu,1,NA,
yingxingwu,1,NA,NA
yutannihilation,1,Hiroaki Yutani,https://twitter.com/yutannihilation
yuyu-aung,1,Yu Yu Aung,NA
zachbogart,1,Zach Bogart,zachbogart.com

1 login n name blog
10 BinxiePeterson 1 Bianca Peterson NA
11 BirgerNi 1 Birger Niklas NA
12 DDClark 1 David Clark NA
13 DOH-RPS1303 1 Russell Shean NA
14 DSGeoff 1 NA NA
15 Divider85 3 NA NA
16 EdwinTh 4 Edwin Thoen thats-so-random.com
17 EricKit 1 Eric Kitaif NA
18 GeroVanMi 1 Gerome Meyer https://astralibra.ch
20 Iain-S 1 Iain NA
21 JeffreyRStevens 2 Jeffrey Stevens https://decisionslab.unl.edu/
22 JeldorPKU 1 蒋雨蒙 https://jeldorpku.github.io
23 KittJonathan 10 Jonathan Kitt NA
24 MJMarshall 2 NA NA
25 MarckK 1 Kara de la Marck https://www.linkedin.com/in/karadelamarck
26 MattWittbrodt 1 Matt Wittbrodt mattwittbrodt.com
27 MatthiasLiew 3 Matthias Liew NA
28 NedJWestern 1 Ned Western NA
29 Nowosad 6 Jakub Nowosad https://nowosad.github.io
30 PursuitOfDataScience 14 Y. Yu https://youzhi.netlify.app/
31 RIngyao 1 Jajo NA
44 a2800276 1 Tim Becker NA
45 adam-gruer 1 Adam Gruer adamgruer.rbind.io
46 adidoit 1 adi pradhan http://adidoit.github.io
47 aephidayatuloh 1 Aep Hidyatuloh NA
48 agila5 1 Andrea Gilardi NA
49 ajay-d 1 Ajay Deonarine http://deonarine.com/
50 aleloi 1 NA NA
70 bklamer 11 Brett Klamer NA
71 boardtc 1 NA NA
72 c-hoh 1 Christian hohenfeld.is
73 caddycarine 1 Caddy NA
74 camillevleonard 1 Camille V Leonard https://www.camillevleonard.com/
75 canovasjm 1 NA NA
76 cedricbatailler 1 Cedric Batailler cedricbatailler.me
84 cwarden 2 Christian G. Warden http://xn.pinkhamster.net/
85 cwickham 1 Charlotte Wickham http://cwick.co.nz
86 darrkj 1 Kenny Darrell http://darrkj.github.io/blogs
87 davidrsch 4 5 David NA
88 davidrubinger 1 David Rubinger NA
89 derwinmcgeary 1 Derwin McGeary http://derwinmcgeary.github.io
90 dgromer 2 Daniel Gromer NA
97 eddelbuettel 1 Dirk Eddelbuettel http://dirk.eddelbuettel.com
98 elgabbas 1 Ahmed El-Gabbas https://elgabbas.github.io
99 enryH 1 Henry Webel NA
100 ercan7 1 Ercan Karadas NA
101 ericwatt 1 Eric Watt www.ericdwatt.com
102 erikerhardt 2 Erik Erhardt StatAcumen.com
103 etiennebr 2 Etienne B. Racine NA
112 gl-eb 1 Gleb Ebert glebsite.ch
113 gridgrad 1 bahadir cankardes NA
114 gustavdelius 2 Gustav W Delius NA
115 hadley 1151 1166 Hadley Wickham http://hadley.nz
116 hao-trivago 2 Hao Chen NA
117 harrismcgehee 7 Harris McGehee https://gist.github.com/harrismcgehee
118 hendrikweisser 1 NA NA
145 jrdnbradford 1 Jordan www.linkedin.com/in/jrdnbradford
146 jrnold 4 Jeffrey Arnold http://jrnold.me
147 jroberayalas 7 Jose Roberto Ayala Solares jroberayalas.netlify.com
148 jtr13 1 Joyce Robbins NA
149 juandering 1 NA NA
150 jules32 1 Julia Stewart Lowndes http://jules32.github.io
151 kaetschap 1 Sonja NA
168 maurolepore 2 Mauro Lepore https://fgeo.netlify.com/
169 mbeveridge 7 Mark Beveridge https://twitter.com/mbeveridge
170 mcewenkhundi 1 NA NA
171 mcsnowface 6 mcsnowface, PhD NA
172 mfherman 1 Matt Herman mattherman.info
173 michaelboerman 1 Michael Boerman https://michaelboerman.com
174 mine-cetinkaya-rundel 95 119 Mine Cetinkaya-Rundel https://stat.duke.edu/~mc301
175 mitsuoxv 5 Mitsuo Shiota https://mitsuoxv.rbind.io/
176 mjhendrickson 1 Matthew Hendrickson https://about.me/matthew.j.hendrickson
177 mmhamdy 1 Mohammed Hamdy NA
188 nischalshrestha 1 Nischal Shrestha http://nischalshrestha.me
189 njtierney 1 Nicholas Tierney http://www.njtierney.com
190 olivier6088 1 NA NA
191 oliviercailloux 1 Olivier Cailloux https://www.lamsade.dauphine.fr/~ocailloux/
192 p0bs 1 Robin Penfold p0bs.com
193 pabloedug 1 Pablo E. Garcia NA
194 padamson 1 Paul Adamson padamson.github.io
195 penelopeysm 1 Penelope Y NA
196 peterhurford 1 Peter Hurford http://www.peterhurford.com
197 pkq 4 Patrick Kennedy NA
198 pooyataher 1 Pooya Taherkhani https://gitlab.com/pooyat
239 wibeasley 2 Will Beasley http://scholar.google.com/citations?user=ffsJTC0AAAAJ&hl=en
240 yihui 4 Yihui Xie https://yihui.name
241 yimingli 3 Yiming (Paul) Li https://yimingli.net
242 yingxingwu 1 NA NA
243 yutannihilation 1 Hiroaki Yutani https://twitter.com/yutannihilation
244 yuyu-aung 1 Yu Yu Aung NA
245 zachbogart 1 Zach Bogart zachbogart.com

View File

@ -195,7 +195,7 @@ billboard |>
After the data, there are three key arguments:
- `cols` specifies which columns need to be pivoted, i.e. which columns aren't variables. This argument uses the same syntax as `select()` so here we could use `!c(artist, track, date.entered)` or `starts_with("wk")`.
- `names_to` names of the variable stored in the column names, we named that variable `week`.
- `names_to` names the variable stored in the column names, we named that variable `week`.
- `values_to` names the variable stored in the cell values, we named that variable `rank`.
Note that in the code `"week"` and `"rank"` are quoted because those are new variables we're creating, they don't yet exist in the data when we run the `pivot_longer()` call.
@ -448,7 +448,7 @@ knitr::include_graphics("diagrams/tidy-data/names-and-values.png", dpi = 270)
## Widening data
So far we've used `pivot_longer()` to solve the common class of problems where values have ended up in column names.
Next we'll pivot (HA HA) to `pivot_wider()`, which which makes datasets **wider** by increasing columns and reducing rows and helps when one observation is spread across multiple rows.
Next we'll pivot (HA HA) to `pivot_wider()`, which makes datasets **wider** by increasing columns and reducing rows and helps when one observation is spread across multiple rows.
This seems to arise less commonly in the wild, but it does seem to crop up a lot when dealing with governmental data.
We'll start by looking at `cms_patient_experience`, a dataset from the Centers of Medicare and Medicaid services that collects data about patient experiences:

View File

@ -225,7 +225,7 @@ flights |>
### Exercises
1. In a singe pipeline, find all flights that meet all of the following conditions:
1. In a single pipeline, find all flights that meet all of the following conditions:
- Had an arrival delay of two or more hours
- Flew to Houston (`IAH` or `HOU`)

View File

@ -795,4 +795,4 @@ Working with dates and times can seem harder than necessary, but hopefully this
Even if your data never crosses a day light savings boundary or involves a leap year, the functions need to be able to handle it.
The next chapter gives a round up of missing values.
You've seen them in a few places and have no doubt encounter in your own analysis, and it's how time to provide a grab bag of useful techniques for dealing with them.
You've seen them in a few places and have no doubt encounter in your own analysis, and it's now time to provide a grab bag of useful techniques for dealing with them.

View File

@ -72,7 +72,7 @@ df |> mutate(
You might be able to puzzle out that this rescales each column to have a range from 0 to 1.
But did you spot the mistake?
When Hadley wrote this code he made an error when copying-and-pasting and forgot to change an `a` to a `b`.
Preventing this type of mistake of is one very good reason to learn how to write functions.
Preventing this type of mistake is one very good reason to learn how to write functions.
### Writing a function
@ -611,7 +611,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
```{r}
#| eval: false
weather |> standardize_time(sched_dep_time)
flights |> standardize_time(sched_dep_time)
```
2. For each of the following functions list all arguments that use tidy evaluation and describe whether they use data-masking or tidy-selection: `distinct()`, `count()`, `group_by()`, `rename_with()`, `slice_min()`, `slice_sample()`.

View File

@ -11,7 +11,7 @@ You'll also learn how to manage cognitive resources to facilitate discoveries wh
This website is and will always be free, licensed under the [CC BY-NC-ND 3.0](https://creativecommons.org/licenses/by-nc-nd/3.0/us/) License.
If you'd like a physical copy of the book, you can order the 1st edition on [Amazon](https://amzn.to/2aHLAQ1), or wait until mid-2023 for the 2nd edition.
If appreciate reading the book for free and would like to give back please make a donation to [Kākāpō Recovery](https://www.doc.govt.nz/kakapo-donate): the [kākāpō](https://www.youtube.com/watch?v=9T1vfsHYiKY) (which appears on the cover of R4DS) is a critically endangered native NZ parrot; there are only 252 left.
If you appreciate reading the book for free and would like to give back, please make a donation to [Kākāpō Recovery](https://www.doc.govt.nz/kakapo-donate): the [kākāpō](https://www.youtube.com/watch?v=9T1vfsHYiKY) (which appears on the cover of R4DS) is a critically endangered parrot native to New Zealand; there are only 248 left.
If you speak another language, you might be interested in the freely available translations of the 1st edition:

View File

@ -199,7 +199,7 @@ In other words, the complement to the tidyverse is not the messyverse but many o
As you tackle more data science projects with R, you'll learn new packages and new ways of thinking about data.
We'll use many packages from outside the tidyverse in this book.
For example, we use the following packages to that provide interesting data sets:
For example, we'll use the following packages because they provide interesting data sets for us to work with in the process of learning R:
```{r}
#| eval: false
@ -252,20 +252,7 @@ Throughout the book, we use a consistent set of conventions to refer to code:
## Acknowledgments
This book isn't just the product of Hadley, Mine, and Garrett but is the result of many conversations (in person and online) that we've had with many people in the R community.
There are a few people we'd like to thank in particular because they have spent many hours answering our questions and helping us to better think about data science:
- Jenny Bryan and Lionel Henry for many helpful discussions around working with lists and list-columns.
- The three chapters on workflow were adapted (with permission) from <https://stat545.com/block002_hello-r-workspace-wd-project.html> by Jenny Bryan.
- Yihui Xie for his work on the [bookdown](https://github.com/rstudio/bookdown) package and for tirelessly responding to my feature requests.
- Bill Behrman for his thoughtful reading of the entire book and for trying it out with his data science class at Stanford.
- The #rstats Twitter community who reviewed all of the draft chapters and provided tons of helpful feedback.
This book was written in the open, and many people contributed pull requests to fix minor problems.
Special thanks go to everyone who contributed via GitHub:
We're incredibly grateful for all the conversations we've had with y'all; thank you so much!
```{r}
#| eval: false
@ -277,7 +264,7 @@ contribs_all_json <- gh::gh("/repos/:owner/:repo/contributors",
repo = "r4ds",
.limit = Inf
)
contribs_all <- tibble(
contribs_all <- tibble(,
login = contribs_all_json %>% map_chr("login"),
n = contribs_all_json %>% map_int("contributions")
)
@ -319,7 +306,7 @@ contributors <- contributors %>%
desc = ifelse(is.na(name), login, paste0(name, " (", login, ")"))
)
cat("A big thank you to all ", nrow(contributors), " people who contributed specific improvements via GitHub pull requests (in alphabetical order by username): ", sep = "")
cat("This book was written in the open, and many people contributed via pull requests. A special thanks to all ",nrow(contributors), " of you who contributed improvements via GitHub pull requests (in alphabetical order by username): ", sep = "")
cat(paste0(contributors$desc, collapse = ", "))
cat(".\n")
```

View File

@ -118,7 +118,7 @@ In simple cases, as above, this will be a single existing function.
This is a pretty special feature of R: we're passing one function (`median`, `mean`, `str_flatten`, ...) to another function (`across`).
This is one of the features that makes R a functional programming language.
It's important to note that we're passing this function to `across()`, so `across()` can call it; we're calling it ourselves.
It's important to note that we're passing this function to `across()`, so `across()` can call it; we're not calling it ourselves.
That means the function name should never be followed by `()`.
If you forget, you'll get an error:
@ -538,7 +538,7 @@ list(
)
```
So we can use `map()` get a list of 12 data frames:
So we can use `map()` to get a list of 12 data frames:
```{r}
files <- map(paths, readxl::read_excel)

View File

@ -373,7 +373,7 @@ x <- 1:10
cumsum(x)
```
If you need more complex rolling or sliding aggregates, try the [slider](https://davisvaughan.github.io/slider/) package by Davis Vaughan.
If you need more complex rolling or sliding aggregates, try the [slider](https://slider.r-lib.org/) package by Davis Vaughan.
### Exercises

View File

@ -484,7 +484,7 @@ sentences |>
str_view()
```
If you want extract the matches for each group you can use `str_match()`.
If you want to extract the matches for each group you can use `str_match()`.
But `str_match()` returns a matrix, so it's not particularly easy to work with[^regexps-8]:
[^regexps-8]: Mostly because we never discuss matrices in this book!
@ -554,7 +554,7 @@ str_match(x, "gr(?:e|a)y")
## Pattern control
It's possible to exercise extra control over the details of the match by using a pattern object instead of just a string.
This allows you control the so called regex flags and match various types of fixed strings, as described below.
This allows you to control the so called regex flags and match various types of fixed strings, as described below.
### Regex flags {#sec-flags}

View File

@ -226,7 +226,7 @@ df <- tribble(
"Marvin", "nectarine",
"Terence", "cantaloupe",
"Terence", "papaya",
"Terence", "madarin"
"Terence", "mandarin"
)
df |>
group_by(name) |>

View File

@ -70,7 +70,7 @@ Note, however, the situation is rather different in Europe where courts have fou
### Personally identifiable information
Even if the data is public, you should be extremely careful about scraping personally identifiable information like names, email addresses, phone numbers, dates of birth, etc.
Europe has particularly strict laws about the collection of storage of such data ([GDPR](https://gdpr-info.eu/)), and regardless of where you live you're likely to be entering an ethical quagmire.
Europe has particularly strict laws about the collection or storage of such data ([GDPR](https://gdpr-info.eu/)), and regardless of where you live you're likely to be entering an ethical quagmire.
For example, in 2016, a group of researchers scraped public profile information (e.g. usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
While the researchers felt that there was nothing wrong with this since the data were already public, this work was widely condemned due to ethics concerns around identifiability of users whose information was released in the dataset.
If your work involves scraping personally identifiable information, we strongly recommend reading about the OkCupid study[^webscraping-4] as well as similar studies with questionable research ethics involving the acquisition and release of personally identifiable information.
@ -81,7 +81,7 @@ If your work involves scraping personally identifiable information, we strongly
Finally, you also need to worry about copyright law.
Copyright law is complicated, but it's worth taking a look at the [US law](https://www.law.cornell.edu/uscode/text/17/102) which describes exactly what's protected: "\[...\] original works of authorship fixed in any tangible medium of expression, \[...\]".
It then goes on to describe specific categories that it applies like literary works, musical works, motions pictures and more.
It then goes on to describe specific categories that it applies like literary works, musical works, motion pictures and more.
Notably absent from copyright protection are data.
This means that as long as you limit your scraping to facts, copyright protection does not apply.
(But note that Europe has a separate "[sui generis](https://en.wikipedia.org/wiki/Database_right)" right that protects databases.)