diff --git a/arrow.qmd b/arrow.qmd index 0226ca9..1cff1c1 100644 --- a/arrow.qmd +++ b/arrow.qmd @@ -19,7 +19,7 @@ But CSV files aren't very efficient: you have to do quite a lot of work to read In this chapter, you'll learn about a powerful alternative: the [parquet format](https://parquet.apache.org/), an open standards-based format widely used by big data systems. We'll pair parquet files with [Apache Arrow](https://arrow.apache.org), a multi-language toolbox designed for efficient analysis and transport of large datasets. -We'll use Apache Arrow via the the [arrow package](https://arrow.apache.org/docs/r/), which provides a dplyr backend allowing you to analyze larger-than-memory datasets using familiar dplyr syntax. +We'll use Apache Arrow via the [arrow package](https://arrow.apache.org/docs/r/), which provides a dplyr backend allowing you to analyze larger-than-memory datasets using familiar dplyr syntax. As an additional benefit, arrow is extremely fast: you'll see some examples later in the chapter. Both arrow and dbplyr provide dplyr backends, so you might wonder when to use each.