parent
6124b65098
commit
b2c4c1d0d0
|
@ -119,7 +119,7 @@ That's not terrible given how much data we have, but we can make it much faster
|
|||
|
||||
## The parquet format {#sec-parquet}
|
||||
|
||||
To make this data easier to work with, lets switch to the parquet file format and split it up into multiple files.
|
||||
To make this data easier to work with, let's switch to the parquet file format and split it up into multiple files.
|
||||
The following sections will first introduce you to parquet and partitioning, and then apply what we learned to the Seattle library data.
|
||||
|
||||
### Advantages of parquet
|
||||
|
@ -290,9 +290,9 @@ The neat thing about `to_duckdb()` is that the transfer doesn't involve any memo
|
|||
## Summary
|
||||
|
||||
In this chapter, you've been given a taste of the arrow package, which provides a dplyr backend for working with large on-disk datasets.
|
||||
It can work with CSV files, its much much faster if you convert your data to parquet.
|
||||
It can work with CSV files, and it's much much faster if you convert your data to parquet.
|
||||
Parquet is a binary data format that's designed specifically for data analysis on modern computers.
|
||||
Far fewer tools can work with parquet files compared to CSV, but it's partitioned, compressed, and columnar structure makes it much more efficient to analyze.
|
||||
Far fewer tools can work with parquet files compared to CSV, but its partitioned, compressed, and columnar structure makes it much more efficient to analyze.
|
||||
|
||||
Next up you'll learn about your first non-rectangular data source, which you'll handle using tools provided by the tidyr package.
|
||||
We'll focus on data that comes from JSON files, but the general principles apply to tree-like data regardless of its source.
|
||||
|
|
Loading…
Reference in New Issue