parent
6124b65098
commit
b2c4c1d0d0
|
@ -119,7 +119,7 @@ That's not terrible given how much data we have, but we can make it much faster
|
||||||
|
|
||||||
## The parquet format {#sec-parquet}
|
## The parquet format {#sec-parquet}
|
||||||
|
|
||||||
To make this data easier to work with, lets switch to the parquet file format and split it up into multiple files.
|
To make this data easier to work with, let's switch to the parquet file format and split it up into multiple files.
|
||||||
The following sections will first introduce you to parquet and partitioning, and then apply what we learned to the Seattle library data.
|
The following sections will first introduce you to parquet and partitioning, and then apply what we learned to the Seattle library data.
|
||||||
|
|
||||||
### Advantages of parquet
|
### Advantages of parquet
|
||||||
|
@ -290,9 +290,9 @@ The neat thing about `to_duckdb()` is that the transfer doesn't involve any memo
|
||||||
## Summary
|
## Summary
|
||||||
|
|
||||||
In this chapter, you've been given a taste of the arrow package, which provides a dplyr backend for working with large on-disk datasets.
|
In this chapter, you've been given a taste of the arrow package, which provides a dplyr backend for working with large on-disk datasets.
|
||||||
It can work with CSV files, its much much faster if you convert your data to parquet.
|
It can work with CSV files, and it's much much faster if you convert your data to parquet.
|
||||||
Parquet is a binary data format that's designed specifically for data analysis on modern computers.
|
Parquet is a binary data format that's designed specifically for data analysis on modern computers.
|
||||||
Far fewer tools can work with parquet files compared to CSV, but it's partitioned, compressed, and columnar structure makes it much more efficient to analyze.
|
Far fewer tools can work with parquet files compared to CSV, but its partitioned, compressed, and columnar structure makes it much more efficient to analyze.
|
||||||
|
|
||||||
Next up you'll learn about your first non-rectangular data source, which you'll handle using tools provided by the tidyr package.
|
Next up you'll learn about your first non-rectangular data source, which you'll handle using tools provided by the tidyr package.
|
||||||
We'll focus on data that comes from JSON files, but the general principles apply to tree-like data regardless of its source.
|
We'll focus on data that comes from JSON files, but the general principles apply to tree-like data regardless of its source.
|
||||||
|
|
Loading…
Reference in New Issue