Fix/arrow typos (#1481)

* a typo

* typos
This commit is contained in:
Mitsuo Shiota 2023-05-21 13:08:14 +09:00 committed by GitHub
parent 6124b65098
commit b2c4c1d0d0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 3 additions and 3 deletions

View File

@ -119,7 +119,7 @@ That's not terrible given how much data we have, but we can make it much faster
## The parquet format {#sec-parquet}
To make this data easier to work with, lets switch to the parquet file format and split it up into multiple files.
To make this data easier to work with, let's switch to the parquet file format and split it up into multiple files.
The following sections will first introduce you to parquet and partitioning, and then apply what we learned to the Seattle library data.
### Advantages of parquet
@ -290,9 +290,9 @@ The neat thing about `to_duckdb()` is that the transfer doesn't involve any memo
## Summary
In this chapter, you've been given a taste of the arrow package, which provides a dplyr backend for working with large on-disk datasets.
It can work with CSV files, its much much faster if you convert your data to parquet.
It can work with CSV files, and it's much much faster if you convert your data to parquet.
Parquet is a binary data format that's designed specifically for data analysis on modern computers.
Far fewer tools can work with parquet files compared to CSV, but it's partitioned, compressed, and columnar structure makes it much more efficient to analyze.
Far fewer tools can work with parquet files compared to CSV, but its partitioned, compressed, and columnar structure makes it much more efficient to analyze.
Next up you'll learn about your first non-rectangular data source, which you'll handle using tools provided by the tidyr package.
We'll focus on data that comes from JSON files, but the general principles apply to tree-like data regardless of its source.