From b2c4c1d0d05df095761096a39bfdb30deddcf1e6 Mon Sep 17 00:00:00 2001 From: Mitsuo Shiota <48662507+mitsuoxv@users.noreply.github.com> Date: Sun, 21 May 2023 13:08:14 +0900 Subject: [PATCH] Fix/arrow typos (#1481) * a typo * typos --- arrow.qmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arrow.qmd b/arrow.qmd index 1e525cc..b639551 100644 --- a/arrow.qmd +++ b/arrow.qmd @@ -119,7 +119,7 @@ That's not terrible given how much data we have, but we can make it much faster ## The parquet format {#sec-parquet} -To make this data easier to work with, lets switch to the parquet file format and split it up into multiple files. +To make this data easier to work with, let's switch to the parquet file format and split it up into multiple files. The following sections will first introduce you to parquet and partitioning, and then apply what we learned to the Seattle library data. ### Advantages of parquet @@ -290,9 +290,9 @@ The neat thing about `to_duckdb()` is that the transfer doesn't involve any memo ## Summary In this chapter, you've been given a taste of the arrow package, which provides a dplyr backend for working with large on-disk datasets. -It can work with CSV files, its much much faster if you convert your data to parquet. +It can work with CSV files, and it's much much faster if you convert your data to parquet. Parquet is a binary data format that's designed specifically for data analysis on modern computers. -Far fewer tools can work with parquet files compared to CSV, but it's partitioned, compressed, and columnar structure makes it much more efficient to analyze. +Far fewer tools can work with parquet files compared to CSV, but its partitioned, compressed, and columnar structure makes it much more efficient to analyze. Next up you'll learn about your first non-rectangular data source, which you'll handle using tools provided by the tidyr package. We'll focus on data that comes from JSON files, but the general principles apply to tree-like data regardless of its source.