From 39132b9a742133fc35d00522ca8fb9483b8ec006 Mon Sep 17 00:00:00 2001 From: Olivier Cailloux Date: Fri, 10 Mar 2023 01:00:58 +0100 Subject: [PATCH] Minors (#1339) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * comma * Quarto link * but instead of and, as it seems to be considered as a good thing then a bad thing * Reduce repetition * Typo ot ⇒ to * Rm spurious comma * TODO ref * Comment about a strange sentence * Comment not in my env * Comment about create ≠ assign * Argument about reading one’s mind * Broken ref comment * Argument about repetition * Argue for reducing repetition * Comment about dplyr * Resolve to dos * Resolve to dos * Update intro.qmd * Update intro.qmd * Resolve to dos * Fix number of workflow chapters --------- Co-authored-by: Olivier Cailloux Co-authored-by: Mine Cetinkaya-Rundel --- data-transform.qmd | 7 +++---- index.qmd | 2 +- intro.qmd | 4 ++-- preface-2e.qmd | 2 +- whole-game.qmd | 4 ++-- 5 files changed, 9 insertions(+), 10 deletions(-) diff --git a/data-transform.qmd b/data-transform.qmd index 63957dc..e504257 100644 --- a/data-transform.qmd +++ b/data-transform.qmd @@ -72,7 +72,7 @@ But before we discuss their individual differences, it's worth stating what they 3. The output is always a new data frame. Because each verb does one thing well, solving complex problems will usually require combining multiple verbs, and we'll do so with the pipe, `|>`. -We'll discuss the pipe more in @the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`. +We'll discuss the pipe more in @sec-the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`. The easiest way to pronounce the pipe is "then". That makes it possible to get a sense of the following code even though you haven't yet learned the details: @@ -320,8 +320,7 @@ Often, the right answer is a new object that is named informatively to indicate It's not uncommon to get datasets with hundreds or even thousands of variables. In this situation, the first challenge is often just focusing on the variables you're interested in. -`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables. -`select()` is not terribly useful with the `flights` data because we only have 19 variables, but you can still get the general idea of how it works: +`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables: - Select columns by name: @@ -467,7 +466,7 @@ ggplot(flights, aes(x = air_time - airtime2)) + geom_histogram() arrange(arr_delay) ``` -## The pipe {#the-pipe} +## The pipe {#sec-the-pipe} We've shown you simple examples of the pipe above, but its real power arises when you start to combine multiple verbs. For example, imagine that you wanted to find the fast flights to Houston's IAH airport: you need to combine `filter()`, `mutate()`, `select()`, and `arrange()`: diff --git a/index.qmd b/index.qmd index 0b63558..5d1f471 100644 --- a/index.qmd +++ b/index.qmd @@ -13,7 +13,7 @@ This website is and will always be free, licensed under the [CC BY-NC-ND 3.0](ht If you'd like a physical copy of the book, you can order the 1st edition on [Amazon](https://amzn.to/2aHLAQ1), or wait until mid-2023 for the 2nd edition. If appreciate reading the book for free and would like to give back please make a donation to [Kākāpō Recovery](https://www.doc.govt.nz/kakapo-donate): the [kākāpō](https://www.youtube.com/watch?v=9T1vfsHYiKY) (which appears on the cover of R4DS) is a critically endangered native NZ parrot; there are only 252 left. -If you speak, another language, you might be interested in the freely available translations of the 1st edition: +If you speak another language, you might be interested in the freely available translations of the 1st edition: - [Spanish](https://es.r4ds.hadley.nz) - [Italian](https://it.r4ds.hadley.nz) diff --git a/intro.qmd b/intro.qmd index 922a59d..05fe41c 100644 --- a/intro.qmd +++ b/intro.qmd @@ -52,7 +52,7 @@ These have complementary strengths and weaknesses, so any real data analysis wil **Visualization** is a fundamentally human activity. A good visualization will show you things you did not expect or raise new questions about the data. A good visualization might also hint that you're asking the wrong question or that you need to collect different data. -Visualizations can surprise you, and they don't scale particularly well because they require a human to interpret them. +Visualizations can surprise you, but they don't scale particularly well because they require a human to interpret them. **Models** are complementary tools to visualization. Once you have made your questions sufficiently precise, you can use a model to answer them. @@ -105,7 +105,7 @@ We'll also show you how to get data out of databases and parquet files, both of You won't necessarily be able to work with the entire dataset, but that's not a problem because you only need a subset or subsample to answer the question that you're interested in. If you're routinely working with larger data (10-100 Gb, say), we recommend learning more about [data.table](https://github.com/Rdatatable/data.table). -We don't teach it here because it uses a different interface to the tidyverse and requires you ot learn some different conventions. +We don't teach it here because it uses a different interface to the tidyverse and requires you to learn some different conventions. However, it is incredible faster and the performance payoff is worth investing some time learning it if you're working with large data. ### Python, Julia, and friends diff --git a/preface-2e.qmd b/preface-2e.qmd index 4df66e5..d68cf11 100644 --- a/preface-2e.qmd +++ b/preface-2e.qmd @@ -27,5 +27,5 @@ A brief summary of the biggest changes follows: We never had enough room to fully do modelling justice, and there are now much better resources available. We generally recommend using the [tidymodels](https://www.tidymodels.org/) packages and reading [Tidy Modeling with R](https://www.tmwr.org/) by Max Kuhn and Julia Silge. -- The communicate part remains, but has been thoroughly updated to feature Quarto instead of R Markdown. +- The communicate part remains, but has been thoroughly updated to feature [Quarto](https://quarto.org/) instead of R Markdown. This edition of the book has been written in quarto, and it's clearly the tool of the future. diff --git a/whole-game.qmd b/whole-game.qmd index 48257ee..d6d3c4a 100644 --- a/whole-game.qmd +++ b/whole-game.qmd @@ -8,7 +8,7 @@ source("_common.R") Our goal in this part of the book is to give you a rapid overview of the main tools of data science: **importing**, **tidying**, **transforming**, and **visualizing data**, as shown in @fig-ds-whole-game. We want to show you the "whole game" of data science giving you just enough of all the major pieces so that you can tackle real, if simple, datasets. -The later parts of the book, will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle. +The later parts of the book will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle. ```{r} #| label: fig-ds-whole-game @@ -39,7 +39,7 @@ Five chapters focus on the tools of data science: - Before you can transform and visualize your data, you need to first get your data into R. In @sec-data-import you'll learn the basics of getting `.csv` files into R. -Nestled among these chapters are five other chapters that focus on your R workflow. +Nestled among these chapters are four other chapters that focus on your R workflow. In @sec-workflow-basics, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code. These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects. Finally, @sec-workflow-getting-help will teach you how to get help and keep learning.