* comma

* Quarto link

* but instead of and, as it seems to be considered as a good thing then a bad thing

* Reduce repetition

* Typo ot ⇒ to

* Rm spurious comma

* TODO ref

* Comment about a strange sentence

* Comment not in my env

* Comment about create ≠ assign

* Argument about reading one’s mind

* Broken ref comment

* Argument about repetition

* Argue for reducing repetition

* Comment about dplyr

* Resolve to dos

* Resolve to dos

* Update intro.qmd

* Update intro.qmd

* Resolve to dos

* Fix number of workflow chapters

---------

Co-authored-by: Olivier Cailloux <olivier.cailloux@gmail.com>
Co-authored-by: Mine Cetinkaya-Rundel <cetinkaya.mine@gmail.com>
This commit is contained in:
Olivier Cailloux 2023-03-10 01:00:58 +01:00 committed by GitHub
parent bac06d00f2
commit 39132b9a74
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 9 additions and 10 deletions

View File

@ -72,7 +72,7 @@ But before we discuss their individual differences, it's worth stating what they
3. The output is always a new data frame.
Because each verb does one thing well, solving complex problems will usually require combining multiple verbs, and we'll do so with the pipe, `|>`.
We'll discuss the pipe more in @the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`.
We'll discuss the pipe more in @sec-the-pipe, but in brief, the pipe takes the thing on its left and passes it along to the function on its right so that `x |> f(y)` is equivalent to `f(x, y)`, and `x |> f(y) |> g(z)` is equivalent to into `g(f(x, y), z)`.
The easiest way to pronounce the pipe is "then".
That makes it possible to get a sense of the following code even though you haven't yet learned the details:
@ -320,8 +320,7 @@ Often, the right answer is a new object that is named informatively to indicate
It's not uncommon to get datasets with hundreds or even thousands of variables.
In this situation, the first challenge is often just focusing on the variables you're interested in.
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables.
`select()` is not terribly useful with the `flights` data because we only have 19 variables, but you can still get the general idea of how it works:
`select()` allows you to rapidly zoom in on a useful subset using operations based on the names of the variables:
- Select columns by name:
@ -467,7 +466,7 @@ ggplot(flights, aes(x = air_time - airtime2)) + geom_histogram()
arrange(arr_delay)
```
## The pipe {#the-pipe}
## The pipe {#sec-the-pipe}
We've shown you simple examples of the pipe above, but its real power arises when you start to combine multiple verbs.
For example, imagine that you wanted to find the fast flights to Houston's IAH airport: you need to combine `filter()`, `mutate()`, `select()`, and `arrange()`:

View File

@ -13,7 +13,7 @@ This website is and will always be free, licensed under the [CC BY-NC-ND 3.0](ht
If you'd like a physical copy of the book, you can order the 1st edition on [Amazon](https://amzn.to/2aHLAQ1), or wait until mid-2023 for the 2nd edition.
If appreciate reading the book for free and would like to give back please make a donation to [Kākāpō Recovery](https://www.doc.govt.nz/kakapo-donate): the [kākāpō](https://www.youtube.com/watch?v=9T1vfsHYiKY) (which appears on the cover of R4DS) is a critically endangered native NZ parrot; there are only 252 left.
If you speak, another language, you might be interested in the freely available translations of the 1st edition:
If you speak another language, you might be interested in the freely available translations of the 1st edition:
- [Spanish](https://es.r4ds.hadley.nz)
- [Italian](https://it.r4ds.hadley.nz)

View File

@ -52,7 +52,7 @@ These have complementary strengths and weaknesses, so any real data analysis wil
**Visualization** is a fundamentally human activity.
A good visualization will show you things you did not expect or raise new questions about the data.
A good visualization might also hint that you're asking the wrong question or that you need to collect different data.
Visualizations can surprise you, and they don't scale particularly well because they require a human to interpret them.
Visualizations can surprise you, but they don't scale particularly well because they require a human to interpret them.
**Models** are complementary tools to visualization.
Once you have made your questions sufficiently precise, you can use a model to answer them.
@ -105,7 +105,7 @@ We'll also show you how to get data out of databases and parquet files, both of
You won't necessarily be able to work with the entire dataset, but that's not a problem because you only need a subset or subsample to answer the question that you're interested in.
If you're routinely working with larger data (10-100 Gb, say), we recommend learning more about [data.table](https://github.com/Rdatatable/data.table).
We don't teach it here because it uses a different interface to the tidyverse and requires you ot learn some different conventions.
We don't teach it here because it uses a different interface to the tidyverse and requires you to learn some different conventions.
However, it is incredible faster and the performance payoff is worth investing some time learning it if you're working with large data.
### Python, Julia, and friends

View File

@ -27,5 +27,5 @@ A brief summary of the biggest changes follows:
We never had enough room to fully do modelling justice, and there are now much better resources available.
We generally recommend using the [tidymodels](https://www.tidymodels.org/) packages and reading [Tidy Modeling with R](https://www.tmwr.org/) by Max Kuhn and Julia Silge.
- The communicate part remains, but has been thoroughly updated to feature Quarto instead of R Markdown.
- The communicate part remains, but has been thoroughly updated to feature [Quarto](https://quarto.org/) instead of R Markdown.
This edition of the book has been written in quarto, and it's clearly the tool of the future.

View File

@ -8,7 +8,7 @@ source("_common.R")
Our goal in this part of the book is to give you a rapid overview of the main tools of data science: **importing**, **tidying**, **transforming**, and **visualizing data**, as shown in @fig-ds-whole-game.
We want to show you the "whole game" of data science giving you just enough of all the major pieces so that you can tackle real, if simple, datasets.
The later parts of the book, will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.
The later parts of the book will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.
```{r}
#| label: fig-ds-whole-game
@ -39,7 +39,7 @@ Five chapters focus on the tools of data science:
- Before you can transform and visualize your data, you need to first get your data into R.
In @sec-data-import you'll learn the basics of getting `.csv` files into R.
Nestled among these chapters are five other chapters that focus on your R workflow.
Nestled among these chapters are four other chapters that focus on your R workflow.
In @sec-workflow-basics, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
These will set you up for success in the long run, as they'll give you the tools to stay organized when you tackle real projects.
Finally, @sec-workflow-getting-help will teach you how to get help and keep learning.