Whole game edits (#1184)

* Reflect new part structure

* Mention all chapters

* Hide the ruler

* Crossref diagram

* Fix crossref

* Mention all import chapters

* Fix link to following chapter

* Fix title and summary

* Add intros

* Consistent chunk style?
This commit is contained in:
Mine Cetinkaya-Rundel 2022-12-16 01:41:10 -05:00 committed by GitHub
parent 0b557e0da7
commit 69df813e31
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 40 additions and 20 deletions

View File

@ -9,8 +9,13 @@ status("polishing")
## Introduction
Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to stop learning and start working with your own data.
In this chapter, you'll learn how to read plain-text rectangular files into R.
Working with data provided by R packages is a great way to learn the tools of data science, but at some point you want to apply what you've learned to your own data.
In this chapter, you'll learn the basics of reading data files into R.
Specifically, this chapter will focus on reading plain-text rectangular files.
We'll start with some practical advice for handling features like column names and types and missing data.
You will then learn about reading data from multiple files at once and writing data from R to a file.
Finally, you'll learn how to hand craft data frames in R.
### Prerequisites
@ -25,7 +30,7 @@ library(tidyverse)
## Reading data from a file
To begin we'll focus on the most rectangular data file type: the CSV, short for comma separate values.
To begin we'll focus on the most rectangular data file type: the CSV, short for comma-separated values.
Here is what a simple CSV file looks like.
The first row, commonly called the header row, gives the column names, and the following six rows give the data.
@ -496,7 +501,7 @@ We'll use `tibble()` and `tribble()` later in the book to construct small exampl
In this chapter, you've learned how to load CSV files with `read_csv()` and to do your own data entry with `tibble()` and `tribble()`.
You've learned how csv files work, some of the problems you might encounter, and how to overcome them.
We'll come to data import a few times in this book: @sec-import-databases will show you how to load data from databases, @sec-import-spreadsheets from Excel and googlesheets, @sec-rectangling from JSON, and @sec-scraping from websites.
We'll come to data import a few times in this book: @sec-import-spreadsheets from Excel and googlesheets, @sec-import-databases will show you how to load data from databases, @sec-arrow from parquet files, @sec-rectangling from JSON, and @sec-scraping from websites.
Now that you're writing a substantial amount of R code, it's time to learn more about organizing your code into files and directories.
In the next chapter, you'll learn all about the advantages of scripts and projects, and some of the many tools that they provide to make your life easier.

View File

@ -14,7 +14,9 @@ Often you'll need to create some new variables or summaries to see the most impo
You'll learn how to do all that (and more!) in this chapter, which will introduce you to data transformation using the **dplyr** package and a new dataset on flights that departed New York City in 2013.
The goal of this chapter is to give you an overview of all the key tools for transforming a data frame.
We'll come back these functions in more detail in later chapters, as we start to dig into specific types of data (e.g. numbers, strings, dates).
We'll start with functions that operate on rows and then columns of a data frame.
We will then introduce the ability to work with groups.
We will end the chapter with a case study that showcases these functions in action and we'll come back to the functions in more detail in later chapters, as we start to dig into specific types of data (e.g. numbers, strings, dates).
### Prerequisites

View File

@ -15,7 +15,7 @@ R has several systems for making graphs, but ggplot2 is one of the most elegant
ggplot2 implements the **grammar of graphics**, a coherent system for describing and building graphs.
With ggplot2, you can do more and faster by learning one system and applying it in many places.
This chapter will teach you how to visualize your data using ggplot2.
This chapter will teach you how to visualize your data using **ggplot2**.
We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects -- the fundamental building blocks of ggplot2.
We will then walk you through visualizing distributions of single variables as well as visualizing relationships between two or more variables.
We'll finish off with saving your plots and troubleshooting tips.
@ -567,7 +567,7 @@ In the following sections you will learn about commonly used plots for visualizi
To visualize the relationship between a numerical and a categorical variable we can use side-by-side box plots.
A **boxplot** is a type of visual shorthand for a distribution of values that is popular among statisticians.
Each boxplot consists of:
As shown in @fig-eda-boxplot, each boxplot consists of:
- A box that stretches from the 25th percentile of the distribution to the 75th percentile, a distance known as the interquartile range (IQR).
In the middle of the box is a line that displays the median, i.e. 50th percentile, of the distribution.
@ -579,7 +579,10 @@ Each boxplot consists of:
- A line (or whisker) that extends from each end of the box and goes to the farthest non-outlier point in the distribution.
```{r}
#| label: fig-eda-boxplot
#| echo: false
#| fig-cap: >
#| Diagram depicting how a boxplot is created.
#| fig-alt: >
#| A diagram depicting how a boxplot is created following the steps outlined
#| above.
@ -848,7 +851,7 @@ We started with the basic idea that underpins ggplot2: a visualization is a mapp
You then learned about increasing the complexity and improving the presentation of your plots layer-by-layer.
You also learned about commonly used plots for visualizing the distribution of a single variable as well as for visualizing relationships between two or more variables, by levering additional aesthetic mappings and/or splitting your plot into small multiples using faceting.
We'll use visualizations again and again through out this book, introducing new techniques as we need them as well as do a deeper dive into creating visualizations with ggplot2 in @sec-layers through @sec-eda.
We'll use visualizations again and again through out this book, introducing new techniques as we need them as well as do a deeper dive into creating visualizations with ggplot2 in @sec-layers through @sec-exploratory-data-analysis.
With the basics of visualization under your belt, in the next chapter we're going to switch gears a little and give you some practical workflow advice.
We intersperse workflow advice with data science tools throughout this part of the book because it'll help you stay organize as you write increasing amounts of R code.

View File

@ -356,5 +356,7 @@ knitr::kable(df, format = "markdown")
```
```{r}
#| eval: false
cli:::ruler()
```

View File

@ -9,11 +9,14 @@ A brief summary of the biggest changes follows:
- The first part of the book has been renamed to "Whole game".
The goal of this section is to give you the rough details of the "whole game" of data science before we dive into the details.
- The second part of the book is now called "Transform" and gains new chapters on numbers, logical vectors, and missing values.
- The second part of the book is "Visualize".
This part gives data visualization tools and best practices a more thorough coverage compared to the first edition.
- The third part of the book is now called "Transform" and gains new chapters on numbers, logical vectors, and missing values.
These were previously parts of the data transformation chapter, but needed much more room.
- The third part of the book is called "Wrangle".
It's a new set of chapters that goes beyond reading flat text files to now embrace working with spreadsheets, getting data out of databases, rectangling tree-like data, and scraping data from web sites.
- The fourth part of the book is called "Import".
It's a new set of chapters that goes beyond reading flat text files to now embrace working with spreadsheets, getting data out of databases, working with big data, rectangling hierarchical data, and scraping data from web sites.
- The "Program" part continues, but has been rewritten from top-to-bottom to focus on the most important parts of function writing and iteration.
Function writing now includes sections on how to wrap tidyverse functions (dealing with the challenges of tidy evaluation), since this has become much easier over the last few years.
@ -21,6 +24,8 @@ A brief summary of the biggest changes follows:
- The modeling part has been removed.
We never had enough room to fully do modelling justice, and there are now much better resources available.
We geneally recommend using the [tidymodels](https://www.tidymodels.org/) packages and reading [Tidy Modeling with R](https://www.tmwr.org/) by Max Kuhn and Julia Silge.
We generally recommend using the [tidymodels](https://www.tidymodels.org/) packages and reading [Tidy Modeling with R](https://www.tmwr.org/) by Max Kuhn and Julia Silge.
Other changes include switching from magrittr's pipe (`%>%`) to the base pipe (`|>`) and switching from RMarkdown to Quarto.
- The communicate part continues as well, but features Quarto instead of R Markdown as the tool of choice for authoring reproducible computational documents.
Other changes include switching from magrittr's pipe (`%>%`) to the base pipe (`|>`) and switching the book's source from RMarkdown to Quarto.

View File

@ -40,5 +40,6 @@ Five chapters focus on the tools of data science:
In @sec-data-import you'll learn the basics of getting `.csv` files into R.
Nestled among these chapters that are five other chapters that focus on your R workflow.
In @sec-workflow-basics, @sec-workflow-pipes, @sec-workflow-style, and @sec-workflow-scripts-projects, you'll learn good workflow practices for writing and organizing your R code.
In @sec-workflow-basics, @sec-workflow-pipes, @sec-workflow-style, and @sec-workflow-scripts-projects you'll learn good workflow practices for writing and organizing your R code.
These will set you up for success in the long run, as they'll give you the tools to stay organised when you tackle real projects.
Finally, @sec-workflow-getting-help will teach you how to get help to keep learning.

View File

@ -3,6 +3,7 @@
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("polishing")
```
@ -250,4 +251,3 @@ knitr::include_graphics("screenshots/rstudio-env.png")
Now that you've learned a little more about how R code works, and some tips to help you understand your code when you come back to it in the future.
In the next chapter, we'll continue your data science journey by teaching you about dplyr, the tidyverse package that helps you transform data, whether it's selecting important variables, filtering down to rows of interest, or computing summary statistics.

View File

@ -1,8 +1,9 @@
# Workflow: Getting help {#sec-workflow-getting-help}
# Workflow: getting help {#sec-workflow-getting-help}
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("polishing")
```
@ -128,5 +129,4 @@ This chapter concludes the Whole Game part of the book.
You've now seen the most important parts of the data science process: visualization, transformation, tidying and importing.
Now you've got a holistic view of whole process and we start to get into the the details of small pieces.
The next part of the book, Transform, goes into depth into the different types of variables that you might encounter: logical vectors, numbers, strings, factors, and date-times, and covers important related topics like tibbles, regular expression, missing values, and joins.
There's no need to read these chapters in order; dip in and out as needed for the specific data that you're working with.
The next part of the book, Visualize, does a deeper dive into the grammar of graphics and creating data visualizations with ggplot2, showcases how to use the tools you've learned so far to conduct exploratory data analysis, and introduces good practices for creating plots for communication.

View File

@ -3,6 +3,7 @@
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("complete")
```

View File

@ -359,4 +359,4 @@ In this chapter, you've learned how to organize your R code in scripts (files) a
Much like code style, this may feel like busywork at first.
But as you accumulate more code across multiple projects, you'll learn to appreciate how a little up front organisation can save you a bunch of time down the road.
Next up, we'll switch back to data science tooling to talk about exploratory data analysis (or EDA for short), a philosophy and set of tools that you can use with your data to start to get a sense of what's going on.
Next up, you'll learn about how to get help and how to ask good coding questions.

View File

@ -3,6 +3,7 @@
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
status("polishing")
```
@ -13,7 +14,7 @@ Using a consistent style makes it easier for others (including future-you!) to r
This chapter will introduce to the most important points of the [tidyverse style guide](https://style.tidyverse.org), which is used throughout this book.
Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature.
Additionally, there are some great tools to quickly restyle existing code, like the [styler](https://styler.r-lib.org) package by Lorenz Walthert.
Additionally, there are some great tools to quickly restyle existing code, like the [**styler**](https://styler.r-lib.org) package by Lorenz Walthert.
Once you've installed it with `install.packages("styler")`, an easy way to use it is via RStudio's **command palette**.
The command palette lets you use any build-in RStudio command, as well as many addins provided by packages.
Open the palette by pressing Cmd/Ctrl + Shift + P, then type "styler" to see all the shortcuts provided by styler.