Welcome to the second edition of “R for Data Science”! This is a major reworking of the first edition, removing material we no longer think is useful, adding material we wish we included in the first edition, and generally updating the text and code to reflect changes in best practices. We’re also very excited to welcome a new co-author: Mine Çetinkaya-Rundel, a noted data science educator and one of our colleagues at Posit (the company formerly known as RStudio).
A brief summary of the biggest changes follows:
The first part of the book has been renamed to “Whole game”. The goal of this section is to give you the rough details of the “whole game” of data science before we dive into the details.
The second part of the book is “Visualize”. This part gives data visualization tools and best practices a more thorough coverage compared to the first edition.
The third part of the book is now called “Transform” and gains new chapters on numbers, logical vectors, and missing values. These were previously parts of the data transformation chapter, but needed much more room.
The fourth part of the book is called “Import”. It’s a new set of chapters that goes beyond reading flat text files to now embrace working with spreadsheets, getting data out of databases, working with big data, rectangling hierarchical data, and scraping data from web sites.
The “Program” part continues, but has been rewritten from top-to-bottom to focus on the most important parts of function writing and iteration. Function writing now includes sections on how to wrap tidyverse functions (dealing with the challenges of tidy evaluation), since this has become much easier over the last few years. We’ve added a new chapter on important Base R functions that you’re likely to see when reading R code found in the wild.
The modeling part has been removed. We never had enough room to fully do modelling justice, and there are now much better resources available. We generally recommend using the tidymodels packages and reading Tidy Modeling with R by Max Kuhn and Julia Silge.
The communicate part continues as well, but features Quarto instead of R Markdown as the tool of choice for authoring reproducible computational documents.
Other changes include switching from magrittr’s pipe (%>%
) to the base pipe (|>
) and switching the book’s source from RMarkdown to Quarto.