Polish transform intros

This commit is contained in:
Hadley Wickham 2022-11-07 17:06:29 -06:00
parent b1a0b8c39b
commit a3b606dc47
5 changed files with 23 additions and 10 deletions

View File

@ -15,12 +15,12 @@ options(warnPartialMatchArgs = FALSE)
This chapter will show you how to work with dates and times in R.
At first glance, dates and times seem simple.
You use them all the time in your regular life, and they don't seem to cause much confusion.
However, the more you learn about dates and times, the more complicated they seem to get.
To warm up think about how many days there are in a year, and how many hours there are in a day.
However, the more you learn about dates and times, the more complicated they seem to get!
To warm up think about how many days there are in a year, and how many hours there are in a day.
You probably remembered that most years have 365 days, but leap years have 366.
Do you know the full rule for determining if a year is a leap year[^datetimes-1]?
The number of hours in a day is a little less obvious: most days have 24 hours, but if you use daylight saving time (DST), one day each year has 23 hours and another has 25.
The number of hours in a day is a little less obvious: most days have 24 hours, but in places that use daylight saving time (DST), one day each year has 23 hours and another has 25.
[^datetimes-1]: A year is a leap year if it's divisible by 4, unless it's also divisible by 100, except if it's also divisible by 400.
In other words, in every set of 400 years, there's 97 leap years.
@ -28,6 +28,10 @@ The number of hours in a day is a little less obvious: most days have 24 hours,
Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and DST.
This chapter won't teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges.
We'll begin by showing you how to create date-times from various inputs, and then once you've got a date-time, how you can extract components like year, month, and day.
We'll then dive into the tricky topic of working with time spans, which come in a variety of flavors depending on what you're trying to do.
We'll conclude with a brief discussion of the additional challenges posed by time zones.
### Prerequisites
This chapter will focus on the **lubridate** package, which makes it easier to work with dates and times in R.

View File

@ -12,9 +12,9 @@ status("complete")
Factors are used for categorical variables, variables that have a fixed and known set of possible values.
They are also useful when you want to display character vectors in a non-alphabetical order.
If you want to learn more about factors after reading this chapter, we recommend reading Amelia McNamara and Nicholas Horton's paper, [*Wrangling categorical data in R*](https://peerj.com/preprints/3163/).
This paper lays out some of the history discussed in [*stringsAsFactors: An unauthorized biography*](https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/) and [*stringsAsFactors = \<sigh\>*](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh), and compares the tidy approaches to categorical data outlined in this book with base R methods.
An early version of the paper helped motivate and scope the forcats package; thanks Amelia & Nick!
We'll start by motivating why factors are needed for data analysis and how you can create them with `factor()`.
We'll then introduce you to the `gss_cat` dataset which contains a bunch of categorical variables to experiment with.
You'll then use that dataset to practice modifying the order and values of factors, before we finish up with a discussion of ordered factors.
### Prerequisites
@ -66,6 +66,7 @@ Now you can create a factor:
```{r}
y1 <- factor(x1, levels = month_levels)
y1
sort(y1)
```
@ -433,5 +434,9 @@ Given the arguable utility of these differences, we don't generally recommend us
This chapter introduced you to the handy forcats package for working with factors, introducing you to the most commonly used functions.
forcats contains a wide range of other helpers that we didn't have space to discuss here, so whenever you're facing a factor analysis challenge that you haven't encountered before, I highly recommend skimming the [reference index](https://forcats.tidyverse.org/reference/index.html) to see if there's a canned function that can help solve your problem.
If you want to learn more about factors after reading this chapter, we recommend reading Amelia McNamara and Nicholas Horton's paper, [*Wrangling categorical data in R*](https://peerj.com/preprints/3163/).
This paper lays out some of the history discussed in [*stringsAsFactors: An unauthorized biography*](https://simplystatistics.org/posts/2015-07-24-stringsasfactors-an-unauthorized-biography/) and [*stringsAsFactors = \<sigh\>*](https://notstatschat.tumblr.com/post/124987394001/stringsasfactors-sigh), and compares the tidy approaches to categorical data outlined in this book with base R methods.
An early version of the paper helped motivate and scope the forcats package; thanks Amelia & Nick!
In the next chapter we'll switch gears to start learning about dates and times in R.
Dates and times seem deceptively simple, but as you'll soon see, the more you learn about them, the more complex they seem to get!

View File

@ -9,10 +9,12 @@ status("polishing")
## Introduction
In this chapter, you'll learn useful tools for creating and manipulating numeric vectors.
We'll start by going into a little more detail of `count()` before diving into various numeric transformations.
Numeric vectors are the backbone of data science, and you've already used them a bunch of times earlier in the book.
Now it's time to systematically survey what you can do with them in R, ensuring that you're well situated to tackle any future problem involving numeric vectors.
We'll start by going into a little more detail of `count()` before diving into various numeric transformations that pair well with `mutate()`.
You'll then learn about more general transformations that can be applied to other types of vector, but are often used with numeric vectors.
Then you'll learn about a few more useful summaries and how they can also be used with `mutate()`.
We'll finish off by covering the summary functions that pair well with `summarise()` and show you how they can also be used with `mutate()`.
### Prerequisites

View File

@ -18,7 +18,7 @@ The term "regular expression" is a bit of a mouthful, so most people abbreviate
The chapter starts with the basics of regular expressions and the most useful stringr functions for data analysis.
We'll then expand your knowledge of patterns, to cover seven important new topics (escaping, anchoring, character classes, shorthand classes, quantifiers, precedence, and grouping).
Next we'll talk about some of the other types of pattern that stringr functions can work with, and the various "flags" that allow you to tweak the operation of regular expressions.
We'll finish up with a survey of other places in stringr, the tidyverse, and base R where you might use regexes.
We'll finish up with a survey of other places in the tidyverse and base R where you might use regexes.
### Prerequisites

View File

@ -16,6 +16,8 @@ We'll begin with the details of creating strings and character vectors.
You'll then dive into creating strings from data, then the opposite; extracting strings from data.
The chapter finishes up with functions that work with individual letters and a brief discussion of where your expectations from English might steer you wrong when working with other languages.
We'll keep working with strings in the next chapter, where you'll learn more about the power of regular expressions.
### Prerequisites
::: callout-important