Clarify plaintext and fix punctuation (#1251)

This commit is contained in:
Zeki Akyol 2023-01-28 00:03:36 +03:00 committed by GitHub
parent 96f4ad4b7c
commit feaf9545fd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 5 additions and 5 deletions

View File

@ -275,7 +275,7 @@ It then works through the following questions:
[^data-import-2]: You can override the default of 1000 with the `guess_max` argument.
- Does it contain only `F`, `T`, `FALSE`, or `TRUE` (ignoring case)? If so, it's a logical.
- Does it contain only numbers (e.g., `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it contain only numbers (e.g. `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it match the ISO8601 standard? If so, it's a date or date-time. (We'll return to date-times in more detail in @sec-creating-datetimes).
- Otherwise, it must be a string.

View File

@ -17,7 +17,7 @@ This chapter will introduce you to two important types of joins:
- Filtering joins, which filter observations from one data frame based on whether or not they match an observation in another.
We'll begin by discussing keys, the variables used to connect a pair of data frames in a join.
We cement the theory with an examination of the keys in the nycflights13 datasets, then use that knowledge to start joining data frames together.
We cement the theory with an examination of the keys in the datasets from the nycflights13 package, then use that knowledge to start joining data frames together.
Next we'll discuss how joins work, focusing on their action on the rows.
We'll finish up with a discussion of non-equi-joins, a family of joins that provide a more flexible way of matching keys than the default equality relationship.

View File

@ -20,7 +20,7 @@ We'll finish off with `if_else()` and `case_when()`, two useful functions for ma
### Prerequisites
Most of the functions you'll learn about in this chapter are provided by base R, so we don't need the tidyverse, but we'll still load it so we can use `mutate()`, `filter()`, and friends to work with data frames.
We'll also continue to draw examples from the nycflights13 dataset.
We'll also continue to draw examples from the `nycflights13::flights` dataset.
```{r}
#| label: setup
@ -404,7 +404,7 @@ This works, but what if we wanted to also compute the average delay for flights
We'd need to perform a separate filter step, and then figure out how to combine the two data frames together[^logicals-3].
Instead you could use `[` to perform an inline filtering: `arr_delay[arr_delay > 0]` will yield only the positive arrival delays.
[^logicals-3]: We'll cover this in @sec-joins\]
[^logicals-3]: We'll cover this in @sec-joins.
This leads to:

View File

@ -67,7 +67,7 @@ Note, however, the situation is rather different in Europe where courts have fou
Even if the data is public, you should be extremely careful about scraping personally identifiable information like names, email addresses, phone numbers, dates of birth, etc.
Europe has particularly strict laws about the collection of storage of such data (GDPR), and regardless of where you live you're likely to be entering an ethical quagmire.
For example, in 2016, a group of researchers scraped public profile information (e.g., usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
For example, in 2016, a group of researchers scraped public profile information (e.g. usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
While the researchers felt that there was nothing wrong with this since the data were already public, this work was widely condemned due to ethics concerns around identifiability of users whose information was released in the dataset.
If your work involves scraping personally identifiable information, we strongly recommend reading about the OkCupid study as well as similar studies with questionable research ethics involving the acquisition and release of personally identifiable information.[^webscraping-3]