Comma after e.g.

This commit is contained in:
mine-cetinkaya-rundel 2023-04-10 11:22:08 -04:00
parent cae4b89e77
commit b01bb5a061
20 changed files with 40 additions and 40 deletions

View File

@ -221,7 +221,7 @@ We might also suspect that measurements of 32mm and 59mm are implausible: those
It's good practice to repeat your analysis with and without the outliers.
If they have minimal effect on the results, and you can't figure out why they're there, it's reasonable to omit them, and move on.
However, if they have a substantial effect on your results, you shouldn't drop them without justification.
You'll need to figure out what caused them (e.g. a data entry error) and disclose that you removed them in your write-up.
You'll need to figure out what caused them (e.g., a data entry error) and disclose that you removed them in your write-up.
### Exercises

View File

@ -73,7 +73,7 @@ ggplot(mpg, aes(x = displ, y = hwy)) +
```
The purpose of a plot title is to summarize the main finding.
Avoid titles that just describe what the plot is, e.g. "A scatterplot of engine displacement vs. fuel economy".
Avoid titles that just describe what the plot is, e.g., "A scatterplot of engine displacement vs. fuel economy".
If you need to add more text, there are two other useful labels: `subtitle` adds additional detail in a smaller font beneath the title and `caption` adds text at the bottom right of the plot, often used to describe the source of the data.
You can also use `labs()` to replace the axis and legend titles.

View File

@ -297,7 +297,7 @@ It then works through the following questions:
[^data-import-2]: You can override the default of 1000 with the `guess_max` argument.
- Does it contain only `F`, `T`, `FALSE`, or `TRUE` (ignoring case)? If so, it's a logical.
- Does it contain only numbers (e.g. `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it contain only numbers (e.g., `1`, `-4.5`, `5e6`, `Inf`)? If so, it's a number.
- Does it match the ISO8601 standard? If so, it's a date or date-time. (We'll return to date-times in more detail in @sec-creating-datetimes).
- Otherwise, it must be a string.

View File

@ -423,7 +423,7 @@ household |>
)
```
We again use `values_drop_na = TRUE`, since the shape of the input forces the creation of explicit missing variables (e.g. for families with only one child).
We again use `values_drop_na = TRUE`, since the shape of the input forces the creation of explicit missing variables (e.g., for families with only one child).
@fig-pivot-names-and-values illustrates the basic idea with a simpler example.
When you use `".value"` in `names_to`, the column names in the input contribute to both values and variable names in the output.

View File

@ -17,7 +17,7 @@ You'll learn how to do all that (and more!) in this chapter, which will introduc
The goal of this chapter is to give you an overview of all the key tools for transforming a data frame.
We'll start with functions that operate on rows and then columns of a data frame, then circle back to talk more about the pipe, an important tool that you use to combine verbs.
We will then introduce the ability to work with groups.
We will end the chapter with a case study that showcases these functions in action and we'll come back to the functions in more detail in later chapters, as we start to dig into specific types of data (e.g. numbers, strings, dates).
We will end the chapter with a case study that showcases these functions in action and we'll come back to the functions in more detail in later chapters, as we start to dig into specific types of data (e.g., numbers, strings, dates).
### Prerequisites
@ -673,7 +673,7 @@ daily_flights <- daily |>
)
```
Alternatively, change the default behavior by setting a different value, e.g. `"drop"` to drop all grouping or `"keep"` to preserve the same groups.
Alternatively, change the default behavior by setting a different value, e.g., `"drop"` to drop all grouping or `"keep"` to preserve the same groups.
### Ungrouping

View File

@ -555,7 +555,7 @@ The easiest way to see the full set of what's currently available is to visit th
## Function translations {#sec-sql-expressions}
So far we've focused on the big picture of how dplyr verbs are translated to the clauses of a query.
Now we're going to zoom in a little and talk about the translation of the R functions that work with individual columns, e.g. what happens when you use `mean(x)` in a `summarize()`?
Now we're going to zoom in a little and talk about the translation of the R functions that work with individual columns, e.g., what happens when you use `mean(x)` in a `summarize()`?
To help see what's going on, we'll use a couple of little helper functions that run a `summarize()` or `mutate()` and show the generated SQL.
That will make it a little easier to explore a few variations and see how summaries and transformations can differ.

View File

@ -427,7 +427,7 @@ There are two terms to look for in the docs which correspond to the two most com
- **Tidy-selection**: this is used for functions like `select()`, `relocate()`, and `rename()` that select variables.
Your intuition about which arguments use tidy evaluation should be good for many common functions --- just think about whether you can compute (e.g. `x + 1`) or select (e.g. `a:x`).
Your intuition about which arguments use tidy evaluation should be good for many common functions --- just think about whether you can compute (e.g., `x + 1`) or select (e.g., `a:x`).
In the following sections, we'll explore the sorts of handy functions you might write once you understand embracing.
@ -607,7 +607,7 @@ While our examples have mostly focused on dplyr, tidy evaluation also underpins
weather |> summarize_weather(temp)
```
5. Converts the user supplied variable that uses clock time (e.g. `dep_time`, `arr_time`, etc.) into a decimal time (i.e. hours + (minutes / 60)).
5. Converts the user supplied variable that uses clock time (e.g., `dep_time`, `arr_time`, etc.) into a decimal time (i.e. hours + (minutes / 60)).
```{r}
#| eval: false

View File

@ -188,7 +188,7 @@ Surrogate keys can be particular useful when communicating to other humans: it's
3. The `year`, `month`, `day`, `hour`, and `origin` variables almost form a compound key for `weather`, but there's one hour that has duplicate observations.
Can you figure out what's special about that hour?
4. We know that some days of the year are special and fewer people than usual fly on them (e.g. Christmas eve and Christmas day).
4. We know that some days of the year are special and fewer people than usual fly on them (e.g., Christmas eve and Christmas day).
How might you represent that data as a data frame?
What would be the primary key?
How would it connect to the existing data frames?

View File

@ -429,7 +429,7 @@ ggplot(mpg, aes(x = hwy, y = drv, fill = drv, color = drv)) +
```
The best place to get a comprehensive overview of all of the geoms ggplot2 offers, as well as all functions in the package, is the reference page: <https://ggplot2.tidyverse.org/reference>.
To learn more about any single geom, use the help (e.g. `?geom_smooth`).
To learn more about any single geom, use the help (e.g., `?geom_smooth`).
### Exercises
@ -735,7 +735,7 @@ However, there are three reasons why you might need to use a stat explicitly:
```
ggplot2 provides more than 20 stats for you to use.
Each stat is a function, so you can get help in the usual way, e.g. `?stat_bin`.
Each stat is a function, so you can get help in the usual way, e.g., `?stat_bin`.
### Exercises

View File

@ -567,7 +567,7 @@ We don't expect you to memorize these rules, but they should become second natur
3. Use `ifelse()` to compute the absolute value of a numeric vector called `x`.
4. Write a `case_when()` statement that uses the `month` and `day` columns from `flights` to label a selection of important US holidays (e.g. New Years Day, 4th of July, Thanksgiving, and Christmas).
4. Write a `case_when()` statement that uses the `month` and `day` columns from `flights` to label a selection of important US holidays (e.g., New Years Day, 4th of July, Thanksgiving, and Christmas).
First create a logical column that is either `TRUE` or `FALSE`, and then create a character column that either gives the name of the holiday or is `NA`.
## Summary

View File

@ -71,7 +71,7 @@ coalesce(x, 0)
Sometimes you'll hit the opposite problem where some concrete value actually represents a missing value.
This typically arises in data generated by older software that doesn't have a proper way to represent missing values, so it must instead use some special value like 99 or -999.
If possible, handle this when reading in the data, for example, by using the `na` argument to `readr::read_csv()`, e.g. `read_csv(path, na = "99")`.
If possible, handle this when reading in the data, for example, by using the `na` argument to `readr::read_csv()`, e.g., `read_csv(path, na = "99")`.
If you discover the problem later, or your data source doesn't provide a way to handle on it read, you can use `dplyr::na_if()`:
```{r}

View File

@ -405,7 +405,7 @@ The following sections describe some general transformations which are often use
### Ranks
dplyr provides a number of ranking functions inspired by SQL, but you should always start with `dplyr::min_rank()`.
It uses the typical method for dealing with ties, e.g. 1st, 2nd, 2nd, 4th.
It uses the typical method for dealing with ties, e.g., 1st, 2nd, 2nd, 4th.
```{r}
x <- c(1, 2, 2, 3, 4, NA)

View File

@ -42,7 +42,7 @@ There are two ways to set the output of a document:
Quarto offers a wide range of output formats.
You can find the complete list at <https://quarto.org/docs/output-formats/all-formats.html>.
Many formats share some output options (e.g. `toc: true` for including a table of contents), but others have options that are format specific (e.g. `code-fold: true` collapses code chunks into a `<details>` tag for HTML output so the user can display it on demand, it's not applicable in a PDF or Word document).
Many formats share some output options (e.g., `toc: true` for including a table of contents), but others have options that are format specific (e.g., `code-fold: true` collapses code chunks into a `<details>` tag for HTML output so the user can display it on demand, it's not applicable in a PDF or Word document).
To override the default options, you need to use an expanded `format` field.
For example, if you wanted to render an `html` with a floating table of contents, you'd use:

View File

@ -337,7 +337,7 @@ The most important set of options controls if your code block is executed and wh
It's also useful if you're teaching R and want to deliberately include an error.
The default, `error: false` causes rendering to fail if there is a single error in the document.
Each of these chunk options get added to the header of the chunk, following `#|`, e.g. in the following chunk the result is not printed since `eval` is set to false.
Each of these chunk options get added to the header of the chunk, following `#|`, e.g., in the following chunk the result is not printed since `eval` is set to false.
```{r}
#| echo: fenced
@ -374,7 +374,7 @@ execute:
echo: false
```
Since Quarto is designed to be multi-lingual (works with R as well as other languages like Python, Julia, etc.), all of the knitr options are not available at the document execution level since some of them only work with knitr and not other engines Quarto uses for running code in other languages (e.g. Jupyter).
Since Quarto is designed to be multi-lingual (works with R as well as other languages like Python, Julia, etc.), all of the knitr options are not available at the document execution level since some of them only work with knitr and not other engines Quarto uses for running code in other languages (e.g., Jupyter).
You can, however, still set these as global options for your document under the `knitr` field, under `opts_chunk`.
For example, when writing books and tutorials we set:
@ -427,13 +427,13 @@ comma(.12358124331)
## Figures {#sec-figures}
The figures in a Quarto document can be embedded (e.g. a PNG or JPEG file) or generated as a result of a code chunk.
The figures in a Quarto document can be embedded (e.g., a PNG or JPEG file) or generated as a result of a code chunk.
To embed an image from an external file, you can use the Insert menu in RStudio and select Figure / Image.
This will pop open a menu where you can browse to the image you want to insert as well as add alternative text or caption to it and adjust its size.
In the visual editor you can also simply paste an image from your clipboard into your document and RStudio will place a copy of that image in your project folder.
If you include a code chunk that generates a figure (e.g. includes a `ggplot()` call), the resulting figure will be automatically included in your Quarto document.
If you include a code chunk that generates a figure (e.g., includes a `ggplot()` call), the resulting figure will be automatically included in your Quarto document.
### Figure sizing
@ -672,8 +672,8 @@ Here we'll discuss three: self-contained documents, document parameters, and bib
### Self-contained
HTML documents typically have a number of external dependencies (e.g. images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a `_files` folder in the same directory as your `.qmd` file.
If you publish the HTML file on a hosting platform (e.g. QuartoPub, <https://quartopub.com/>), the dependencies in this directory are published with your document and hence are available in the published report.
HTML documents typically have a number of external dependencies (e.g., images, CSS style sheets, JavaScript, etc.) and, by default, Quarto places these dependencies in a `_files` folder in the same directory as your `.qmd` file.
If you publish the HTML file on a hosting platform (e.g., QuartoPub, <https://quartopub.com/>), the dependencies in this directory are published with your document and hence are available in the published report.
However, if you want to email the report to a colleague, you might prefer to have a single, self-contained, HTML document that embeds all of its dependencies.
You can do this by specifying the `embed-resources` option:
@ -733,7 +733,7 @@ Citations can be inserted from a variety of sources:
4. Your document bibliography (a `.bib` file in the directory of your document)
Under the hood, the visual mode uses the standard Pandoc markdown representation for citations (e.g. `[@citation]`).
Under the hood, the visual mode uses the standard Pandoc markdown representation for citations (e.g., `[@citation]`).
If you add a citation using one of the first three methods, the visual editor will automatically create a `bibliography.bib` file for you and add the reference to it.
It will also add a `bibliography` field to the document YAML.

View File

@ -618,7 +618,7 @@ Four of them are scalars:
- The simplest type is a null (`null`) which plays the same role as `NA` in R. It represents the absence of data.
- A **string** is much like a string in R, but must always use double quotes.
- A **number** is similar to R's numbers: they can use integer (e.g. 123), decimal (e.g. 123.45), or scientific (e.g. 1.23e3) notation. JSON doesn't support `Inf`, `-Inf`, or `NaN`.
- A **number** is similar to R's numbers: they can use integer (e.g., 123), decimal (e.g., 123.45), or scientific (e.g., 1.23e3) notation. JSON doesn't support `Inf`, `-Inf`, or `NaN`.
- A **boolean** is similar to R's `TRUE` and `FALSE`, but uses lowercase `true` and `false`.
JSON's strings, numbers, and booleans are pretty similar to R's character, numeric, and logical vectors.
@ -669,7 +669,7 @@ This often works well, particularly in simple cases, but we think you're better
### Starting the rectangling process
In most cases, JSON files contain a single top-level array, because they're designed to provide data about multiple "things", e.g. multiple pages, or multiple records, or multiple results.
In most cases, JSON files contain a single top-level array, because they're designed to provide data about multiple "things", e.g., multiple pages, or multiple records, or multiple results.
In this case, you'll start your rectangling with `tibble(json)` so that each element becomes a row:
```{r}

View File

@ -86,7 +86,7 @@ str_view(c("a", "ab", "abb"), "ab+")
str_view(c("a", "ab", "abb"), "ab*")
```
**Character classes** are defined by `[]` and let you match a set of characters, e.g. `[abcd]` matches "a", "b", "c", or "d".
**Character classes** are defined by `[]` and let you match a set of characters, e.g., `[abcd]` matches "a", "b", "c", or "d".
You can also invert the match by starting with `^`: `[^abcd]` matches anything **except** "a", "b", "c", or "d".
We can use this idea to find the words containing an "x" surrounded by vowels, or a "y" surrounded by consonants:
@ -391,7 +391,7 @@ A **character class**, or character **set**, allows you to match any character i
As we discussed above, you can construct your own sets with `[]`, where `[abc]` matches "a", "b", or "c" and `[^abc]` matches any character except "a", "b", or "c".
Apart from `^` there are two other characters that have special meaning inside of `[]:`
- `-` defines a range, e.g. `[a-z]` matches any lower case letter and `[0-9]` matches any number.
- `-` defines a range, e.g., `[a-z]` matches any lower case letter and `[0-9]` matches any number.
- `\` escapes special characters, so `[\^\-\]]` matches `^`, `-`, or `]`.
Here are few examples:
@ -416,7 +416,7 @@ There are three other particularly useful pairs[^regexps-7]:
- `\d` matches any digit;\
`\D` matches anything that isn't a digit.
- `\s` matches any whitespace (e.g. space, tab, newline);\
- `\s` matches any whitespace (e.g., space, tab, newline);\
`\S` matches anything that isn't whitespace.
- `\w` matches any "word" character, i.e. letters and numbers;\
`\W` matches any "non-word" character.
@ -820,7 +820,7 @@ The following sections describe some other useful functions in the wider tidyver
There are three other particularly useful places where you might want to use a regular expressions
- `matches(pattern)` will select all variables whose name matches the supplied pattern.
It's a "tidyselect" function that you can use anywhere in any tidyverse function that selects variables (e.g. `select()`, `rename_with()` and `across()`).
It's a "tidyselect" function that you can use anywhere in any tidyverse function that selects variables (e.g., `select()`, `rename_with()` and `across()`).
- `pivot_longer()'s` `names_pattern` argument takes a vector of regular expressions, just like `separate_wider_regex()`.
It's useful when extracting data out of variable names with a complex structure

View File

@ -43,7 +43,7 @@ Most of readxl's functions allow you to load Excel spreadsheets into R:
- `read_xlsx()` read Excel files with `xlsx` format.
- `read_excel()` can read files with both `xls` and `xlsx` format. It guesses the file type based on the input.
These functions all have similar syntax just like other functions we have previously introduced for reading other types of files, e.g. `read_csv()`, `read_table()`, etc.
These functions all have similar syntax just like other functions we have previously introduced for reading other types of files, e.g., `read_csv()`, `read_table()`, etc.
For the rest of the chapter we will focus on using `read_excel()`.
### Reading Excel spreadsheets {#sec-reading-spreadsheets-excel}
@ -314,13 +314,13 @@ For example, Excel has no notion of an integer.
All numbers are stored as floating points, but you can choose to display the data with a customizable number of decimal points.
Similarly, dates are actually stored as numbers, specifically the number of seconds since January 1, 1970.
You can customize how you display the date by applying formatting in Excel.
Confusingly, it's also possible to have something that looks like a number but is actually a string (e.g. type `'10` into a cell in Excel).
Confusingly, it's also possible to have something that looks like a number but is actually a string (e.g., type `'10` into a cell in Excel).
These differences between how the underlying data are stored vs. how they're displayed can cause surprises when the data are loaded into R.
By default readxl will guess the data type in a given column.
A recommended workflow is to let readxl guess the column types, confirm that you're happy with the guessed column types, and if not, go back and re-import specifying `col_types` as shown in @sec-reading-spreadsheets-excel.
Another challenge is when you have a column in your Excel spreadsheet that has a mix of these types, e.g. some cells are numeric, others text, others dates.
Another challenge is when you have a column in your Excel spreadsheet that has a mix of these types, e.g., some cells are numeric, others text, others dates.
When importing the data into R readxl has to make some decisions.
In these cases you can set the type for this column to `"list"`, which will load the column as a list of length 1 vectors, where the type of each element of the vector is guessed.
@ -632,7 +632,7 @@ write_sheet(bake_sale, ss = "bake-sale", sheet = "Sales")
While you can read from a public Google Sheet without authenticating with your Google account, reading a private sheet or writing to a sheet requires authentication so that googlesheets4 can view and manage *your* Google Sheets.
When you attempt to read in a sheet that requires authentication, googlesheets4 will direct you to a web browser with a prompt to sign in to your Google account and grant permission to operate on your behalf with Google Sheets.
However, if you want to specify a specific Google account, authentication scope, etc. you can do so with `gs4_auth()`, e.g. `gs4_auth(email = "mine@example.com")`, which will force the use of a token associated with a specific email.
However, if you want to specify a specific Google account, authentication scope, etc. you can do so with `gs4_auth()`, e.g., `gs4_auth(email = "mine@example.com")`, which will force the use of a token associated with a specific email.
For further authentication details, we recommend reading the documentation googlesheets4 auth vignette: <https://googlesheets4.tidyverse.org/articles/auth.html>.
### Exercises

View File

@ -112,7 +112,7 @@ str_view(tricky)
```
A raw string usually starts with `r"(` and finishes with `)"`.
But if your string contains `)"` you can instead use `r"[]"` or `r"{}"`, and if that's still not enough, you can insert any number of dashes to make the opening and closing pairs unique, e.g. `` `r"--()--" ``, `` `r"---()---" ``, etc. Raw strings are flexible enough to handle any text.
But if your string contains `)"` you can instead use `r"[]"` or `r"{}"`, and if that's still not enough, you can insert any number of dashes to make the opening and closing pairs unique, e.g., `` `r"--()--" ``, `` `r"---()---" ``, etc. Raw strings are flexible enough to handle any text.
### Other special characters
@ -575,7 +575,7 @@ If you'd like to learn more, we recommend reading the detailed explanation at <h
### Letter variations
Working in languages with accents poses a significant challenge when determining the position of letters (e.g. with `str_length()` and `str_sub()`) as accented letters might be encoded as a single individual character (e.g. ü) or as two characters by combining an unaccented letter (e.g. u) with a diacritic mark (e.g. ¨).
Working in languages with accents poses a significant challenge when determining the position of letters (e.g., with `str_length()` and `str_sub()`) as accented letters might be encoded as a single individual character (e.g., ü) or as two characters by combining an unaccented letter (e.g., u) with a diacritic mark (e.g., ¨).
For example, this code shows two ways of representing ü that look identical:
```{r}

View File

@ -65,13 +65,13 @@ Generally, to be bound to the terms of service, you must have taken some explici
This is why whether or not the data is **public** is important; if you don't need an account to access them, it is unlikely that you are bound to the terms of service.
Note, however, the situation is rather different in Europe where courts have found that terms of service are enforceable even if you don't explicitly agree to them.
[^webscraping-3]: e.g. <https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn>
[^webscraping-3]: e.g., <https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn>
### Personally identifiable information
Even if the data is public, you should be extremely careful about scraping personally identifiable information like names, email addresses, phone numbers, dates of birth, etc.
Europe has particularly strict laws about the collection or storage of such data ([GDPR](https://gdpr-info.eu/)), and regardless of where you live you're likely to be entering an ethical quagmire.
For example, in 2016, a group of researchers scraped public profile information (e.g. usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
For example, in 2016, a group of researchers scraped public profile information (e.g., usernames, age, gender, location, etc.) about 70,000 people on the dating site OkCupid and they publicly released these data without any attempts for anonymization.
While the researchers felt that there was nothing wrong with this since the data were already public, this work was widely condemned due to ethics concerns around identifiability of users whose information was released in the dataset.
If your work involves scraping personally identifiable information, we strongly recommend reading about the OkCupid study[^webscraping-4] as well as similar studies with questionable research ethics involving the acquisition and release of personally identifiable information.
@ -111,7 +111,7 @@ HTML stands for **H**yper**T**ext **M**arkup **L**anguage and looks something li
</body>
```
HTML has a hierarchical structure formed by **elements** which consist of a start tag (e.g. `<tag>`), optional **attributes** (`id='first'`), an end tag[^webscraping-5] (like `</tag>`), and **contents** (everything in between the start and end tag).
HTML has a hierarchical structure formed by **elements** which consist of a start tag (e.g., `<tag>`), optional **attributes** (`id='first'`), an end tag[^webscraping-5] (like `</tag>`), and **contents** (everything in between the start and end tag).
[^webscraping-5]: A number of tags (including `<p>` and `<li>)` don't require end tags, but we think it's best to include them because it makes seeing the structure of the HTML a little easier.

View File

@ -332,11 +332,11 @@ This is why relative paths are important: they'll work regardless of where the R
Absolute paths point to the same place regardless of your working directory.
They look a little different depending on your operating system.
On Windows they start with a drive letter (e.g. `C:`) or two backslashes (e.g. `\\servername`) and on Mac/Linux they start with a slash "/" (e.g. `/users/hadley`).
On Windows they start with a drive letter (e.g., `C:`) or two backslashes (e.g., `\\servername`) and on Mac/Linux they start with a slash "/" (e.g., `/users/hadley`).
You should **never** use absolute paths in your scripts, because they hinder sharing: no one else will have exactly the same directory configuration as you.
There's another important difference between operating systems: how you separate the components of the path.
Mac and Linux uses slashes (e.g. `data/diamonds.csv`) and Windows uses backslashes (e.g. `data\diamonds.csv`).
Mac and Linux uses slashes (e.g., `data/diamonds.csv`) and Windows uses backslashes (e.g., `data\diamonds.csv`).
R can work with either type (no matter what platform you're currently using), but unfortunately, backslashes mean something special to R, and to get a single backslash in the path, you need to type two backslashes!
That makes life frustrating, so we recommend always using the Linux/Mac style with forward slashes.