diff --git a/_common.R b/_common.R index f779a56..e7af1c7 100644 --- a/_common.R +++ b/_common.R @@ -14,6 +14,7 @@ options( dplyr.print_min = 6, dplyr.print_max = 6, pillar.max_footer_lines = 2, + pillar.min_chars = 15, stringr.view_n = 6, # Activate crayon output - temporarily disabled for quarto # crayon.enabled = TRUE, diff --git a/datetimes.qmd b/datetimes.qmd index ba2e61e..f180240 100644 --- a/datetimes.qmd +++ b/datetimes.qmd @@ -41,7 +41,6 @@ We will also need nycflights13 for practice data. ```{r} #| message: false library(tidyverse) - library(nycflights13) ``` diff --git a/factors.qmd b/factors.qmd index 77299e5..d3ac8c4 100644 --- a/factors.qmd +++ b/factors.qmd @@ -140,17 +140,6 @@ gss_cat |> count(race) ``` -Or with a bar chart: - -```{r} -#| fig-alt: > -#| A bar chart showing the distribution of race. There are ~2000 -#| records with race "Other", 3000 with race "Black", and other -#| 15,000 with race "White". -ggplot(gss_cat, aes(x = race)) + - geom_bar() -``` - When working with factors, the two most common operations are changing the order of the levels, and changing the values of the levels. Those operations are described in the sections below. @@ -254,7 +243,7 @@ It takes a factor, `f`, and then any number of levels that you want to move to t #| fig-alt: > #| The same scatterplot but now "Not Applicable" is displayed at the #| bottom of the y-axis. Generally there is a positive association -#| between income and age, and the income band with the highest average +#| between income and age, and the income band with the highethst average #| age is "Not applicable". ggplot(rincome_summary, aes(x = age, y = fct_relevel(rincome, "Not applicable"))) + @@ -276,8 +265,8 @@ This makes the plot easier to read because the colors of the line at the far rig #| There is one line for each category of marital status: no answer, #| never married, separated, divorced, widowed, and married. It is #| a little hard to read the plot because the order of the legend is -#| unrelated to the lines on the plot. -#| +#| unrelated to the lines on the plot. +#| #| Rearranging the legend makes the plot easier to read because the #| legend colors now match the order of the lines on the far right #| of the plot. You can see some unsuprising patterns: the proportion diff --git a/joins.qmd b/joins.qmd index 4f1a861..b2060bd 100644 --- a/joins.qmd +++ b/joins.qmd @@ -56,6 +56,8 @@ When more than one variable is needed, the key is called a **compound key.** For You can identify each airport by its three letter airport code, making `faa` the primary key. ```{r} + #| R.options: + #| width: 67 airports ``` @@ -63,6 +65,8 @@ When more than one variable is needed, the key is called a **compound key.** For You can identify a plane by its tail number, making `tailnum` the primary key. ```{r} + #| R.options: + #| width: 67 planes ``` @@ -70,6 +74,8 @@ When more than one variable is needed, the key is called a **compound key.** For You can identify each observation by the combination of location and time, making `origin` and `time_hour` the compound primary key. ```{r} + #| R.options: + #| width: 67 weather ``` diff --git a/rectangling.qmd b/rectangling.qmd index 9f7e2dc..28552f2 100644 --- a/rectangling.qmd +++ b/rectangling.qmd @@ -421,13 +421,14 @@ repos |> ``` This has worked but the result is a little overwhelming: there are so many columns that tibble doesn't even print all of them! -We can see them all with `names()`: +We can see them all with `names()`; and here we look at the first 10: ```{r} repos |> unnest_longer(json) |> unnest_wider(json) |> - names() + names() |> + head(10) ``` Let's select a few that look interesting: @@ -439,7 +440,7 @@ repos |> select(id, full_name, owner, description) ``` -You can use this to work back to understand how `gh_repos` was strucured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created. +You can use this to work back to understand how `gh_repos` was structured: each child was a GitHub user containing a list of up to 30 GitHub repositories that they created. `owner` is another list-column, and since it contains a named list, we can use `unnest_wider()` to get at the values: diff --git a/regexps.qmd b/regexps.qmd index b55fae3..27ccf1d 100644 --- a/regexps.qmd +++ b/regexps.qmd @@ -123,8 +123,6 @@ Regular expressions are very compact and use a lot of punctuation characters, so Don't worry; you'll get better with practice, and simple patterns will soon become second nature. Let's kick off that process by practicing with some useful stringr functions. -### Exercises - ## Key functions {#sec-stringr-regex-funs} Now that you've got the basics of regular expressions under your belt, let's use them with some stringr and tidyr functions. diff --git a/strings.qmd b/strings.qmd index d8bfc6a..5597b3e 100644 --- a/strings.qmd +++ b/strings.qmd @@ -516,7 +516,12 @@ stringr provides two useful tools for cases where your string is too long: The following code shows these functions in action with a made-up string: ```{r} -x <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat." +x <- paste0( + "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod ", + "tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim ", + "veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea", + "commodo consequat." +) str_view(str_trunc(x, 30)) str_view(str_wrap(x, 30)) diff --git a/webscraping.qmd b/webscraping.qmd index 148ad9a..86779af 100644 --- a/webscraping.qmd +++ b/webscraping.qmd @@ -326,22 +326,10 @@ Here's a simple HTML table with two columns and three rows: ```{r} html <- minimal_html(" - - - - - - - - - - - - - - - - + + + +
xy
1.52.7
4.91.3
7.28.1
x y
1.5 2.7
4.9 1.3
7.2 8.1
") ``` @@ -455,6 +443,7 @@ At the time we wrote this chapter, the page looked like @fig-scraping-imdb. ```{r} #| label: fig-scraping-imdb +#| echo: false #| fig-cap: > #| Screenshot of the IMDb top movies web page taken on 2022-12-05. #| fig-alt: >