数据可视化

《区域水环境污染数据分析实践》
Data analysis practice of regional water environment pollution

苏命、王为东
中国科学院大学资源与环境学院
中国科学院生态环境研究中心

2024-04-09

The ggplot2 Package


… is an R package to visualize data created by Hadley Wickham in 2005

# install.packages("ggplot2")
library(ggplot2)


… is part of the {tidyverse}

# install.packages("tidyverse")
library(tidyverse)

The Grammar of {ggplot2}

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.

The Grammar of {ggplot2}


Component Function Explanation
Data ggplot(data)          The raw data that you want to visualise.
Aesthetics           aes() Aesthetic mappings between variables and visual properties.
Geometries geom_*() The geometric shapes representing the data.
Statistics stat_*() The statistical transformations applied to the data.
Scales scale_*() Maps between the data and the aesthetic dimensions.
Coordinate System coord_*() Maps data into the plane of the data rectangle.
Facets facet_*() The arrangement of the data into a grid of plots.
Visual Themes theme() / theme_*() The overall visual defaults of a plot.

The Data

Bike sharing counts in London, UK, powered by TfL Open Data

  • covers the years 2015 and 2016
  • incl. weather data acquired from freemeteo.com
  • prepared by Hristo Mavrodiev for Kaggle


bikes <- readr::read_csv("../../data/ggplot2/london-bikes-custom.csv",
  ## or: "https://raw.githubusercontent.com/z3tt/graphic-design-ggplot2/main/data/london-bikes-custom.csv"
  col_types = "Dcfffilllddddc"
)

bikes$season <- forcats::fct_inorder(bikes$season)
Variable Description Class
date Date encoded as `YYYY-MM-DD` date
day_night `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) character
year `2015` or `2016` factor
month `1` (January) to `12` (December) factor
season `winter`, `spring`, `summer`, or `autumn` factor
count Sum of reported bikes rented integer
is_workday `TRUE` being Monday to Friday and no bank holiday logical
is_weekend `TRUE` being Saturday or Sunday logical
is_holiday `TRUE` being a bank holiday in the UK logical
temp Average air temperature (°C) double
temp_feel Average feels like temperature (°C) double
humidity Average air humidity (%) double
wind_speed Average wind speed (km/h) double
weather_type Most common weather type character

ggplot2::ggplot()

The help page of the ggplot() function.

Data

ggplot(data = bikes)

Aesthetic Mapping(视觉映射):aes(.)


= link variables to graphical properties

  • positions (x, y)
  • colors (color, fill)
  • shapes (shape, linetype)
  • size (size)
  • transparency (alpha)
  • groupings (group)

Aesthetic Mapping(视觉映射):aes(.)

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)

aesthetics

aes() outside as component

ggplot(data = bikes) +
  aes(x = temp_feel, y = count)


aes() inside, explicit matching

ggplot(data = bikes, mapping = aes(x = temp_feel, y = count))


aes() inside, implicit matching

ggplot(bikes, aes(temp_feel, count))


aes() inside, mixed matching

ggplot(bikes, aes(x = temp_feel, y = count))

Geometrical Layers

Geometries(几何图层):geom_*


= interpret aesthetics as graphical representations

  • points
  • lines
  • polygons
  • text labels

Geometries(几何图层):geom_*

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point()

Visual Properties of Layers(图层属性)

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5,
    shape = "X",
    stroke = 1,
    size = 4
  )

Setting vs Mapping of Visual Properties

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    color = "#28a87d",
    alpha = .5
  )
ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )

Mapping Expressions

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = temp_feel > 20),
    alpha = .5
  )

Filter Data

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Filter Data

ggplot(
    bikes %>% filter(!is.na(weather_type)),
    aes(x = temp, y = temp_feel)
  ) +
  geom_point(
    aes(color = weather_type == "clear",
        size = count),
    shape = 18,
    alpha = .5
  )

Local vs. Global(应用至当前图层或所有图层)

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  )
ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  )

Adding More Layers

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Global Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Local Color Encoding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

The `group` Aesthetic

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point(
    aes(color = season),
    alpha = .5
  ) +
  geom_smooth(
    aes(group = day_night),
    method = "lm"
  )

Set Both as Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm"
  )

Overwrite Global Aesthetics

ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm",
    color = "black"
  )

Statistical Layers

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = temp_feel, y = count)) +
  stat_smooth(geom = "smooth")
ggplot(bikes, aes(x = temp_feel, y = count)) +
  geom_smooth(stat = "smooth")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = season)) +
  stat_count(geom = "bar")
ggplot(bikes, aes(x = season)) +
  geom_bar(stat = "count")

`stat_*()` and `geom_*()`

ggplot(bikes, aes(x = date, y = temp_feel)) +
  stat_identity(geom = "point")
ggplot(bikes, aes(x = date, y = temp_feel)) +
  geom_point(stat = "identity")

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary() 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary(
    fun.data = mean_se, ## the default
    geom = "pointrange"  ## the default
  ) 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  geom_boxplot() +
  stat_summary(
    fun = mean,
    geom = "point",
    color = "#28a87d",
    size = 3
  ) 

Statistical Summaries

ggplot(
    bikes, 
    aes(x = season, y = temp_feel)
  ) +
  stat_summary(
    fun = mean, 
    fun.max = function(y) mean(y) + sd(y), 
    fun.min = function(y) mean(y) - sd(y) 
  ) 

Extending a ggplot

Store a ggplot as Object

g <-
  ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season,
        group = day_night)
  ) +
  geom_point(
    alpha = .5
  ) +
  geom_smooth(
    method = "lm",
    color = "black"
  )

class(g)
[1] "gg"     "ggplot"

Inspect a ggplot Object

g$data
# A tibble: 1,454 × 14
   date       day_night year  month season count is_workday is_weekend
   <date>     <chr>     <fct> <fct> <fct>  <int> <lgl>      <lgl>     
 1 2015-01-04 day       2015  1     winter  6830 FALSE      TRUE      
 2 2015-01-04 night     2015  1     winter  2404 FALSE      TRUE      
 3 2015-01-05 day       2015  1     winter 14763 TRUE       FALSE     
 4 2015-01-05 night     2015  1     winter  5609 TRUE       FALSE     
 5 2015-01-06 day       2015  1     winter 14501 TRUE       FALSE     
 6 2015-01-06 night     2015  1     winter  6112 TRUE       FALSE     
 7 2015-01-07 day       2015  1     winter 16358 TRUE       FALSE     
 8 2015-01-07 night     2015  1     winter  4706 TRUE       FALSE     
 9 2015-01-08 day       2015  1     winter  9971 TRUE       FALSE     
10 2015-01-08 night     2015  1     winter  5630 TRUE       FALSE     
# ℹ 1,444 more rows
# ℹ 6 more variables: is_holiday <lgl>, temp <dbl>, temp_feel <dbl>,
#   humidity <dbl>, wind_speed <dbl>, weather_type <chr>

Inspect a ggplot Object

g$mapping
Aesthetic mapping: 
* `x`      -> `temp_feel`
* `y`      -> `count`
* `colour` -> `season`
* `group`  -> `day_night`

Extend a ggplot Object: Add Layers

g +
  geom_rug(
    alpha = .2
  )

Remove a Layer from the Legend

g +
  geom_rug(
    alpha = .2,
    show.legend = FALSE
  )

Extend a ggplot Object: Add Labels

g +
  xlab("Feels-like temperature (°F)") +
  ylab("Reported bike shares") +
  ggtitle("TfL bike sharing trends")

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends"
  )

Extend a ggplot Object: Add Labels

g <- g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends",
    color = "Season:"
  )

g

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "Feels-like temperature (°F)",
    y = "Reported bike shares",
    title = "TfL bike sharing trends",
    subtitle = "Reported bike rents versus feels-like temperature in London",
    caption = "Data: TfL",
    color = "Season:",
    tag = "Fig. 1"
  )

Extend a ggplot Object: Add Labels

g +
  labs(
    x = "",
    caption = "Data: TfL"
  )
g +
  labs(
    x = NULL,
    caption = "Data: TfL"
  )

Extend a ggplot Object: Themes

g + theme_light()
g + theme_minimal()

Change the Theme Base Settings

g + theme_light(
  base_size = 14
)

Set a Theme Globally

theme_set(theme_light())

g

Change the Theme Base Settings

theme_set(theme_light(
  base_size = 14
))

g

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank()
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold")
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "none"
  )

Overwrite Specific Theme Settings

g +
  theme(
    panel.grid.minor = element_blank(),
    plot.title = element_text(face = "bold"),
    legend.position = "top",
    plot.title.position = "plot"
  )

Overwrite Theme Settings Globally

theme_update(
  panel.grid.minor = element_blank(),
  plot.title = element_text(face = "bold"),
  legend.position = "top",
  plot.title.position = "plot"
)

g

Save the Graphic

ggsave(g, filename = "my_plot.png")
ggsave("my_plot.png")
ggsave("my_plot.png", width = 8, height = 5, dpi = 600)
ggsave("my_plot.pdf", width = 20, height = 12, unit = "cm", device = cairo_pdf)
grDevices::cairo_pdf("my_plot.pdf", width = 10, height = 7)
g
dev.off()


A comparison of vector and raster graphics.

Modified from canva.com

Facets(面)

Facets(面)


= split variables to multiple panels

Facets are also known as:

  • small multiples
  • trellis graphs
  • lattice plots
  • conditioning

Setup

g <-
  ggplot(
    bikes,
    aes(x = temp_feel, y = count,
        color = season)
  ) +
  geom_point(
    alpha = .3,
    guide = "none"
  )

g

Wrapped Facets

g +
  facet_wrap(
    vars(day_night)
  )

Wrapped Facets

g +
  facet_wrap(
    ~ day_night
  )

Facet Multiple Variables

g +
  facet_wrap(
    ~ is_workday + day_night
  )

Facet Options: Cols + Rows

g +
  facet_wrap(
    ~ day_night,
    ncol = 1
  )

Facet Options: Free Scaling

g +
  facet_wrap(
    ~ day_night,
    ncol = 1,
    scales = "free"
  )

Facet Options: Free Scaling

g +
  facet_wrap(
    ~ day_night,
    ncol = 1,
    scales = "free_y"
  )

Facet Options: Switch Labels

g +
  facet_wrap(
    ~ day_night,
    ncol = 1,
    switch = "x"
  )

Gridded Facets

g +
  facet_grid(
    rows = vars(day_night),
    cols = vars(is_workday)
  )

Gridded Facets

g +
  facet_grid(
    day_night ~ is_workday
  )

Facet Multiple Variables

g +
  facet_grid(
    day_night ~ is_workday + season
  )

Facet Options: Free Scaling

g +
  facet_grid(
    day_night ~ is_workday,
    scales = "free"
  )

Facet Options: Switch Labels

g +
  facet_grid(
    day_night ~ is_workday,
    scales = "free",
    switch = "y"
  )

Facet Options: Proportional Spacing

g +
  facet_grid(
    day_night ~ is_workday,
    scales = "free",
    space = "free"
  )

Facet Options: Proportional Spacing

g +
  facet_grid(
    day_night ~ is_workday,
    scales = "free_y",
    space = "free_y"
  )

Diamonds Facet

ggplot(
    diamonds,
    aes(x = carat, y = price)
  ) +
  geom_point(
    alpha = .3
  ) +
  geom_smooth(
    method = "lm",
    se = FALSE,
    color = "dodgerblue"
  )

Diamonds Facet

ggplot(
    diamonds,
    aes(x = carat, y = price)
  ) +
  geom_point(
    alpha = .3
  ) +
  geom_smooth(
    method = "lm",
    se = FALSE,
    color = "dodgerblue"
  ) +
  facet_grid(
    cut ~ clarity,
    space = "free_x",
    scales = "free_x"
  )

Diamonds Facet (Dark Theme Bonus)

ggplot(
    diamonds,
    aes(x = carat, y = price)
  ) +
  geom_point(
    alpha = .3,
    color = "white"
  ) +
  geom_smooth(
    method = "lm",
    se = FALSE,
    color = "dodgerblue"
  ) +
  facet_grid(
    cut ~ clarity,
    space = "free_x",
    scales = "free_x"
  ) +
  theme_dark(
    base_size = 14
  )

Scales(尺度)

Scales


= translate between variable ranges and property ranges

  • feels-like temperature  ⇄  x
  • reported bike shares  ⇄  y
  • season  ⇄  color
  • year  ⇄  shape

Scales

The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.


Consequently, there are scale_*() functions for all aesthetics such as:

  • positions via scale_x_*() and scale_y_*()

  • colors via scale_color_*() and scale_fill_*()

  • sizes via scale_size_*() and scale_radius_*()

  • shapes via scale_shape_*() and scale_linetype_*()

  • transparency via scale_alpha_*()

Scales

The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.


The extensions (*) can be filled by e.g.:

  • continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions

  • continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors

  • continuous(), discrete(), manual(), ordinal(), area(), date() for sizes

  • continuous(), discrete(), manual(), ordinal() for shapes

  • continuous(), discrete(), manual(), ordinal(), date() for transparency

Allison Horsts illustration ofthe correct use of continuous versus discrete; however, in {ggplot2} these are interpeted in a different way: as quantitative and qualitative.

Illustration by Allison Horst

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height
  • weight
  • age
  • counts

Discrete:
qualitative or categorical data

  • species
  • sex
  • study sites
  • age group

Continuous vs. Discrete in {ggplot2}

Continuous:
quantitative or numerical data

  • height (continuous)
  • weight (continuous)
  • age (continuous or discrete)
  • counts (discrete)

Discrete:
qualitative or categorical data

  • species (nominal)
  • sex (nominal)
  • study site (nominal or ordinal)
  • age group (ordinal)

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point()

Aesthetics + Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date() +
  scale_y_continuous() +
  scale_color_discrete()

Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_continuous() +
  scale_y_continuous() +
  scale_color_discrete()

Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_continuous() +
  scale_y_log10() +
  scale_color_discrete()

Scales

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_continuous() +
  scale_y_log10() +
  scale_color_viridis_d()

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +  
  geom_point() +
  scale_y_continuous(
    trans = "log10"
  )

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares"
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = seq(0, 60000, by = 15000)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = 0:4*15000
  )

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = c(0, 2:12*2500, 40000, 50000)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares in thousands",
    breaks = 0:4*15000,
    labels = 0:4*15
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares in thousands",
    breaks = 0:4*15000,
    labels = paste(0:4*15000, "bikes")
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = 0:4*15000,
    limits = c(NA, 60000)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = 0:4*15000,
    expand = c(0, 0)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = -1:5*15000,
    expand = c(.5, .5) ## c(add, mult)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = -1:5*15000,
    expand = expansion(add = 2000)
  ) 

`scale_x|y_continuous`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_y_continuous(
    name = "Reported bike shares",
    breaks = 0:4*15000,
    guide = "none"
  )

`scale_x|y_date`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date(
    name = NULL,
    date_breaks = "4 months"
  )

`scale_x|y_date`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date(
    name = NULL,
    date_breaks = "20 weeks"
  )

`scale_x|y_date` with `strftime()`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date(
    name = NULL,
    date_breaks = "6 months",
    date_labels = "%Y/%m/%d"
  )

`scale_x|y_date` with `strftime()`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_x_date(
    name = NULL,
    date_breaks = "6 months",
    date_labels = "%b '%y"
  )

`scale_x|y_discrete`

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  scale_x_discrete(
    name = "Period",
    labels = c("Dec-Feb", "Mar-May", "Jun-Aug", "Sep-Nov")
  )

`scale_x|y_discrete`

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  scale_x_discrete(
    name = "Season",
    expand = c(.5, 0) ## add, mult
  )

Discrete or Continuous?

ggplot(
    bikes,
    aes(x = as.numeric(season), y = count)
  ) +
  geom_boxplot(
    aes(group = season)
  )

Discrete or Continuous?

ggplot(
    bikes,
    aes(x = as.numeric(season),
        y = count)
  ) +
  geom_boxplot(
    aes(group = season)
  ) +
  scale_x_continuous(
    name = "Season",
    breaks = 1:4,
    labels = levels(bikes$season)
  )

Discrete or Continuous?

ggplot(
    bikes,
    aes(x = as.numeric(season) + 
            as.numeric(season) / 8,
        y = count)
  ) +
  geom_boxplot(
    aes(group = season)
  ) +
  scale_x_continuous(
    name = "Season",
    breaks = 1:4,
    labels = levels(bikes$season)
  )

`scale_color|fill_discrete`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_color_discrete(
    name = "Season:",
    type = c("#69b0d4", "#00CB79", "#F7B01B", "#a78f5f")
  )

Inspect Assigned Colors

g <- ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_color_discrete(
    name = "Season:",
    type = c("#3ca7d9", "#1ec99b", "#F7B01B", "#bb7e8f")
  )

gb <- ggplot_build(g)

gb$data[[1]][c(1:5, 200:205, 400:405), 1:5]
     colour     x     y PANEL group
1   #3ca7d9 16439  6830     1     1
2   #3ca7d9 16439  2404     1     1
3   #3ca7d9 16440 14763     1     1
4   #3ca7d9 16440  5609     1     1
5   #3ca7d9 16441 14501     1     1
200 #1ec99b 16538  8830     1     2
201 #1ec99b 16539 24019     1     2
202 #1ec99b 16539 10500     1     2
203 #1ec99b 16540 25640     1     2
204 #1ec99b 16540 11830     1     2
205 #1ec99b 16541 22216     1     2
400 #F7B01B 16638 12079     1     3
401 #F7B01B 16639 26646     1     3
402 #F7B01B 16639 12446     1     3
403 #F7B01B 16640 11312     1     3
404 #F7B01B 16640  4722     1     3
405 #F7B01B 16641 22748     1     3

`scale_color|fill_discrete`

my_colors <- c(
  `winter` = "#3c89d9",
  `spring` = "#1ec99b",
  `summer` = "#F7B01B",
  `autumn` = "#a26e7c"
)

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_color_discrete(
    name = "Season:",
    type = my_colors
  )

`scale_color|fill_discrete`

my_colors_alphabetical <- c(
  `autumn` = "#a26e7c",
  `spring` = "#1ec99b",
  `summer` = "#F7B01B",
  `winter` = "#3c89d9"
)

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_color_discrete(
    name = "Season:",
    type = my_colors_alphabetical
  )

`scale_color|fill_discrete`

library(RColorBrewer)

ggplot(
    bikes,
    aes(x = date, y = count,
        color = season)
  ) +
  geom_point() +
  scale_color_discrete(
    name = "Season:",
    type = brewer.pal(
      n = 4, name = "Dark2"
    )
  )

`scale_color|fill_manual`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = weather_type)
  ) +
  geom_point() +
  scale_color_manual(
    name = "Season:",
    values = brewer.pal(n = 6, name = "Pastel1"),
    na.value = "black"
  )

`scale_color|fill_carto_d`

ggplot(
    bikes,
    aes(x = date, y = count,
        color = weather_type)
  ) +
  geom_point() +
  rcartocolor::scale_color_carto_d(
    name = "Season:",
    palette = "Pastel",
    na.value = "black"
  )

Diamonds Facet

facet <-
  ggplot(
    diamonds,
    aes(x = carat, y = price)
  ) +
  geom_point(
    alpha = .3
  ) +
  geom_smooth(
    aes(color = cut),
    method = "lm",
    se = FALSE
  ) +
  facet_grid(
    cut ~ clarity,
    space = "free_x",
    scales = "free_x"
  )

facet

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = c("$0", "$10,000", "$20,000", "$30,000")
  )

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = paste0(
      "$", format(
        0:3*10000, 
        big.mark = ",", 
        trim = TRUE
      )
    )
  )

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = function(y) paste0(
      "$", format(
        y, big.mark = ",",
        trim = TRUE
      )
    )
  )

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = scales::dollar_format()
  )

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = scales::dollar_format()
  ) +
  scale_color_brewer(
    palette = "Set2",
    guide = "none"
  )

Diamonds Facet

facet +
  scale_x_continuous(
    breaks = 0:5
  ) +
  scale_y_continuous(
    limits = c(0, 30000),
    breaks = 0:3*10000,
    labels = scales::dollar_format()
  ) +
  scale_color_brewer(
    palette = "Set2"
  ) +
  theme(
    legend.position = "none"
  )

Coordinate Systems(投影)

Coordinate Systems


= interpret the position aesthetics

  • linear coordinate systems: preserve the geometrical shapes
    • coord_cartesian()
    • coord_fixed()
    • coord_flip()
  • non-linear coordinate systems: likely change the geometrical shapes
    • coord_polar()
    • coord_map() and coord_sf()
    • coord_trans()

Cartesian Coordinate System

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian()

Cartesian Coordinate System

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian(
    ylim = c(NA, 15000)
  )

Changing Limits

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian(
    ylim = c(NA, 15000)
  )
ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  scale_y_continuous(
    limits = c(NA, 15000)
  )

Clipping

ggplot(
    bikes,
    aes(x = season, y = count)
  ) +
  geom_boxplot() +
  coord_cartesian(
    ylim = c(NA, 15000),
    clip = "off"
  )

Clipping

ggplot(
    filter(bikes, is_holiday == TRUE),
    aes(x = temp_feel, y = count)
  ) +
  geom_point() +
  geom_text(
    aes(label = season),
    nudge_x = .3,
    hjust = 0
  ) +
  coord_cartesian(
    clip = "off"
  )

… or better use {ggrepel}

ggplot(
    filter(bikes, is_holiday == TRUE),
    aes(x = temp_feel, y = count)
  ) +
  geom_point() +
  ggrepel::geom_text_repel(
    aes(label = season),
    nudge_x = .3,
    hjust = 0
  ) +
  coord_cartesian(
    clip = "off"
  )

Remove All Padding

ggplot(
    bikes,
    aes(x = temp_feel, y = count)
  ) +
  geom_point() +
  coord_cartesian(
    expand = FALSE,
    clip = "off"
  )

Fixed Coordinate System

ggplot(
    bikes,
    aes(x = temp_feel, y = temp)
  ) +
  geom_point() +
  coord_fixed()
ggplot(
    bikes,
    aes(x = temp_feel, y = temp)
  ) +
  geom_point() +
  coord_fixed(ratio = 4)

Flipped Coordinate System

ggplot(
    bikes,
    aes(x = weather_type)
  ) +
  geom_bar() +
  coord_cartesian()
ggplot(
    bikes,
    aes(x = weather_type)
  ) +
  geom_bar() +
  coord_flip()

Flipped Coordinate System

ggplot(
    bikes,
    aes(y = weather_type)
  ) +
  geom_bar() +
  coord_cartesian()
ggplot(
    bikes,
    aes(x = weather_type)
  ) +
  geom_bar() +
  coord_flip()

Reminder: Sort Your Bars!

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(y = fct_infreq(weather_type))
  ) +
  geom_bar()

Reminder: Sort Your Bars!

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(y = fct_rev(
      fct_infreq(weather_type)
    ))
  ) +
  geom_bar()

Circular Corrdinate System

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = weather_type,
        fill = weather_type)
  ) +
  geom_bar() +
  coord_polar()
ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = weather_type,
        fill = weather_type)
  ) +
  geom_bar() +
  coord_cartesian()

Circular Cordinate System

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = fct_infreq(weather_type),
        fill = weather_type)
  ) +
  geom_bar(width = 1) +
  coord_polar()
ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = fct_infreq(weather_type),
        fill = weather_type)
  ) +
  geom_bar(width = 1) +
  coord_cartesian()

Circular Corrdinate System

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = fct_infreq(weather_type),
        fill = weather_type)
  ) +
  geom_bar() +
  coord_polar(theta = "x")
ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = fct_infreq(weather_type),
        fill = weather_type)
  ) +
  geom_bar() +
  coord_polar(theta = "y")

Circular Corrdinate System

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = 1, fill = weather_type)
  ) +
  geom_bar(position = "stack") +
  coord_polar(theta = "y") 
ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = 1, fill = weather_type)
  ) +
  geom_bar(position = "stack") +
  coord_cartesian() 

Circular Corrdinate System

ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = 1,
        fill = fct_rev(fct_infreq(weather_type)))
  ) +
  geom_bar(position = "stack") +
  coord_polar(theta = "y") +
  scale_fill_discrete(name = NULL)
ggplot(
    filter(bikes, !is.na(weather_type)),
    aes(x = 1,
        fill = fct_rev(fct_infreq(weather_type)))
  ) +
  geom_bar(position = "stack") +
  coord_cartesian() +
  scale_fill_discrete(name = NULL)

Transform a Coordinate System

ggplot(
    bikes,
    aes(x = temp, y = count)
  ) +
  geom_point() +
  coord_trans(y = "log10")

Transform a Coordinate System

ggplot(
    bikes,
    aes(x = temp, y = count,
        group = day_night)
  ) +
  geom_point() +
  geom_smooth(method = "lm") +
  coord_trans(y = "log10")
ggplot(
    bikes,
    aes(x = temp, y = count,
        group = day_night)
  ) +
  geom_point() +
  geom_smooth(method = "lm") +
  scale_y_log10()

图形组合

Allison Horsts monster illustration of the patchwork extension package.

Illustration by Allison Horst

theme_std <- theme_set(theme_minimal(base_size = 18))
theme_update(
  # text = element_text(family = "Pally"),
  panel.grid = element_blank(),
  axis.text = element_text(color = "grey50", size = 12),
  axis.title = element_text(color = "grey40", face = "bold"),
  axis.title.x = element_text(margin = margin(t = 12)),
  axis.title.y = element_text(margin = margin(r = 12)),
  axis.line = element_line(color = "grey80", size = .4),
  legend.text = element_text(color = "grey50", size = 12),
  plot.tag = element_text(size = 40, margin = margin(b = 15)),
  plot.background = element_rect(fill = "white", color = "white")
)

bikes_sorted <-
  bikes %>%
  filter(!is.na(weather_type)) %>%
  group_by(weather_type) %>%
  mutate(sum = sum(count)) %>%
  ungroup() %>%
  mutate(
    weather_type = forcats::fct_reorder(
      str_to_title(str_wrap(weather_type, 5)), sum
    )
  )

p1 <- ggplot(
    bikes_sorted,
    aes(x = weather_type, y = count, color = weather_type)
  ) +
  geom_hline(yintercept = 0, color = "grey80", size = .4) +
  stat_summary(
    geom = "point", fun = "sum", size = 12
  ) +
  stat_summary(
    geom = "linerange", ymin = 0, fun.max = function(y) sum(y),
    size = 2, show.legend = FALSE
  ) +
  coord_flip(ylim = c(0, NA), clip = "off") +
  scale_y_continuous(
    expand = c(0, 0), limits = c(0, 8500000),
    labels = scales::comma_format(scale = .0001, suffix = "K")
  ) +
  scale_color_viridis_d(
    option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
    guide = guide_legend(override.aes = list(size = 7))
  ) +
  labs(
    x = NULL, y = "Sum of reported bike shares", tag = "P1",
  ) +
  theme(
    axis.line.y = element_blank(),
    axis.text.y = element_text(family = "Pally", color = "grey50", face = "bold",
                               margin = margin(r = 15), lineheight = .9)
  )

p1

p2 <- bikes_sorted %>%
  filter(season == "winter", is_weekend == TRUE, day_night == "night") %>%
  group_by(weather_type, .drop = FALSE) %>%
  mutate(id = row_number()) %>%
  ggplot(
      aes(x = weather_type, y = id, color = weather_type)
    ) +
    geom_point(size = 4.5) +
    scale_color_viridis_d(
      option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
      guide = guide_legend(override.aes = list(size = 7))
    ) +
    labs(
      x = NULL, y = "Reported bike shares on\nweekend winter nights", tag = "P2",
    ) +
    coord_cartesian(ylim = c(.5, NA), clip = "off")

p2

my_colors <- c("#cc0000", "#000080")

p3 <- bikes %>%
  group_by(week = lubridate::week(date), day_night, year) %>%
  summarize(count = sum(count)) %>%
  group_by(week, day_night) %>%
  mutate(avg = mean(count)) %>%
  ggplot(aes(x = week, y = count, group = interaction(day_night, year))) +
    geom_line(color = "grey65", size = 1) +
    geom_line(aes(y = avg, color = day_night), stat = "unique", size = 1.7) +
    annotate(
      geom = "text", label = c("Day", "Night"), color = my_colors,
      x = c(5, 18), y = c(125000, 29000), size = 8, fontface = "bold", family = "Pally"
    ) +
    scale_x_continuous(breaks = c(1, 1:10*5)) +
    scale_y_continuous(labels = scales::comma_format()) +
    scale_color_manual(values = my_colors, guide = "none") +
    labs(
      x = "Week of the Year", y = "Reported bike shares\n(cumulative # per week)", tag = "P3",
    )

p3

{patchwork}

# install.packages("patchwork")
require(patchwork)
(p1 + p2) / p3

“Collect Guides”

(p1 + p2) / p3 + plot_layout(guides = "collect")

Apply Theming

((p1 + p2) / p3 & theme(legend.justification = "top")) +
plot_layout(guides = "collect")

Apply Theming

(p1 + p2) / p3 & theme(legend.position = "none",
  plot.background = element_rect(color = "black", size = 3))

Adjust Widths and Heights

((p1 + p2) / p3 & theme(legend.position = "none")) +
  plot_layout(heights = c(.2, .1), widths = c(2, 1))

Use A Custom Layout

picasso <- "
AAAAAA#BBBB
CCCCCCCCC##
CCCCCCCCC##"
(p1 + p2 + p3 & theme(legend.position = "none")) +
plot_layout(design = picasso)

Add Labels

pl1 <- p1 + labs(tag = NULL, title = "Plot One") + theme(legend.position = "none")
pl2 <- p2 + labs(tag = NULL, title = "Plot Two") + theme(legend.position = "none")
pl3 <- p3 + labs(tag = NULL, title = "Plot Three") + theme(legend.position = "none")

Add Labels

(pl1 + pl2) / pl3 +
  plot_annotation(tag_levels = "1",
  tag_prefix = "P",
  title = "An overarching title for all 3 plots, placed on the very top while all other titles are sitting below the tags.")

Add Text

text <- tibble::tibble(
  x = 0, y = 0, label = "Lorem ipsum dolor sit amet, **consectetur adipiscing elit**, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation <b style='color:#000080;'>ullamco laboris nisi</b> ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat <b style='color:#cc0000;'>cupidatat non proident</b>, sunt in culpa qui officia deserunt mollit anim id est laborum."
)
pt <- ggplot(text, aes(x = x, y = y)) +
  ggtext::geom_textbox(
    aes(label = label),
    box.color = NA, width = unit(23, "lines"),
    color = "grey40", size = 6.5, lineheight = 1.4
  ) +
  coord_cartesian(expand = FALSE, clip = "off") +
  theme_void()
pt

Add Text

(p1 + pt) / p3

Add Inset Plots

pl1 + inset_element(pl2,
  l = .6, b = .1, r = 1, t = .6)

Add Inset Plots

pl1 + inset_element(pl2,
  l = .6, b = 0, r = 1, t = .5, align_to = 'full')

Add Inset Plots

(pl1 + inset_element(pl2,
  l = .6, b = .1, r = 1, t = .6) + pt) / pl3

练习

library(palmerpenguins)
library(ggthemes)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

效果

练习

p <- ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point(aes(color = species, shape = species)) +
  geom_smooth(method = "lm") +
  labs(
    title = "Body mass and flipper length",
    subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
    x = "Flipper length (mm)", y = "Body mass (g)",
    color = "Species", shape = "Species"
  ) +
  scale_color_colorblind()

练习

p <- ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(aes(color = bill_depth_mm)) +
  geom_smooth()

练习

p <- ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g, color = island)
) +
  geom_point() +
  geom_smooth(se = FALSE)

分层展示

p <- ggplot(
  data = penguins,
  mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
  geom_point() +
  geom_smooth()

柱状图

p <- ggplot(penguins, aes(x = species)) +
  geom_bar()

柱状图

p <- ggplot(penguins, aes(x = fct_infreq(species))) +
  geom_bar()

直方图

p <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_histogram(binwidth = 200)

直方图

p1 <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_histogram(binwidth = 20)
p2 <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_histogram(binwidth = 2000)
p <- p1 + p2

密度图

p <- ggplot(penguins, aes(x = body_mass_g)) +
  geom_density()

箱图

p <- ggplot(penguins,
  aes(x = species, y = body_mass_g)) +
  geom_boxplot()

分组

p <- ggplot(penguins,
  aes(x = body_mass_g, color = species)) +
  geom_density(linewidth = 0.75)

分组

p <- ggplot(penguins,
  aes(x = body_mass_g, color = species, fill = species)) +
  geom_density(alpha = 0.5)

分组

p <- ggplot(penguins,
  aes(x = island, fill = species)) +
  geom_bar()

分组

p <- ggplot(penguins,
  aes(x = island, fill = species)) +
  geom_bar(position = "fill")

分组

p <- ggplot(penguins,
  aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()

分组

p <- ggplot(penguins,
  aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(aes(color = species, shape = island))

分面

p <- ggplot(penguins,
  aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(aes(color = species, shape = species)) +
  facet_wrap(~island)

分面

p <- ggplot(
  data = penguins,
  mapping = aes(
    x = bill_length_mm, y = bill_depth_mm, 
    color = species, shape = species
  )
) +
  geom_point() +
  labs(color = "Species")

练习

p1 <- ggplot(penguins, aes(x = island, fill = species)) +
  geom_bar(position = "fill")
p2 <- ggplot(penguins, aes(x = species, fill = island)) +
  geom_bar(position = "fill")
p <- p1 + p2

练习

p <- ggplot(penguins,
  aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point()

练习

p <- ggplot(mpg, aes(x = class)) +
geom_bar()

练习

p <- ggplot(mpg, aes(x = cty, y = hwy)) +
geom_point()

练习

p <- ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))

欢迎讨论!

苏命|https://drwater.rcees.ac.cn; https://drwater.rcees.ac.cn/bcard; Slides