Grammar of data visualization

Lecture 2

May 16, 2025

Reminders

  • I have office hours today! 1:00-3:00 PM in Old Chemistry 203/203B.

  • We will start grading your ae repositories next week - make sure you have them ready to go.

  • First ‘real’ lab is on Monday; the topic will be data visualization (what we are starting today).

Outline

  • Last time:

    • We introduced you to the course toolkit.

    • You cloned your ae repositories and started making some updates in your Quarto documents.

    • You commited and pushed your changes back.

  • Today:

    • We will introduce data visualization.

    • You will pull to get today’s application exercise file.

    • You will work on the new application exercise on data visualization, commit your changes, and push them.

From last time

ae-01-meet-the-penguins

Go to RStudio, confirm that you’re in the ae project, and open the document ae-01-meet-the-penguins.qmd.

Common problems:

  • The environment used by Quarto when rendering starts EMPTY - it does not see what you see in your environment.

  • Using functions that cause a popup (like View() ) are not going to work when you render a document. Either use a comment (with #) to remove them, or just delete before rendering!

  • Make sure you commit and then PUSH! Just committing is not enough!

Data visualization

Thoughts on this plot?

More Penguins

Start plotting!

How can you create something like this???

  • The ggplot2 package has the plotting functions you need!

  • ggplot2 is a part of the tidyverse package - when you load tidyverse, you also load ggplot2

Load Packages

library(palmerpenguins)
library(tidyverse)
library(ggthemes)

Look at the data

Visualize the data

What are some steps you can take to visualize a data set?

  • What do you want on the x-axis?

  • What do you want on the y-axis?

Step 1. Prepare a canvas for plotting

ggplot(data = penguins)

Step 2. Map variables to aesthetics

Map year to the x aesthetic

ggplot(data = penguins, mapping = aes(x = bill_length_mm))

Step 3. Map variables to aesthetics

Map percent_yes to the y aesthetic

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = body_mass_g))

Argument names

It’s common practice in R to omit the names of first two arguments of a function:

  • Instead of
ggplot(data = your_data, mapping = aes(x = x_variable, y = y_variable))
  • Use
ggplot(your_data, aes(x = x_variable, y = y_variable))

Step 3. Map variables to aesthetics

Map percent_yes to the y aesthetic

ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = body_mass_g))

Step 3. Map variables to aesthetics

Map percent_yes to the y aesthetic

ggplot(penguins, aes(x = bill_length_mm, y = body_mass_g))

Step 4. Represent data on your canvas

with a geom

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g)) +
  geom_point()
Warning: Removed 2 rows containing missing values or values outside the scale
range (`geom_point()`).

Step 4. Represent data on your canvas

  • Adding geom_point() resulted in the following warning:
Warning: Removed 2 rows containing missing values or values outside the scale
range (`geom_point()`)

Step 4. Represent data on your canvas

with a geom

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g)) +
  geom_point()

Step 5. Map variables to aesthetics

Map species to the color aesthetic

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point()

Step 5. Map variables to aesthetics

Map species to the color aesthetic

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point()

What exactly are aesthetics? They map from a variable to a plot feature.

  • x and y axes

  • color, shape, size of points

Step 6. Represent data on your canvas

with another geom

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 2 rows containing missing values or values outside the scale
range (`geom_point()`).

Warnings and messages

  • Adding geom_smooth() resulted in the following warning:
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
  • It tells us the type of smoothing ggplot2 does under the hood when drawing the smooth curves that represent trends for each species.
  • Going forward we’ll suppress this warning to save some space.

Step 6. Represent data on your canvas

with another geom

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth()

Step 7. Split plot into facets

Use facet_wrap to make sub-plots

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~island)

Step 7. Split plot into facets

We can facet by other variables!

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  facet_wrap(~species)

A note on facets:

Which plot do you think made it easier to compare between penguin species?

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() 

Step 8. Use a different color scale

With a scale_color_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_grey() 

Step 8. Use a different color scale

With another scale_color_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() #this is from ggthemes 

Step 9. Apply a different theme

With a theme_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() +
  theme_minimal()

Step 9. Apply a different theme

With a theme_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() +
  theme_classic()

Step 9. Apply a different theme

With a theme_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() +
  theme_solarized() #this is from ggthemes

Step 9. Apply a different theme

With a theme_ function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() +
  theme_minimal()

Step 10. Add labels

With labs() function

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point() +
  geom_smooth() +
  scale_color_colorblind() +
  theme_minimal() +
  labs(x = "Bill Length (mm)", y = "Body Mass (g)", color = "Species", title = "Penguin Body Mass vs. Bill Length")

Step 11. Set transparency of points

with alpha

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point(alpha = 0.1) +
  geom_smooth() +
  scale_color_colorblind() +
  theme_minimal() +
  labs(x = "Bill Length (mm)", y = "Body Mass (g)", color = "Species")

Step 11. Set transparency of points

with alpha

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point(alpha = 0.7) +
  geom_smooth() +
  scale_color_colorblind() +
  theme_minimal() +
  labs(x = "Bill Length (mm)", y = "Body Mass (g)", color = "Species")

Step 12. Hide standard errors of curves

with se = FALSE

ggplot(penguins, mapping = aes(x = bill_length_mm, y = body_mass_g, color = species)) +
  geom_point(alpha = 0.5) +
  geom_smooth(se = FALSE) +
  scale_color_colorblind() +
  theme_minimal() +
  labs(x = "Bill Length (mm)", y = "Body Mass (g)", color = "Species")

How am I supposed to remember all of this?!

You aren’t!!!

  • It’s important to (eventually) know and remember the key ideas: what does changing a theme do? What are aesthetics? What is a geom?
  • You do not need to memorize a comprehensive list of all of the different geoms, themes, color scales, etc.
  • There will be a few fundamentals we expect you to know – more on that later!
  • https://ggplot2.tidyverse.org is super helpful!

Grammar of graphics

We built a plot layer-by-layer

  • just like described in the book The Grammar of Graphics and
  • implemented in the ggplot2 package, the data visualization package of the tidyverse.

Application exercise

Application exercise

What if we want to use our own data?


read_csv("data_file.csv") (assuming the data is in a CSV format)

ae-02-bechdel-dataviz

We will be looking at data on movies and the Bechdel test.

ae-02-bechdel-dataviz

  • Go to your ae project in RStudio.
  • Make sure all of your changes up to this point are committed and pushed, i.e., there’s nothing left in your Git pane.
  • If you haven’t yet done so, click Pull to get today’s application exercise file.
  • Work through the application exercise in class, and render, commit, and push your edits by the end of class.

Recap

  • Construct plots with ggplot().
  • Layers of ggplots are separated by +s.
  • The formula is (almost) always as follows:
ggplot(DATA, aes(x = X-VAR, y = Y-VAR, ...)) +
  geom_XXX() 

Coming Up…

  • What are some other types of plots you can make?

  • How can you talk about the information conveyed by plots?