Visualizing Different Types of Data

Lecture 3

May 19, 2025

Announcements/Reminders

  • Check the office hours schedule! Updates are on the course overview page.

  • Answers to your survey questions will come tomorrow!

  • Today is your first ‘real’ lab … any questions?

  • Make sure you push changes to AE03 today - more on that later.

  • Do the preparation reading!!!

Outline

  • Last Time: Learned how to plot with a focus on scatter plots.

  • Today:

    • Discuss different methods for visualizing numerical and categorical data

    • Demonstrate code to plot these data

Warm-Up:

Review AE-02

Plotting Different Types of Data

Types of Data

  • Numerical Data:

    • Takes a wide range of numerical values

    • Makes sense to add, subtract, etc.

  • Categorical Data:

    • Values can be thought of as distinct categories

    • The possible values are called levels

Identifying variable types

  • Favorite food

  • Number of classes you’re in this semester

  • Zip code

  • Age

Numerical Data: Some Key Terms

  • Center: Shows a ‘typical’ value of a data entry

    • Mean, median, mode…
  • Variability: How different are the data points from each other?

    • Quartiles, IQR, Skew…

And so much more!

Visualizing penguins

library(tidyverse)
── Attaching core tidyverse packages ────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(palmerpenguins)
library(ggthemes)

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm
   <fct>   <fct>              <dbl>         <dbl>             <int>
 1 Adelie  Torgersen           39.1          18.7               181
 2 Adelie  Torgersen           39.5          17.4               186
 3 Adelie  Torgersen           40.3          18                 195
 4 Adelie  Torgersen           NA            NA                  NA
 5 Adelie  Torgersen           36.7          19.3               193
 6 Adelie  Torgersen           39.3          20.6               190
 7 Adelie  Torgersen           38.9          17.8               181
 8 Adelie  Torgersen           39.2          19.6               195
 9 Adelie  Torgersen           34.1          18.1               193
10 Adelie  Torgersen           42            20.2               190
# ℹ 334 more rows
# ℹ 3 more variables: body_mass_g <int>, sex <fct>, year <int>

Univariate analysis

Univariate analysis

Analyzing a single variable:

  • Numerical: histogram, box plot, density plot, etc.

  • Categorical: bar plot, pie chart, etc.

Histogram - Step 1

ggplot(
  penguins
  )

Histogram - Step 2

ggplot(
  penguins,
  aes(x = body_mass_g)
  )

Histogram - Step 3

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_histogram()

Histogram - Step 4

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_histogram(
    binwidth = 250
  )

Histogram - Step 4

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_histogram(
    binwidth = 1000
  )

Histogram - Step 5

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_histogram(
    binwidth = 250
  ) +
  labs(
    title = "Weights of penguins",
    x = "Weight (grams)",
    y = "Count"
  )

Boxplot - Step 1

ggplot(
  penguins
  )

Boxplot - Step 2

ggplot(
  penguins,
  aes(x = body_mass_g)
  )

Boxplot - Step 3

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_boxplot()

Boxplot - Step 3

ggplot(
  penguins,
  aes(y = body_mass_g)
  ) +
  geom_boxplot()

Boxplot - Step 3

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_boxplot()

Boxplot - Step 4

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_boxplot() +
  labs(
    x = "Weight (grams)",
    y = NULL
  )

Density plot - Step 1

ggplot(
  penguins
  )

Density plot - Step 2

ggplot(
  penguins,
  aes(x = body_mass_g)
  )

Density plot - Step 3

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_density()

Density plot - Step 4

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_density(
    fill = "lightblue1"
  )

Density plot - Step 5

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_density(
    fill = "lightblue1",
    linewidth = 2
  )

Density plot - Step 6

ggplot(
  penguins,
  aes(x = body_mass_g)
  ) +
  geom_density(
    fill = "lightblue1",
    linewidth = 2,
    color = "darkorchid3"
  )

Let’s Discuss

Bar Charts - Step 1

ggplot(
  penguins
  )

Bar Charts - Step 2

ggplot(
  penguins, 
  aes(x = species)
  )

Bar Charts - Step 3

ggplot(
  penguins, 
  aes(x = species)
  ) +
geom_bar()

Bivariate analysis

Bivariate analysis

Analyzing the relationship between two variables:

  • Numerical + numerical: scatterplot

  • Numerical + categorical: side-by-side box plots, violin plots, etc.

  • Categorical + categorical: stacked bar plots

  • Using an aesthetic (e.g., fill, color, shape, etc.) or facets to represent the second variable in any plot

Side-by-side box plots

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    y = species
    )
  ) +
  geom_boxplot()

Density plots

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    color = species
    )
  ) +
  geom_density()

Density plots

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    color = species,
    fill = species
    )
  ) +
  geom_density()

Density plots

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    color = species,
    fill = species
    )
  ) +
  geom_density(
    alpha = 0.5
  )

Density plots

ggplot(
  penguins,
  aes(
    x = body_mass_g,
    color = species,
    fill = species
    )
  ) +
  geom_density(
    alpha = 0.5
  ) +
  theme(
    legend.position = "bottom"
  )

Bar Charts

ggplot(
  penguins, 
  aes(x = species, 
      fill = island)
  ) +
geom_bar()

Application exercise

How to push

  • You must commit and push changes to the current AE document before the end of class

  • Today, check your github folder for changes to AE-03 with a commit message by you!!