Data types and classes

Lecture 8

Published

May 27, 2025

Announcements

  • Exam 1 in class next week on Tuesday – cheat sheet (1 page, both sides, hand-written or typed, must be prepared by you)
  • Exam 1 take home starts after class on Tuesday, due at 9:30 AM on Wednesday (open resources, internet, etc., closed to other humans)

Study tips for the exam

  • Review lectures/readings/labs/AEs
  • Make sure you understand why code has the output it does
  • Lab 3 will cover most of what we have done so far - think of this as a type of review!
  • General practice/ in class review coming on Friday!

Types and classes

Types and classes

An object’s type indicates how it is stored in memory.

Common data types

You’ll commonly encounter:

  • logical
  • integer
  • double
  • character

You’ll less commonly encounter:

  • list
  • NULL
  • complex
  • raw

Logical and Character

Logical: Boolean TRUE / FALSE values

  • <dbl>
typeof(TRUE)
[1] "logical"


typeof(FALSE)
[1] "logical"

Character: Character strings; in quotes

  • `<chr>`
typeof("Hello!")
[1] "character"


typeof("TRUE")
[1] "character"

Numeric: Double and integer

Double: floating point numerical values (default numerical type)

  • `<dbl>`
typeof(2.5)
[1] "double"


[1] "double"

Integer: integer numerical values; indicated with an L

  • `<int>`
typeof(3L)
[1] "integer"

Type Compatibility

Can you use different types of data together? Sometimes… but be careful!

"3" + 3
Error in "3" + 3: non-numeric argument to binary operator
3L + 3
[1] 6
typeof(3L + 3)
[1] "double"
TRUE + 3
[1] 4
typeof(TRUE + 3)
[1] "double"

Concatenation

Vectors are constructed using the c function

  • Double vector:

    x <- c(1, 2, 3, 5)
    typeof(x)
    [1] "double"
  • Integer vectors:

    x <- c(1L, 2L, 3L, 5L)
    typeof(x)
    [1] "integer"
  • Character vector:

    x <- c("1", "2", "3", "5")
    typeof(x)
    [1] "character"
  • Logical vectors:

    x <- c(TRUE, FALSE, FALSE)
    typeof(x)
    [1] "logical"

Converting between types

without intention…

c(2, "Just this one!")
[1] "2"              "Just this one!"


R will happily convert between various types without complaint when different types of data are concatenated in a vector. This is NOT always a good thing.

Converting between types

without intention…

c(FALSE, 3L)
[1] 0 3


c(1.2, 3L)
[1] 1.2 3.0


c(2L, "two")
[1] "2"   "two"

Converting between types

with intention…

x <- 1:3
x
[1] 1 2 3
[1] "integer"
y <- as.character(x)
y
[1] "1" "2" "3"
[1] "character"

Converting between types

with intention…

x <- c(TRUE, FALSE)
x
[1]  TRUE FALSE
[1] "logical"
y <- as.numeric(x)
y
[1] 1 0
[1] "double"

Explicit vs. implicit coercion

Explicit coercion:

When you call a function like:

Implicit coercion:

Happens when you use a vector in a specific context that expects a certain type of vector.

Data classes

Data classes

  • Data types are like Lego building blocks
  • We can stick them together to build more complicated constructs, e.g. representations of data
  • The class determines this construct
  • Examples: factors, dates, and data frames

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df
  x y
1 1 3
2 2 4
typeof(df)
[1] "list"
class(df)
[1] "data.frame"

Data frames

We can think of data frames like like vectors of equal length glued together

df <- data.frame(x = 1:2, y = 3:4)
df
  x y
1 1 3
2 2 4
typeof(df)
[1] "list"
class(df)
[1] "data.frame"
  • When we use the pull() function, we extract a vector from the data frame
df |>
  pull(y)
[1] 3 4

Dates

today <- as.Date("2025-05-27")
today
[1] "2025-05-27"
typeof(today)
[1] "double"
class(today)
[1] "Date"

More on dates

We can think of dates like an integer (the number of days since the origin, 1 Jan 1970) and an integer (the origin) glued together

as.integer(today)
[1] 20235
as.integer(today) / 365 # roughly 55 yrs
[1] 55.43836

Factors

R uses factors to handle categorical variables with a fixed and known set of possible values

months <- c("June", "July", "June", "August", "June")
months_factor <- factor(months)
months_factor
[1] June   July   June   August June  
Levels: August July June
typeof(months_factor)
[1] "integer"
class(months_factor)
[1] "factor"

More on factors

We can think of factors like character (level labels) and an integer (level numbers) glued together

glimpse(months_factor)
 Factor w/ 3 levels "August","July",..: 3 2 3 1 3
as.integer(months_factor)
[1] 3 2 3 1 3

More on factors

We can use the forcats package (in tidyverse) to work with factors!

Some commonly used functions are:

  • fct_relevel(): reorder factors by hand

  • fct_reorder(): reorder factors by another variable

  • fct_infreq(): reorder factors by frequency

  • fct_rev(): reorder factors by reversing

Example Factor Re-Order

amounts <- c("low", "medium", "high", "high", "medium")
amounts_factor <- factor(amounts)
amounts_factor
[1] low    medium high   high   medium
Levels: high low medium
fct_relevel(amounts_factor, c("low", "medium", "high"))
[1] low    medium high   high   medium
Levels: low medium high

Application exercise

AE 08: Working with Factors