# A tibble: 5 × 3
C1 C2 C3
<chr> <chr> <int>
1 A X 1
2 A Y 2
3 A X 3
4 B X 4
5 B Y 5
Lecture 11
May 30, 2025
group_by/summarize/mutate
With summarize: condenses information into smaller result table
With mutate: adds a new column to the data
With groupby: results for each provided group
Compute the mean of C3.
Compute the mean of C3.
Group by C1.
Group by C1 and C2.
Compute the mean of C3 by value of C1.
Compute the mean of C3 by value of C1.
Compute the mean of C3 by value of C1 and C2.
Compute the mean of C3 by value of C1 and C2.
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
left_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 3 | X3 | NA |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
left_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 3 | X3 | NA |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
right_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 4 | NA | Y4 |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
right_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 4 | NA | Y4 |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
full_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 3 | X3 | NA |
| 4 | NA | Y4 |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
full_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
| 3 | X3 | NA |
| 4 | NA | Y4 |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
inner_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
inner_join(df_X, df_Y)
| id | X | Y |
|---|---|---|
| 1 | X1 | Y1 |
| 2 | X2 | Y2 |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
semi_join(df_X, df_Y)
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
semi_join(df_X, df_Y)
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |

df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
anti_join(df_X, df_Y)
| id | X |
|---|---|
| 3 | X3 |
df_X
| id | X |
|---|---|
| 1 | X1 |
| 2 | X2 |
| 3 | X3 |
df_Y
| id | Y |
|---|---|
| 1 | Y1 |
| 2 | Y2 |
| 4 | Y4 |
anti_join(df_X, df_Y)
| id | X |
|---|---|
| 3 | X3 |


df_contacts
| person | phone |
|---|---|
| Marie | 1 |
| Katie | 2 |
| John | 3 |
| John | 4 |
df_texts
| phone | texts |
|---|---|
| 1 | 100 |
| 2 | 700 |
| 4 | 400 |
| 5 | 50 |
Remember: sometimes we need to specify which columns we are joining with.
How do we go from this…
# A tibble: 4 × 6
degree `2011` `2012` `2013` `2014` `2015`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AB2 NA 1 NA NA 4
2 AB 2 2 4 1 3
3 BS2 2 6 1 NA 5
4 BS 5 9 4 13 10
…to this?
# A tibble: 56 × 3
degree year n
<fct> <dbl> <dbl>
1 AB2 2011 NA
2 AB2 2012 1
3 AB2 2013 NA
4 AB2 2014 NA
5 AB2 2015 4
# ℹ 51 more rows
How do we get back to this…
# A tibble: 4 × 6
degree `2011` `2012` `2013` `2014` `2015`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AB2 NA 1 NA NA 4
2 AB 2 2 4 1 3
3 BS2 2 6 1 NA 5
4 BS 5 9 4 13 10
… from this?
# A tibble: 56 × 3
degree year n
<fct> <dbl> <dbl>
1 AB2 2011 NA
2 AB2 2012 1
3 AB2 2013 NA
4 AB2 2014 NA
5 AB2 2015 4
# ℹ 51 more rows
Kahoot! See website for questions.