# A tibble: 5 × 3
C1 C2 C3
<chr> <chr> <int>
1 A X 1
2 A Y 2
3 A X 3
4 B X 4
5 B Y 5
Lecture 11
May 30, 2025
group_by
/summarize
/mutate
With summarize: condenses information into smaller result table
With mutate: adds a new column to the data
With groupby: results for each provided group
Compute the mean of C3
.
Compute the mean of C3
.
Group by C1
.
Group by C1
and C2
.
Compute the mean of C3
by value of C1
.
Compute the mean of C3
by value of C1
.
Compute the mean of C3
by value of C1
and C2
.
Compute the mean of C3
by value of C1
and C2
.
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
left_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
3 | X3 | NA |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
left_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
3 | X3 | NA |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
right_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
4 | NA | Y4 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
right_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
4 | NA | Y4 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
full_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
3 | X3 | NA |
4 | NA | Y4 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
full_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
3 | X3 | NA |
4 | NA | Y4 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
inner_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
inner_join(df_X, df_Y)
id | X | Y |
---|---|---|
1 | X1 | Y1 |
2 | X2 | Y2 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
semi_join(df_X, df_Y)
id | X |
---|---|
1 | X1 |
2 | X2 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
semi_join(df_X, df_Y)
id | X |
---|---|
1 | X1 |
2 | X2 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
anti_join(df_X, df_Y)
id | X |
---|---|
3 | X3 |
df_X
id | X |
---|---|
1 | X1 |
2 | X2 |
3 | X3 |
df_Y
id | Y |
---|---|
1 | Y1 |
2 | Y2 |
4 | Y4 |
anti_join(df_X, df_Y)
id | X |
---|---|
3 | X3 |
df_contacts
person | phone |
---|---|
Marie | 1 |
Katie | 2 |
John | 3 |
John | 4 |
df_texts
phone | texts |
---|---|
1 | 100 |
2 | 700 |
4 | 400 |
5 | 50 |
Remember: sometimes we need to specify which columns we are joining with.
How do we go from this…
# A tibble: 4 × 6
degree `2011` `2012` `2013` `2014` `2015`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AB2 NA 1 NA NA 4
2 AB 2 2 4 1 3
3 BS2 2 6 1 NA 5
4 BS 5 9 4 13 10
…to this?
# A tibble: 56 × 3
degree year n
<fct> <dbl> <dbl>
1 AB2 2011 NA
2 AB2 2012 1
3 AB2 2013 NA
4 AB2 2014 NA
5 AB2 2015 4
# ℹ 51 more rows
How do we get back to this…
# A tibble: 4 × 6
degree `2011` `2012` `2013` `2014` `2015`
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AB2 NA 1 NA NA 4
2 AB 2 2 4 1 3
3 BS2 2 6 1 NA 5
4 BS 5 9 4 13 10
… from this?
# A tibble: 56 × 3
degree year n
<fct> <dbl> <dbl>
1 AB2 2011 NA
2 AB2 2012 1
3 AB2 2013 NA
4 AB2 2014 NA
5 AB2 2015 4
# ℹ 51 more rows
Kahoot! See website for questions.