AE 12: Modelling penguins
In this application exercise we will be studying penguins. The data can be found in the palmerpenguins package and we will use tidyverse and tidymodels for data exploration and modeling, respectively.
You’ve seen the penguins
data far too much at this point (sorry!!!), but we’re going to bring it back one more time.
glimpse(penguins)
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Part 1
Our goal is to understand better how various body measurements and attributes of penguins relate to their body mass. First, we are going to investigate the relationship between a penguins’ flipper lengths and their body masses.
- Based on our research focus, body mass is the response variable.
Task 1 - Exploratory Data Analysis: Visualize the relationship between flipper length and body mass of penguins.
# Add code here!
Correlation
Task 2 - Complete the following
-
Question: What is correlation? What values can correlation take?
*Add answer here.
NoteAre you good at guessing correlation? Give it a try with this game!
Code: What is the correlation between flipper length and body mass of penguins?
# add code here
Defining, fitting, and summarizing a model
Task 3 (Demo) : Write the population model (model for the true values) below that explains the relationship between body mass and flipper length.
Add answer here!
Task 4: Fit the linear regression model and display the results. Write the estimated model output below.
# add code here
Writee model output here.
Task 5: Interpret the slope and the intercept in the context of the data.
Intercept:
Slope:
Task 6: Recreate the visualization from above, this time adding a regression line to the visualization geom_smooth(method = "lm")
.
# add code here
Task 7 (Demo): What is the estimated body mass for a penguin with a flipper length of 210?
# add code here
Task 8: What is the estimated body mass for a penguin with a flipper length of 100? Add code to find it! Is there anything weird about making this prediction?
# add code here
Part 2: Another model
Task 9: A different researcher wants to look at body weight of penguins based on the island they were recorded on.
-
Question: How are the variables involved in this analysis different?
Add answer here. Code: Make an appropriate visualization to investigate this relationship below. Additionally, calculate the mean body mass by island.
# add plot here
# add mean by island
Task 10: Change the geom of your previous plot to geom_point()
. Is this plot useful?
# add code here
Task 11: Fit the linear regression model and display the results. Write the estimated model output below.
# add code here
Interpreting Categorical Predictors
Task 12: Fill in the blanks.
- The baseline island is ________.
Intercept: Penguins from _________ island are expected to weigh, on average, _______ grams.
-
Slopes:
Penguins from _________ are expected to weigh, on average, _______ grams _____ than those from _______.
Penguins from _________ island are expected to weigh, on average, ________ grams _______ than those from _______.
Prediction:
Task 13 (Demo): What is the estimated body weight of a penguin on Biscoe island? What are the estimated body weights of penguins on Dream and Torgersen islands?
# add code here