Lab 6
Lemurs
The due date has been pushed back by one day to allow you more time to work on your projects before their due date. The late submission deadline is a hard cutoff of Wednesday at 8:30 AM. We will release solutions on Wednesday morning to help in studying for the final. NO late submissions will be allowed after this point. The standard 5% penalty per day late applies.
Introduction: Lemurs
In this lab, you’ll work with data from the Duke Lemur Center, which houses over 200 lemurs across 14 species – the most diverse population of lemurs on Earth, outside their native Madagascar.
Duke Lemur Center:
Lemurs are the most threatened group of mammals on the planet, and 95% of lemur species are at risk of extinction. Our mission is to learn everything we can about lemurs – because the more we learn, the better we can work to save them from extinction. They are endemic only to Madagascar, so it’s essentially a one-shot deal: once lemurs are gone from Madagascar, they are gone from the wild.
By studying the variables that most affect their health, reproduction, and social dynamics, the Duke Lemur Center learns how to most effectively focus their conservation efforts. And the more we learn about lemurs, the better we can educate the public around the world about just how amazing these animals are, why they need to be protected, and how each and every one of us can make a difference in their survival.
Source: TidyTuesday
You’ll work with a dataset of selected lemur species. The dataset, called lemurs.csv
, can be found in the data
folder. You can learn more about the data at: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-24.
If you ever watched the kids show Zoboomafoo, the lemur featured was from the Duke Lemur Center and is in our data set! For one bonus point added to this lab, write code that clearly displays this lemur’s name, taxon, date of birth, and age at death. Place this code in a section titled bonus.
The packages you will need for this lab are:
Question 1
Load the
lemurs
data from yourdata
folder and save it aslemurs
.-
In a single pipeline, write code to report which types of lemurs (in the
taxon
variable) are represented in the sample and how many of each. In addition to reporting the value oftaxon
, you should also report the common name. You should refer back to the linked data dictionary to understand what the different values oftaxon
mean.Your response should be a tibble with three columns:
taxon
,common_name
(a new variable you create that contains the common name of thetaxon
, e.g.,EMON
isMongoose lemur
), andn
(number of lemurs with that taxon).
Question 2
Compute a 95% bootstrap interval for the slope of the regression line for predicting weights of lemurs (weight_g
) from the ages of lemurs (in years) when their weight was measured (age_at_wt_y
). In your code, use 1,000 bootstrap samples when simulating your bootstrap distribution. Don’t forget to set a seed!
In narrative, provide an interpretation of this 95% confidence interval for the slope. Additionally, report your point estimate.
Below is a step-by-step recipe for constructing and visualizing a confidence interval. The code snippets shown are not “complete.” They include some blanks you need to fill in, and they are just intended to guide you in the right direction.
- Step 1: Calculate the point estimate for the slope and the intercept of the regression line.
<- lemurs |>
obs_fit specify(______ ~ ______) |>
fit()
- Step 2: Simulate a bootstrap distribution of regression estimates.
set.seed(___)
<- lemurs |>
boot_dist specify(_____ ~ ______) |>
generate(reps = ____, type = _______) |>
fit()
- Step 3: Calculate the bounds of the confidence interval.
<-
conf_ints get_confidence_interval(
____, level = ___,
point_estimate = ____
)
Question 3
What are the slopes of the regression line for an additive model predicting weights of lemurs (weight_g
) from the ages of lemurs (in years) when their weight was measured (age_at_wt_y
) and their types (taxon
)? Calculate and interpret a 95% bootstrap bootstrap confidence interval for each. Also report your point estimates. Don’t forget to set a seed and use 1,000 bootstrap samples (reps = 1000
) when simulating your bootstrap distribution.
Question 4
Do female coquerel’s sifaka lemurs have different weights than male coquerel’s sifaka on average?
Create a new data frame
cs_lemurs
filtered to have only coquerel’s sifaka lemurs with determined sex (sex is not determined whensex
equalsND
).Create an appropriate data visualization to compare the distribution of weights of male and female coquerel’s sifaka lemurs. Provide a brief interpretation of this visualization.
-
Conduct a hypothesis test to answer our question at 5% significance level. While doing this, you should make sure your answer includes:
Your hypotheses clearly stated in the context of the data and the research question
A visualization of the p-value and the null distribution
The computed p-value clearly displayed
A one-sentence conclusion for your hypothesis test in the context of the data and the research question.
Make sure to set a seed and use 1,000 samples (reps = 1000
) when simulating your permutation-based null distribution.
If you are having trouble getting started, refer back to Wednesday’s slides - check out the example where we compared weights of babies born to smoking/non-smoking mothers.
Based on your answer to part (c), would you expect a 95% confidence interval for the difference in means of female and male lemurs to include 0? Explain your reasoning.
Construct and interpret a 95% bootstrap confidence interval for the difference in means of female and male lemurs. Does it include 0? Don’t forget to set a seed and use 1,000 bootstrap samples (
reps = 1000
) when simulating your bootstrap distribution.