Lab 6

Lemurs

Lab
Due Monday June 23 11:59 PM
Important

The due date has been pushed back by one day to allow you more time to work on your projects before their due date. The late submission deadline is a hard cutoff of Wednesday at 8:30 AM. We will release solutions on Wednesday morning to help in studying for the final. NO late submissions will be allowed after this point. The standard 5% penalty per day late applies.

Introduction: Lemurs

In this lab, you’ll work with data from the Duke Lemur Center, which houses over 200 lemurs across 14 species – the most diverse population of lemurs on Earth, outside their native Madagascar.

Duke Lemur Center:

Lemurs are the most threatened group of mammals on the planet, and 95% of lemur species are at risk of extinction. Our mission is to learn everything we can about lemurs – because the more we learn, the better we can work to save them from extinction. They are endemic only to Madagascar, so it’s essentially a one-shot deal: once lemurs are gone from Madagascar, they are gone from the wild.

By studying the variables that most affect their health, reproduction, and social dynamics, the Duke Lemur Center learns how to most effectively focus their conservation efforts. And the more we learn about lemurs, the better we can educate the public around the world about just how amazing these animals are, why they need to be protected, and how each and every one of us can make a difference in their survival.

Source: TidyTuesday

You’ll work with a dataset of selected lemur species. The dataset, called lemurs.csv, can be found in the data folder. You can learn more about the data at: https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-08-24.

Zoboomafoo

If you ever watched the kids show Zoboomafoo, the lemur featured was from the Duke Lemur Center and is in our data set! For one bonus point added to this lab, write code that clearly displays this lemur’s name, taxon, date of birth, and age at death. Place this code in a section titled bonus.

The packages you will need for this lab are:

Question 1

  1. Load the lemurs data from your data folder and save it as lemurs.

  2. In a single pipeline, write code to report which types of lemurs (in the taxon variable) are represented in the sample and how many of each. In addition to reporting the value of taxon, you should also report the common name. You should refer back to the linked data dictionary to understand what the different values of taxon mean.

    Your response should be a tibble with three columns: taxon, common_name (a new variable you create that contains the common name of the taxon, e.g., EMON is Mongoose lemur), and n (number of lemurs with that taxon).

Question 2

Compute a 95% bootstrap interval for the slope of the regression line for predicting weights of lemurs (weight_g) from the ages of lemurs (in years) when their weight was measured (age_at_wt_y). In your code, use 1,000 bootstrap samples when simulating your bootstrap distribution. Don’t forget to set a seed!

In narrative, provide an interpretation of this 95% confidence interval for the slope. Additionally, report your point estimate.

Tip

Below is a step-by-step recipe for constructing and visualizing a confidence interval. The code snippets shown are not “complete.” They include some blanks you need to fill in, and they are just intended to guide you in the right direction.

  • Step 1: Calculate the point estimate for the slope and the intercept of the regression line.
obs_fit <- lemurs |>
  specify(______ ~ ______) |>
  fit()
  • Step 2: Simulate a bootstrap distribution of regression estimates.
set.seed(___)

boot_dist <- lemurs |>
  specify(_____ ~ ______) |>
  generate(reps = ____, type = _______) |>
  fit()
  • Step 3: Calculate the bounds of the confidence interval.
conf_ints <- 
  get_confidence_interval(
    ____, 
    level = ___, 
    point_estimate = ____
  )

Question 3

What are the slopes of the regression line for an additive model predicting weights of lemurs (weight_g) from the ages of lemurs (in years) when their weight was measured (age_at_wt_y) and their types (taxon)? Calculate and interpret a 95% bootstrap bootstrap confidence interval for each. Also report your point estimates. Don’t forget to set a seed and use 1,000 bootstrap samples (reps = 1000) when simulating your bootstrap distribution.

Question 4

Do female coquerel’s sifaka lemurs have different weights than male coquerel’s sifaka on average?

  1. Create a new data frame cs_lemurs filtered to have only coquerel’s sifaka lemurs with determined sex (sex is not determined when sex equals ND).

  2. Create an appropriate data visualization to compare the distribution of weights of male and female coquerel’s sifaka lemurs. Provide a brief interpretation of this visualization.

  3. Conduct a hypothesis test to answer our question at 5% significance level. While doing this, you should make sure your answer includes:

    • Your hypotheses clearly stated in the context of the data and the research question

    • A visualization of the p-value and the null distribution

    • The computed p-value clearly displayed

    • A one-sentence conclusion for your hypothesis test in the context of the data and the research question.

Make sure to set a seed and use 1,000 samples (reps = 1000) when simulating your permutation-based null distribution.

Tip

If you are having trouble getting started, refer back to Wednesday’s slides - check out the example where we compared weights of babies born to smoking/non-smoking mothers.

  1. Based on your answer to part (c), would you expect a 95% confidence interval for the difference in means of female and male lemurs to include 0? Explain your reasoning.

  2. Construct and interpret a 95% bootstrap confidence interval for the difference in means of female and male lemurs. Does it include 0? Don’t forget to set a seed and use 1,000 bootstrap samples (reps = 1000) when simulating your bootstrap distribution.