Final Review

Published

June 23, 2025

Announcement: Office Hours/Exam

Tuesday:
- 9:30 - 10:45 AM (Marie)
- 3:30 - 5:30 PM (Marie)
Wednesday:
- 9:30 - 10:45 AM (Marie)
- 1:30 - 3:30 PM (Mary - Zoom)
- 5:30 - 7:30 PM (Katie)
Thursday:

No class or office hours! Final is 2-5pm in this room.

Teaching Evaluation

Teaching evaluations due tomorrow. If we get 13/17 responses, I’ll add +2 points to everyone’s final exam grade!

Current Grade Calculation

AEs: 5%: completion of 80% or more is full credits
- Equivalent to dropping the lowest 4 (out of 20 total)
- Take the number you have completed; divide by 16
Lab Attendance: 5% - one drop given (score - number of labs out of 10)
Labs: 30% - Lowest is dropped (5 labs at 6% each)
Midterm - 20% (Add in-class + take-home score to get total out of 100)
Project - 20%
Final - 20%

. . .

If you do better on the final than the midterm, it will be weighted higher!

Final Overview

All multiple choice (same format as midterm in-class)
Cumulative!!!
Cheat sheet: 2 pages front and back

Content Review

What model should I use?

First, look at what type of output you have:

Numerical: Linear regression
Binary: Logistic regression
Other: We haven’t learned how to do this!

What model should I use?

Now, look at your predictors. Does the relationship between the output and each predictor stay the same regardless of other predictors?

Yes: Additive
No: interaction

Additive vs. Interaction

Practice

head(weather_df)

# A tibble: 6 × 3
  did_rain temperature humidity
  <fct>          <dbl>    <dbl>
1 no_rain         71.4     52.7
2 rain            66.3     80  
3 no_rain         70.3     53.1
4 rain            63.3     88.8
5 rain            60.2     77.5
6 no_rain         76.7     65.9

What type of model to predict did_rain from temperature?
What type of model to predict temperature from did_rain?
What type of model to predict humidity from did_rain and and temperature?
What type of model to predict did_rain from temperature and humidity?

Model comparison

Linear Regression:
- \(R^2\):
  - Tells us the percent of variability in the data explained by the model
  - Always increases when adding more variables to a model
- Adjusted \(R^2\): Like \(R^2\) but with a penalty for more variables
Logistic Regression:
- TPR, TNR, FPR, FNR
- ROC Curve: which curve goes further to the top right?
- AUC: Area under the ROC curve

Review: Logistic Regression

Line of best fit? A little silly…

S-curve of best fit

What is the s-curve??

It’s the logistic function:

\[ \text{Prob}(y = 1) = \frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}. \]

. . .

If you set \(p = \text{Prob}(y = 1)\) and do some algebra, you get the simple linear model for the log-odds:

. . .

\[ \log\left(\frac{p}{1-p}\right) = \beta_0+\beta_1x. \]

. . .

This is called the logistic regression model.

Estimation

We estimate the parameters \(\beta_0,\,\beta_1\) using maximum likelihood (don’t worry about it) to get the “best fitting” S-curve;
The fitted model is

. . .

\[ \log\left(\frac{\widehat{p}}{1-\widehat{p}}\right) = b_0+b_1x. \]

. . .

\(\hat{p}\) is the predicted probability that \(y = 1\)
In R, the second level of the factor is taken as \(y = 1\)

Interpreting the intercept

Plug in \(x = 0\):

\[ \log\left(\frac{\widehat{p}}{1-\widehat{p}}\right) = b_0+b_1x. \]

. . .

When \(x = 0\), the estimated log-odds is \(b_0\).

. . .

When \(x = 0\), the estimated probability that \(y = 1\) is

\[ \hat{p} = \frac{e^{b_0}}{1+e^{b_0}} \]

Interpreting the slope is tricky

Recall:

\[ \log\left(\frac{\widehat{p}}{1-\widehat{p}}\right) = b_0+b_1x. \]

. . .

Alternatively:

\[ \frac{\widehat{p}}{1-\widehat{p}} = e^{b_0+b_1x} = \color{blue}{e^{b_0}e^{b_1x}} . \]

. . .

If we increase \(x\) by one unit, we have:

\[ \frac{\widehat{p}}{1-\widehat{p}} = e^{b_0}e^{b_1(x+1)} = e^{b_0}e^{b_1x+b_1} = {\color{blue}{e^{b_0}e^{b_1x}}}{\color{red}{e^{b_1}}} . \]

. . .

A one unit increase in \(x\) is associated with a change in odds by a factor of \(e^{b_1}\). Gross!

Sign of the slope is meaningful

A one unit increase in \(x\) is associated with a change in odds by a factor of \(e^{b_1}\).

. . .

A positive slope means increasing \(x\) increases the odds (and probability!) that \(y = 1\)
A negative slope means increasing \(x\) decreases the odds (and probability!) that \(y = 1\)

Threshold

Pick some probability threshold \(p^*\).

If \(\hat{p} > p^*\) - predict \(y = 1\)
If \(\hat{p} < p^*\) - predict \(y = 0\)

. . .

The higher the threshold, the harder it is to classify as \(y = 1\)!!!

Classification Rates

False negative rate = \(\frac{FN}{FN + TP}\)
False positive rate = \(\frac{FP}{FP + TN}\)
Sensitivity = \(\frac{FN}{FN + TP}\) = 1 − False negative rate
Specificity = \(\frac{TN}{FP + TN}\) = 1 - False positive rate

ROC Curve + AUC

. . .

AUC: Area under the curve

“Three branches of statistical government”

We have an unknown quantity we are trying to learn about (for example, \(\beta_1\)) using noisy, imperfect data. Learning comes in three flavors:

POINT ESTIMATION: get a single-number best guess for \(\beta_1\);
INTERVAL ESTIMATION: get a range of likely values for \(\beta_1\) that characterizes (sampling) uncertainty;
HYPOTHESIS TESTING: use the data to distinguish competing claims about \(\beta_1\).