Final review

Suggested Answers

Part 1

  1. a, c, e - boxplots, histograms, and density plots are good ways of visualizing numerical variables
  2. c
  3. a, d - both of these will return the same values; the one with specify() is the code we should use if we need point estimates with confidence intervals/hypothesis testing
  4. c - since less is the baseline, we will see level more in the output
  5. c
  6. c
  7. b, c, e; The p-value 0.3 greater than the 0.05 disernability level means we fail to reject the null hypothesis. About 30% of samples in the null distribution were more extreme than our observed value.

Option e is a little unclear - this would be phrased more precisely if something like this appears on an exam. Here, when I am talking about our observed value, I am specifically talking about our observed value of the slope since this is what the hypothesis test was looking at. About 30% of the null distribution are more extreme - here, about 15% of the null distribution values are greater than the observed value.

Part 2

  1. (c) For every additional $1,000 of annual salary, the model predicts the raise to be higher, on average, by 0.0155%.

  2. $R^2$ of raise_2_fit is higher than $R^2$ of raise_1_fit since raise_2_fit has one more predictor and $R^2$ always

  3. The reference level of performance_rating is High, since it’s the first level alphabetically. Therefore, the coefficient -2.40% is the predicted difference in raise comparing High to Successful. In this context a negative coefficient makes sense since we would expect those with High performance rating to get higher raises than those with Successful performance.

  4. (a) “Poor”, “Successful”, “High”, “Top”.

  5. Option 3. It’s a linear model with no interaction effect, so parallel lines. And since the slope for salary_typeSalaried is positive, its intercept is higher. The equations of the lines are as follows: - Hourly:

    \[ \begin{align*} \widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\ &= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 0 \\ &= 1.24 + 0.0000137 \times annual\_salary \end{align*} \]

-   Salaried:

    $$
    \begin{align*}
    \widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\
    &= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 1 \\
    &= 2.153 + 0.0000137 \times annual\_salary
    \end{align*}
    $$
    

Part 3

  1. d
  2. c - We are 95% confident that the mean number of texts per month of all American teens is between 1450 and 1550.
  3. a
  4. b
  5. False
  6. True
  7. c
  8. d
  9. TPR: 0.87; FPR: 0.12; TNR: 0.88; FNR = 0.13 (it’s okay if you estimated these numbers slightly differently!!)

The sensitivity (which equals the true positive rate) is about 0.87 and 1 - specificity (which equals the false positive rate) is about 0.12.

Since true positive rate + false negative rate = 1, we know the false negative rate is 1 - TPR = 1 - 0.87 = 0.13.

Similarly, TNR + FPR = 1, so TNR = 1 - FPR 0.12 = 0.88.