Final review

Part 1

a, c, e - boxplots, histograms, and density plots are good ways of visualizing numerical variables
c
a, d - both of these will return the same values; the one with specify() is the code we should use if we need point estimates with confidence intervals/hypothesis testing
c - since less is the baseline, we will see level more in the output
c
c
b, c, e; The p-value 0.3 greater than the 0.05 disernability level means we fail to reject the null hypothesis. About 30% of samples in the null distribution were more extreme than our observed value.

Option e is a little unclear - this would be phrased more precisely if something like this appears on an exam. Here, when I am talking about our observed value, I am specifically talking about our observed value of the slope since this is what the hypothesis test was looking at. About 30% of the null distribution are more extreme - here, about 15% of the null distribution values are greater than the observed value.

Part 2

(c) For every additional $1,000 of annual salary, the model predicts the raise to be higher, on average, by 0.0155%.
$R^2$ of raise_2_fit is higher than $R^2$ of raise_1_fit since raise_2_fit has one more predictor and $R^2$ always
The reference level of performance_rating is High, since it’s the first level alphabetically. Therefore, the coefficient -2.40% is the predicted difference in raise comparing High to Successful. In this context a negative coefficient makes sense since we would expect those with High performance rating to get higher raises than those with Successful performance.
(a) “Poor”, “Successful”, “High”, “Top”.
Option 3. It’s a linear model with no interaction effect, so parallel lines. And since the slope for salary_typeSalaried is positive, its intercept is higher. The equations of the lines are as follows: - Hourly:

\[ \begin{align*} \widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\ &= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 0 \\ &= 1.24 + 0.0000137 \times annual\_salary \end{align*} \]

-   Salaried:

    $$
    \begin{align*}
    \widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\
    &= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 1 \\
    &= 2.153 + 0.0000137 \times annual\_salary
    \end{align*}
    $$

Part 3

d
c - We are 95% confident that the mean number of texts per month of all American teens is between 1450 and 1550.
a
b
False
True
c
d
TPR: 0.87; FPR: 0.12; TNR: 0.88; FNR = 0.13 (it’s okay if you estimated these numbers slightly differently!!)

The sensitivity (which equals the true positive rate) is about 0.87 and 1 - specificity (which equals the false positive rate) is about 0.12.

Since true positive rate + false negative rate = 1, we know the false negative rate is 1 - TPR = 1 - 0.87 = 0.13.

Similarly, TNR + FPR = 1, so TNR = 1 - FPR 0.12 = 0.88.