Final review
Suggested Answers
Part 1
- a, c, e - boxplots, histograms, and density plots are good ways of visualizing numerical variables
- c
- a, d - both of these will return the same values; the one with specify() is the code we should use if we need point estimates with confidence intervals/hypothesis testing
- c - since less is the baseline, we will see level more in the output
- c
- c
- b, c, e; The p-value 0.3 greater than the 0.05 disernability level means we fail to reject the null hypothesis. About 30% of samples in the null distribution were more extreme than our observed value.
Option e is a little unclear - this would be phrased more precisely if something like this appears on an exam. Here, when I am talking about our observed value, I am specifically talking about our observed value of the slope since this is what the hypothesis test was looking at. About 30% of the null distribution are more extreme - here, about 15% of the null distribution values are greater than the observed value.
Part 2
(c) For every additional $1,000 of annual salary, the model predicts the raise to be higher, on average, by 0.0155%.
$R^2$ of
raise_2_fit
is higher than $R^2$ ofraise_1_fit
sinceraise_2_fit
has one more predictor and $R^2$ alwaysThe reference level of
performance_rating
is High, since it’s the first level alphabetically. Therefore, the coefficient -2.40% is the predicted difference in raise comparing High to Successful. In this context a negative coefficient makes sense since we would expect those with High performance rating to get higher raises than those with Successful performance.(a) “Poor”, “Successful”, “High”, “Top”.
Option 3. It’s a linear model with no interaction effect, so parallel lines. And since the slope for
salary_typeSalaried
is positive, its intercept is higher. The equations of the lines are as follows: - Hourly:\[ \begin{align*} \widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\ &= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 0 \\ &= 1.24 + 0.0000137 \times annual\_salary \end{align*} \]
- Salaried:
$$
\begin{align*}
\widehat{percent\_incr} &= 1.24 + 0.0000137 \times annual\_salary + 0.913 salary\_typeSalaried \\
&= 1.24 + 0.0000137 \times annual\_salary + 0.913 \times 1 \\
&= 2.153 + 0.0000137 \times annual\_salary
\end{align*}
$$
Part 3
- d
- c - We are 95% confident that the mean number of texts per month of all American teens is between 1450 and 1550.
- a
- b
- False
- True
- c
- d
- TPR: 0.87; FPR: 0.12; TNR: 0.88; FNR = 0.13 (it’s okay if you estimated these numbers slightly differently!!)
The sensitivity (which equals the true positive rate) is about 0.87 and 1 - specificity (which equals the false positive rate) is about 0.12.
Since true positive rate + false negative rate = 1, we know the false negative rate is 1 - TPR = 1 - 0.87 = 0.13.
Similarly, TNR + FPR = 1, so TNR = 1 - FPR 0.12 = 0.88.