Mock Assessment for MM1F28
Make sure you can answer all of these questions. The final assessment will require you to be able to use all the methods you use here.
We will use the german_credit dataset (this can be found in the same folder as this document).
What is the mean, median, and SD of the variable credit_amount?
Is there a significant effect of foreign worker on credit amount? In other words:
credit_amount ~ foreign_worker
Report all the pre-tests and the tests you did to address this question. Next, report the results and answer the question.
Draw a confidence interval plot for the previous question, i.e., credit_amount ~ foreign_worker
Is there a significant effect of personal_status_sex on credit_amount? If so, between which groups?
Report all the pre-tests and the tests you did to address this question. Next, report the results and answer the question. Note, in this one you need to report the results of post-hoc tests.
Is there a correlation between credit_amount and duration_in_month?
Report all the pre-tests and the tests you did to address this question. Next, report the results and answer the question.
Draw a regression plot for credit_amount and duration_in_month. Set credit_amount as the Y variable (response variable).
Create the following liner model (multiple linear regression):
Credit_amount ~ duration_in_month + housing
Now report the results of the linear model. Don’t forget to include whether the model is a good fit, whether the predictors are significant, the R2, and an interpretation of how the coefficients of xn affect y.
Upload a diagnostic plot for your linear model. How does this plot affect our trust in the results?
Here is the output of a logistic regression with ‘default’ as the response variable (y):
Interpret the output of the logit model. Make sure to explain how x affects y when that is significant. The intercept for the categorical variable ‘purpose’ is ‘vacation’.
Here is the ROC plot for the model above, what is the best trade-off between sensitivity and specificity? How do you know this?
Here is the confusion matrix when the model is applied on the data. Report accuracy, misclassification, sensitivity, specificity, PPV, and NPV.