程序辅导案例 > Database >

辅导案例-MAT012

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

UNIVERSITY OF CARDIFF

MAT012 Credit Risk Scoring

Assignment 2019/20

This forms your assessment (100%) of this module.
There are two parts to this assessment.

Part A contains THREE short essay-‐based questions and counts for 50% of the final mark.

Part B contains FOUR tasks to establish a scorecard using the given dataset and counts for
50% of the final mark. You may use Excel, SAS, R or Python to assist in the scorecard
preparation.

You must answer ALL questions.

Submission must be made by 3pm on Friday 20th March via Learning Central, and
instructions will follow shortly on how to do this. You will need to submit a single file
containing answers to all questions; any spreadsheet analysis, workings or coding necessary
can be shown in an Appendix in that file. Only the submitted file will be marked.

PART A

1. Critically examine what needs to be considered when developing a credit risk scoring
model.
[20 marks]

2. Explain how, in theory, Cox’s proportional hazard model for survival analysis can be
used for constructing a scorecard. Comment on the relative popularity of Cox’s PH
model versus logistic regression in scorecard construction.
[15 marks]

3. Provide a brief literature review on the use of Markov models in credit risk
modelling, with a particular focus on those used in credit risk scoring.
[15 marks]

PART B

The dataset underpinning the analysis here is that used in the lab sessions during lectures. It
has been uploaded as a spreadsheet named ‘German’ together with the data dictionary
‘German data dictionary’ describing each attribute. You will recall that the dataset consists
of data for 1000 applicants along with a variable that says whether they were subsequently
Good or Bad from a credit perspective.

1. Split the dataset into two subsets as follows:

Subset 1: the applicants with Duration <= 12 months
Subset 2: the applicants where Duration > 12 months

Clean the subsets if necessary.
[5 marks]

2. For each subset, establish a training set and validation set. Explain:
a. what principle you have used to decide on these;
b. why both training and validation sets are needed;
c. any issues encountered during the splitting exercise.
[5 marks]

3. For each training set choose four variables which are suitable for building a
scorecard. For each training set the variables must have (i) at least one continuous
variable before binning; (ii) at least one categorical variable with more than two
categories, so you can see whether categories can be combined.

Explain the rationale behind your choice of variables (using supporting statistics eg
chi-‐square). Should you be unable to choose variables satisfying the above criteria,
explain the problem you have encountered and the solution you have chosen to
compromise the variable selection.
[10 marks]

4. Using the binary variables obtained from the coarse classification in the above
exercise to build two scorecards for each training set (so, two scorecards for those
applicants with Duration <= 12 months; another two for those with Duration > 12
months), one using linear regression and one using logistic regression.

Note that the file you submit should include, in the Appendix, a table that gives the
binary variables you used, together with the coefficients for those variables
calculated in each regression.
[15 marks]

5. Derive ROC curves for all scorecards using the validation set applicable to each,
showing in detail how sensitivity and specificity have been calculated. Estimate the
Gini coefficient and KS values for each. Explain and comment on your results.
[15 marks]