辅导案例-MAT012

UNIVERSITY  OF  CARDIFF  
 
MAT012  Credit  Risk  Scoring  
 
Assignment  2019/20  
 
 
This  forms  your  assessment  (100%)  of  this  module.  
There  are  two  parts  to  this  assessment.  
 
Part  A  contains  THREE  short  essay-­‐based  questions  and  counts  for  50%  of  the  final  mark.  
 
Part  B  contains  FOUR  tasks  to  establish  a  scorecard  using  the  given  dataset  and  counts  for  
50%  of  the  final  mark.  You  may  use  Excel,  SAS,  R  or  Python  to  assist  in  the  scorecard  
preparation.  
 
You  must  answer  ALL  questions.  
 
Submission  must  be  made  by  3pm  on  Friday  20th  March  via  Learning  Central,  and  
instructions  will  follow  shortly  on  how  to  do  this.  You  will  need  to  submit  a  single  file  
containing  answers  to  all  questions;  any  spreadsheet  analysis,  workings  or  coding  necessary  
can  be  shown  in  an  Appendix  in  that  file.  Only  the  submitted  file  will  be  marked.  
 
 
 
PART  A  
 
1.   Critically  examine  what  needs  to  be  considered  when  developing  a  credit  risk  scoring  
model.  
[20  marks]  
 
2.   Explain  how,  in  theory,  Cox’s  proportional  hazard  model  for  survival  analysis  can  be  
used  for  constructing  a  scorecard.  Comment  on  the  relative  popularity  of  Cox’s  PH  
model  versus  logistic  regression  in  scorecard  construction.    
[15  marks]  
 
3.   Provide  a  brief  literature  review  on  the  use  of  Markov  models  in  credit  risk  
modelling,  with  a  particular  focus  on  those  used  in  credit  risk  scoring.  
[15  marks]  
 
   
PART  B  
 
The  dataset  underpinning  the  analysis  here  is  that  used  in  the  lab  sessions  during  lectures.  It  
has  been  uploaded  as  a  spreadsheet  named  ‘German’  together  with  the  data  dictionary  
‘German  data  dictionary’  describing  each  attribute.  You  will  recall  that  the  dataset  consists  
of  data  for  1000  applicants  along  with  a  variable  that  says  whether  they  were  subsequently  
Good  or  Bad  from  a  credit  perspective.  
 
1.   Split  the  dataset  into  two  subsets  as  follows:  
 
Subset  1:  the  applicants  with  Duration  <=  12  months  
Subset  2:  the  applicants  where  Duration  >  12  months  
 
Clean  the  subsets  if  necessary.  
[5  marks]  
 
2.   For  each  subset,  establish  a  training  set  and  validation  set.  Explain:  
a.   what  principle  you  have  used  to  decide  on  these;  
b.   why  both  training  and  validation  sets  are  needed;  
c.   any  issues  encountered  during  the  splitting  exercise.  
[5  marks]  
 
3.   For  each  training  set  choose  four  variables  which  are  suitable  for  building  a  
scorecard.  For  each  training  set  the  variables  must  have  (i)  at  least  one  continuous  
variable  before  binning;  (ii)  at  least  one  categorical  variable  with  more  than  two  
categories,  so  you  can  see  whether  categories  can  be  combined.    
 
Explain  the  rationale  behind  your  choice  of  variables  (using  supporting  statistics  eg  
chi-­‐square).  Should  you  be  unable  to  choose  variables  satisfying  the  above  criteria,  
explain  the  problem  you  have  encountered  and  the  solution  you  have  chosen  to  
compromise  the  variable  selection.  
[10  marks]  
   
4.   Using  the  binary  variables  obtained  from  the  coarse  classification  in  the  above  
exercise  to  build  two  scorecards  for  each  training  set  (so,  two  scorecards  for  those  
applicants  with  Duration  <=  12  months;  another  two  for  those  with  Duration  >  12  
months),  one  using  linear  regression  and  one  using  logistic  regression.    
 
Note  that  the  file  you  submit  should  include,  in  the  Appendix,  a  table  that  gives  the  
binary  variables  you  used,  together  with  the  coefficients  for  those  variables  
calculated  in  each  regression.  
[15  marks]  
 
5.   Derive  ROC  curves  for  all  scorecards  using  the  validation  set  applicable  to  each,  
showing  in  detail  how  sensitivity  and  specificity  have  been  calculated.  Estimate  the  
Gini  coefficient  and  KS  values  for  each.  Explain  and  comment  on  your  results.  
[15  marks]  
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie