辅导案例-MATH4021

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

School of Mathematical Sciences
MATH4021 (G14SDS) Statistics Dissertation
2020
Contact details for staff
Code Name Room Email Address
FGB Prof FG Ball C08 [email protected]
CJB Dr CJ Brignell C48 [email protected]
KB Dr K Bharath C50a [email protected]
ID Prof I Dryden B09 [email protected]
CF Dr C Fallaize B05 [email protected]
KS Dr K Severn B51 katie.severn @nottingham.ac.uk
TK Dr T Kypraios C23 [email protected]
HL Prof H Le B13 [email protected]
PO Prof P O’Neill B48 [email protected]
SPP Dr SP Preston B37 [email protected]
DS Dr D Sirl B20 [email protected]
Statistics Dissertation MATH4021 (G14SDS) Project Booklet 2020
Timetable
Project Book issued on Moodle Tuesday 5 May 2020
Student’s own project description deadline Tuesday May 12 2020
Submission of preferences Wednesday 20 May 2020 – 12.00 noon
Projects allocated to students Wednesday 3 June 2020
Work begins on project Monday 8 June 2020
Regular meetings with supervisor commence w/c 22 June 2020
Progress Report deadline Friday 10 July 2020 – 3.00 pm
Dissertation deadline Friday 4 September 2020 – 3.00 pm
Education Aims
The purpose of this module is to broaden and deepen the student’s knowledge and
understanding of mathematics by carrying out a detailed and substantial investigation.
Students will acquire knowledge and skills of relevance to a professional and/or
researcher working in an area where mathematics plays a major role. Research
experience will be broadened considerably by undertaking this independent but
supervised work, and summarising the analysis and findings in a written dissertation.
Procedure
The dissertation booklet contains a whole variety of topic areas that are available to
you. You should read through the booklet thoroughly and decide which topics you
would most like to do a project in. Each topic has a staff member to contact for more
information about the topic (see the front page for the staff initials). You will then be
required to submit your topic choices in order of preference on the electronic form for
which a link will be provided. We will then allocate projects and supervisors, such that
as many students as possible are able to do projects in one of their top topics of choice.
Note that your supervisor may be different to the information contact listed for the
topic. You will be informed of your supervisor and which project you have been
allocated to by email on Wednesday 3 June 2020.
You may suggest your own project description, although this is not generally
recommended. The arrangements for proposing your own project description were
circulated previously. They are repeated here for completeness. If you wish to suggest
your own project description, you should first find a member of staff who is prepared to
supervise your project and then write a brief description of the project, and finally
submit the description to the MATH4021 Convenor by Tuesday 12 May 2020 for
approval.

Your description should give a clear outline of the project. You should attach to the
description a written note by the prospective supervisor, affirming that he or she is
prepared to supervise the project. The MATH4021 Convenor will inform you whether or
not your project description is approved.
If you have suggested your own project, and it has been approved, then you should
indicate the project as your first choice on the electronic project choice form. In that
case you do not need to specify any other choices. If your description is not approved,
you should complete the project choice form as usual.
Work on the dissertation will begin after the Spring Semester exams. The first two
weeks are intended for you to conduct preliminary reading, review the literature and
begin some initial work as appropriate to the project, based on materials suggested by
your supervisor. You should then begin regular meetings with your supervisor from w/c
22nd June, and you should be prepared for your first meeting based on your initial
reading. From that point onwards until the completion of the project, you should aim to
meet your supervisor once a week in most weeks, depending on staff availability. It is
likely that the supervisor and you will agree a mutually convenient time to meet each
week. Supervisions should last for approximately half an hour.
If you are unable to contact your supervisor over several weeks and do not have an
appointment booked in the near future, then please contact the MATH4021 Course
Director, Professor Huiling Le.
Role of the Supervisor
The supervisor will introduce the project and the main objectives, but you are expected
to take ownership of the project and drive it in directions of your choosing. The
supervisor’s role is to help your progress by looking through and commenting on your
ideas. They may suggest books and papers for you to read and will give advice on
tackling the project. He or she will also advise you on writing the final report.
However, the project is your project and your supervisor is not expected to set detailed
week-by-week tasks, solve the problems or write the project up.
Computing Resources
It is anticipated that the majority of students will be able to access computing software
which may be required for some projects, e.g. R, Matlab, C++ and install them on their
local machines. These can also be accessed online via https://matlab.mathworks.com/ (for
Matlab) and colcalc.com (R, C++, Python). If you have any concerns about access to
such resources, please contact your course director immediately. If appropriate,
learning to use such resources will form part of the assessment.
Progress Report and Feedback
You must submit a progress report via the submission link on Moodle by the deadline
shown above. This report is primarily for your own benefit, to give you an indication of
how your work is progressing. It should be word processed and approximately three
sides of A4 in length and it should contain a statement of the topic under investigation,
a brief summary of progress so far and a plan for the rest of the project. You will find
that writing this report concentrates your mind on what you have done and what you
still have to do.
Your supervisor will give you verbal feedback and brief written comments on the
progress report, normally at your next meeting. They will also give the report a mark
on a three-point scale. However, this mark will not contribute to the module mark.
Assessment
The project is assessed by a written final report. The report is required to be word
processed, including equations, with software of your choice. We recommend using to
learn LaTex (see below) to write your report, but you do not have to use LaTex. Unless
the mathematical content strongly justifies otherwise, the length should normally be
between 60 and 90 pages in total, including any appendices, in double-spaced 12-point
font with appropriate margins.
The dissertation is worth 60 credits. You are expected to put into the project an
amount of work comparable to that of taking a semester’s lecture courses. The final
report is your evidence of this work.
You should discuss the plan of your report with your supervisor before you begin to
write it. You are also encouraged to give your supervisor one draft of the report. Your
supervisor is not expected to read the draft in detail but can give general feedback.
Other staff members (such as your Personal Tutor) are not expected to read drafts or
provide other extensive help.
Submission of Final Report
You need to submit a pdf of your report via the submission link on Moodle, on or before
the deadline shown above.
If your final report is submitted late without good cause, your dissertation mark will be
reduced by 5 marks per working day on the standard University scale and if the report
is twenty or more working days late, you will receive a dissertation mark of zero. If,
because of extenuating circumstances, you need extra time to complete your final
report, then you should complete an ECF online, attaching any evidence:
https://www.nottingham.ac.uk/studentservices/contact-us/extcirc-form.aspx
The decisions on such ECF’s will be made by a committee chaired by the Senior Tutor in
the School of Mathematical Science.
In addition to the original copy of the final report, students are required to retain all
files used in the production of the final report in case the assessors request them. This
includes computer code to generate results, and all files and figures required to
produce the written report. These will not be assessed directly, but may be used to
verify the student’s work if the assessors require them. It is the student’s responsibility
to make sure they can supply these files if requested.
Plagiarism
You should be familiar with the MSc Student Handbook guidelines on plagiarism. In
particular, you should:
 Include on the title page of the final report a statement confirming that the work
is your own, apart from the acknowledged references.
 Acknowledge sources properly.
 Avoid extensive paraphrasing from sources.
You should be aware that ignorance of the plagiarism rules is no defence; it is up to
you to find out. If in doubt, consult your supervisor. You are strongly encouraged to
discuss regularly with your supervisor both the form and content of your final report.
The School will use the electronic file of your final report to compare it with
publicly available sources.
This will be done using the TurnitinUK software tool, which is available to students on
the Moodle page (under the University Resources). Please see the following link
TurnItInUK test your text
Assessment Criteria
The components contributing to the module mark are:
63%: Final report mathematical content.
27%: Final report written presentation/introduction.
10%: Student initiative.
The mathematical content and written/introduction are marked independently by your
supervisor and a second assessor.
The student initiative mark is a reward for making progress beyond your supervisor’s
suggestions. You are however entitled to help from your supervisor and your
supervisor will in any event expect to be kept up-to-date on your progress. You are
strongly advised to discuss your ideas and progress with your supervisor at all stages of
the work.
Detailed assessment criteria are given at the end of the dissertation booklet.
Feedback on Dissertation
You will receive written feedback on your final report with your provisional marks,
which are emailed to you after the MSc Board of Examiners meeting that takes place in
mid to late October.
English Language Courses
The University of Nottingham runs courses on English through the Centre for English
Language Education (CELE), see their website http://www.nottingham.ac.uk/CELE/
These courses are free for overseas students whose native language is not English.
Information about participating has been circulated previously. All students whose first
language is NOT English are obliged to participate unless they have written permission
from their Course Director to be excluded from attending.
LaTeX
Training materials for using the mathematical typesetting system LaTeX will be
provided.
Use of LaTeX is not mandatory, but is recommended.
Dissertation Write-Up Presentation
A short presentation, on Dissertation writing, will be provided to all MSc Students.
Reassessment Arrangements
Students who do not pass their Dissertation will be allowed to resubmit their final
report within 12 weeks of notification of the failure by the University. Such students
will be provided with written feedback on their first final report, together with a written
statement from their Supervisor, advising areas of improvement. Students should not
expect to have regular supervision meetings or feedback from their Supervisor.
Students are entitled to a single meeting with their Supervisor to discuss how they
should improve their final report.
Research Conduct and Ethics
All students and supervisors should be aware of and follow the University's Code of
Research Conduct and Research Ethics.
If you have any queries over ethics or research governance issues, please contact the
School Ethics Officer – see MSc Student Handbook.
Further Questions
For further questions, contact your supervisor. If this is not appropriate, contact the
MATH4021 Course Director, Professor Huiling Le.
MSc Project descriptions (with codes and initials of the information contact)
P1 Bootstrap. Contact: CJB
The first bootstrap method was introduced by Bradley Efron in the late 1970s. Since
then, it has become an important and widely-used tool for simulation-based inference
and has seen many developments and extensions for use in many areas of statistics.
One very important application of the bootstrap is the construction of confidence
intervals for a parameter when the (asymptotic) distribution of its estimator is difficult
to establish theoretically. Projects will first review the basic principles behind the
bootstrap. Possible directions then include a deep study of established bootstrap
techniques, investigating extensions of the bootstrap or implementing bootstrap
techniques for particular classes of models such as time series models. Projects can
focus on theory and/or computational investigations using simulations and real data.
P2 Statistical machine learning. Contact: SPP
There has been great interest in recent years in developments in Machine Learning.
This project will investigate a selection of one or more topics within machine learning,
including, but not restricted to: boosting; bagging; tree-based methods; support vector
machines; kernel methods; Gaussian process models; generalisations of principal
component analysis; deep learning. Students will use software (e.g. R) to investigate
the techniques studied, applying the methods to real and/or simulated data.
References (both freely available online):
 Hastie, Tibshirani and Friedman (2008). The Elements of Statistical Learning.
 James, Witten, Hastie and Tibshirani (2013). An Introduction to Statistical
Learning with Applications in R.
P3 Multiple testing procedures. Contact: CJB
In a standard hypothesis test we control the probability of a "false positive", rejecting
the null hypothesis when it is true, by setting the significance level to, say, 0.05.
Frequently in science, however, we wish to perform multiple hypothesis tests. We
would expect the null hypothesis to be rejected 5 out of 100 times even when it is
always true which could be viewed as unacceptable.
Many methods have been developed for correcting the significance level in the presence
of multiple, sometimes correlated, hypothesis tests. In this project students will carry
out a simulation study to investigate different methods for controlling the familywise
error rate (FWER) and false discovery rate (FDR). The aim is to compare the methods
as the number of tests, the correlation of the tests and the proportion of true null
hypotheses change. Different test statistics can also be examined. Students will be
required to code the required methods in R.
P4 Statistical Shape analysis. Contact: ID
The shape of an object can be described as the information which remains when the
object undergoes some form of transformation. For example, if an object is rotated and
translated, the shape is still the same. Given a sample of objects, statistical questions
include estimating the mean shape, describing shape variation in a population, and
testing for shape differences between populations. However, traditional statistical
techniques (such as those from multivariate analysis) are not directly applicable, since
inferences need to be invariant to certain transformations of the data and independent
of the particular choice of reference coordinates.
Appropriate methods for describing shapes statistically will first be reviewed. Possible
projects could then focus on statistical models for shape data, computational methods
for shape inference, or analysis of real data sets such as shapes of molecular data.
Reference: Dryden and Mardia (2016). Statistical Shape Analysis, 2nd edition. Available
electronically through the UoN Library.
P5 Theoretical topics in epidemic modelling. Contact: FGB
The current coronavirus epidemic highlights the need for mathematical models of
epidemics. There is a large literature of such models. Key questions addressed by the
models include: can an epidemic with few initial infectives take-off and lead to a large
outbreak? (This is where the parameter R0 comes in.) If so, what is the probability of a
large outbreak occurring and what will it look like? For example, how many people will
be infected and how long will the epidemic last? Related matters include the effects of
(a) mitigation strategies, such as vaccination prior to an epidemic and lockdown during
an epidemic, and (b) population structure. Projects in this area involve investigating
these issues within the framework of stochastic models of epidemics (see, for example,
Andersson and Britton (2000)). They are likely to involve extensive computing, using R
or Matlab, and some previous knowledge of stochastic processes (or willingness to learn
quickly) is essential.
Example projects include: Epidemics among a population of households, which is
concerned with stochastic models for the spread of an epidemic among a community of
households, in which individuals mix uniformly within households and, in addition,
uniformly at a much lower rate within the population at large (see Ball and Lyne
(2006)); Stochastic modelling of endemic diseases (ie: diseases that persist in a
community); Epidemics on random networks, (see, for example, Newman (2002)).
References
Anderson R M, Jackson H C, May R M and Smith A M (1981) Population dynamics of fox
rabies in Europe. Nature 289, 765-770.
Andersson H and Britton T (2000) Stochastic Epidemic Models and Their Statistical
Analysis. Springer . (Available online - google the title).
Ball F G and Lyne O D (2006) Optimal vaccination schemes for epidemics among a
population of households, with application to variola minor in Brazil. Statistical Methods
in Medical Research 15, 481-497.
Newman M E J (2002) Spread of epidemic disease on networks. Physical Review E
66.016128

P6 Analysis of infectious disease data. Contact: PO
The COVID-19 pandemic demonstrates the impact that infectious diseases can have on
society. This area is concerned with learning about, and applying, methods for
analysing data from outbreaks of infectious disease. Many of the methods involve
developing disease transmission models and then fitting them to data.

Example projects include: Analysing time-dependent data from disease outbreaks;
Analysing data from household studies of disease; Analysing data of the COVID-19
outbreak.
P7 Advanced simulation-based inference methods. Contact: TK
The past 30 years has seen an explosion in advanced computational methods for
simulation-based inference. This has resulted in powerful techniques for performing
inference for complex models, in both frequentist and Bayesian settings. These
techniques include, but are not restricted to, advanced Markov chain Monte Carlo
methods, importance sampling, sequential Monte Carlo and approximate Bayesian
computation. Projects will review and implement some of these methods, investigate
performance using simulations, and/or apply the methods to perform inference using
real data.
P8 Advanced Stochastic Processes. Contact: DS
Stochastic process models are widely used in many areas of science including Biology,
Physics, Finance, Epidemiology, etc. Projects in this area are concerned with learning
about some specific stochastic process models, including the underlying mathematical
theory, and possible areas of application. These projects typically have scope for
including some numerical/computational work to accompany and illustrate general
theory and/or specific applications. Example projects include Continuous-time Markov
chains and Stochastic models in genetics.
P9 Functional data analysis. Contact: KS
Functional data analysis (FDA) deals with, roughly speaking, data living in an infinite-
dimensional vector space, for instance the space of continuous functions. In practice,
function values are only observed at a finite set of points, and hence FDA can be
viewed as a natural extension of multivariate statistical techniques when there is an
underlying ordering (for eg. over time) of the data collection procedure. However,
appropriate adjustments need to be made to account for the infinite-dimensional nature
of the true underlying functions.
Projects will first review mathematical representations of functional data and
descriptive statistical procedures on functional data. Possible directions could then
involve studying the alignment of functions, models for functional data, and/or analysis
of real functional data sets.
P10 Networks and graphs. Contact: ID
The statistical analysis of networks is of interest in many fields of study, including in
neuroscience, social science, and corpus linguistics. A network can be represented by a
graph consisting of nodes and edges, and it can be represented mathematically by
certain types of matrices. The dissertation will consider models and methods for the
analysis of networks. Different types of networks could be explored, including random
graphs, small-world networks, and preferential attachment models. Practical analysis of
network data will be carried out using R.
Projects could focus on, for example: The statistics analysis of social network data,
such as twitter; Probability on networks; Networks in corpus linguistics; Brain networks
in neuroscience
References:
Kolaczyk, E. (2009). Statistical Analysis of Network Data, Springer, New York.
Csárdi, G. and Kolaczyk, E. (2014). Statistical Analysis of Network Data with R,
Springer, New York.
P11 Statistical analysis of the National Lottery. Contact: FGB
The National Lottery raises a number of interesting statistical and probabilistic
questions. For example, are the balls drawn by the lottery machinery truly random? Is
it surprising that in one week there were 133 jackpot winners? Do the punters choose
their numbers uniformly at random? If not, in what ways are the punters' choices non-
random? This project will involve investigating these and related questions. In
particular, a variety of tests of whether the machinery draws balls fairly at random will
be developed and run on data from the UK National Lottery; weekly data on the
numbers of winners in the UK National Lottery will be used to explore whether or not
punters are choosing their numbers randomly; and models for number combinations
chosen by punters will be investigated. Essential background reading is Haigh (1997).
The project will involve extensive computing, using Matlab or R.
Reference:
Haigh J (1997) The Statistics of the national Lottery. J R Statist Soc A 160, 187-206.
P12 Statistical models in mathematical biology. Contact: CF
Statistics and stochastic processes are used extensively in many different areas of
mathematical medicine and biology. For instance, dynamical models involving systems
of differential equations are used to describe mechanisms such as the progress of
medicines in the body, immune systems and metabolic pathways, and tasks such as
model selection and parameter estimation need to be performed to make conclusions
regarding the biological questions of interest. Dynamic stochastic models are used to
describe the evolutions of populations and ecosystems over time. Statistical models are
employed to make inferences based on experimental data.
Example projects include statistical inference for systems of differential equations,
analysis of white matter data, simulation-based inference for biological models and
analysis of real experimental data sets.
P13 Contemporary Regression Methods. Contact: TK
Regression methods remain one of the most widely used tools in statistics, machine
learning and data science. This area is concerned with learning about, and applying
contemporary regression methods to scenarios when traditional linear regression
methods are inadequate (e.g. for example, when the effects of covariate is not linear
and/or when the number of covariates is much larger than the number of
observations).
Projects will investigate a selection of one or more of these methods, including, but not
restricted to: Non-parametric regression; regularised and sparse regression; High-
dimensional regression; Measurement error models; Local linear regression; Quantile
regression; Linear mixed-effects models; Bayesian regression; Partial least squares
regression
P14 Uncertainty Quantification. Contact: KB
Adopting a data-centric modelling approach to real-world problems requires dealing
with various sources of uncertainty: measurement errors in data; incomplete
knowledge of process generating the data; incomplete knowledge of values of model
parameters. Statistical modelling deals with explaining such uncertainties using only
the data and probability theory; mechanistic and physical modelling deals with the
uncertainties by developing dynamic equations based on various factors/variables
affecting the system under consideration. Uncertainty Quantification (UQ) is a marriage
of the two approaches, and provides a unified treatment of the various sources of
uncertainties.
Example projects include: Gaussian process regression models; nonlinear regression
models; UQ for predator-prey models (e.g. Lotka-Volterra); UQ for simple epidemic
models; more generally, UQ for ODE/PDE-based models arising in several applications.
Each project will involve computing in R or Matlab.
Reference material:
 ‘Nonlinear regression with R' by Ritz and Streibig, available online through UoN
library.
 ‘Gaussian processes for machine learning’ by Rasmussen and Williams,
available online on author’s website.
 ‘Dynamic Data Analysis’ by Ramsay and Hooker.
P15 Applied Statistical Modelling. Contact: CF
In practice, statistical work involves analysis of real data in a very wide variety of
disciplines and in collaboration with stakeholders such as employers or collaborators
from other academic disciplines in inter-disciplinary research projects. The aim of these
projects is to give experience of such statistical modelling work. The focus is on a
comprehensive analysis of real datasets rather than a study of prescribed methods. The
primary objective is to answer the questions of interest, as would be the case in applied
statistical work. Projects will involve identifying and applying appropriate techniques
and providing a thorough account of the analysis from the exploratory stage through to
results and conclusions relevant for the “client”. Projects include the analysis of: gene
expression data; energy consumption trends; data from biological experiments;
pairwise ranking data.
P16 Image Analysis. Contact: SPP
Modern statistical work increasingly involves the analysis of complex-structured data.
One such example is images, for example images collected from satellites or medical
imaging data. Images are often represented as pixel arrays, with a numerical value
recorded at each pixel on some scale (e.g. RGB value or grayscale intensity), and there
is additional structure given by the spatial structure (correlation of nearby pixels).
Images are usually noisy (pixel values are recorded with error) and hence statistical
methods are required to distinguish signals in the images from noise. Tasks include
processing (denoising) the raw images, classification of images and recognising objects
in images. Projects may focus more on an in-depth study of probabilistic models for
achieving these tasks, or on practical analysis of real image data.
P17 Statistics and Number Theory. Contact: CF.
Classical Number Theory is mainly the study of integers, prime numbers,
divisibility and other fundamental properties of integers and integer
arithmetic. Modern approaches to number theory make use of geometry,
combinatorics, probability theory and analysis.
Although integers like 1,2,3,... are very deterministic objects
there are still many interesting statistical properties arising when,
for instance, we are looking at primes and prime factors. As an example,
the number of prime factors of n (counted with multiplicity), was shown to follow a
normal distribution with mean and ''variance” log log n. The Poisson distribution plays
an important part in number theory, and for instance the sequence of primes can
be viewed in a certain sense as a Poisson process. Another example is the Riemann
zeta function, whose non-trivial zeroes (or rather their spacings)
are conjectured to follow a particular distribution. This conjecture is mainly
based on computer experiments and partial results about moments and
pair correlations.
The main goals of these projects would be to formulate “reasonable” statistical
hypotheses in the context of number theory and then verify such hypotheses by
analysing experimental data (either given or computed by yourself). Examples
of objects you could study include the number of distinct prime factors, the distribution
of prime numbers, and zeroes/values of the Riemann zeta function. Depending on your
interest the focus can be on statistical models, computing or even number theory.
Students should be prepared to learn a little number theory to understand the context,
but the projects are about the use of statistical methods to provide insight to
conjectures in number theory from the data available. Guidance in the number theory
required will be provided.

P18 Statistical Analysis of Random Matrices. Contact: CF.
This topic is concerned with the situation where the basic unit of observation is a
matrix, and this matrix is a random quantity. Thus, the elements of the matrix are
random variables, and in many respects, the study of random matrices is a type of
multivariate data analysis which accounts for the matrix structure of the data. Indeed,
the basic data matrix used in multivariate data analysis is the random matrix obtained
by collecting all the data vectors together in a matrix. The Wishart distribution is a
classical model for describing random matrices, and statistical properties of quantities
derived from matrices (such as eigenvalues and eigenvectors) can be derived.
Projects in this topic will look at statistical models for random matrices and use them to
analyse output from mathematical models and/or real statistical data. The statistical
analysis of such output/data allows for very powerful insights into mathematical
problems where exact, analytical solutions are not possible. For example, the statistical
properties of eigenvalues and eigenfunctions resulting from approximations to
equations used in mathematical physics and the study of waves are extremely useful
for understanding the underlying mathematical problem.
P19 Mathematical Finance. Contact: HL
Many probability and statistical models and methods have been used in finance. For
example, mean and variance analysis is a common tool used in effective portfolio
selection, and stochastic calculus is the main mathematical language used in financial
engineering. The projects in this group will study and investigate the theories behind
some well-known results. Students may also use R or other software to fit models to
historical stock price data to, for example, analyse their goodness-of-fit or to draw
comparisons between different investment strategies.
Examples of projects include but are not limited to: portfolio management; arbitrage-
free pricing of options under the Black-Scholes model; Discrete approximation for
American put options using dynamic programming.
Capital One Projects
The following two projects are offered in conjunction with Capital One and are available
on a selective basis only. If you would like to be considered for selection for either of
these projects, please email the course director, Prof Huiling Le, by May 12 2020. You
will then be notified by Friday May 15 whether you have been selected for the project.
If you are not selected, you must then complete the electronic choice form as normal,
choosing from the topics listed above.
P20 Thompson Sampling. Contact: CJB
Key Areas: Statistical Testing, Design of Experiments, Bayesian Statistics,
Computationally Intensive Methods, Decision Theory
Description/Questions to get started:
Thompson Sampling is a relatively simple method of Bayesian testing which can
produce efficient results in certain situations - particularly where responses are
immediate and where cell allocation is sequential and dynamic. The project should
consist of a review of existing techniques (with benefits and drawbacks), but then is
relatively open-ended.
Further areas for exploration could be:
 Decision Theory:
 What is the best way to size a test upfront using this framework?
 How do you make a decision based on the outcome of the test?
 Practicalities of applying the techniques:
 What's the best way of accounting for tests where cell allocation is made
in sequential batches? (e.g. direct mail vs email)
 What happens when priors can't be updated immediately (or are
updated collectively "at the end of the day")? What's the efficiency trade
off here, and is there a solution?
 Experimental Design:
 Which techniques can best account for large numbers of test cells?
 Bayesian Statistics:
 How does a "vanilla" Thompson Sampling algorithm (say, using Beta-
Binomial priors) stack up against TS operating on a Bayesian model (as
opposed to a single distribution)?
Suggested Articles
Problem Introduction
 A Tutorial on Thompson
Sampling https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf
 Solving Multi-Armed bandits: A comparison of epsilon-greedy and Thompson
sampling https://towardsdatascience.com/solving-multiarmed-bandits-a-
comparison-of-epsilon-greedy-and-thompson-sampling-d97167ca9a50
Research
 Learning the distribution with the largest mean: two bandit
frameworks https://hal.archives-ouvertes.fr/hal-01449822/document
 Simple Bayesian Algorithms for Best Arm
Identification https://arxiv.org/abs/1602.08448
 Fixed-Confidence Guarantees for Bayesian Best-Arm
Identification https://arxiv.org/abs/1910.10945
 Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits https://
arxiv.org/abs/1210.4862
 Bridging the gap between regret minimization and best arm identification, with
application to A/B tests https://arxiv.org/abs/1810.04088
This project is available on a selective basis only. If you would like to be
considered for selection for this project, please email the course director, Prof Huiling
Le, by May 12 2020.
P21 Effectiveness of Counterfactual Explanations in Interpreting Credit
Scoring Models. Contact: CJB
Project description
For a company operating in the financial services sector, rigorous justification standards
for credit and loan decisions largely dictate the need to use fully transparent and easily
understandable AI decision-support systems.
Explainable AI is one of the most discussed topics right now, and regulators are gearing
up to give advice to the industry on issues such as model interpretability and data
ethics. The ICO and The Alan Turing Institute have published guidance, which aims to
give organisations practical advice to help explain the processes, services and decisions
delivered or assisted by AI, to the individuals affected by them.
One set of specific guidelines is around supplementary explanation tools to produce
meaningful information about an AI system’s results - specifically, using counterfactual
tools to explore alternative possibilities and actionable recourse.
Counterfactual explanations are of special interest to us because they could help our
potential customers: for example, counterfactual explanations could be used to give
customers an explanation of why their application for a credit card was rejected, and
what specific actions they could take to make them more likely to be accepted.
Project goals
This project is intended to answer the question: how effective are counterfactual
explanations in interpreting GBM models and what actions could be taken to help
customers using these explanations?
This project has an open-ended nature, with different avenues to explore:
 Implement a counterfactual explanation framework
 As the method is model-agnostic, it does not matter what specific model
you are basing this around - however, we are particularly interested in
tree-based models such as random forests and Gradient Boosting
Machines.
 The underlying data should be a freely available credit scoring dataset,
with the target outcome variable being a credit risk outcome such as
default at a particular statement - e.g. the Lending Club Loan Data on
Kaggle ( https://www.kaggle.com/wendykan/lending-club-loan-data)
 Explore how the counterfactual explanation framework can be used to interpret
the model, with different stakeholders in mind:
 How do counterfactual explanations let us improve the predictive
performance of models and develop more parsimonious models?
 How do counterfactual explanations lead to explanations that customers
can understand?
 What are the risks with using counterfactual explanations - what are the
downsides?
 Can counterfactual explanations be used in combination with a different
explanation approach to counteract these downsides?
References:
 ICO and The Turing consultation on Explaining AI decisions guidance
- https://ico.org.uk/about-the-ico/ico-and-stakeholder-consultations/ico-and-
the-turing-consultation-on-explaining-ai-decisions-guidance/
 Interpretable Machine Learning, ‘Counterfactual Explanations’
- https://christophm.github.io/interpretable-ml-book/counterfactual.html
 Kusner, M. J., Loftus, J., Russell, C., & Silva, R. (2017). Counterfactual fairness.
In Advances in Neural Information Processing Systems (pp. 4066-
4076). http://papers.nips.cc/paper/6995-counterfactual-fairness.pdf
 Ustun, B., Spangher, A., & Liu, Y. (2019). Actionable recourse in linear
classification. In Proceedings of the Conference on Fairness, Accountability, and
Transparency(pp. 10-19). ACM. https://arxiv.org/pdf/1809.06514.pdf
 Evaluate recourse in linear classification models in python: https://github.com/
ustunb/actionable-recourse
This project is available on a selective basis only. If you would like to be
considered for selection for this project, please email the course director, Prof Huiling
Le, by May 12 2020.
MATH4021 (G14SDS)
Student final report assessment guidelines
When marking the student’s written final report, the supervisor and second assessor
are expected to return marks, with the indicated weightings, under the categories of
 Written presentation / Introduction (30%)
 Statistical content (70%) comprising

% Category Comments
10% Technical
background
Use and interpretation of background statistical/
probabilistic concepts;
if appropriate, use of a range of given reference materials and
of further reading.
20% Depth & progress (Written) Evidence of an appropriate amount of progress
made.
20% Understanding &
interpretation
Logical development and selection of relevant material;
Critical appraisal/interpretation of results in the context of the
dissertation;
Demonstration of deep understanding and interpretation of all
aspects of the material presented.
20% Development &
results
Use of specific statistical/probabilistic techniques and/or
computing/statistical/mathematical packages (as appropriate)
to develop models and/or produce “results”.
Furthermore the Supervisor has to return a mark for
 Student initiative
Marks for each category should be provided on the University scale.
More detailed guidance for the individual categories and the overall mark is given
below.
Written presentation / Introduction (30%)
The written presentation is assessed with equal weights by Introduction, Structure
and Clarity and marked on the University scale. The relative weights of the elements
within “Structure” and within “Clarity” depend on the topic and will be determined by
the assessors.
Structure
1. Abstract
 States the topic(s) under investigation and main results or conclusions. Also
methods or approaches where appropriate.
 Informative, self-contained and concise.
2. Sectioning
 Report broken into sections of digestible length, using subsections and
appendices where appropriate.
 Text within sections appropriately broken into paragraphs.
 Sectioning effective in signposting the main aspects of the work.
3. Conclusions
 Summary of the results.
 Conclusions, with reflection and critical analysis.
 Results interpreted in the context of the project objectives where appropriate.
 Directions for further work indicated.
4. Other elements, expected to include:
 Title corresponds to content.
 Title page states author, date, University of Nottingham, module code and title
 Plagiarism disclaimer.
 Table of contents.
 Equations, figures, tables etc adequately and accurately cross-referenced.
 Captions to figures and tables appropriately informative.
 References, when appropriate for the topic, collected into a bibliography with a
standard format. Number and selection of the references and the manner of
citing references in the text appropriate for the topic.
0 – 29 Insufficient structure.
30 – 39 Rudimentary structure in place.
40 – 49 Basic structure in place.
50 – 59 Most structural elements in place.
60 – 69 All or almost all structural elements in place. Structure effective in signposting
the technical content.
70 – 79 Excellent, easy-to-follow structure. Evidence of self-confidence and
independence in the structural choices.
80+ Outstandingly well-structured report with substantial evidence of self-confidence
and independence in structural choices.
Clarity
1. How accurately and concisely does the text convey the meaning?
2. Appropriate use of technical versus non-technical style.
3. Grammar, punctuation and spelling. Proof-reading.
4. Technical production, including:
 Layout and typesetting: fonts, spacing, page breaks; page numbering; equation
typesetting.
 Technical production of figures, diagrams and tables where appropriate.
0 – 29 Very little of the report can be followed.
30 – 39 Some parts of the reports can be followed.
40 – 49 A substantial part of the report can be followed.
50 – 59 Report readable without difficulty. Technical production is accurate and
conforms to the subject standards.
60 – 69 Report easy to read and in appropriate style. Technical production effective
in signposting the content.
70 – 79 Report outstandingly clear. Originality in style and technical production if
appropriate.
80+ Report exceptionally clear with substantial evidence of originality in style and
production.
Introduction
The introduction should include the following components:-
 Background and context of the work;
 Aim of the work;
 Overview of the work;
 Statement of main achievements, results or conclusions;
 Structure of the report outlined.
A good introduction will cover the above points in a clear and concise manner with an
amount of depth appropriate to the topic.
After reading the Introduction, a reader (perhaps the external examiner) should be able
to answer the following questions:-
(i) What is the background to the project?
(ii) What is the aim of the work?
(iii) What general methods/techniques have been used?
(iv) What are the main achievements, results or conclusions?
(v) Where in the report can details of each of the main achievements etc be
found?
0 – 29 None of the above questions (i)-(v) is answered.
30 – 39
The student shows a rudimentary grasp of the requirement of an
Introduction. The reader will obtain rudimentary answers to some of the
questions (i)–(v) which should go beyond that of a lay person.
40 – 49
The student shows a basic grasp of the requirements of an Introduction.
The reader will obtain basic answers to some of the questions (i)–(v).
50 – 59
The student shows a coherent grasp of the requirements of an Introduction.
Most of the five components are present.
The reader will obtain satisfactory answers to some of the questions (i)–(v).
60 – 69
The student shows an assured grasp of the requirements of an Introduction.
All five components are present.
The reader will obtain satisfactory answers to most of the questions (i)–(v).
70 – 79
The student shows a full grasp of the requirements of an Introduction.
All five components have been clearly and concisely covered to an
appropriate depth.
The reader will obtain clear and comprehensive answers to questions (i)–
(v).
80+
The student shows a full and substantial grasp of the requirements of an
Introduction. All five components have been covered outstandingly well and
in depth. The reader will obtain outstandingly clear and comprehensive
answers to questions (i)–(v).

Technical background (10%)
Is there evidence of the use and interpretation of appropriate background
statistical/probabilistic concepts?
Is there evidence, if appropriate, of the use of a range of given reference material and
further reading at an appropriate level?
Has the student introduced, explained and used correctly technical terms and methods
appropriate to the topic?
0 – 29 Lack of evidence.
30 – 39 Rudimentary evidence, which should still go beyond that of a lay person.
40 – 49 Basic evidence.
50 – 59 Moderate evidence.
60 – 69 Significant evidence.
70 – 79 Substantial evidence.
80+ Exceptional evidence.
Depth and progress (20%)
Has the student made appropriate progress bearing in mind level, breadth and depth?
0 – 29 Insufficient progress.
30 – 39 Minimal but still detectable progress made.
40 – 49 Little progress made.
50 – 59 Moderate progress made.
60 – 69 Significant progress made.
70 – 79 Substantial progress made.
80+ Substantial progress made showing considerable insight and for originality.
Understanding and interpretation (20%)
Is there evidence of logical development and selection of relevant material?
Does the student demonstrate understanding and interpretation of all aspects of the
material?
Is there evidence in the report that the student understands and can interpret what
they have done, by the use of suitable illustrative examples or other means?
0 – 29 Lack of evidence.
30 – 39 Rudimentary evidence, which should still go beyond that of a lay person.
40 – 49 Basic evidence.
50 – 59 Moderate evidence.
60 – 69 Significant evidence.
70 – 79 Substantial evidence.
80+ Exceptional evidence showing an outstanding level of understanding.
Development and results (20%)
Is there evidence of the use of specific statistical/probabilistic techniques and/or
computing/statistical/mathematical packages (as appropriate) to produce “results”?
Has the student obtained sufficient results bearing in mind the nature of the topic?
Have the aims of the project/dissertation been met?
Is there evidence that the student has done more than simply regurgitated reference
material?
0 – 29 Lack of evidence.
30 – 39 Rudimentary evidence.
40 – 49 Basic evidence.
50 – 59 Moderate evidence.
60 – 69 Significant evidence.
70 – 79 Substantial evidence
80+ Exceptional evidence.
Overall mark
When marking projects/dissertations it is often useful to think to oneself, what would
have to be done to raise the overall mark by, say 5 marks or into the next class. For
example, suppose that the overall mark is 67, what could the student have done which
would have raised the mark to 70? The same applies to each category.
Another way of viewing the overall mark is by the use of “buzzwords/phrases” to reflect
the different categories of marks.
The report shows that the
student’s understanding and
development of the topic is
In approaching the
topic the student
shows evidence of
Mistakes
90 – 100 Exemplary Mastery
No substantial errors
and no minor errors
80 – 89 Authoritative Depth and confidence
No substantial errors
but occasional minor
errors
70 – 79 Highly competent Confidence
No substantial but
possibly a few minor
errors
60 – 69 Assured & competent Significant skills
Perhaps one
substantial error
and/or several minor
errors
50 – 59 Coherent & sound Accurate skills
A few substantial
errors as well as
several minor errors
40 – 49 Basically sound Basic skills
Some substantial
errors or little
progress
30 - 39 Unsound
Lack of
understanding
Several substantial
errors or little
progress
0 - 29 Non-existent Lack of work
Unsatisfactory
progress
Student Initiative (marked by supervisor only)
The student initiative mark is a measure of student input beyond the supervisor’s
suggestions. The initiative is marked by the supervisor only and is a reward for making
progress beyond the supervisor's suggestions.
The marking should take into account that students are entitled to help from the
supervisor and strongly encouraged to discuss their progress with the supervisor.
Presenting ideas to the supervisor for comment should not count negatively in the
initiative mark.
Tasks where initiative may be shown include normally some of the following, as
appropriate for the topic:
Formulate the questions under investigation.
Find appropriate breadth and depth for the investigation.
Adjust the project in light of progress made
Devise original examples or applications.
Search and study the literature.
Find connections to related areas of statistics/probability and
beyond.
Formulate the conclusions.
Develop a personal viewpoint on the subject.
0 - 39 Negligible initiative.
40 - 49 Attempts at initiative, but with limited success.
50 - 59 Some initiative, with moderate success.
60 - 69 Substantial initiative, mostly with success.
70 - 79 After the initial consultations, the student has produced successful work with
little additional guidance.
80 + After the initial consultations, the student has produced successful work
essentially fully independently.