MA9070 Simulation and Machine Learning Project 2022

1

1 Methods for Asian options (50% of project credit)

1.1 Overview

An Asian option is an option where the payoff is not determined by the underlying price at

maturity, but by the average underlying price over some preset time interval. Asian options

were originated in Asian markets to prevent option traders from attempting to manipulate the

price of the underlying on the exercise date.

There are a variety of Asian options. We will consider one with the following payoff:

max(

1

N

N

∑

n=1

Sn−K,0),

where Sn are daily closing prices of the underlying and K is the fixed strike price. The option

corresponding to this payoff function is called a Fixed Strike Asian Call Option with Discrete

Arithmetic Average.

For the underlying process we will use the geometric Brownian motion

dSt = rStdt+σ(St , t)StdWt , (1)

allowing for the possibility that the volatility can depend on the current time t and current

value of underlying asset St . We refer to this as the local volatility model.

The aim of Part 1 of the project is to price Asian options by Monte-Carlo simulations,

employing different variance reduction techniques.

1.2 Particulars

Unless otherwise specified, use the following parameters:

• The strike price is K = 110.

• The interest rate is r = 0.05.

• The local volatility is given by the function

σ(S, t) = σ0(1+σ1 cos(2πt))(1+σ2 exp(−S/50)). (2)

where σ0 = 0.2, σ1 = 0.3 and σ2 = 0.5. Time t is in years.

• Assume there are 260 (working) days in a year.

• Fix the number of sample paths to be N_paths = 1000.

2

1.3 Computational Tasks

• Programme the local volatility Eq. (2) in a Python function and then write separate

functions to price an Asain option:

– without variance reduction (naive method);

– antithetic variance reduction.

– control variates (see below).

Use Euler time stepping with a time step of one day. Each function should return

option price and variance.

• After you have fully tested your code, compare the different methods you have imple-

mented. For this, fix the time to maturity (expiry) to 3 years, i.e., T = 3. Then price

the option for three values of the spot price S0 = S(t = 0):

S0 < K, S0 = K, S0 > K

You are free to choose sensible values of S0 to give a good assessment of how the

methods are performing under different situations. For each method you have imple-

mented, evaluate the option at three values of S0. From the variances you can obtain

95% confidence intervals for each case.

• Write code to plot option price as a function of spot price over the range S0 = 10

to S0 = 180. You only need to plot the option price using the method that gives the

smallest variance.

• Using a method of your choice, programme a function to compute the delta for the

Asian option. You only need to implement one method, but ideally it should be a

method with small variance. Write code to plot the delta over the same range of spot

prices as the previous item.

1.4 Report contents

See general discussion of report contents in Sec. 3 and 4. The report should follow the

structure of the computational tasks with the aim to produce a report that leads the reader

clearly through the tasks undertaken. The report should summarize the overall picture of how

the various reduction methods perform and the dependence of option prices and deltas on S0.

A few specific things to consider for this part of the project are:

• Your Python code should be commented so that it is clear how you have implemented

each variance reduction technique.

In addition, to make the report understandable independently of the Python code, you

include markdown cells that briefly state which variance reduction methods you have

3

implemented. You do not need to give an analysis of the variance reduction. These can

be short explanations of a few sentences.

• Report the results of your runs for different methods and different S0 in a clear

and understandable form. Discuss the benefits and/or disadvantages of the different

methods. Taking into account additional cost of variance reduction computations,

determine which method is the most efficient for this problem.

• The plots of option price and corresponding delta should be clear. You can be creative

here and plot prices and deltas for a few values of the time to maturity T to show

the evolution with time to maturity. You could also contrast Asian option prices with

the European counterparts. You should summarize and discuss your plots, possibly

including from a financial perspective.

1.5 Control variates

There are three possible control variates one can consider:

1. ZT ,

2. e−rT max(ZT −K,0),

3. max

((

∏Nn=0Zn

) 1

N+1 −K,0

)

,

where Zt is governed by geometry Brownian motion

dZt = rZtdt+σ Zt dWt , (3)

where r and σ are constant.

The volatility in our model varies, but not too much, so one can expect that the discounted

payoff for the Asian option computed along a geometric Brownian path in the local volatility

model will be highly correlated with a corresponding constant-volatility geometric Brownian

path. In practice, one simulates (3) alongside the simulation of (1). From these simulations,

the different control variates depending on Zt are available. A simple choice for σ is σ(S0,0),

(why?). Other choices are possible and might be better.

The first control variate is just the value of ZT at the final time, and hence has a known

expectation (mean), just as was used for European options. The second is the discounted

payoff for a European call option, and hence the expectation is given by the Black Scholes

formula. The final is the discounted payoff for a geometrically averaged Asian option, for

which there is also a formula for the expectation

Z0 exp((rg− r)T )N(d1)−K exp(−rT )N(d2),

4

where

N(·) denotes the cumulative distribution function of the standard normal distribution

σg = σ

√

2N+1

6(N+1)

rg =

1

2

(

r− 1

2

σ2g

)

d1 =

log(Z0/K)+(rg+ 12σ

2

g )T

σg

√

T

d2 = d1−σg

√

T

This part of the project is challenging. You might not succeed at correctly implementing

all three methods. It is strongly recommended that you focus on control variate 1. Only after

you have completed other parts of the project should you attempt the other control variates.

2 Machine Learning: Credit Approval Data (50% of project

credit)

2.1 Overview

A popular use of machine learning is predicting credit risk. This talk1 by Soledad Galli

of Zopa provides an excellent overview of the steps and procedures involved in an actual

deployment. While it would be far too much to attack all these steps in this project, we will

consider a limited set of tasks using a pre-processed dataset for credit card approvals.

The aim of Part 2 of the project is to train, test, and evaluate the performance of different

classifiers in predicting credit card approval.

2.2 Particulars

A popular dataset used to examine machine learning classifiers is the Australian Credit

Approval Dataset2 hosted on the UC Irvine Machine Learning Repository. "This file concerns

credit card applications. All attribute names and values have been changed to meaningless

symbols to protect confidentiality of the data. This dataset is interesting because there is a

good mix of attributes – continuous, nominal with small numbers of values, and nominal

with larger numbers of values. There are also a few missing values."

We consider the dataset with all categorical values replaced by numerical values. Missing

values have been replaced by the mode of the attribute (categorical values) or by the mean of

1https://www.youtube.com/watch?v=KHGGlozsRtA&ab_channel=PyData

2https://archive-beta.ics.uci.edu/ml/datasets/statlog+australian+

credit+approval

5

the attribute (continuous values). The dataset contains 690 examples. The credit card approval

information is contained in the last column and encoded as 0 for “not approved” and +1 for

“approved”. This column is the label vector. The remaining columns contain the features. The

dataset will be posted on my.wbs as a comma-separated-values file: australian.csv along

with a description of the dataset.

Scikit-learn will be used for all machine learning tasks. Pandas and seaborn will be useful

for importing, inspecting and visualizing the data.

2.3 Tasks

• Using pandas, read the dataset and verify that it is sensible. Using pandas and/or

seaborn provide a summary of the dataset. (See "Report contents" below.)

• Extract the design matrix X and vector of labels y from the data. Create a train-test

split. Scale the data appropriately.

• Perform a sanity check of the training data by running a cross validation score of the

SVC classifier with default parameters. Report the mean cross validation score. This

will provide a baseline score of what one can expect from a basic classifier without

any tuning of hyperparameters.

• Now consider the linear and rbf kernels for the SVC classifier and tune the hyperpa-

rameters for the two kernels. Standard tuning of hyperparameters would mean tuning

regularisation parameter C for the linear kernel and C and the scale parameter gamma

for the rbf kernel. You do not need to tune more than these hyperparameters, although

you may consider more if they do not require a large amount of computer time to tune.

Based on mean cross validation scores, decide final hyperparameter values for the two

kernels.

• Test and compare the two classifiers using the tuned hyperparameters. (See "Report

contents" for suggestions on what you might compare.)

• Now consider other classifiers from the scikit-learn library. You must consider the

MLP classifier but should in addition consider the Decision Tree and Random Forest

Classifiers.

For the MPL classifier, you should investigate tuning the hidden layers, but this can

result in large computation times, and you should not leave code in the notebook that

would take long run times. (See "Report contents" below.)

For the Decision Tree, Random Forest, or any other classifiers that you investigate,

you may briefly investigate different hyperparameters.

• Finally, it is possible to investigate which features are most important in determining

the classification. You are encouraged to investigate this. A few useful approaches

6

are to look at permutation_importance in the scikit-learn library. Also, if you run

a Decision Tree with a small depth and output the tree, you can see what features

are important. You might want to use seaborn to visualize the connection between

important features and the label.

2.4 Report contents

See general discussion of report contents in Sec. 3 and 4. The report should follow the

structure of the computational tasks with the aim to produce a report that leads the reader

clearly through the tasks undertaken and then summarizes the overall picture of how the

various classifiers perform and possibly connects this with the structure of the dataset.

A few specific points to consider are:

• After reading the dataset, you need to briefly summarize its contents to the reader

using pandas and/or seaborn. At a minimum you want to use the .describe() method,

but ideally you should include some useful plots.

• The Python code for turning the hyperparameters for the SVC classifier should be

included in your submission. Make sure you explain, or print, or plot results from the

tuning of hyperparameters so that the final choice is clear from reading the report.

While you are strongly encouraged to investigate different choices for hidden layer

in the MLP classifier, there are too many possibilities here for you to include Python

code for this turning in your submission. You should briefly summarize in words in

the report what you tried. The Python code should contain only the final MLP that you

decided. (Other code can stay in as long as it is commented out and does not execute

when the notebook is run.)

For any other classifies you run, please be succinct.

• When evaluating classifiers, you will surely want to generate confusion matrices

and classification reports. Since the goal is to predict credit card approval, false

positives (incorrectly predicting 1) are considered worse the false negatives (incorrectly

predicting 0). This means that the precision of predicting 1 and the recall (sensitivity)

of predicting 0 are especially important.

The complexity of models is also something that can be discussed when comparing

classifiers. This is a relatively small dataset and so there is some danger of overfitting.

7

3 Report Notebooks

Your project work will be reported in two separate JupyterLab notebooks, one for each part

of the project. Each notebook should run without errors and produce your report.

Each notebook should begin with a concise introduction. These can typically be one or

at most two paragraphs and should describe what the notebook contains and/or give some

motivation to the work.

You should:

• Use section headings and possibly horizontal lines to give your report structure.

• Explain to the reader the purpose or goal of each section. Be brief, focusing on what is

being done and why.

• Python code should be commented. You want to communicate concisely at the top of

code cells what task is being performed in the cell. You also need to include comments

for block of code that compute specific tasks. You should assume that the reader

understands Python. Do not comment line-by-line what is obvious.

• Clearly label all plots!

• Explain parameter choices you have made. Describe and interpret your results. It is

important that you interpret your findings. Findings will often be in the form of a

plot. End individual sections and/or whole notebooks with a brief summary of your

findings.

A very useful guide to constructing a clear notebook is the following. Run the notebook

and then collapse all code cells. The introduction, results, plots, and any discussion should

be readable as a short report.

Further points:

• There is no specific guidance for length other than include all the material in the descrip-

tions above. It is better to produce a shorter report that clearly and concisely addresses

all the required points.

– Do Not include numerous non-illustrative plots.

– Do Not explain the Python code line-by-line.

– Do Not include irrelevant material and discussion.

• In developing and testing your codes you will surely need some Python code that does

not belong in your final report. This is normal. However, such things should not be

included in your submitted report. A useful way to approach this is to leave all code in

place until you have a finalised your work. Then remove any code cells unnecessary to

the final report.

8

• It is not necessary to include citations in your report to numerical methods or to

example Python code covered in the module lectures and labs. You are permitted to

use sections of code directly from the examples in the scikit-learn documentation or

Users Guide. If you do, include a simple comment line in the code saying where the

code is from. For example:

# This follows the examples section of the

# sklearn.svm.SVC documentation.

In the unlikely event that you use methods or Python code examples not covered in the

module, then you must cite the source.

• Write in passive voice or used the editorial we (as in “We see that ...”). Do not use

contractions, e.g., “don’t”, “haven’t”, etc.

4 Further details

4.1 Marks

Marks will be awarded for the project in line following the Generic WBS Marking criteria

with technical capability found at the bottom of this page3. Specifically, the criteria are:

• Technical Capability [40%]. This includes using appropriate and correct methods

and algorithms, implementing correct Python coding, and using appropriate external

libraries. Correctly completing all tasks is of primary importance.

• Academic Writing [20%]. Results should not only be accurate, but they must also be

presented in a clear, structured, and understandable form. Plots and other outputs must

be labelled and described. The use of relevant literature; referencing and citation are

not normally significant factors for the project assessment.

• Analysis and Critical evaluation [20%]. WBS considers these to be separate criteria, but

we will consider this to be a single criterion. Results must be interpreted. Justification

must be given for the various choices made in the project work. Both parts of the

project should contain a concise introduction and concise and informative discussion

of the findings.

• Comprehension [20%]. Showing deep knowledge & understanding of the subject

matter and its context. Originality will also be assessed here.

As already emphasised, satisfying these criteria does not require lengthy reports.

3https://my.wbs.ac.uk/-/teaching/216161/resources/in/870142/item/

690223/

9

4.2 Project Submission

• The project must be submitted electronically through my.wbs:

• The submission will consist of a single zip file named uxxxxxxx.zip, where xxxxxxx

are the digits of your University ID. The zip file will contain two Jupyter notebooks,

plus any modules and data needed to run the notebooks, e.g. you should include

australian.csv file.

• The marker should be able to unzip your submission and run each notebook without

error and without any additional input or files.

• Important: before submitting, you should restart the kernel and run all cells in

each notebook. You should then save the notebooks in the run state. This way

your submission contains two notebooks exactly in the state that you last ran

them.

• It is the students’ responsibility to ensure that the zip file is not corrupt.

• Marks will be deducted for not following these procedures.

4.3 Rules and Regulations

This project is to be completed by individuals only and is not a group exercise. Plagiarism is

taken extremely seriously and any student found to be guilty of plagiarism of fellow students

will be severely punished.

4.3.1 Plagiarism

Please ensure that any work submitted by you for assessment has been correctly referenced

as WBS expects all students to demonstrate the highest standards of academic integrity at all

times and treats all cases of poor academic practice and suspected plagiarism very seriously.

You can find information on these matters on my.wbs, in your student handbook and on the

library pages here.

It is important to note that it is not permissible to reuse work which has already been

submitted for credit either at WBS or at another institution (unless explicitly told that you

can do so). This would be considered self-plagiarism and could result in significant mark

reductions.

Upon submission of your assignment, you will be asked to sign a plagiarism declaration.

10

欢迎咨询51作业君