MA9070 Simulation and Machine Learning Project 2022 1 1 Methods for Asian options (50% of project credit) 1.1 Overview An Asian option is an option where the payoff is not determined by the underlying price at maturity, but by the average underlying price over some preset time interval. Asian options were originated in Asian markets to prevent option traders from attempting to manipulate the price of the underlying on the exercise date. There are a variety of Asian options. We will consider one with the following payoff: max( 1 N N ∑ n=1 Sn−K,0), where Sn are daily closing prices of the underlying and K is the fixed strike price. The option corresponding to this payoff function is called a Fixed Strike Asian Call Option with Discrete Arithmetic Average. For the underlying process we will use the geometric Brownian motion dSt = rStdt+σ(St , t)StdWt , (1) allowing for the possibility that the volatility can depend on the current time t and current value of underlying asset St . We refer to this as the local volatility model. The aim of Part 1 of the project is to price Asian options by Monte-Carlo simulations, employing different variance reduction techniques. 1.2 Particulars Unless otherwise specified, use the following parameters: • The strike price is K = 110. • The interest rate is r = 0.05. • The local volatility is given by the function σ(S, t) = σ0(1+σ1 cos(2πt))(1+σ2 exp(−S/50)). (2) where σ0 = 0.2, σ1 = 0.3 and σ2 = 0.5. Time t is in years. • Assume there are 260 (working) days in a year. • Fix the number of sample paths to be N_paths = 1000. 2 1.3 Computational Tasks • Programme the local volatility Eq. (2) in a Python function and then write separate functions to price an Asain option: – without variance reduction (naive method); – antithetic variance reduction. – control variates (see below). Use Euler time stepping with a time step of one day. Each function should return option price and variance. • After you have fully tested your code, compare the different methods you have imple- mented. For this, fix the time to maturity (expiry) to 3 years, i.e., T = 3. Then price the option for three values of the spot price S0 = S(t = 0): S0 < K, S0 = K, S0 > K You are free to choose sensible values of S0 to give a good assessment of how the methods are performing under different situations. For each method you have imple- mented, evaluate the option at three values of S0. From the variances you can obtain 95% confidence intervals for each case. • Write code to plot option price as a function of spot price over the range S0 = 10 to S0 = 180. You only need to plot the option price using the method that gives the smallest variance. • Using a method of your choice, programme a function to compute the delta for the Asian option. You only need to implement one method, but ideally it should be a method with small variance. Write code to plot the delta over the same range of spot prices as the previous item. 1.4 Report contents See general discussion of report contents in Sec. 3 and 4. The report should follow the structure of the computational tasks with the aim to produce a report that leads the reader clearly through the tasks undertaken. The report should summarize the overall picture of how the various reduction methods perform and the dependence of option prices and deltas on S0. A few specific things to consider for this part of the project are: • Your Python code should be commented so that it is clear how you have implemented each variance reduction technique. In addition, to make the report understandable independently of the Python code, you include markdown cells that briefly state which variance reduction methods you have 3 implemented. You do not need to give an analysis of the variance reduction. These can be short explanations of a few sentences. • Report the results of your runs for different methods and different S0 in a clear and understandable form. Discuss the benefits and/or disadvantages of the different methods. Taking into account additional cost of variance reduction computations, determine which method is the most efficient for this problem. • The plots of option price and corresponding delta should be clear. You can be creative here and plot prices and deltas for a few values of the time to maturity T to show the evolution with time to maturity. You could also contrast Asian option prices with the European counterparts. You should summarize and discuss your plots, possibly including from a financial perspective. 1.5 Control variates There are three possible control variates one can consider: 1. ZT , 2. e−rT max(ZT −K,0), 3. max (( ∏Nn=0Zn ) 1 N+1 −K,0 ) , where Zt is governed by geometry Brownian motion dZt = rZtdt+σ Zt dWt , (3) where r and σ are constant. The volatility in our model varies, but not too much, so one can expect that the discounted payoff for the Asian option computed along a geometric Brownian path in the local volatility model will be highly correlated with a corresponding constant-volatility geometric Brownian path. In practice, one simulates (3) alongside the simulation of (1). From these simulations, the different control variates depending on Zt are available. A simple choice for σ is σ(S0,0), (why?). Other choices are possible and might be better. The first control variate is just the value of ZT at the final time, and hence has a known expectation (mean), just as was used for European options. The second is the discounted payoff for a European call option, and hence the expectation is given by the Black Scholes formula. The final is the discounted payoff for a geometrically averaged Asian option, for which there is also a formula for the expectation Z0 exp((rg− r)T )N(d1)−K exp(−rT )N(d2), 4 where N(·) denotes the cumulative distribution function of the standard normal distribution σg = σ √ 2N+1 6(N+1) rg = 1 2 ( r− 1 2 σ2g ) d1 = log(Z0/K)+(rg+ 12σ 2 g )T σg √ T d2 = d1−σg √ T This part of the project is challenging. You might not succeed at correctly implementing all three methods. It is strongly recommended that you focus on control variate 1. Only after you have completed other parts of the project should you attempt the other control variates. 2 Machine Learning: Credit Approval Data (50% of project credit) 2.1 Overview A popular use of machine learning is predicting credit risk. This talk1 by Soledad Galli of Zopa provides an excellent overview of the steps and procedures involved in an actual deployment. While it would be far too much to attack all these steps in this project, we will consider a limited set of tasks using a pre-processed dataset for credit card approvals. The aim of Part 2 of the project is to train, test, and evaluate the performance of different classifiers in predicting credit card approval. 2.2 Particulars A popular dataset used to examine machine learning classifiers is the Australian Credit Approval Dataset2 hosted on the UC Irvine Machine Learning Repository. "This file concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data. This dataset is interesting because there is a good mix of attributes – continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values." We consider the dataset with all categorical values replaced by numerical values. Missing values have been replaced by the mode of the attribute (categorical values) or by the mean of 1https://www.youtube.com/watch?v=KHGGlozsRtA&ab_channel=PyData 2https://archive-beta.ics.uci.edu/ml/datasets/statlog+australian+ credit+approval 5 the attribute (continuous values). The dataset contains 690 examples. The credit card approval information is contained in the last column and encoded as 0 for “not approved” and +1 for “approved”. This column is the label vector. The remaining columns contain the features. The dataset will be posted on my.wbs as a comma-separated-values file: australian.csv along with a description of the dataset. Scikit-learn will be used for all machine learning tasks. Pandas and seaborn will be useful for importing, inspecting and visualizing the data. 2.3 Tasks • Using pandas, read the dataset and verify that it is sensible. Using pandas and/or seaborn provide a summary of the dataset. (See "Report contents" below.) • Extract the design matrix X and vector of labels y from the data. Create a train-test split. Scale the data appropriately. • Perform a sanity check of the training data by running a cross validation score of the SVC classifier with default parameters. Report the mean cross validation score. This will provide a baseline score of what one can expect from a basic classifier without any tuning of hyperparameters. • Now consider the linear and rbf kernels for the SVC classifier and tune the hyperpa- rameters for the two kernels. Standard tuning of hyperparameters would mean tuning regularisation parameter C for the linear kernel and C and the scale parameter gamma for the rbf kernel. You do not need to tune more than these hyperparameters, although you may consider more if they do not require a large amount of computer time to tune. Based on mean cross validation scores, decide final hyperparameter values for the two kernels. • Test and compare the two classifiers using the tuned hyperparameters. (See "Report contents" for suggestions on what you might compare.) • Now consider other classifiers from the scikit-learn library. You must consider the MLP classifier but should in addition consider the Decision Tree and Random Forest Classifiers. For the MPL classifier, you should investigate tuning the hidden layers, but this can result in large computation times, and you should not leave code in the notebook that would take long run times. (See "Report contents" below.) For the Decision Tree, Random Forest, or any other classifiers that you investigate, you may briefly investigate different hyperparameters. • Finally, it is possible to investigate which features are most important in determining the classification. You are encouraged to investigate this. A few useful approaches 6 are to look at permutation_importance in the scikit-learn library. Also, if you run a Decision Tree with a small depth and output the tree, you can see what features are important. You might want to use seaborn to visualize the connection between important features and the label. 2.4 Report contents See general discussion of report contents in Sec. 3 and 4. The report should follow the structure of the computational tasks with the aim to produce a report that leads the reader clearly through the tasks undertaken and then summarizes the overall picture of how the various classifiers perform and possibly connects this with the structure of the dataset. A few specific points to consider are: • After reading the dataset, you need to briefly summarize its contents to the reader using pandas and/or seaborn. At a minimum you want to use the .describe() method, but ideally you should include some useful plots. • The Python code for turning the hyperparameters for the SVC classifier should be included in your submission. Make sure you explain, or print, or plot results from the tuning of hyperparameters so that the final choice is clear from reading the report. While you are strongly encouraged to investigate different choices for hidden layer in the MLP classifier, there are too many possibilities here for you to include Python code for this turning in your submission. You should briefly summarize in words in the report what you tried. The Python code should contain only the final MLP that you decided. (Other code can stay in as long as it is commented out and does not execute when the notebook is run.) For any other classifies you run, please be succinct. • When evaluating classifiers, you will surely want to generate confusion matrices and classification reports. Since the goal is to predict credit card approval, false positives (incorrectly predicting 1) are considered worse the false negatives (incorrectly predicting 0). This means that the precision of predicting 1 and the recall (sensitivity) of predicting 0 are especially important. The complexity of models is also something that can be discussed when comparing classifiers. This is a relatively small dataset and so there is some danger of overfitting. 7 3 Report Notebooks Your project work will be reported in two separate JupyterLab notebooks, one for each part of the project. Each notebook should run without errors and produce your report. Each notebook should begin with a concise introduction. These can typically be one or at most two paragraphs and should describe what the notebook contains and/or give some motivation to the work. You should: • Use section headings and possibly horizontal lines to give your report structure. • Explain to the reader the purpose or goal of each section. Be brief, focusing on what is being done and why. • Python code should be commented. You want to communicate concisely at the top of code cells what task is being performed in the cell. You also need to include comments for block of code that compute specific tasks. You should assume that the reader understands Python. Do not comment line-by-line what is obvious. • Clearly label all plots! • Explain parameter choices you have made. Describe and interpret your results. It is important that you interpret your findings. Findings will often be in the form of a plot. End individual sections and/or whole notebooks with a brief summary of your findings. A very useful guide to constructing a clear notebook is the following. Run the notebook and then collapse all code cells. The introduction, results, plots, and any discussion should be readable as a short report. Further points: • There is no specific guidance for length other than include all the material in the descrip- tions above. It is better to produce a shorter report that clearly and concisely addresses all the required points. – Do Not include numerous non-illustrative plots. – Do Not explain the Python code line-by-line. – Do Not include irrelevant material and discussion. • In developing and testing your codes you will surely need some Python code that does not belong in your final report. This is normal. However, such things should not be included in your submitted report. A useful way to approach this is to leave all code in place until you have a finalised your work. Then remove any code cells unnecessary to the final report. 8 • It is not necessary to include citations in your report to numerical methods or to example Python code covered in the module lectures and labs. You are permitted to use sections of code directly from the examples in the scikit-learn documentation or Users Guide. If you do, include a simple comment line in the code saying where the code is from. For example: # This follows the examples section of the # sklearn.svm.SVC documentation. In the unlikely event that you use methods or Python code examples not covered in the module, then you must cite the source. • Write in passive voice or used the editorial we (as in “We see that ...”). Do not use contractions, e.g., “don’t”, “haven’t”, etc. 4 Further details 4.1 Marks Marks will be awarded for the project in line following the Generic WBS Marking criteria with technical capability found at the bottom of this page3. Specifically, the criteria are: • Technical Capability [40%]. This includes using appropriate and correct methods and algorithms, implementing correct Python coding, and using appropriate external libraries. Correctly completing all tasks is of primary importance. • Academic Writing [20%]. Results should not only be accurate, but they must also be presented in a clear, structured, and understandable form. Plots and other outputs must be labelled and described. The use of relevant literature; referencing and citation are not normally significant factors for the project assessment. • Analysis and Critical evaluation [20%]. WBS considers these to be separate criteria, but we will consider this to be a single criterion. Results must be interpreted. Justification must be given for the various choices made in the project work. Both parts of the project should contain a concise introduction and concise and informative discussion of the findings. • Comprehension [20%]. Showing deep knowledge & understanding of the subject matter and its context. Originality will also be assessed here. As already emphasised, satisfying these criteria does not require lengthy reports. 3https://my.wbs.ac.uk/-/teaching/216161/resources/in/870142/item/ 690223/ 9 4.2 Project Submission • The project must be submitted electronically through my.wbs: • The submission will consist of a single zip file named uxxxxxxx.zip, where xxxxxxx are the digits of your University ID. The zip file will contain two Jupyter notebooks, plus any modules and data needed to run the notebooks, e.g. you should include australian.csv file. • The marker should be able to unzip your submission and run each notebook without error and without any additional input or files. • Important: before submitting, you should restart the kernel and run all cells in each notebook. You should then save the notebooks in the run state. This way your submission contains two notebooks exactly in the state that you last ran them. • It is the students’ responsibility to ensure that the zip file is not corrupt. • Marks will be deducted for not following these procedures. 4.3 Rules and Regulations This project is to be completed by individuals only and is not a group exercise. Plagiarism is taken extremely seriously and any student found to be guilty of plagiarism of fellow students will be severely punished. 4.3.1 Plagiarism Please ensure that any work submitted by you for assessment has been correctly referenced as WBS expects all students to demonstrate the highest standards of academic integrity at all times and treats all cases of poor academic practice and suspected plagiarism very seriously. You can find information on these matters on my.wbs, in your student handbook and on the library pages here. It is important to note that it is not permissible to reuse work which has already been submitted for credit either at WBS or at another institution (unless explicitly told that you can do so). This would be considered self-plagiarism and could result in significant mark reductions. Upon submission of your assignment, you will be asked to sign a plagiarism declaration. 10
欢迎咨询51作业君