MM923 Regression Modelling Project 1. Project Overview In this project you will analyse and explore a large data set based on the daily closing stock-prices of 28 large companies. In particular, you will build and assess linear regression models that explain variability in the daily stock price of BAE systems using stock price data from other companies. \ You will be expected to use your own judgement, as well as the course content in order to come up with a predictive model for the BAE Systems stock price. 2. Getting Started The dataset “lse” looks at the closing share prices for BAE Systems (known as British Aerospace) and 29 other companies in the FTSE (Financial Times Stock Exchange) 100 Index. The FTSE 100 Index lists the share prices of the 100 companies with the highest market capitalisation that are part of the London Stock Exchange. That is, the companies with the highest market value, worked out by multiplying the company’s share price with the number of shares. [1] The data were taken from Yahoo Finance and the response variable, which predictions will be made on, is labelled BA, British Aerospace. The dataset includes daily data from January 2016 to January 2019. The other 27 company variables have been standardised with mean 0 and variance 1. The share price for BAE Systems is one day ahead and so a regression model can be fit to predict the closing share prices at the end of day (i+1) using those of the 27 companies at the end of day (i) # load the London Stock Exchange data set load("project_data.RData") The data have now been loaded and are accessible in a data frame called lse. You can quickly visualise the data columns by printing the first few rows of data using the head() function # print first rows of the London Stock Exchange data head(lse) Full descriptions of the column contents are provided on the final page of this document. 3. Project Tasks Part 1: Model Fitting & Interpretation (25 Marks) (a) (i) Calculate the sample correlation coefficient between the closing price of each stock and that of BAE Systems and use a single plot to summarise these. (ii) Find the 5 companies whose stock prices have the strongest correlation (in absolute value) with BAE System’s. (b) Fit a linear regression for BAE Systems’s stock price based on the five companies found in (a)(ii) (i) Write down the ANOVA table for this linear regression 1 (ii) Comment on the goodness of fit of the model (iii) Provide an interpretation of the model coefficients (iv) How do the estimated coefficients compare to the correlations from part (a)(i)? Investigate any differences and describe the potential issues with this model. (d) Using an appropriate variable selection technique and any transformations of the independent variables, build an improved model for BAE Systems’s daily closing price. (e) Using your final model from part (d), check the regression assumptions using appropriate summary plots, and comment on whether you think that these are valid. Part 2: Prediction & Validation (10 Marks) (a) Use the following code template to create a function which will make future predictions for BAE Systems’s closing price using the model created in Part 1 (d). The first line of this script must not be altered. Save this as an R script, and upload it via the link on MyPlace to view your prediction result. predict_BAE <- function(lse, newdata){ # Carry out any transformations prior to fitting you model # Add transformed variables to both lse and newdata. E.g.: lse$VOD.sq <- lse$VOD^2 newdata$VOD.sq <- newdata$VOD^2 # this is the part that fits your linear model BAE.lm <- lm(BA ~ VOD.sq + Year, data = lse) # this is the part that produces predictions using your linear model predictions <- predict(BAE.lm, newdata = newdata) return(predictions) } (b) Try to improve the prediction accuracy of your model. Discuss how you did this and whether your prediction improved. Presentation of report, clarity of language (5 marks) 4. Report Structure, Content & Submission The report itself should follow a logical structure, and any analysis you carry out should be clearly interpreted, using full sentences. You should write as though your audience have some, limited knowledge of statistics. Further report guidelines are as follows: • The report should be a maximum of 6 pages in length including graphics and tables • Graphs should be suitably labelled, sensibly scaled and cropped. • Numerical R outputs used to answer questions should be neatly presented in tables or in the text. • You should submit your R script along with your report in the assignment submission on MyPlace. Background knowledge of finance is not required, although it may help in building and criticising your models to consider basic factors that may contribute to the day-to-day variability in the value of a company. 2
欢迎咨询51作业君