辅导案例-MM923

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MM923 Regression Modelling Project
1. Project Overview
In this project you will analyse and explore a large data set based on the daily closing stock-prices of 28 large
companies. In particular, you will build and assess linear regression models that explain variability in the
daily stock price of BAE systems using stock price data from other companies. \
You will be expected to use your own judgement, as well as the course content in order to come up with a
predictive model for the BAE Systems stock price.
2. Getting Started
The dataset “lse” looks at the closing share prices for BAE Systems (known as British Aerospace) and 29
other companies in the FTSE (Financial Times Stock Exchange) 100 Index. The FTSE 100 Index lists the
share prices of the 100 companies with the highest market capitalisation that are part of the London Stock
Exchange. That is, the companies with the highest market value, worked out by multiplying the company’s
share price with the number of shares. [1] The data were taken from Yahoo Finance and the response variable,
which predictions will be made on, is labelled BA, British Aerospace. The dataset includes daily data from
January 2016 to January 2019. The other 27 company variables have been standardised with mean 0 and
variance 1. The share price for BAE Systems is one day ahead and so a regression model can be fit to predict
the closing share prices at the end of day (i+1) using those of the 27 companies at the end of day (i)
# load the London Stock Exchange data set
load("project_data.RData")
The data have now been loaded and are accessible in a data frame called lse. You can quickly visualise the
data columns by printing the first few rows of data using the head() function
# print first rows of the London Stock Exchange data
head(lse)
Full descriptions of the column contents are provided on the final page of this document.
3. Project Tasks
Part 1: Model Fitting & Interpretation (25 Marks)
(a) (i) Calculate the sample correlation coefficient between the closing price of each stock and that of
BAE Systems and use a single plot to summarise these.
(ii) Find the 5 companies whose stock prices have the strongest correlation (in absolute value) with
BAE System’s.
(b) Fit a linear regression for BAE Systems’s stock price based on the five companies found in (a)(ii)
(i) Write down the ANOVA table for this linear regression
1
(ii) Comment on the goodness of fit of the model
(iii) Provide an interpretation of the model coefficients
(iv) How do the estimated coefficients compare to the correlations from part (a)(i)? Investigate any
differences and describe the potential issues with this model.
(d) Using an appropriate variable selection technique and any transformations of the independent variables,
build an improved model for BAE Systems’s daily closing price.
(e) Using your final model from part (d), check the regression assumptions using appropriate summary
plots, and comment on whether you think that these are valid.
Part 2: Prediction & Validation (10 Marks)
(a) Use the following code template to create a function which will make future predictions for BAE
Systems’s closing price using the model created in Part 1 (d). The first line of this script must not be
altered. Save this as an R script, and upload it via the link on MyPlace to view your prediction result.
predict_BAE <- function(lse, newdata){
# Carry out any transformations prior to fitting you model
# Add transformed variables to both lse and newdata. E.g.:
lse$VOD.sq <- lse$VOD^2
newdata$VOD.sq <- newdata$VOD^2
# this is the part that fits your linear model
BAE.lm <- lm(BA ~ VOD.sq + Year, data = lse)
# this is the part that produces predictions using your linear model
predictions <- predict(BAE.lm, newdata = newdata)
return(predictions)
}
(b) Try to improve the prediction accuracy of your model. Discuss how you did this and whether your
prediction improved.
Presentation of report, clarity of language (5 marks)
4. Report Structure, Content & Submission
The report itself should follow a logical structure, and any analysis you carry out should be clearly interpreted,
using full sentences. You should write as though your audience have some, limited knowledge of statistics.
Further report guidelines are as follows:
• The report should be a maximum of 6 pages in length including graphics and tables
• Graphs should be suitably labelled, sensibly scaled and cropped.
• Numerical R outputs used to answer questions should be neatly presented in tables or in the text.
• You should submit your R script along with your report in the assignment submission on MyPlace.
Background knowledge of finance is not required, although it may help in building and criticising your models
to consider basic factors that may contribute to the day-to-day variability in the value of a company.
2

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468