辅导案例-MSCI 523

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

2019/2020 MSCI 523 Forecasting Coursework

Coursework Information & Submission
This is an individual assignment weighted 100%. Coursework deadline is April 28th 2020, 10:00am. Standard
departmental penalties will apply for late work unless you have been given an extension for exceptional reasons from
the course administrator. All submissions will be checked by the plagiarism software. Coursework must be submitted
online on Moodle. Submit your report PLUS R scripts in the appendix through in Moodle.

Task: Forecast 1 Real-Word Time Series – 100%
You are required to forecast a dataset of a single time series individually assigned to you in a miniature competition
between the algorithms. Your objective is to:

(a) Develop the most accurate statistical forecasting models, demonstrating your modelling skills!
• Obtain your time series from the respective competition dataset
o Download the datasets from the Moodle website
o Select the 1 time series allocated to you from the competition file
• Construct forecasts, and for each of the time series
o conduct a thorough exploratory data analysis (using graphs, statistical test, etc. and a verbal analysis)
o build multiple potentially suitable forecasting models, including a suitable Exponential Smoothing,
ARIMA, and Time Series Regression model to predict the 14 next values (2 weeks ahead)
o Choose what you assume to be the "best" model from these models for a final submission, by
assessing errors and comparing errors against a Naive and a Seasonal Naive benchmark models.

(b) Document your forecasting process in a technical research report
• Write a technical report to document your model building skills & to justify your choice of model
• Critically discuss some findings and choices

Marking Scheme & Hints

20% of points - Data Exploration
- Explore the regular components and the irregular components of the time series using graphs, statistical
summaries and statistical tests.
- NOTE:
o Important tests relate to stationarity (e.g. the ADF test) and patterns (e.g. seasonality, trend, ACF &
PACF analysis) of the time series to support your visual analysis
o Document your findings comprehensively, making adequate use of (readable and correctly labelled)
graphs which you also discuss verbally to support your arguments.
o Excessive evidence (e.g. the complete information from statistical tests) may be placed in the
appendix, but must be referenced directly at the corresponding place in the main the text, else it does
not count.
o Conclude by recommending one (or multiple) suitable model forms for forecasting.

60% of points – Forecast Model Building
- Build multiple potential contender models (each one suitable for the identified data properties), leading to
o 1 good manually built Exponential Smoothing Model - in comparison to 1 automatically built
o 1 good manually built ARIMA model - in comparison to 1 automatically built
o 1 good manually built Time Series Regression model - in comparison to 1 automatically built
- Document the specification of each model
o Document the specification process of model building, i.e. the iterative steps to build this final model,
including the analysis of residuals of intermediate models that have led to better ones
o Document the final model form and parameters to allow a complete replication of your experiments
(i.e. specify what model forms or lags you selected, which transformations in which order & which
parameters were used so that others could replicate what you did exactly only from your document).
- NOTE:
o For each contender model, one or more different model forms may be feasible to produce forecasts
(i.e. for a seasonal time series with slight trend in Exponential Smoothing you may use multiplicative
or additive seasonality, with or without trend, so 4 candidates that could perform well; for ARIMA you
may consider models with seasonal differences or first differences or both; for regression models you
may consider to capture seasonality as dummy variables, or as a seasonal autoregressive lag etc.).
Where applicable, you should always consider multiple plausible candidate models, and must justify
your choice of candidates in comparison to other potential models in each class of models (feel free to
explicitly rule out implausible ones). To get high marks it will not be sufficient to build a single
Exponential Smoothing model using an auto specification ZZZ or a single automatic ARIMA model
and a single Regression model with stepwise, but rather require the development of a subset of
potentially useful models which you compare and accept/reject. Base your justification on evidence
and document your ITERATIVE modelling process throughout.
o We recommend using R, but you are free to use any external software but report the software used.

10% of points – Discussion of expected Accuracy (Errors)
- Determine a suitable error metric and compute the expected in-sample and out-of sample errors of your
recommended models and compare them. Comment on the suitability of the chosen metric for this task.
- Select one "best" model to be used for forecasting each of the time series across all algorithm families, and
provide your final out of sample forecasts.
- NOTE:
o To assess the future forecast accuracy of your models before the final future values become
available, consider to create a hold-out dataset of equal or better larger size to the forecasting horizon
and assess both in-sample and (quasi) out-of-sample errors
o Different error metrics are feasible. You should use at least two suitable error metrics for the
assessment, and justify the use of each error metrics you are using
o To show improvements in accuracy of your methods to some objective benchmark, you should
compare all to the Naive Level and a Naive Seasonal Model. Use tables to provide a suitable
overview of different methods’ accuracy and their uplfit of accuracy on the 2 Naïve models.
o Comment on the accuracy and suitability of each of the methods for each of the time series. Which
method would you recommend to use for each of the time series? Comment on why you think some
methods perform better than others? Comment on the available data and number of origins for
forecast evaluations on the withheld test set and given your fixed forecasting horizon.

10% of points - General report writing skills
- General report writing skills such as critical discussion of findings, thoroughness of documentation, clarity of
arguments, structure of the report, readability of the report (i.e. lack of spelling and grammatical mistakes etc.)
will also be considered in marking each section. Please see next page for some more technical considerations
on report writing.

General suggestions on writing a report for Task II
The coursework requires you to document your analysis and critically discuss your chosen experimental design,
modelling approaches and the results in a technical report. This technical report should be written as if tailored to an
OR specialist (e.g. who has an MSc in Management Science from Lancaster University and has taken the MSCI 523
course, and who wants to evaluate your results AND your decision making process to determine your skills in
modelling and whether you have missed anything). This means that you are not required to write a general description
(i.e. Exponential Smoothing is ... ) as an OR expert would be aware of this! Consequently the report should document
the process of modelling, and allow an understanding of your choices and a replication of your experiments.
The report should contain an introduction and a summary with conclusions on your findings, numbered headings, list
of figures and tables and an executive summary (tailored to senior management) indicating the most relevant findings.
The report should display a logical and concise structure, be generally “readable” and support your argument using
plots of time series, forecasts and /or accuracy. Make adequate use of graphs to show time series, model fit /
predictions and residuals to support your arguments (for this graphs must be completely readable and with labels), as
well as tables to compare results.
The page limit for the report is 18 pages (note this is a maximum to make your life easier - you can produce shorter
reports! pages count only for main text incl. graphs and tables, but not for the cover sheet, executive summary,
contents sheet or appendices). Reports of excessive length will be penalised by deducting 10 marks (i.e. 10% of 100)
but only if they are including un-necessary material. For formatting, use single spacing, format normal text in times
new roman font size 12, text in tables, figure and table headings in font size 10, and leave 2cm of margin left and right.
Include any technical details and hardcopies that support your arguments in a set of appendices (i.e. the printouts
from ADF tests in the appendix, with only the conclusion of significance / insignificance at a probability in the main
text), which will not count towards the page limit. You must ensure the main text is readable and that your argument is
coherent without needing to consult the appendices. All parts of the text supported by an appendix must cross-
reference directly to the relevant part.

Non-disclosure clause: these datasets and the coursework task is subject to copyright © by Sven Crone, all rights reserved. In
downloading the documents and submitting the assignment for assessment the copyright agreement is deemed accepted. Any
publication of the dataset, the coursework task, or its solution (e.g. on a coursework website or a social network site), or a part thereof,
will be considered a violation of copyright. The person breaking the copyright may be held liable for damages by international law suit.
Furthermore, the publication will count the assignment as a plagiarism - even in retrospect after receiving the MSc degree - leading to a
mark of zero, with the usual right to appeal to university court in official hearing,

Contact details: Sven F. Crone, Lancaster University Management School, Centre for Forecasting, Room A53a, Tel 01524 5-92991,
[email protected] If you have any questions don’t hesitate to contact me! However, please make sure you arrange a meeting first via email
to avoid people queuing and being disappointed. You have over 4 weeks to complete this, but I think it is imperative that you start immediately with
the coursework! Please note that I am around and happy to answer all questions in person in the next weeks, but that I am heavily engaged in
teaching and finding MSc student projects for you, so I may not be able to answer very late questions! Also consider in your enquiries that I cannot
always react within a few hours, so don't leave questions to the last minute … start early! Best of luck! Sven
Department of Management Science
2020 TEMPLATE - MSCI523 Feedback & Assessment

Student ID: MARK: Max Yours
Task I: Forecasting of 1 Time Series 100

Exploratory Data Analysis (description, distributions, relevance) 20
- Plot of time series, seasonal plots, distributions, decomposed data 4
- Descriptive analysis of regular (level, trend, season) and irregular patterns (outliers, breaks) 4
- Plot of ACF / ACF plot of original and stationary data (on ADF test) with correct interpretation 6
- Statistical analysis using tests (Stationarity, Trend, Seasonality, Structural breaks, etc.) 6

Model building & Forecasting with Exponential Smoothing 20
- Documentation of correct modelling process (steps taken) & info on final model (spec./pars.) 8
- Insightful residual analysis making good use of graphics and statistical tests 4
- Evidence of iterative model building process (analysis of residuals, refinement of model etc.) 4
- Critical discussion of all plausible contender models (in case multiple models are adequate) 4

Model building & Forecasting with ARIMA models 20
- Documentation of correct modelling process (steps taken) & info on final model (spec./pars.) 8
- Insightful residual analysis making good use of graphics and statistical tests 4
- Evidence of iterative model building process (analysis of residuals, refinement of model etc.) 4
- Critical discussion of all plausible contender models (in case multiple models are adequate) 4

Model building & Forecasting with (dynamic) Regression Models 20
- Documentation of correct modelling process (steps taken) & info on final model (spec./pars.) 8
- Insightful residual analysis making good use of graphics and statistical tests 4
- Evidence of iterative model building process (analysis of residuals, refinement of model etc.) 4
- Critical discussion of all plausible contender models (in case multiple model are adequate) 4

Evaluation of Accuracy across models (discussion → analysis → description → none) 10
- Use and justification of an unbiased error metric (e.g. sMAPE, not MAPE or RMSE, MSE) 2
- Compare errors of all model & against Naive to Recommend “best” model for competition 2
- Compute expected multiple-step ahead errors of models (in sample and out-of-sample) 4
- Provide final forecasts for selected model 2

General Report Writing 10
- Including relevant report parts (Exec. summary, intro., lists of graphs & tables, appendix) 4
- General report writing structure & style (see below) 6

Sub-total Task II 100
- Late hand-in (-10% of 100 = -10 pts)
- Exceeds page limit (-1% of 100 for each page, e.g. 10 pages = -10 pts)
TOTAL 100

General issues considered in the marks given in each section
- Structure: coherence, logical sequencing of parts, distinct self-contained paragraphs
- Writing Style & presentation: clarity, precision, conciseness, grammar, spelling, punctuation, language
- Analysis / argument: Understanding of concepts; clarity of arguments; critical approach to concepts and theories
- Evidence: Relevance & depth in conceptual support/empirical evidence; avoiding unsupported assertion, claims& repetition
- Relevance: Addressing and interpreting the requirements of the question; keeping the question in focus
- Scope / breadth of approach: Adequate coverage of parts of the question; balance of different elements of question

Common mistakes
- no adequate use of plots in report (or in appendix with explicit link to appendix) to discuss time series, model fit, forecast & residuals
- no use & clear identification of ex-ante out of sample performance metrics (use of in-sample errors)
- no discussion of pros/cons of SMAPE error measure versus others; no identification of a single error measure used to id the "best" method
- no discussion of discrepancies in method performance (i.e. relative ranks) across series, across origins & across different error measures
- detailed discussion of each algorithm - not required, systematic analysis of data and developing models using graphs, tests & error metrics
- no evidence of iterative model building, i.e. starting with simple mode, analyzing residuals, then refining model type & parameters further
- no evidence of residual analysis for checking model adequacy, contrasting this with errors (often contradictory or supporting evidence)
- no evidence of critical discussion of lab exercises for ETS nor Regression models, poor discussion of error (no data exploration asked)