程序代写案例-ECS764

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

QUESTION
PAPER TEMPLATE

MSc Examination
Friday 8th May 2014 14:30 - 17:00
ECS764 Applied Statistics Duration: 2 hours 30 minutes
YOU ARE NOT PERMITTED TO READ THE CONTENTS OF THIS QUESTION PAPER UNTIL
INSTRUCTED TO DO SO BY AN INVIGILATOR
Answer FOUR questions
If you answer more questions than specified, only the first answers (up to the specified number) will
be marked. Cross out any answers that you do not wish to be marked
Calculators are/are not permitted in this examination. Please state on your answer book the name and type
of machine used.
Complete all rough workings in the answer book and cross through any work that is not to be assessed.

Possession of unauthorised material at any time when under examination conditions is an assessment offence
and can lead to expulsion from QMUL. Check now to ensure you do not have any notes, mobile phones or
unauthorised electronic devices on your person. If you do, raise your hand and give them to an invigilator
immediately. It is also an offence to have any writing of any kind on your person, including on your body. If
you are found to have hidden unauthorised material elsewhere, including toilets and cloakrooms it will be
treated as being found in your possession. Unauthorised material found on your mobile phone or other
electronic device will be considered the same as being in possession of paper notes. A mobile phone that
causes a disruption in the exam is also an assessment offence.

EXAM PAPERS MUST NOT BE REMOVED FROM THE EXAM ROOM
Examiners: Steve Uhlig

© Queen Mary, University of London, 2013
Page 2 ECS764 (2014)
Question 1 - Descriptive statistics & probability distributions
a) Consider the following popular centrality statistics: mode, mean, median and mid-
range. Explain the strengths and weaknesses of each of them to describe the centrality of a
normal random variable, depending on the number of data samples available.
Answer: When many samples are available, it does not matter, all of them are more or less
equivalent, except the mid-range that is always unstable and therefore a poor cemntrality
statistic. With a limited number of sample points, the mode is unlikely to be meaningful at
all. The mean will be biased but should be still meaningful. The median is the least biased
of all centrality statistics. The mid-range is always the most biased.
[10 marks]
b) Assume that a set of points, distributed according to a normal distribution, suffers from a
few unusually large or small values (referred to as “outliers”). Explain in what way do such
“outliers” affect the variance, and why this is the case?

Answer: Outliers in the form of very large values increase the variance. Because the
variance is a second-order statistic (square of deviations around the mean), any large value
about the mean increases the variance.
[5 marks]

c) Explain the main difference between exponential and heavy-tailed distributions. Illustrate
this difference by explaining how well the first (percentile 25) and third quartiles
(percentile 75) describe them.

Answer: The main difference between exponential and heavy-tailed distributions is the
decay of the tail probabilities. Exponential distributions have an exponential tail, meaning
that the probability of observing a large value decays exponentially fast. A heavy-tailed
distribution on the other hand has a heavier tail, in the sense that the probability of
observing a large value x is proportional x-a, where a is a positive integer. Exponential
distributions are reasonably well described by the first and third quartile as their deviation
around the mean is limited, and these quartiles will likely capture most of the mass of the
distribution. Heavy-tailed distributions on the other hand will be described best by high
quantiles, e.g., the percentile 95 or 99, so the first and third quartile will not sample the
large values of a heavy-tailed distribution.

[10 marks]
ECS764 (2014) Page 3
Turn Over
Question 2 – Fitting distributions
a) Explain the three main steps of the methodology through which some statistical variable
(e.g., data) will be fit to a given probability distribution? Describe each of the 3 steps and
how they relate with each other.

Answer: The 3 steps in fitting a probability distribution are: (1) finding the distribution
from which the data might be drawn, (2) fitting the parameters of this distribution, and (3)
evaluate the quality of the fit. The first step requires prior knowledge about the process that
generates the data or some guess about the likely distributions. The second step uses the
first and estimates the most appropriate values of the parameters of the considered
distribution (which could be one or multiple). Finally, the last step is used to quantify the
distance between the data and the fitted distribution.

[15 marks]

b) Explain the purpose of the QQ-plot in the methodology of fitting a probability distribution
to some statistical variable (e.g., data). You may illustrate an answer with a diagram
showing an example of a QQ-plot.

Answer: The qqplot is a graphical technique to compare how the quantiles of two given
distributions relate to each other. It consists in plotting on the x-axis the values of the
quantiles of the first distribution, and on the y-axis the values of the quantiles of the second
distribution. The purpose of the qqplot is to asses how similar two empirical distributions
are, by visually comparing their quantiles. If the two distributions are similar, their
quantiles should fall on the diagonal of the plot.

[10 marks]

Page 4 ECS764 (2014)
Question 3 – Hypothesis testing

a) Explain the notion of a statistical test in the case of a one-sample test, i.e., when some data
is compared to a known population. Describe the respective roles of the null hypothesis
(H0), the test statistic, and the p-value in the outcome of the statistical test. In particular,
explain when the null hypothesis will be rejected.

Answer: A statistical test is a procedure to test a hypothesis about a set of numerical values,
i.e., data. A statistical test relies on a null hypothesis (H0), i.e., a statement that is tested
about the data. Sometimes, an alternative hypothesis (often the complement of H0) will also
be stated that is hoped to be true in the case the null hypothesis is rejected. A one-sample
test relies on a “test statistic” that will provide a distance function between the data and the
known population that defines H0. The test statistic is specifically selected or defined in
such a way as to quantify, within observed data, behaviors that would distinguish H0 from
HA. Depending on the size of the data, a given distance of the test statistic will be translated
into a likelihood that the data is as extreme as the one observed, assuming that H0 is true,
called the p-value. In other words, the p-value is the probability of obtaining a test statistic
at least as extreme as the one that was actually observed, assuming that the null hypothesis
H0 is true. If the p-value is below a pre-defined threshold, the null hypothesis will be
rejected.

[15 marks]

b) A statistical test does not accept the null hypothesis, but either rejects or fails to reject it.
Imagine that you believe that the null hypothesis should be rejected, but with a given dataset
(of a given size) you fail to reject it. Explain two strategies that you may pursue to reject the
null hypothesis, but without changing the null hypothesis.

Answer: The first strategy is to increase the size of the dataset and hope that the distance
will increase and the p-value will decrease. The second strategy is not to change the size of
the dataset, but to rely on a test statistic that is more strict and will therefore require a
smaller distance between the data and the known population, e.g., the KS test that is very
strict.

[10 Marks]
ECS764 (2014) Page 5
Turn Over
Question 4 – Time-series analysis

a) Explain the auto-correlation function, and its use in time-series analysis.

Answer: The auto-correlation function is the correlation between values of the process at
different times s,t, as a function of the two times s,t or of the time difference t-s. Its formula
is the following:
R(s, t) =
E[(Xt -m)(Xs -m)]
s 2
where E is the expectation, mu is the mean, and sigma is the standard deviation. The auto-
correlation is used to understand the dependence over time within time-series. Auto-
correlation decays (in absolute value) with time lag, so plotting how this decay depends on
the time lag gives insight into the properties of the time-series.

[10 marks]
b) In time-series analysis, one often relies on decomposition, by which the time-series is
decomposed into a “trend” and a “remainder”. The remainder component is often expected
to be uncorrelated, i.e., close to random noise. Describe the auto-correlation of perfect
random noise.
Answer: The auto-correlation of a perfect random noise should be 1 at lag 0 (as is the case
for all time-series) and should be close to 0 for any non-0 lag. Close to 0 actually means
within the 1/sqrt(n) confidence intervals, where n is the length of the time-series.
[5 marks]
c) Stationarity is a fundamental concept in time-series analysis. Give one of its multiple
definitions, and give an example of a stationary process and of a non-stationary process.

Answer: Stationarity has multiple definitions (weak, strong, and statistical). Intuitively,
stationarity means that a set of statistics of the time-series do not vary over time. More
formally, it means that the probability laws that govern the process do not change over
time. For example, the mean or the variance should be constant over time for a time-series
to be considered second-order stationary. An example of stationary time-series is white
noise or a moving average, and an example of non-stationary time-series is the random
walk.
[10 marks]

Page 6 ECS764 (2014)

End of Paper

欢迎咨询51作业君