辅导案例-ID 00395

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Paper ID 00395
FAMILY NAME:
OTHER NAME(S):
STUDENT ID:
SIGNATURE:
SCHOOL OF RISK AND ACTUARIAL STUDIES
TERM 1 2019
FINAL EXAM
ACTL 2131: Probability and Mathematical Statistics
INSTRUCTIONS:
1. TIME ALLOWED2 HOURS
2. READING TIME10 MINUTES
3. THIS EXAMINATION PAPER HAS 33 PAGES.
4. TOTAL NUMBER OF QUESTIONS7
5. TOTAL MARKS AVAILABLE100
6. MARKS AVAILABLE FOR EACH QUESTION ARE SHOWN IN THE EXAMINATION PA-
PER (AND OVERLEAF). ALL QUESTIONS ARE NOT OF EQUAL VALUE.
7. ANSWER ALL QUESTIONS IN THE SPACE ALLOCATED TO THEM. IF MORE
SPACE IS REQUIRED, USE THE ADDITIONAL PAGES AT THE END.
8. CANDIDATES MAY BRING
a. THE TEXT FORMULÆ AND TABLES FOR ACTUARIAL EXAMINATIONS (ANY
EDITION) INTO THE EXAMINATION. ITMUST BEWHOLLY UNANNOTATED.
b. THEIR OWN UNSW APPROVED CALCULATOR
9. ALL ANSWERS MUST BE WRITTEN IN INK. EXCEPT WHERE THEY ARE EXPRESSLY
REQUIRED, PENCILS MAY BE USED ONLY FOR DRAWING, SKETCHING OR GRAPH-
ICAL WORKS.
10. THIS PAPER MAY NOT BE RETAINED BY THE CANDIDATE.
Question Total available marks Total marks attained
for the question for the question
1 [8 marks]
2 [10 marks]
3 [15 marks]
4 [25 marks]
5 [10 marks]
6 [17 marks]
7 [15 marks]
[total: 100 marks]
Page 2 of 33
Question 1 [8 marks]
Suppose that random variables X and Y have joint probability density function given by
fXY (x, y) =
{
Cxy, for 0 ≤ x ≤ a and 0 ≤ y ≤ b,
0, otherwise.
(a) [2 marks] Show that C has to be equal to
4
a2b2
for fXY (x, y) to be an appropriate pdf.
Solution: We should use the property that
∫∞
−∞
∫∞
−∞ fXY (x, y)dxdy = 1. ( 1 point)
∫ ∞
−∞
∫ ∞
−∞
fXY (x, y)dxdy =
∫ a
0
∫ b
0
Cxydxdy = C × a
2b2
4
= 1. (1 point) (1)
Therefore, C = 4
a2b2
.
Page 3 of 33
(b) [2 marks] Determine the marginal density function of the random variable Y , i.e., fY (y).
Solution:
fY (y) =
{∫∞
−∞ fXY (x, y)dx =
∫ a
0
4
a2b2
xydx = 2
b2
y for 0 ≤ y ≤ b. (1.5 points)
0 otherwise. (0.5 points)
(2)
Page 4 of 33
(c) [4 marks] Determine E [XY ] and express it in terms of a and b.
Solution:
E [XY ]
(1 points)
=
∫ ∞
−∞
∫ ∞
−∞
xyfXY (x, y)dxdy
(1 points)
=
∫ a
0
∫ b
0
4
a2b2
x2y2dxdy
(1 points)
=
4
a2b2
x3
3
∣∣∣∣a
0
y3
3
∣∣∣∣b
0
(1 points)
=
4
9
ab. (3)
Page 5 of 33
Question 2 [10 marks]
You are provided with a sample of 20 observations. You have fitted a log-normal distribution to the
sample and the Maximum Likelihood estimators are µ̂ = 2, and σ̂2 = 1.
Your manager asks you to perform a χ2 goodness-of-fit test on the estimated log-normal distribution
and has kindly provided you with the following table:
cell observed observed
2
expected expected
2
(observed-expected) (observed-expected)
2
0-1 0 0 0.46 0.21 -0.46 0.21
1-2 2 4 1.46 2.12 0.54 0.29
2-3 1 1 1.76 3.10 -0.76 0.58
3-4 2 4 1.72 2.96 0.28 0.08
4-5 1 1 1.57 2.46 -0.57 0.32
5-6 3 9 1.39 1.93 1.61 2.59
6-7 2 4 1.22 1.48 0.78 0.61
7-8 0 0 1.06 1.13 -1.06 1.13
8-9 2 4 0.93 0.87 1.07 1.14
9-10 1 1 0.81 0.66 0.18 0.03
10-11 1 1 0.71 0.51 0.29 0.08
11-12 1 1 0.63 0.40 0.37 0.14
12-13 0 0 0.56 0.31 -0.56 0.31
13-14 1 1 0.49 0.24 0.51 0.26
14-15 1 1 0.44 0.19 0.56 0.32
>15 2 4 4.79 22.93 -2.79 7.78
sum 20 36 20 186.0103 0 22.1503
(a) [4 marks] Explain to your manager why the above table is not appropriate to perform a χ2-test.
Propose changes to the table to make it suitable for a χ2-test.
Solution: Cells should be chosen such that the expected number of observations in each cell, i,
Ei ≥ 5. (2 points)
The degree of freedom should be positive. Thus, minimum number of cells we could have is
1+2+1=4. (2 points)
Page 6 of 33
Your manager has provided a new but incomplete table.
observed expected (observed-expected) (observed-expected)
2
0-4 -0.39
4-8 0.58
2.42 5.87
>15 -2.79 7.78
(b) [6 marks] Complete the table and test whether the log-normal distribution is a valid distribution
for this dataset. Use a level of significance of α = 5% for your test.
Solution: We propose to use the following cells: (3 points)
observed expected (observed-expected) (observed-expected)
2 (observed-expected)
2
expected
0-4 5 5.39 -0.39 0.16 0.03
4-8 6 5.24 0.76 0.58 0.11
8-15 7 4.58 2.42 5.87 1.28
>15 2 4.79 -2.79 7.78 1.62
sum 20 20 0 14.38 3.05
under H0 we have that T =
∑ (observed-expected)2
expected
∼ χ2c−k−1 = χ21. (2 points)
Hence: T = 3.05 ≤ χ20.95(1) = 3.84 thus we cannot reject the null hypothesis that the log-normal
distribution is the correct distribution. (1 point)
Page 7 of 33
Question 3 [15 marks]
Consider an independent random sample {X1, ..., X200} with common density function given by:
f(x;α) = α(1 + x)−α−1, for α > 0 and x ≥ 0.
From this sample, the following information has been collected:
200∑
i=1
xi = 159.711;
200∑
i=1
x2i = 474.998;
200∑
i=1
log(xi) = −201.257;
200∑
i=1
log(1 + xi) = 99.266.
(a) [6 marks] Show that the log-likelihood is given by
log(L(α)) = n log(α)− (α+ 1)
n∑
i=1
log(1 +Xi),
and prove that the Maximum Likelihood estimator of the parameter α is given by
αˆML =
n∑n
i=1 log(1 +Xi)
.
Solution: We find the log-likelihood as follows:
log(L(α)) =
n∑
i=1
log(α(1 +Xi)
−α−1) = n log(α)− (α+ 1)
n∑
i=1
log(1 +Xi). (1 point) (4)
We have to solve the following optimization problem:
log(L(α))→ max
α
(1 point) (5)
We find the solution by differentiating the equation above subject to α and establishing the first
order condition:
∂ log(L(α))
∂α
=
n
α
−
n∑
i=1
log(1 +Xi) = 0. (2 point) (6)
Thus,
αˆML =
n∑n
i=1 log(1 +Xi)
. (1 points) (7)
We also should check the second order condition:
∂2 log(L(α))
∂α2
= − n
α2
< 0, for α > 0. (1 point) (8)
Thus, αˆML is the global minimum of the log-likelihood function.
Page 8 of 33
(b) [5 marks] To test the hypothesis
H0 : α = α0 vs. H1 : α 6= α0
we define the log-likelihood ratio statistic to be log(Λ) = log(L(α0))− log(L(αˆML)), where αˆML
is the value of the Maximum Likelihood estimator of α. Show that
log(Λ) = n log
(
α0
∑n
i=1 log(1 +Xi)
n
)
+ n− α0
n∑
i=1
log(1 +Xi).
Solution: We have
log(L(α)) = n log(α)− (α+ 1)
n∑
i=1
log(1 +Xi). (9)
Using the result in part (a) we then have that log(Λ) is
log(Λ) = n log(α0)− (α0 + 1)
n∑
i=1
log(1 +Xi)− n log(αˆ) + (αˆ+ 1)
n∑
i=1
log(1 +Xi) (10)
= n log
(
α0
n/
∑n
i=1 log(1 +Xi)
)
+
(
n∑n
i=1 log(1 +Xi)
− α0
) n∑
i=1
log(1 +Xi) (11)
= n log
(
α0
∑n
i=1 log(1 +Xi)
n
)
+ n− α0
n∑
i=1
log(1 +Xi). (12)
Page 9 of 33
(c) [4 marks] Using the result in (b), perform the likelihood ratio test of
H0 : α = 2 vs. H1 : α 6= 2
at a 5% level of significance. Clearly state your conclusion.
[Hint: you are reminded that −2 log(Λ) is asymptotically χ21, where Λ is the likelihood ratio]
Solution: Using the result in part (b) we have that the value of −2 log(Λ) is
−2 log(Λ) = −2
(
n log
(
α0
∑n
i=1 log(1 +Xi)
n
)
+ n− α0
n∑
i=1
log(1 +Xi)
)
(13)
= −2 (200 · log(2 · 99.266/200) + 200− 2 · 99.266) = 0.0108. (14)
Then,
p-value = Pα0(Λ < λ) = P (−2 log(Λ) > 0.0108) = P (χ21 > 0.0108) = 0.917. (15)
Therefore, we fail to reject H0 at 5% level of significance.
Page 10 of 33
Question 4 [25 marks]
Let X be a Normal(0, σ2) random variable. In this question we consider the random variable Y = |X|,
and an i.i.d sample coming from this distribution {Y1, Y2, . . . , Yn}.
[Note that all questions are independent and do not require solving previous question(s). However, you
may need information contained in the statements of other questions.]
(a) [2 marks] Explain why the probability density function of Y is
fY (y) =

√
2
σ
√
pi
exp
(
− y
2
2σ2
)
, for y ≥ 0,
0, otherwise.
The pdf of X is 1√
2piσ
exp
(
− x2
2σ2
)
for all x. Because Y is the absolute value of X, the pdf will
be 0 for all y < 0, and exactly double than that of X for y ≥ 0.
(b) [3 marks] Show that the expectation of Y is E[Y ] = σ
√
2
pi
.
By definition
E[Y ] =
√
2
σ
√
pi
∫ ∞
0
y exp
(
− y
2
2σ2
)
dy
=
√
2
σ
√
pi
(−σ2) exp
(
− y
2
2σ2
) ∣∣∣∞
0
= −σ
√
2
pi
(0− 1) = σ
√
2
pi
.
Page 11 of 33
(c) [2 marks] Show that the Method of Moments estimator of σ is σˆMM =
√
pi
2
∑n
i=1 Yi
n
.
From (b)
σ =
√
pi
2
E[Y ]
σˆ =
√
pi
2
· Y .
(d) [2 marks] Explain how to use the central limit theorem to identify the asymptotic distribution
of the estimator, σˆMM , given in (c), if the sample size is very large.
Because this Method of Moments estimator is simply a function of the sample mean Y , we can
invoke the CLT and say that
σˆ ≈ Normal
(
σ,
pi
2
V[Y ]
n
)
(Students don't need to specify the parameters to get full marks).
Page 12 of 33
(e) [4 marks] The Maximum Likelihood estimator of σ is σˆML =
√∑n
i=1 Y
2
i
n
. This is different than
the Method of Moments estimator, σˆMM , given in (c). Between these two estimators (the Maxi-
mum Likelihood estimator and the Method of Moments estimator), which one would you choose
and why?
Solution:
This question has no right answer, it is meant for them to explain some strengths and weaknesses
of both estimators. Possible elements in favour of the MM estimator are
It is straightforward to see this estimator is unbiased, which is suitable.
It is straightforward to see that the variance converges to zero, which means the estimator
is consistent.
Possible elements in favour of the ML estimator are
The method on which it relies (finding the parameter that makes the observations the most
likely) is sounder.
Automatically, the ML is consistent.
Asymptotically, the ML reaches the Cramer-Rao bound, hence asymptotically it is the best
unbiased estimator.
Page 13 of 33
(f) [4 marks] The two estimators from part (b) and (c), σˆMM and σˆML, respectively, are different
for a fixed n. Show that as n→∞, they converge in probability to the same quantity.
For the MM estimator, it is both unbiased and with a variance going to 0, hence consistent.
But consistent exactly means converging in probability to the true parameter σ. Alternatively,
seeing that the MM estimator is just √
pi
2
· Y
one can invoke the Law of Large numbers and by it we have that it converges to
√
pi
2E[Y ] = σ.
For the ML estimator, one automatically has that the MLE is consistent, hence converges in
probability to σ.
(g) [4 marks] Show that if X ∼ Normal(0, σ2), then X2 ∼ Gamma (12 , 12σ2 ).
[Hint: you are reminded that χ21 is the same as a Gamma(
1
2 ,
1
2).]
It seems wise to start with the hint. A χ21 is the square of a standard Normal, hence(
X
σ
)2
=
X2
σ2
∼ χ21 ≡ Gamma
(
1
2
,
1
2
)
But then, we can find the MGF of X2 as
MX2(t) = E
[
e
X2
σ2
σ2t
]
= MX2
σ2
(σ2t) =
(
1− σ
2t
1/2
)− 1
2
which is exactly the MGF of a Gamma(12 ,
1
2σ2
).
Page 14 of 33
(h) [4 marks] Use the result given in part (g) to construct a confidence interval for the parameter
σ, at a level of significance α.
We need a pivot for σ. Realising that X2 = Y 2 and using part (g) we have
X2
σ2
=
Y 2
σ2
∼ χ21
and so
1
σ2
∑
Y 2i ∼ χ2n
so that
Pr
[
χ2n,α/2 ≤
1
σ2
∑
Y 2i ≤ χ2n,1−α/2
]
= α
Pr
[
χ2n,α/2∑
Y 2i
≤ 1
σ2
≤
χ2n,1−α/2∑
Y 2i
]
= α
Pr
[ ∑
Y 2i
χ2n,1−α/2
≤ σ2 ≤
∑
Y 2i
χ2n,α/2
]
= α
Page 15 of 33
Question 5 [10 marks]
Professional actuarial exams are hard, and often require several attempts to complete. Out of 50
actuaries randomly selected who completed one specific exam, the table below summarises the number
of attempts they needed to pass.
# Attempts Needed # Actuaries
1 19
2 13
3 7
4 4
5 1
6 4
7 2
For question (a), (b) and (c), assume the following to hold true:
• Candidates are independent from each other.
• Candidates are of equal strength.
• All trials have constant probability p of success, and are independent of previous attempts.
(a) [2 marks] If X denotes the number of attempts a given candidate requires, what random variable
would you use to model X? Justify your answer.
Given the assumptions of independence and identical probability of success p, X would follow a
Geometric(p) distribution.
Page 16 of 33
(b) [3 marks] Using a technique of your choice, propose an estimator for p and compute its numerical
value.
We can see the whole data as being one big sample from Bernoulli trials. there have been 50
success out of a total number of trials of
19 + 13 · 2 + 7 · 3 + . . .+ 2 · 7 = 125.
Hence, a simple Method of Moments estimator would be
pˆ = Y¯ =
50
125
= 0.4.
(c) [2 marks] Let X1, X2, . . . , X50 be the number of attempts needed by each of the 50 candidates.
Say you wanted to conduct an hypothesis test on the hypothesis that
Xi ∼ Some Specified Distribution, i ∈ {1, 2, . . . , 50}.
Name two possible tests you could use.
Anderson-Darling, Cramer-von Mises, Kolmogorov-Smirnoff, Kuiper or the chi-squared test are
all valid options.
(this is not required, but if we do the χ2 test for real): we find the expected number of observa-
tions for each x. Using p(x) = p(1− p)x−1 = (1/2)x we have
x # Expected
1 50 · 0.5 = 25
2 50 · 0.52 = 12.5
3 50 · 0.53 = 6.25
4+ 50− 25− 12.5− 6.25 = 6.25
And the test statistic is
T =
∑
i
(Ei −Oi)2
Ei
=
(25− 19)2
25
+
(12.5− 13)2
12.5
+
(6.25− 7)2
6.25
+
(6.25− 11)2
6.25
= 1.44 + 0.02 + 0.09 + 3.61 = 5.16.
Because no parameters were estimated, the d.o.f. for the χ2 are 4− 1 = 3. Next, χ23,95% = 7.815
which is bigger than the test statistic, hence we do not reject H0.
Page 17 of 33
(d) [3 marks] Are the three assumptions reasonable? Explain your answer for each assumption.
The is no `perfect answer' here. They should get the marks for coherent reasoning. Some
elements of answer:
Candidates are independent: It is reasonable to a certain extent because the exams are
individually done and each candidate is responsible for its own study. However, one can
make the case that some candidates may be good friends studying together and encouraging
themselves to work hard. In that case the success of one candidate may be linked to that
of another, so that some candidates are not perfectly independent.
Candidates are of equal strength: This one is harder to justify: within all the people
attempting those exams, it would make sense that some would be more apt and more
determined than others.
All trials have constant probability p of success, and independently of previous attempts:
This one is also hard to justify: it would seem that a candidate redoing an exam has a
better chance of success, because they should know a bit more with at each attempt.
Page 18 of 33
Question 6 [17 marks]
You want to model students results in ACTL1101 final exam (y) using their results in ACTL1101
midterm exam (x) through a simple linear regression model. For those 300 students, at first you
consider a model without intercept, i.e. for each student i = 1, 2, . . . , 300, your model is
yi = βxi + εi, εi ∼ N(0, σ2).
You also consider a model with an intercept, i.e.
yi = α+ βxi + εi, εi ∼ N(0, σ2).
You fit both models and some of the numerical results are presented in the table below,
Model α (std error) β (std error) Residual std error SST SSE
No intercept NA 1.021 (0.011) 10.91 70612.82 35557.27
With intercept 21.823 (1.434) 0.671 (0.024) 8.19 70612.82 20005.12
where SST stands for the total sum of squares and SSE stands for the sum of squared errors.
(a) [4 marks] For the `no intercept model', derive the least squares estimator of β.
Call the sum of squared errors S(β), with S(β) =
∑
(yi − βxi)2. We want to minimise this
quantity, hence we take its derivative and set it to 0:
dS
dβ
= 2
n∑
i=1
(yi − βxi)(−xi) = 0
=⇒
∑
xiyi = β
∑
x2i
=⇒ β̂ =
∑
xiyi∑
x2i
.
The second derivative
d2S
dβ2
= 2
∑
x2i > 0 hence we have a minimum (this part is not necessary
for students to get full marks).
(b) [2 marks] Explain why SST is the same for both models?
By definition SST is
∑n
i=1(yi − y¯)2 which is unaffected by the choice of model (involves only
observations).
Page 19 of 33
(c) [4 marks] Based on the information you have, which model do you consider better? Provide two
reasons.
Quite clearly, the model with intercept is better. Possible answers:
The respective R2 for both models are 1 − SSE
SST
= 0.496 for the `no intercept' model and
0.717 for the full model. Based on this criteria measuring how well the model captures the
variability of data (the dependent variable y), the full model (with intercept) is better.
The residual standard error is smaller for the full model, which is another sign it captures
better the trend of the data (because the variability of the unexplained error is smaller).
In the full model, both parameters α and β are highly significant (you can tell by how narrow
the standard errors are), which suggests this model is an improvement over a model with
no intercept.
We will accept as valid the argument that the `no intercept' model still explains at least
partially the trend in data and is a simpler, which is often preferred.
Likewise, we will accept as a valid argument if someone coherently argues that it `makes
more sense' to have a null intercept (arguing for instance that someone who gets 0 in the
midterm can be expected to get 0 in the final).
Page 20 of 33
(d) [4 marks] What additional information would you like to obtain in order to refine your judgement
on the validity of these models? Provide two examples and justify why the information can be
useful.
The answer should revolve around the fact we don't know how our residuals look like, so we
don't know if the assumptions of Normality and Homoscedasticity are verified. Elements to help
us would be:
Plot of residuals ε̂ vs fitted dependent variable ŷ
Plot of residuals ε̂ vs explanatory variable x
Formal tests of Homoscedasticity (e.g. Breusch and Pagan test)
QQ plot of residuals against a Normal
Formal tests of Normality of the residuals
Will also be accepted (as one additional information) if students mention they would like to
have the t-values and p-values associated with the estimates of α and β, to better judge the
significance of the parameters.
Is not accepted as a valid answer: investigation of multicollinearity (since this concept is relevant
only in multiple linear regression).
Page 21 of 33
(e) [3 marks] You forgot to include Josephine's mark in your fitting. She got a 85 mark in her
midterm, and you would like to use the `with intercept model' to build a 90% confidence interval
on her specific mark in the final. You are given that
x¯ =
1
300
300∑
i=1
xi = 55.42 and Sxx =
300∑
i=1
x2i = 1,033,989.
We need to compute:∑
x2i − nx¯2 = 1,033,989− 300 · 55.422 = 112,576.08.
The confidence interval for a single point yˆ (also called prediction interval) with observed value
of the explanatory variable x0 is given by
yˆ ± t0.95,n−2 × s×
√(
1 +
1
n
+
(x0 − x¯)2∑
x2i − nx¯2
)
21.823 + 0.671 · 85± 1.65 · 8.19
√(
1 +
1
300
+
(85− 55.42)2
112,576
)
78.858± 13.588 = [65.270, 92.446]
Page 22 of 33
Question 7 [15 marks]
You are investigating the key drivers of lawyer's weekly income (y). You have selected the following
three variables as part of your analysis:
- x1: weekly hours worked (continuous variable)
- x2: age (continuous variable)
- x3: gender (categorical variable: 1 = male, 0 = female)
The model you fit is a multiple linear regression model with the following specification
y = β0 + β1x1 + β2x2 + β3x3 + ε, ε ∼ N(0, σ2).
Based on a sample of 250 lawyers, you have obtained the following results on the fit
Estimate Std. Error t-value p-value
(Intercept) 64.700 78.335 0.826 0.409635
x1 29.335 1.447 20.266 <0.000001
x2 4.870 1.247 3.905 0.000122
x3 587.286 28.442 20.649 <0.000001
In addition, a Normal Q-Q plot of the residuals from your model is given below.
−3 −2 −1 0 1 2 3
−
40
0
−
20
0
0
20
0
40
0
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
[Questions start on the next page.]
Page 23 of 33
(a) [2 marks] Does the Normal Q-Q plot give you confidence in the validity of the fitted model?
Justify your answer.
This plot assesses the hypothesis of Normal residuals. If Normally distributed, the points should
follow the straight line. Although they do not perfectly follow the straight line at low and high
quantiles, there is not cause for real concern here (and especially because the sample quantiles
are `less extreme' than the Normal ones).
If students argue that points at the extremes are not on the reference line, hence the residuals
might not be perfectly Normal, this is considered a valid argument.
(b) [2 marks] From the fitted model, what is the estimated average weekly income of a 65 year old
male lawyer working 35 hours per week?
yˆ = 64.700 + 29.335 · 35 + 4.870 · 65 + 587.286 = 1,995.26$.
Page 24 of 33
(c) [4 marks] Conduct a test of significance level 1% on the null hypothesis that gender has no effect
on lawyers' weekly income. State clearly H0 and H1 in terms of the parameters of the model
and state your conclusion.
If gender has no effect, then it would mean that β3 = 0. That is, we are testing
H0 : β3 = 0 vs. H1 : β3 6= 0.
Importantly, this is a two-sided test (i.e. we want to detect significant differences potentially on
either sides). The t-value and p−value of such a test are given as is in the table as 20.649 and
< 0.000001, which clearly indicate a rejection of the null.
(d) [2 marks] The explanatory variable `age' is significant here. Do you think age could be a con-
founding variable? Justify your answer.
This question is meant for students to discuss what is a confounding variable, in the context of
this question. There is not a specific `good' answer. Any coherent reasoning that is on the topic
should be awarded marks.
A confounding variable is one that affects both the predictor(s) and the response. Here, some-
one could argue that Age could affect income itself (older lawyers could be more competent, or
perceived as more competent and hence being paid more), but could also influence how many
hours they work (maybe younger lawyers need to impress and hence are working longer hours).
We will also accept a different interpretation of this question, i.e. `could there be a confounder
of Age?'. If students interpret this question that way, that is fine. An answer could then be:
Yes, their could be a confounder (not in the model), of which Age is only an observable approx-
imation. We see here that the higher the age, the higher the income. But can we conclude that
being old in itself makes you earn more money? Possible, but questionable. It could be reason-
able to argue that older lawyers simply had more years to obtain expertise, develop a network
of clients and because of that, they earn more. Likewise, older lawyers have had more time to
build a reputation and can therefore arguably charge more per hour to their clients. Something
like `Experience' or `Reputation' would then be the real driver (i.e. a confounder). Any answer
along these lines is accepted.
Page 25 of 33
(e) [2 marks] Provide an interpretation of the value βˆ2 = 4.870. What does this value represent?
In this model, it means that each lawyer earns on average 4.87$ additional dollars per year of
age, per week (and regardless of their gender and number of hours worked).
(f) [3 marks] Consider an alternative model with an interaction between x2 and x3, namely the
model
y = β0 + β1x1 + β2x2x3 + ε, ε ∼ N(0, σ2).
What are the differences between this model and the old one?
The impact of `hours worked' is the same, i.e. pay increases linearly as a function of hours
worked. However, the effect of the other two variables (gender and age) are fairly different. In
this new model:
The key point is: Only if you are a man, your salary will increase linearly with
your age. If you are a women, age has no effect on your salary.
Age alone has no impact on weekly income.
Gender alone has no impact on weekly income.
This is different then in the old model, where both gender and age had a `stand-alone' effect on
weekly income. Said otherwise: for both men and women, age affects income. And for any age,
gender affects income.
End of Paper
Page 26 of 33
Page 27 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 28 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 29 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 30 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 31 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 32 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.
Page 33 of 33
ADDITIONAL PAGE
Answer any unfinished questions here, or
use for rough working.