辅导案例-MATH3029

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MATH3029-E1
The University of Nottingham
SCHOOL OF MATHEMATICAL SCIENCES
A LEVEL 3 MODULE, SPRING SEMESTER 2019-2020
APPLIED STATISTICAL MODELLING
Suggested time to complete: TWO Hours THIRTY Minutes
Paper set: 18/05/2020 - 10:00
Paper due: 26/05/2020 - 10:00
Answer ALL questions
Your solutions should be written on white paper using dark ink (not pencil), on a tablet, or
typeset. Do not write close to the margins. Your solutions should include complete
explanations and all intermediate derivations. Your solutions should be based on the material
covered in the module and its prerequisites only. Any notation used should be consistent with
that in the Lecture Notes.
Guidance on the Alternative Assessment Arrangements can be found on the Faculty of Science
Moodle page: https://moodle.nottingham.ac.uk/course/view.php?id=99154#section-2
Submit your answers as a single PDF with each page in the correct orientation, to the
appropriate dropbox on the module’s Moodle page. Use the standard naming
convention for your document: [StudentID]_[ModuleCode].pdf. Please check the
box indicated on Moodle to confirm that you have read and understood the statement
on academic integrity: https://moodle.nottingham.ac.uk/pluginfile.php/6288943/mod_
tabbedcontent/tabcontent/8496/FoS%20Statement%20on%20Academic%20Integrity.pdf
A scan of handwritten notes is completely acceptable. Make sure your PDF is easily readable
and does not require magnification. Text which is not in focus or is not legible for any other
reason will be ignored. If your scan is larger than 20Mb, please see if it can easily be reduced
in size (e.g. scan in black & white, use a lower dpi — but not so low that readability is
compromised).
Staff are not permitted to answer assessment or teaching queries during the assessment
period. If you spot what you think may be an error on the exam paper, note this in your
submission but answer the question as written. Where necessary, minor clarifications or
general guidance may be posted on Moodle for all students to access.
Students with approved accommodations are permitted an extension of 3 days.
The standard University of Nottingham penalty of 5% deduction per working day will
apply to any late submission.
MATH3029-E1 Turn over
MATH3029-E1
Academic Integrity in Alternative Assessments
The alternative assessment tasks for summer 2020 are to replace exams that would have
assessed your individual performance. You will work remotely on your alternative assessment
tasks and they will all be undertaken in “open book” conditions. Work submitted for
assessment should be entirely your own work. You must not collude with others or employ the
services of others to work on your assessment. As with all assessments, you also need to avoid
plagiarism. Plagiarism, collusion and false authorship are all examples of academic misconduct.
They are defined in the University Academic Misconduct Policy at: https://www.nottingham.ac.
uk/academicservices/qualitymanual/assessmentandawards/academic-misconduct.aspx
Plagiarism: representing another person’s work or ideas as your own. You could do this by
failing to correctly acknowledge others’ ideas and work as sources of information in an
assignment or neglecting to use quotation marks. This also applies to the use of graphical
material, calculations etc. in that plagiarism is not limited to text-based sources. There is
further guidance about avoiding plagiarism on the University of Nottingham website.
False Authorship: where you are not the author of the work you submit. This may include
submitting the work of another student or submitting work that has been produced (in whole
or in part) by a third party such as through an essay mill website. As it is the authorship of an
assignment that is contested, there is no requirement to prove that the assignment has been
purchased for this to be classed as false authorship.
Collusion: cooperation in order to gain an unpermitted advantage. This may occur where you
have consciously collaborated on a piece of work, in part or whole, and passed it off as your
own individual effort or where you authorise another student to use your work, in part or
whole, and to submit it as their own. Note that working with one or more other students to
plan your assignment would be classed as collusion, even if you go on to complete your
assignment independently after this preparatory work. Allowing someone else to copy your
work and submit it as their own is also a form of collusion.
Statement of Academic Integrity
By submitting a piece of work for assessment you are agreeing to the following statements:
1. I confirm that I have read and understood the definitions of plagiarism, false authorship
and collusion.
2. I confirm that this assessment is my own work and is not copied from any other person’s
work (published or unpublished).
3. I confirm that I have not worked with others to complete this work.
4. I understand that plagiarism, false authorship, and collusion are academic offences and I
may be referred to the Academic Misconduct Committee if plagiarism, false authorship or
collusion is suspected.
MATH3029-E1 Turn over
1 MATH3029-E1
Submission instructions
• Release and submission times are with respect to British Standard Time (BST). Please plan
accordingly.
• Please take time to write clearly and neatly. This is especially important since you will be
handing in scanned documents. If I can’t read your writing clearly, I will not be able to
mark appropriately.
• In accordance with University guidelines for this assessment, please write your name and
student id on the first page of your submitted document.
• It is your responsibility to ensure that the requirements for a valid submission on moodle
are met (e.g. file size; invalidity of ‘draft’ submissions). Please try and submit ahead of
time to avoid complications close to the deadline.
MATH3029-E1
2 MATH3029-E1
1. (a) Consider the one-way ANOVA model
= + + , = 1, 2, 3; = 1, 2.
Assume that are IID Normal random variables with () = 0 and () =
2 > 0
for all , .
i) Suppose the model is used to determine efficacy of three drugs A, B and C on
cholestrol levels of patients. Interpret within this context each term in the model
above, and the corresponding assumptions.
ii) For the model above construct the corresponding design matrix , the vector of
responses , the vector of regression coefficients , and the error vector . Justify
why the least squares estimator ()−1 of cannot be computed without
further constraints.
[15 marks]
(b) A farmer wanted to compare four types of wheat to find which gives greatest yield.
Since he suspected growing conditions might vary across his field, he divided the field
into four plots and performed experiments which led to the following data on yield (in
tonnes).
Plot 1 Plot 2 Plot 3 Plot 4 Sum
Wheat 1 6.5 6.6 6.3 5.9 25.3
Wheat 2 7.2 6.4 6.4 6.2 26.2
Wheat 3 6.3 6.1 5.9 5.9 24.2
Wheat 4 6.4 6.4 6.3 6.1 25.2
Sum 26.4 25.5 24.9 24.1 100.9
Note: ∑4=1∑
4
=1
2
= 637.85.
i) What type of design has been used by the farmer?
ii) Explain how you would ensure this design is randomised.
iii) Write down an appropriate model for this experiment, clearly defining your notation
and explaining any assumptions you make.
iv) Calculate the ANOVA table for this data.
v) Test for the significance of wheat type and comment on your findings.
[25 marks]
MATH3029-E1 Turn Over
3 MATH3029-E1
2. (a) Show that the pdf of a normal distribution with mean ∈ ℝ and variance 1 belongs to
the one-parameter GLM family. Clearly identify , (⋅), (⋅, ⋅), and (⋅). [5 marks]
(b) Suppose , = 1,… , are IID (0, 1) random variables. Denote by and their
pdf and cdf (cumulative distribution function), respectively. For real numbers , define
= 1 if ≤ or = 0 otherwise.
i) For fixed , write down the joint distribution of .
ii) Consider = 1 + 2 with = 1,… , , where are real-valued. Using ,
write down the log-likelihood function (1, 2). Also show that the score statistic
=
(
1
2 )
, where = /, = 1, 2 is:
1 =


=1
[
(1 + 2)
(1 + 2)

(1 − )(1 + 2)
1 − (1 + 2) ]
2 =


=1
[
(1 + 2)
(1 + 2)

(1 − )(1 + 2)
1 − (1 + 2) ]
iii) Verify that () = .
iv) Why is −1 ∶ [0, 1] → ℝ a valid link function for linking () with ?
[20 marks]
(c) In a study examining relationship between Alzheimer’s disease (yes=1 and no=0) and
Age on 98 people, a binary logistic regression model was used. Output from R is given
on the next page.
i) Using Output1: (1) interpret, in the context of the problem, the estimate of the
Age parameter, and (2) explain the values obtained for the degrees of freedom.
ii) UsingOutput1 explain, using the GLM form of a Bernoulli distribution, the statement:
‘Dispersion parameter for binomial family taken to be 1’.
iii) Information on economic status (‘Lower’, ‘Middle’, ‘Higher’) of each person was
added to the model containing Age. UsingOutput1 andOutput2 perform a Deviance
test to ascertain if economic status has a significant relationship with the chances
of being diagnosed with Alzheimer’s.
iv) Using Output2 predict the probability of being diagnosed with Alzhemeir’s for a
person aged 48 and classified as having a ‘Lower’ economic status.
[15 marks]
MATH3029-E1
4 MATH3029-E1
Output 1:
Estimate Std.Error z value Pr(>|z|)
(Intercept) -1.62437 0.40575 -4.003 6.25e-05 ***
Age 0.03183 0.01204 2.644 0.00819 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 114.91 on 96 degrees of freedom
Output 2:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.49037 0.52223 -2.854 0.00432 **
Age 0.03127 0.01247 2.507 0.01216 *
Lower -0.70309 0.56145 -1.252 0.21047
Middle 0.37988 0.55692 0.682 0.49517
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 122.32 on 97 degrees of freedom
Residual deviance: 111.50 on 94 degrees of freedom
MATH3029-E1 Turn Over
5 MATH3029-E1
3. (a) i) Give an example of an offset in a Poisson GLM.
ii) How would you test for the significance of an offset variable in a Poisson GLM?
iii) Suppose are independent Poisson random variables with mean , offset , and
rate for = 1,… ,. With as responses, consider a Poisson GLM with log link
function consisting of a single real-valued predictor with regression coefficient
. Show that, for each = 1,… , the rate parameter changes by a factor of
1
when increases by one unit.
[15 marks]
(b) The data below is on the monthly accident counts on a major US highway for each of
the 12 months of 1970, then for each of the 12 months of 1971, and finally for the first
9 months of 1972.
1970 52 37 49 29 31 32 28 34 32 39 50 63
1971 35 22 27 27 34 23 42 30 36 56 48 40
1972 33 26 31 25 23 20 25 20 36
Output from R showing results from fitting a GLM modelling number of accidents with
appropriately defined predictors year and month is provided below.
Call:
glm(formula = y~year + month, family = poisson)
Coefficients:
Estimate Std. Error z value Pr(> |z|)
(Intercept) 3.81969 0.09896 38.600 < 2e − 16 ***
Year1971 -0.12516 0.06694 -1.870 0.061521 .
Year1972 -0.28794 0.08267 -3.483 0.000496 ***
month2 -0.34484 0.14176 -2.433 0.014994 *
month3 -0.11466 0.13296 -0.862 0.388459
month4 -0.39304 0.14380 -2.733 0.006271 **
month5 -0.31015 0.14034 -2.210 0.027108 *
month6 -0.47000 0.14719 -3.193 0.001408 **
month7 -0.23361 0.13732 -1.701 0.088889 .
month8 -0.35667 0.14226 -2.507 0.012168 *
month9 -0.14310 0.13397 -1.068 0.285444
month10 0.10167 0.13903 0.731 0.464628
month11 0.13276 0.13788 0.963 0.335639
month12 0.18252 0.13607 1.341 0.179812
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 101.143 on 32 degrees of freedom
Residual deviance: 27.273 on 19 degrees of freedom
Number of Fisher Scoring iterations: 3
MATH3029-E1
6 MATH3029-E1
i) Write down the mathematical model fitted along with assumptions.
ii) Based on the output, is it fair to state that the average number of accidents appears
to have decreased from 1970 to 1972? Justify your answer.
iii) The Transport Authority wishes to check if the number of accidents tend to be
higher from September-December when compared to January. What would be
your recommendation? Justify accordingly.
iv) Construct a 95% confidence interval for the coefficent of Year1972 in the model
in i), and corroborate the conclusion obtained from the p-value corresponding to
Year1972 in the output.
v) What is your prediction for the number of accidents in October 1972?
[25 marks]
MATH3029-E1 END
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468