辅导案例-12SMM/MATH

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

G12SMM/MATH 2011 Statistical Models and Methods
Linear Models, Assessed Coursework — 2019/2020
Please submit your work on Moodle as a pdf file by 3.00pm on Wednesday 13 May 2020.
A link “Assessed Coursework Submission” will appear on Moodle in due course for you to do
this. Your solutions should contain relevant R output needed to justify your answers/arguments,
together with appropriate discussion, but please do not include pages of irrelevant plots/output
which you do not discuss. The easiest way to include R output is to use R Markdown to produce
your solutions, but you do not have to do so. You do not need to include your R code, though
you can include it if you wish. If you are using R Markdown, and do not wish to include your
R code, then you can suppress the R code using the echo = FALSE argument, i.e. enclose the
code in an {r, echo=FALSE} environment in the Markdown file.
There will be a Moodle forum specifically for answering queries about the coursework, so you
may post questions and I will answer them there so that everyone receives the same assistance.
Please be careful to not inadvertently give away parts of your answer if you do post a question.
Note that as this is assessed work, I can only answer queries relating to clarification, and I will
only answer queries via the forum so that everyone can see my responses. You can change
your settings so that you get email notifications of new posts if you wish (I do not think that this
is the default setting). Otherwise, please check the forum to see if your query has already been
asked.
Unauthorised late submission will be penalised by 5% of the full mark per day. Work submitted
more than one week late will receive zero marks. You are reminded to familiarise yourself with the
guidelines concerning plagiarism in assessed coursework (see the student handbook), and note
that this applies equally to computer code as it does to written work.
The work contributes 15% to the overall module mark.
Please contact me if you have concerns about/problems with access to computing resources,
including R access and installation. (This does not include actually using R for the analysis, as
it is expected that you have developed the necessary skills through the computing classes and
unassessed/practice coursework. Questions regarding the actual work should be posted on the
forum.) I have made an additional document “R on your own machine” which is on Moodle,
covering what you should need to complete this work.
The Data
You are a medical statistician who has been tasked with investigating associations between the
birthweight of children and various potential explanatory variables. Data are available regarding
the birthweight of 327 children, together with various other measurements. The data (referred
to as the training data below) are contained in the file BirthTrain.txt on Moodle. The
variables are:
age Age of mother.
gest Gestation period.
sex Sex of child.
smokes Whether the mother smoked during pregnancy, with levels ’No’, ’Light’ and ’Heavy’.
weight Pre-pregnancy weight of mother.
rate Rate of growth of child in the first trimester.
bwt Birthweight of child.
You can read the data into R (after saving the file in your working directory) using
Births <- read.table("BirthTrain.txt",header = TRUE)
The variables ’smokes’ and ’sex’ should be treated as factors, the rest as numerical variables.
After reading in the data, you should first check that R is treating each variable as intended, and
change this behaviour if necessary.
Interest lies in determining the variables associated with birthweight, which could then be inves-
tigated further by medical professionals to understand any possible causal relationships.
Additionally, the file BirthTest.txt contains the same measurements for a further 100 individ-
uals. This is to be used for testing the predictive ability of models, and should not be used
in any model development. This is referred to as the test data.
The Task
(a) Using only the training data, develop a model, or models, for assessing associations
between birthweight (the response variable) and the other variables, and discuss your
findings. See the notes below for what your analysis for this part should contain. [35]
(b) Use your chosen “best” model(s) from (a) to predict the birthweight of the 100 individuals
in the test set. Use appropriate numerical summaries/plots to evaluate the quality of your
predictions. How do the predictions compare to those from the model of the form
bwt = intercept + age + gest + sex + smokes + weight + rate? [15]
Notes
• For part (a), please structure your analysis as follows
– An introduction and exploratory analysis, with appropriate plots and summaries which
highlight important/interesting aspects of the data [10 marks].
– A description of your modelling process, showing how you arrive at your final chosen
model(s) which best explain the data in a parsimonious way. Justification should
involve use of appropriate tests/numerical measures. There may well be more than
one good model [20 marks].
– A non-technical summary of your findings and conclusions, in a manner suitable for
reporting back to medical professionals [5 marks].
Whilst the overall merit of the analysis will also be considered as a whole, around half the
marks will be for doing technically correct and relevant things, and half for discussion and
interpretation of the output.
2
• You do not need to (and should not) submit all the output corresponding to everything
you do or try. For example, in the exploratory analysis, you may look at quite a number of
different plots, and you might do quite a bit of experimentation in the model development
stage. You only need to report the important plots/output which justify your decisions
and conclusions, and whilst there is no word or page limit, an overly-verbose analysis with
unneccessary output will detract from the analysis.
• For the model fitting/selection, you can use any of the techniques we have covered this
semester to investigate potential models — including the automated methods of Chapter
6/Case Study 9 and/or manual hypothesis testing.
• Please make use of the help files for R commands. Many functions have optional arguments
which might be useful. (This is a good general habit to get in to for future R use as well.)
• You do not have to use the methods of Chapter 5, i.e. you do not have to do any
transformations/diagnostic plots or assumption checking. However, you may do this if you
wish and they could assist in model improvement, but you will not be penalised for not
doing so.
• For part (b), you should not be doing any additional model fitting. You are simply using
your final model(s) from part (a) to make predictions of birthweight for the individuals in
the test set, then comparing the predictions with the true known values.
3