辅导案例-STATS 2107

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STATS 2107
Statistical Modelling and Inference II
Project
Sharon Lee
Semester 2 2019
Details
• Due date: Friday 1st Nov 2019, 5pm.
• Group submission (groups of up to four people) self-select yourselves into groups.
• A single report is to be submitted for each group.
• The report is to be typed in Rmarkdown.
• You must submit two files: the Rmarkdown (Rmd) file containing the full analysis report, and the
compiled pdf or html document. (Note that marks will be graded based on the Rmd file.)
• Each person also completes Self and Peer Learning and Assessment Tool (SPLAT) to indicate the
contribution completed by each team member.
Data
The dataset provided is a subset of the 1986 Obstetric Audit, which includes births at the Flinders Medical
Centre, Bedford Park, South Australia, during 1986. Each year, patients of the Department of Obstetrics
and Gynaecology at the Flinders Medical Centre is interviewed regarding a wide range of medical history
and basic demographics. The results of this interview are recorded in the Obstetric Audit for the year. The
provided dataset includes the following variables:
Variable Label Details
id ID number A four-digit unique identifier for each birth
twin_id Twin ID Unique identifier for each twin pair
(twins will have the same Twin ID)
Twins Twin Yes=twins, No=singleton
AgeMother Age of mother in years
AgeFather Age of father in years
Sex Gender of infant 1=male, 2=female
WghtInf Weight of infant in grams
Head Head circumference of infant in millimetres
Length Length of infant (from crown to heel) in millimetres
WghtMother Weight of mother before delivery in kilograms
Gravidity Gravidity 1=first baby, no previous incomplete pregnancy;
2=first baby, previous incomplete pregnancy;
3=second or subsequent baby
Gestation Length of gestation in weeks
Marital Marital status 1=first marriage, 2=second or subsequent marriage,
3=unmarried
Smoking_1stTrim Smoking during first trimester 0=no, 1=yes
1
Variable Label Details
Smoking_2ndTrim Smoking during second trimester 0=no, 1=yes
Labour Labour onset 1=born before arrival at hospital, 2=spontaneous,
3=induced, 4=elective caesarian section
Insure Health insurance 1=private patient, 2=health service
Goal
Find a predictive model for the weight of infant (in singleton pregnancies only) by considering the following
predictors:
• age of mother,
• age of father,
• gender of infant,
• weight of mother before delivery,
• gravidity,
• length of gestation,
• any smoking during pregnancy [0=no, 1=yes] (Note: you will need to create this variable from the data
given), and
• marital status.
Marks
The project is worth 10% of the final mark of SMI. The breakdown is as follows
Section Marks
Introduction 5
Data description 5
Data filtering 5
Variable description 5
Model fitting and selection 25
Final model 10
Assumption checking 10
Prediction 20
Conclusion 10
Formatting 5
Total 100
Each member of the group will receive the final mark for the project unless the SPLAT indicates that
indiviuals have not contributed to the project. In that case penalties will apply.
Description of sections
In the following subsections, an indication is given of what each section in the final report should contain as
a minimum.
2
Introduction
The problem is introduced with an outline of the steps involved in the analysis.
Data description
The data is described including identifying the subjects, the variables, which of the variables are predictors,
and which are response variables. The number of subjects and variables must also be given. For each variable,
it and its levels are explained in context.
Data filtering
Each step of the data filtering performed is described with illustratory code and its output. The reasons and
eect of the cleaning is described. For example, any excluded subjects should be noted and the numbers
excluded given along with summary statistics both before and after cleaning.
Variable description
For each variable considered, there should be a section giving the type of variable, summary statistics and a
plot to illustrate its distribution. Summary statistics are best given in a table and referred to in the text. A
discussion of the distribution of each variable must be given.
Model fitting and selection
The model fitting process is described with the type of algorithm used, and the choice of heuristic discussed.
The various models explored are compared.
Final model
The final model is given and the coecients interpreted in context.
Assumption checking
The assumptions of the final model are checked with accompanying tables / figures to support the checking.
Prediction
Generate a plot of predicted weight of infant (with shaded 95% confidence interval bands) for the following
variables:
• weight of mother before delivery, and
• length of gestation.
Conclusion
Summarise your analysis and findings in context.
3
Formatting
• All figures and tables should be appropriately captioned and cross-referenced in the text.
• Correct use of grammar and spelling.
• You need a title page with
– title,
– authors, and
– date.
Code
The R markdown file of the report (including all analysis codes) must be submitted along with the compiled
PDF or HTML document.
Bonus challenge (Twins) 50 marks
Repeat the analysis investigating the eect of the following predictors on the weight of infant, taking into
account both singleton and twins:
• age of mother,
• gender of infant,
• weight of mother before delivery,
• gravidity,
• length of gestation, and
• any smoking during pregnancy.
Note that the samples are no longer considered independent due to the inclusion of twns.
Hint: Look into Linear mixed eects models (LME) or Generalized estimating equations (GEE) for how to
handle clustered (i.e. dependent) data.
4
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468