STAT331: Final Project

Due: August 4, 2021 at 5pmET on Crowdmark

General Instructions

• Due: August 4 at 5pm.

• Each group consists of 3–4 students (see below). Students who have not enrolled in a group

by Wednesday July 12th will be randomly assigned to a group.

• Each project consists of a typed report between 7-10 pages (12 point font with standard

1-inch margins and single-spaced) including figures and tables, but excluding a mandatory

Appendix containing (but not limited to) all R code.

• Reports may be written in R Markdown, LaTeX, Word, or any other reasonable format as

long as all R code is included in the appendix

• Reports must be submitted online via Crowdmark

• Late Penalty: 10% per day. Projects turned in after August 8 will not be graded.

• Your project grade will be worth 35% of your final grade

Group Enrolment

• Log in to LEARN and join a Group: At the top of the screen, click Connect > Groups.

• Agree on a Group number between 1–70 with your other team members, and select a group.

• The names of all collaborators must be written on your report.

Project Details

Data

The dataset pollution.Rdata (posted on LEARN) contains a sample of n = 1000 births included

in a study investigating the relationship between several chemical and non-chemical exposures

during pregnancy and birthweight. Specific variable names and descriptions (including any variable

transformations that have been applied–e.g. some variables are log-transformed) can be found

in codebook.csv. The outcome of interest is birthweight in grams (variable e3 bw). The data

include several possible exposures of interest, including chemical exposures measured in mother’s

blood/urine/hair, as well as outdoor exposures measured in the surrounding environment. The

data also include 7 other covariates (maternal age, education, bmi, weight gain during pregnancy,

child’s year of birth, gestational age at birth, and sex).

1

Goals

The goal of this project is to analyze the pollution.Rdata data and write a report on your anal-

ysis. The specific goals of your analysis are up to you to decide. Examples could include: building

the best possible predictive model for birthweight; investigating interactions among chemical expo-

sures; identifying the most important predictors of low birthweight; evaluating how much chemical

exposures improve predictions; many others! You can be creative here—the more specific and in-

teresting the goal, the better. You can use these data in any way you like, but birthweight must

be your outcome.

Report

Your 7–10 page report must contain the following components:

1. Summary:

• A maximum of 200 words describing the objective of the report, an overview of the

statistical analysis, and summary of the main results.

2. Objective:

• Describe your goals for the analysis.

3. Exploratory Data Analysis:

• Conduct exploratory data analyses: report summary statistics, visualize data (his-

tograms, scatter plots, etc.). Report on any interesting findings and comment on how

these inform the rest of your analysis.

4. Methods:

• Describe your statistical analysis: What is your model? Did you use any transformations

or extensions of the basic multiple linear regression model? How did you select a model?

Does the model fit the data well? Are the necessary assumptions met? Be sure to explain

and justify your decisions.

5. Results:

• Report on the findings of your analysis

6. Discussion:

• Comment on your findings/conclusions; describe any limitations of your analysis.

Grading

• Project grades will consider the following:

– All required components are included.

– Ideas are well organized (please use the sections as described above, but you can further

divide material into subsections as appropriate)

– Ideas are clearly expressed, and written in complete sentences.

– Subjective analysis decisions are reasonable and well-justified.

– Statistical challenges are well described and addressed appropriately

2

– The most important/relevant results and findings are shown and discussed in the report

(optionally, any supporting analyses or results you wish to show can be included in the

Appendix).

– Results are presented and interpreted correctly

– Analysis limitations are acknowledged

– Conclusions are insightful and well-justified

– Choice of tables and figures is e↵ective

– Tables and figures are well presented: captions and labels are informative; axes/scales

are appropriate; no needless digits (3 or 4 significant digits is usually ok); no wasted

space

– R Code is clear, well-commented and reproducible

3

欢迎咨询51作业君