STAT331: Final Project Due: August 4, 2021 at 5pmET on Crowdmark General Instructions • Due: August 4 at 5pm. • Each group consists of 3–4 students (see below). Students who have not enrolled in a group by Wednesday July 12th will be randomly assigned to a group. • Each project consists of a typed report between 7-10 pages (12 point font with standard 1-inch margins and single-spaced) including figures and tables, but excluding a mandatory Appendix containing (but not limited to) all R code. • Reports may be written in R Markdown, LaTeX, Word, or any other reasonable format as long as all R code is included in the appendix • Reports must be submitted online via Crowdmark • Late Penalty: 10% per day. Projects turned in after August 8 will not be graded. • Your project grade will be worth 35% of your final grade Group Enrolment • Log in to LEARN and join a Group: At the top of the screen, click Connect > Groups. • Agree on a Group number between 1–70 with your other team members, and select a group. • The names of all collaborators must be written on your report. Project Details Data The dataset pollution.Rdata (posted on LEARN) contains a sample of n = 1000 births included in a study investigating the relationship between several chemical and non-chemical exposures during pregnancy and birthweight. Specific variable names and descriptions (including any variable transformations that have been applied–e.g. some variables are log-transformed) can be found in codebook.csv. The outcome of interest is birthweight in grams (variable e3 bw). The data include several possible exposures of interest, including chemical exposures measured in mother’s blood/urine/hair, as well as outdoor exposures measured in the surrounding environment. The data also include 7 other covariates (maternal age, education, bmi, weight gain during pregnancy, child’s year of birth, gestational age at birth, and sex). 1 Goals The goal of this project is to analyze the pollution.Rdata data and write a report on your anal- ysis. The specific goals of your analysis are up to you to decide. Examples could include: building the best possible predictive model for birthweight; investigating interactions among chemical expo- sures; identifying the most important predictors of low birthweight; evaluating how much chemical exposures improve predictions; many others! You can be creative here—the more specific and in- teresting the goal, the better. You can use these data in any way you like, but birthweight must be your outcome. Report Your 7–10 page report must contain the following components: 1. Summary: • A maximum of 200 words describing the objective of the report, an overview of the statistical analysis, and summary of the main results. 2. Objective: • Describe your goals for the analysis. 3. Exploratory Data Analysis: • Conduct exploratory data analyses: report summary statistics, visualize data (his- tograms, scatter plots, etc.). Report on any interesting findings and comment on how these inform the rest of your analysis. 4. Methods: • Describe your statistical analysis: What is your model? Did you use any transformations or extensions of the basic multiple linear regression model? How did you select a model? Does the model fit the data well? Are the necessary assumptions met? Be sure to explain and justify your decisions. 5. Results: • Report on the findings of your analysis 6. Discussion: • Comment on your findings/conclusions; describe any limitations of your analysis. Grading • Project grades will consider the following: – All required components are included. – Ideas are well organized (please use the sections as described above, but you can further divide material into subsections as appropriate) – Ideas are clearly expressed, and written in complete sentences. – Subjective analysis decisions are reasonable and well-justified. – Statistical challenges are well described and addressed appropriately 2 – The most important/relevant results and findings are shown and discussed in the report (optionally, any supporting analyses or results you wish to show can be included in the Appendix). – Results are presented and interpreted correctly – Analysis limitations are acknowledged – Conclusions are insightful and well-justified – Choice of tables and figures is e↵ective – Tables and figures are well presented: captions and labels are informative; axes/scales are appropriate; no needless digits (3 or 4 significant digits is usually ok); no wasted space – R Code is clear, well-commented and reproducible 3
欢迎咨询51作业君