辅导案例-GR5205
STAT GR5205 Project
10% Final Grade + EC
Due TBA
The STAT GR5205 Project is a “guided-open-ended” case study
intended to be a capstone on the linear regression models class.
1 Data Description
The data comprise of 166 countries and a number of variables. Most of the data comes from
the CIA’s “The World Factbook.” For variable descriptions, please study The World Factbook
website.
A few additional variables are also included on democracy index. This data is taken from Wikipedia:
Democracy Index. The overall democracy index is the average of the other 5 variables: electoral process and pluralism,
function of government, political participation, political culture and civil liberties.
This project has two parts.
I. Test the two research questions stated below in Section 2. To complete this part, you must fit
a multiple linear regression model that includes appropriate variables, functional forms and
interactions. Once deciding on a model, you are only testing variables related to the research
question. You must also include appropriate diagnostics and remedial measures.
II. Build a “predictive linear regression model” intended to predict... further details will be
included soon.
2 Part I: Research Questions
The goal of Part I is to run classic hypothesis testing procedures based on the multiple linear re-
gression model. As a researcher your goal is to study how infant mortality rates and life expectancy
impact the democracy index. The two research questions follow below:
1. Is there a statistically significant relationship between democracy index and infant mortality?
(without controlling for life expectancy)
2. Is there a statistically significant relationship between the democracy index and life ex-
pectancy? (without controlling for infant mortality)
1
3 Part I: Writeup
Students are required to type up a final report. The final report should be broken up into the
following four sections sections. Section (IV) has several components.
I. Introduction: Include a brief description of the goals of this analysis coupled with some
exploratory data analysis. Your exploratory analysis should includea few important plots
and basic summary statistics that help support the research question. Be creative on the
exploratory analysis and only include items that you feel are informative.
II. Statistical Model: In this section, clearly state your statistical model along with the R
summary output. Be sure to describe all interactions, functional forms and transformations
included in your model. Also include AIC, R2, R2a.
III. Research Question: Perform the relevant testing procedures to answer the two research
questions stated in Section 2. Also include a brief written summary of your results.
IV. Appendix
a. Model:
i. Here you will explain in detail what interactions, functional forms and variables you
decided to include in the model. Describe if and why a transformation is applied to
the response variable. Without overwhelming the TAs, include relevant R output and
plots that helped you arrive at your statistical model.
b. Diagnostics and Model Validation:
i. Include all relevant diagnostic plots.
ii. Include a section on influential observations. For this application, we only care about
testing the slopes related to the research question, thus you don’t need to include
plots for all (DFBETAS)j . Expect a large number of influential observations given
the size of this data set.
iii. Anything you Feel Necessary:
4 Part II: Predictive Model
Will be posted soon..
2
5 Grading
• This project will be graded on:
1. Completeness (don’t forget to turn in your R file also)
2. Correctness
3. Organization/neatness
4. Creativity
I want to see a nice organized final report. It must be typed with graphs labeled. Please
do not make the report too long! (10 pages or less)
• I am granting extra credit on this report. If the class TA feels that you did an excellent job,
your hard work will be reflected when calculating final grades.
• This is NOT a group project.
3
51作业君 51作业君

扫码添加客服微信

添加客服微信: IT_51zuoyejun