STAT GR5205 Project

10% Final Grade + EC

Due TBA

The STAT GR5205 Project is a “guided-open-ended” case study

intended to be a capstone on the linear regression models class.

1 Data Description

The data comprise of 166 countries and a number of variables. Most of the data comes from

the CIA’s “The World Factbook.” For variable descriptions, please study The World Factbook

website.

A few additional variables are also included on democracy index. This data is taken from Wikipedia:

Democracy Index. The overall democracy index is the average of the other 5 variables: electoral process and pluralism,

function of government, political participation, political culture and civil liberties.

This project has two parts.

I. Test the two research questions stated below in Section 2. To complete this part, you must fit

a multiple linear regression model that includes appropriate variables, functional forms and

interactions. Once deciding on a model, you are only testing variables related to the research

question. You must also include appropriate diagnostics and remedial measures.

II. Build a “predictive linear regression model” intended to predict... further details will be

included soon.

2 Part I: Research Questions

The goal of Part I is to run classic hypothesis testing procedures based on the multiple linear re-

gression model. As a researcher your goal is to study how infant mortality rates and life expectancy

impact the democracy index. The two research questions follow below:

1. Is there a statistically significant relationship between democracy index and infant mortality?

(without controlling for life expectancy)

2. Is there a statistically significant relationship between the democracy index and life ex-

pectancy? (without controlling for infant mortality)

3 Part I: Writeup

Students are required to type up a final report. The final report should be broken up into the

following four sections sections. Section (IV) has several components.

I. Introduction: Include a brief description of the goals of this analysis coupled with some

exploratory data analysis. Your exploratory analysis should includea few important plots

and basic summary statistics that help support the research question. Be creative on the

exploratory analysis and only include items that you feel are informative.

II. Statistical Model: In this section, clearly state your statistical model along with the R

summary output. Be sure to describe all interactions, functional forms and transformations

included in your model. Also include AIC, R2, R2a.

III. Research Question: Perform the relevant testing procedures to answer the two research

questions stated in Section 2. Also include a brief written summary of your results.

IV. Appendix

a. Model:

i. Here you will explain in detail what interactions, functional forms and variables you

decided to include in the model. Describe if and why a transformation is applied to

the response variable. Without overwhelming the TAs, include relevant R output and

plots that helped you arrive at your statistical model.

b. Diagnostics and Model Validation:

i. Include all relevant diagnostic plots.

ii. Include a section on influential observations. For this application, we only care about

testing the slopes related to the research question, thus you don’t need to include

plots for all (DFBETAS)j . Expect a large number of influential observations given

the size of this data set.

iii. Anything you Feel Necessary:

4 Part II: Predictive Model

Will be posted soon..

5 Grading

• This project will be graded on:

1. Completeness (don’t forget to turn in your R file also)

2. Correctness

3. Organization/neatness

4. Creativity

I want to see a nice organized final report. It must be typed with graphs labeled. Please

do not make the report too long! (10 pages or less)

• I am granting extra credit on this report. If the class TA feels that you did an excellent job,

your hard work will be reflected when calculating final grades.

• This is NOT a group project.

