ECON 2300: INTRODUCTORY ECONOMETRICS

Coordinator: Dr. Antonio Peyrache

Research Project 1

Due: 4pm on 19 September

Background

You are interested in estimating the effect of education on earnings. The data file cps4 small.dta

contains 1,000 observations on hourly wage rates, education, and other variables from the 2008 Current

Population Survey (CPS):

• wage: earnings per hour

• educ: years of education

• exper: post education years experience

• hrswk: working hours per week

• married: dummy for married

• female: dummy for female

• metro, midwest, south, west: location dummies

• black: dummy for black

• asian: dummy for Asian

Submission of your report

Your report must be single-spaced and in 12 Font size. You should give your answer to each of the

following questions following a similar format of the solutions to the tutorial problem sets. When you

are required to use R, you must show your R command and R outputs (screenshots or figures generated

from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each

question, when you are asked to discuss or interpret, your answer has to be brief and compact. You

will lose 2 points if your answer is needlessly wordy. In addition, you may lose marks for any of the

following: failing to grasp or address core concepts and ideas; poor or ineffective structure; unclear or

illogical flow of ideas; fluffy or unclear arguments; and weak or badly composed arguments. You must

upload your assignment on the course webpage (Blackboard) in PDF format. (Do not submit a hard

copy.)

Research tasks

1. (20 points) Load and explore the main variables.

(a) (7 points) You are given a dataset in the .dta format. Figure out how to load this dataset

in R. Provide your R-commands to load the data. In particular, be clear about which

R-package you install and use. (Hint: use the Internet.)

(b) (13 points) Obtain summary statistics and histograms for the variables wage and educ. For

the histograms, give informative titles and variable names instead of just using the default

titles and variable names. For example, you could use Years of Education in place of

educ. Discuss the data characteristics.

1

2. (25 points) Estimate the linear regression

ln(wagei) = β1 + β2educi + ei.

where ei is the error and β1 and β2 are the unknown population coefficients.

(a) (5 points) Report the estimation results in a common form as introduced in the lecture note

3. For example, see page 9 of the note 3, where the estimates are presented in an equation

form, along with standard errors and some measures for model fit.

(b) (5 points) Construct a scatter diagram of educ and ln(wage) and plot the estimated re-

gression equation in (a) on the scatter diagram. Give informative title and labels for the

variables, e.g., do not use the default title and labels.

(c) (4 points) Assuming that E[e|educ] = 0, interpret the estimated coefficient on educ (2

points) and test whether or not the population coefficient is zero at the 1 % significance level

(2 points).

(d) (6 point) You suspect that the hourly wage could depend on working hours per week.

Discuss under what condition(s) the estimated coefficients in (a) would be biased due to the

omission of the weekly working hours (2 points). Give a reasonable and intuitive story on

why omission of the weekly working hours would cause omitted variable bias in the regression

in (a) (2 points). Under your story, explain whether the estimated coefficient on educ in (a)

would be overestimated or underestimated (2 points). See pages 4 and 5 of Lecture note 4.

(e) (5 point) The variable hrswk is the average weekly working hours for each individual in the

data. Regress ln(wage) on educ and hrswk. Discuss the estimation results. In particular,

how would you revise your answer in (c)? Are the estimates are statistically significant?

2

3. (40 points) You are concerned about omitted variable bias in the regressions of Question 1. For

that reason, you decide to regress ln(wage) on all other variables in the dataset and use this model

as a benchmark.

(a) (11 points) Report a 95% confidence interval for the estimated slope parameter of educ

(3 points), explain the relationship between confidence intervals and hypothesis testing (4

points), and test the hypothesis that one year of additional education would increase hourly

wage by 12% (4 points).

(b) (7 points) Assuming there is no omitted variable bias, discuss the estimated coefficient on

female in the benchmark model. In particular, explain what the estimated coefficient on

female means on hourly wage (3 points), compare the effect being female has on hourly

wage, with the effect that one additional year of education has on hourly wage (2 points),

and discuss whether the effect of being female on hourly wage is significantly different from

zero (2 points).

(c) (5 points) Using the estimation results of the benchmark model, test the hypothesis that

the hourly wage is not affected by the geographic location. Explain how you reach your

conclusion. (Hint: use package car.)

(d) (5 points) Using the estimation results of the benchmark model, test the hypothesis that the

wage differential associated with African American is equal to the wage differential associated

with Asian American. Explain how you reach your conclusion. (Hint: use package car.)

(e) (7 points) How would you modify the benchmark model to estimate the effects on hourly

wage of one additional year of education separately for each gender (4 points). How do the

effects of education differ between the genders and is the difference statistically significant?

(3 points)

(f) (5 point) Keoka is an African American woman, working in a metropolitan area. After she

obtained her high school diploma, she got a job and started working instead of getting a

higher education. She has never been married. Now she has five years of experience in the

industry and is working full time (40 hours a week). Using the benchmark model, predict

her hourly wage.

Be careful: the left-hand side variable is ln(wage), but you should predict Keoka’s wage.

4. (15 points) It may be more useful to estimate the effect on earnings of education by using the

highest diploma/degree rather than years of schooling. Define four dummy variables to indicate

educational achievements;

• lt hs = 1 if educ < 12

• hs = 1 if educ = 12

• col = 1 if educ ≥ 16

• some col = 1 for all other values of educ.

(a) (6 points) Create the dummy variables (lt hs, hs, col, some col) as defined above (3

points) and compute the sample means of hourly wage for each of the four education cate-

gories (3 points).

(b) (9 points) Regress wage on the four dummies (lt hs, hs, col, some col). You will face a

problem. What is the problem here? Under what circumstances would you face the problem

(4 points). To avoid it, you now regress wage on three dummies (lt hs, col, some col)

excluding hs. Interpret the estimated coefficients and compare the estimation results with

the findings in (a) (5 points).

3

欢迎咨询51作业君