Problem 1 (10 points)

A firm that prefers to stay anonymous spent $10M on sponsored search advertising in 2020. When

a consumer searched for the firm’s product, the search engine showed the ad for the product on

some searches. The firm’s data showed that customers who searched for the product and saw the

ad generated $20M in profit for the firm. You were hired to make an evidence-based decision:

whether to spend another $10M on sponsored-search ads in 2021. You have access to all internal

data of the firm. You are about to submit a request to your data engineer to pull the data you

need to decide. Your goal in this problem is to describe this dataset and explain how you will use

to make the decision.

(a) As precisely as you can, define the economic question you want to answer. Make sure your

question contains a question mark.

(b) Define the unit of observation (i.e. if your data are cross-sectional, define i; if your data are a

time series, define t, if your data are panel, define both i and t.)

(c) Define the outcome variable yit. (This notation implies that your data are panel, but that does

not need to be the case).

(d) Define the key regressor of interest xit. Explain why it varies from observation to observation.

(e) Define control variables wit.

(f) Describe possible unobservable factors that will be picked up by uit.

(g) Define the main estimating equation/model. What coecient is your object of interest?

(h) State your estimation method: OLS, IV, logit, probit, panel data with fixed e↵ects, etc. How

will you construct standard errors?

(i) Explain your decision rule, i.e. for what values of the coecient you will decide to advertise

and what values of the coecient you will decide not to advertise.

(j) Will you pay attention to the confidence interval? If yes, how exactly? If not, why? (Be as

specific as possible.)

2

Problem 2 (10 points)

The following code in R was written to illustrate an econometric (statistical) concept related to

hypothesis testing. Unfortunately, the code has blanks ( ).

(a) (1 point) Fill in the blanks.

(b) (1 point) What is the null hypothesis?

(c) (1 point) What is the alternative hypothesis?

(d) (2 points) What is the significance level?

(e) (2 points) What is the concept?

(f) (1 point) Suppose the output that the code generates is: 52. What exactly does this number

show?

(g) (2 points) Suppose we rerun the code with sigma1 = 10 (instead of the current value of 2).

Will the output generated by the code be higher or lower than the original output? Explain

why.

M = 1000

n = ___

beta0 = ___

beta1 = 0.5

sigma1 = 2

sigma2 = 1

outcome = matrix(0, nrow = M, ncol = 1)

threshold = ___

for (m in 1:M){

x = rnorm(n, mean = ___, sd = sqrt(sigma1))

u = rnorm(n, mean = 0, sd = sqrt(sigma2))

y = beta0 + beta1*x + u

bhat = summary(lm(y ~ x))$coefficients [2,___]

se = summary(lm(y ~ x))$coefficients [2,___]

outcome[m] = (bhat/se) > threshold

}

round(mean(outcome)*100)

3

Problem 3 (10 points)

Researchers from Columbia University wrote a study to address the following question: how does

political representation at the federal level a↵ect a county’s economic development? The unit of

observation in the study is a New York state county in a given year. The sample includes observa-

tions on all New York counties from 2006 to 2016. A county at the federal level may be represented

by one or more congressional delegates depending on how the state draws its congressional map.

Multiples counties may be represented by the same delegate and multiple delegates can represent

the same county. Congressional maps get redrawn after each decennial census of U.S. populations:

for this dataset it is shortly after 2010.

The study presented the following OLS estimates of a two-way panel data model:

byit = 1.66

(0.33)

uit + 4.24

(2.09)

nit + b↵i + bt;R2 = 0.35, SER = 9.57

where yit is the GDP of county i in year t, uit is the county level unemployment level, nit is the

number of congressional delegates that represent county i in year t, ↵i is the county fixed e↵ect,

and t is the time fixed e↵ect.

As is customary in economics, the validity of the study has been heavily criticized. For each

alleged drawback, respond professionally. If you disagree with the criticism, explain why. If you

agree, explain how you would re-estimate the regressions to take this criticism into account.

(a) (2 points) “All economies grow over time and economies of New York state counties are

no exception. The model should have at least included a linear trend. Without it, it su↵ers

intolerably from the omitted variable bias.”

(b) (2 points) “The model says nothing about the causal e↵ect of congressional representation on

economic activity: just look at the R2! It’s so low, it’s simply a joke.”

(c) (2 points) “The model doesn’t capture a very important channel: distance to the federal capital.

Counties that are closer to Washington, D.C. have higher political influence and therefore will

generate higher levels of economic activity. This is a fatal flaw of this study.”

(d) (2 points) “The analysis assumes that nit is the variable of interest, while uit is the control

variable. But the data reject this assumption: the coecient on uit is more significant than the

coecient on nit.”

(e) (2 points) “The model doesn’t account for sampling uncertainty properly. Even if the model

correctly captures the true data generating process, our data sample may not be representative.

So, if we saw another data sample generated by the same data generating process, the results

would have been drastically di↵erent.”

4

Problem 4 (10 points)

The following code in R was written to illustrate an econometric (statistical) concept related to

time series. Unfortunately, the code has blanks ( ).

(a) (1 point) Fill in the blanks.

(b) (2 points) What is the concept?

(c) (1 point) Suppose the output that the code generates is: 0.042. What exactly does this

number show?

(d) (2 points) You need to determine whether this number is statistically di↵erent from zero.

Based on all the available information, propose a test that answers this question.

(e) (1 point) Calculate the test statistic for the test you proposed.

(f) (1 point) Choose the significance level and determine the critical value.

(g) (1 point) Draw a conclusion: i.e. determine if the number (0.042) is statistically di↵erent

from zero?

(h) (1 point) Determine whether you have committed a type 1 or type 2 error.

M = 1000

t = 50

t0 = 20

t1 = 30

a = 1

ro = 0.5

y = matrix(0, nrow = M, ncol = t)

for (m in 1:M){

y[m,1] = rnorm(1, mean = ___, sd = ___))

for (i in 2:t){

y[m,i] = a + ro*y[m,i-1] + rnorm (1)

}

}

mean(y[,t0]) - mean(y[,t1])

5

欢迎咨询51作业君