Problem 1 (10 points)
A firm that prefers to stay anonymous spent \$10M on sponsored search advertising in 2020. When
a consumer searched for the firm’s product, the search engine showed the ad for the product on
some searches. The firm’s data showed that customers who searched for the product and saw the
ad generated \$20M in profit for the firm. You were hired to make an evidence-based decision:
whether to spend another \$10M on sponsored-search ads in 2021. You have access to all internal
data of the firm. You are about to submit a request to your data engineer to pull the data you
need to decide. Your goal in this problem is to describe this dataset and explain how you will use
to make the decision.
(a) As precisely as you can, define the economic question you want to answer. Make sure your
question contains a question mark.
(b) Define the unit of observation (i.e. if your data are cross-sectional, define i; if your data are a
time series, define t, if your data are panel, define both i and t.)
(c) Define the outcome variable yit. (This notation implies that your data are panel, but that does
not need to be the case).
(d) Define the key regressor of interest xit. Explain why it varies from observation to observation.
(e) Define control variables wit.
(f) Describe possible unobservable factors that will be picked up by uit.
(g) Define the main estimating equation/model. What coecient is your object of interest?
(h) State your estimation method: OLS, IV, logit, probit, panel data with fixed e↵ects, etc. How
will you construct standard errors?
(i) Explain your decision rule, i.e. for what values of the coecient you will decide to advertise
and what values of the coecient you will decide not to advertise.
(j) Will you pay attention to the confidence interval? If yes, how exactly? If not, why? (Be as
specific as possible.)
2
Problem 2 (10 points)
The following code in R was written to illustrate an econometric (statistical) concept related to
hypothesis testing. Unfortunately, the code has blanks ( ).
(a) (1 point) Fill in the blanks.
(b) (1 point) What is the null hypothesis?
(c) (1 point) What is the alternative hypothesis?
(d) (2 points) What is the significance level?
(e) (2 points) What is the concept?
(f) (1 point) Suppose the output that the code generates is: 52. What exactly does this number
show?
(g) (2 points) Suppose we rerun the code with sigma1 = 10 (instead of the current value of 2).
Will the output generated by the code be higher or lower than the original output? Explain
why.
M = 1000
n = ___
beta0 = ___
beta1 = 0.5
sigma1 = 2
sigma2 = 1
outcome = matrix(0, nrow = M, ncol = 1)
threshold = ___
for (m in 1:M){
x = rnorm(n, mean = ___, sd = sqrt(sigma1))
u = rnorm(n, mean = 0, sd = sqrt(sigma2))
y = beta0 + beta1*x + u
bhat = summary(lm(y ~ x))\$coefficients [2,___]
se = summary(lm(y ~ x))\$coefficients [2,___]
outcome[m] = (bhat/se) > threshold
}
round(mean(outcome)*100)
3
Problem 3 (10 points)
Researchers from Columbia University wrote a study to address the following question: how does
political representation at the federal level a↵ect a county’s economic development? The unit of
observation in the study is a New York state county in a given year. The sample includes observa-
tions on all New York counties from 2006 to 2016. A county at the federal level may be represented
by one or more congressional delegates depending on how the state draws its congressional map.
Multiples counties may be represented by the same delegate and multiple delegates can represent
the same county. Congressional maps get redrawn after each decennial census of U.S. populations:
for this dataset it is shortly after 2010.
The study presented the following OLS estimates of a two-way panel data model:
byit = 1.66
(0.33)
uit + 4.24
(2.09)
nit + b↵i + bt;R2 = 0.35, SER = 9.57
where yit is the GDP of county i in year t, uit is the county level unemployment level, nit is the
number of congressional delegates that represent county i in year t, ↵i is the county fixed e↵ect,
and t is the time fixed e↵ect.
As is customary in economics, the validity of the study has been heavily criticized. For each
alleged drawback, respond professionally. If you disagree with the criticism, explain why. If you
agree, explain how you would re-estimate the regressions to take this criticism into account.
(a) (2 points) “All economies grow over time and economies of New York state counties are
no exception. The model should have at least included a linear trend. Without it, it su↵ers
intolerably from the omitted variable bias.”
(b) (2 points) “The model says nothing about the causal e↵ect of congressional representation on
economic activity: just look at the R2! It’s so low, it’s simply a joke.”
(c) (2 points) “The model doesn’t capture a very important channel: distance to the federal capital.
Counties that are closer to Washington, D.C. have higher political influence and therefore will
generate higher levels of economic activity. This is a fatal flaw of this study.”
(d) (2 points) “The analysis assumes that nit is the variable of interest, while uit is the control
variable. But the data reject this assumption: the coecient on uit is more significant than the
coecient on nit.”
(e) (2 points) “The model doesn’t account for sampling uncertainty properly. Even if the model
correctly captures the true data generating process, our data sample may not be representative.
So, if we saw another data sample generated by the same data generating process, the results
would have been drastically di↵erent.”
4
Problem 4 (10 points)
The following code in R was written to illustrate an econometric (statistical) concept related to
time series. Unfortunately, the code has blanks ( ).
(a) (1 point) Fill in the blanks.
(b) (2 points) What is the concept?
(c) (1 point) Suppose the output that the code generates is: 0.042. What exactly does this
number show?
(d) (2 points) You need to determine whether this number is statistically di↵erent from zero.
Based on all the available information, propose a test that answers this question.
(e) (1 point) Calculate the test statistic for the test you proposed.
(f) (1 point) Choose the significance level and determine the critical value.
(g) (1 point) Draw a conclusion: i.e. determine if the number (0.042) is statistically di↵erent
from zero?
(h) (1 point) Determine whether you have committed a type 1 or type 2 error.
M = 1000
t = 50
t0 = 20
t1 = 30
a = 1
ro = 0.5
y = matrix(0, nrow = M, ncol = t)
for (m in 1:M){
y[m,1] = rnorm(1, mean = ___, sd = ___))
for (i in 2:t){
y[m,i] = a + ro*y[m,i-1] + rnorm (1)
}
}
mean(y[,t0]) - mean(y[,t1])
5  Email:51zuoyejun

@gmail.com