Problem 1 (10 points) A firm that prefers to stay anonymous spent $10M on sponsored search advertising in 2020. When a consumer searched for the firm’s product, the search engine showed the ad for the product on some searches. The firm’s data showed that customers who searched for the product and saw the ad generated $20M in profit for the firm. You were hired to make an evidence-based decision: whether to spend another $10M on sponsored-search ads in 2021. You have access to all internal data of the firm. You are about to submit a request to your data engineer to pull the data you need to decide. Your goal in this problem is to describe this dataset and explain how you will use to make the decision. (a) As precisely as you can, define the economic question you want to answer. Make sure your question contains a question mark. (b) Define the unit of observation (i.e. if your data are cross-sectional, define i; if your data are a time series, define t, if your data are panel, define both i and t.) (c) Define the outcome variable yit. (This notation implies that your data are panel, but that does not need to be the case). (d) Define the key regressor of interest xit. Explain why it varies from observation to observation. (e) Define control variables wit. (f) Describe possible unobservable factors that will be picked up by uit. (g) Define the main estimating equation/model. What coe cient is your object of interest? (h) State your estimation method: OLS, IV, logit, probit, panel data with fixed e↵ects, etc. How will you construct standard errors? (i) Explain your decision rule, i.e. for what values of the coe cient you will decide to advertise and what values of the coe cient you will decide not to advertise. (j) Will you pay attention to the confidence interval? If yes, how exactly? If not, why? (Be as specific as possible.) 2 Problem 2 (10 points) The following code in R was written to illustrate an econometric (statistical) concept related to hypothesis testing. Unfortunately, the code has blanks ( ). (a) (1 point) Fill in the blanks. (b) (1 point) What is the null hypothesis? (c) (1 point) What is the alternative hypothesis? (d) (2 points) What is the significance level? (e) (2 points) What is the concept? (f) (1 point) Suppose the output that the code generates is: 52. What exactly does this number show? (g) (2 points) Suppose we rerun the code with sigma1 = 10 (instead of the current value of 2). Will the output generated by the code be higher or lower than the original output? Explain why. M = 1000 n = ___ beta0 = ___ beta1 = 0.5 sigma1 = 2 sigma2 = 1 outcome = matrix(0, nrow = M, ncol = 1) threshold = ___ for (m in 1:M){ x = rnorm(n, mean = ___, sd = sqrt(sigma1)) u = rnorm(n, mean = 0, sd = sqrt(sigma2)) y = beta0 + beta1*x + u bhat = summary(lm(y ~ x))$coefficients [2,___] se = summary(lm(y ~ x))$coefficients [2,___] outcome[m] = (bhat/se) > threshold } round(mean(outcome)*100) 3 Problem 3 (10 points) Researchers from Columbia University wrote a study to address the following question: how does political representation at the federal level a↵ect a county’s economic development? The unit of observation in the study is a New York state county in a given year. The sample includes observa- tions on all New York counties from 2006 to 2016. A county at the federal level may be represented by one or more congressional delegates depending on how the state draws its congressional map. Multiples counties may be represented by the same delegate and multiple delegates can represent the same county. Congressional maps get redrawn after each decennial census of U.S. populations: for this dataset it is shortly after 2010. The study presented the following OLS estimates of a two-way panel data model: byit = 1.66 (0.33) uit + 4.24 (2.09) nit + b↵i + b t;R2 = 0.35, SER = 9.57 where yit is the GDP of county i in year t, uit is the county level unemployment level, nit is the number of congressional delegates that represent county i in year t, ↵i is the county fixed e↵ect, and t is the time fixed e↵ect. As is customary in economics, the validity of the study has been heavily criticized. For each alleged drawback, respond professionally. If you disagree with the criticism, explain why. If you agree, explain how you would re-estimate the regressions to take this criticism into account. (a) (2 points) “All economies grow over time and economies of New York state counties are no exception. The model should have at least included a linear trend. Without it, it su↵ers intolerably from the omitted variable bias.” (b) (2 points) “The model says nothing about the causal e↵ect of congressional representation on economic activity: just look at the R2! It’s so low, it’s simply a joke.” (c) (2 points) “The model doesn’t capture a very important channel: distance to the federal capital. Counties that are closer to Washington, D.C. have higher political influence and therefore will generate higher levels of economic activity. This is a fatal flaw of this study.” (d) (2 points) “The analysis assumes that nit is the variable of interest, while uit is the control variable. But the data reject this assumption: the coe cient on uit is more significant than the coe cient on nit.” (e) (2 points) “The model doesn’t account for sampling uncertainty properly. Even if the model correctly captures the true data generating process, our data sample may not be representative. So, if we saw another data sample generated by the same data generating process, the results would have been drastically di↵erent.” 4 Problem 4 (10 points) The following code in R was written to illustrate an econometric (statistical) concept related to time series. Unfortunately, the code has blanks ( ). (a) (1 point) Fill in the blanks. (b) (2 points) What is the concept? (c) (1 point) Suppose the output that the code generates is: 0.042. What exactly does this number show? (d) (2 points) You need to determine whether this number is statistically di↵erent from zero. Based on all the available information, propose a test that answers this question. (e) (1 point) Calculate the test statistic for the test you proposed. (f) (1 point) Choose the significance level and determine the critical value. (g) (1 point) Draw a conclusion: i.e. determine if the number (0.042) is statistically di↵erent from zero? (h) (1 point) Determine whether you have committed a type 1 or type 2 error. M = 1000 t = 50 t0 = 20 t1 = 30 a = 1 ro = 0.5 y = matrix(0, nrow = M, ncol = t) for (m in 1:M){ y[m,1] = rnorm(1, mean = ___, sd = ___)) for (i in 2:t){ y[m,i] = a + ro*y[m,i-1] + rnorm (1) } } mean(y[,t0]) - mean(y[,t1]) 5
欢迎咨询51作业君