程序代写案例-EBA 3530

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Problem set
Causal inference
EBA 3530
Spring 2021
The goal of this problem set is to allow you to test your understanding of the materials covered in lectures
9, 10 and 11. If you master these problems (without relying on suggested solutions), you will most likely
be prepared to solve these kinds of problems on the final exam. A suggested solution will be provided, but
please note that there are in many cases not just one correct answer — it will depend on your own ability
to reason and explain your thinking. For example, It is expected at this stage that you will be able to
come up with possible omitted variables and set them in relation to the treatment- and outcome variables.
It is expected that you also sketch/draw up simple flowcharts showing relationships between variables just
like we have done many times in class. This is a very efficient way for you to communicate your thinking.
However, everyone will come up with different examples which will lead to different conclusions wrt. the
case at hand. Furthermore, the suggested solution will not necessarily reflect a perfect answer that will
award 100% score. The official BI grading scale says the following about the grade of A (= Excellent):
An excellent performance, clearly outstanding. The candidate demonstrates excellent judgement and a
high degree of independent thinking. In order for you to showcase your aptitude for independent
thinking on the exam, it is necessary to not provide solutions that are exhaustive and down to a T —
especially since this is going to be an open-book exam.
Good luck!
1. Randomised trials and linear regression
You work at a medium-sized consultancy firm that specialises in evaluating public transit projects
in Norway. During the COVID-19 pandemic, all workers was forced to work from home. Since
then however, home office is no longer mandatory as the pandemic faded. Many of your colleagues
have continued to work from home rather than to come into the office. You are a leading data
scientist at your firm and your manager (who knows a lot about evaluating railway projects, but
not much else) approaches you with a task. He shows you a spreadsheet where he has recorded each
employees’ work output1 during the previous week as well as whether or not the employees worked
from home or came into the office. He wants to revert to a pre-pandemic working environment as he
misses walking around the open landscape office to check in on his subordinates, but is unsure about
whether that will be good for the firm as some employees have voiced a high degree of satisfaction
with working from home. With the data that he has gathered, he wants to know whether he should
make coming into the office mandatory at the next general meeting (which to his dismay will be
held over Microsoft Office Teams).
(a) Explain to your manager (possible reasons) why you are not able to provide a causal estimate
of the productivity effect based on these two pieces of information. In your answer, you should
recast the problem into a causal inference problem. You can use a combination of words,
drawings and maths.
Your manager is a bit confused by your reasoning, but pulls out the results of a questionable
non-anonymous commuting survey that your firm did back in 2019. The goal of the survey was
to gather information about commuting habits following an initiative in 2018 when employee car
parking facilities were removed to promote greener and healthier commutes. He argues that the
solutions to the problems in a) will for sure be found in these data. In this survey, employees
1Measured as the fraction of completed work during a day divided by the benchmark amount of work to be completed
during a day. If measure is equal to 1, it means that the employee produced exactly as much output as required. If measure
is 0.7, the employee underperformed by 30% etc. This technicality allows for a better comparison across different positions
and departments in the firm.
1
provided information regarding their daily commute time from home and to the office in terms
of number of minutes. He used Google a fair bit and figured out how to run linear regression in
Microsoft Office Excel:
Yi = β0 + β1HOMEi + β2CTimei + ei
HOMEi is a dummy variable that is equal to 1 if individual i works from home and zero otherwise.
CTimei is the individual’s commuting time in number of minutes.
(b) Reassure your manager that he is on the right track. Then try to explain why even in this
case, you should be careful with interpreting β1 as a causal effect. In your answer, you can use
terms like omitted variable bias and reverse causality. You can use a combination of words,
drawings and maths.
Your manager is very displeased by your arguments but is very eager to learn about the effect of
home office on worker productivity. He gives you the mandate to obtain a causal effect by any
means necessary.
(c) Propose a research design that will sidestep the issues discussed above. Briefly discuss why
this approach works. Are there any ethical issues that you should consider? What would
happen if your colleagues do not cooperate? Is it really feasible? You can use a combination
of words, drawings and maths.
2. Instrumental variable methods
We build on the background information given in Problem 1 above. Since your consultancy firm
evaluates transit projects, the senior partners of the firm are huge train nerds. They really enjoy
spending time in the boardroom which features a panoramic view over a large train station and
train depot area. The area is thus served mostly by the railway and bus services are limited.
The commuting survey conducted in 2019 confirms that most employees (to the delight of senior
partners) commute to work by using local and regional trains that call at the station nearby. The
commuting survey also asked for information about what train lines the individual employee relied
on to get to work. However, there are employees who walk or bike to work as well despite having
the option to take the train. The removal of employee parking in 2018 eliminated the use of private
cars for commutes.
(a) Given this information, can you think of a valid instrument that could cause random variation
in treatment? That is, is there randomness that can induce employees to stay home rather
than going to work? Use words and drawings to support your discussion. Here you need to
be a bit creative, but start by discussing what characterises a valid instrument. Maybe that
will give you some inspiration.
(b) Identify who the compliers, always-takers, never-takers and defiers are in this case. Are there
likely to be defiers?
(c) Explain how the instrumental variable approach solves the reverse causality problem discussed
in Problem 1. In this case, it can be useful to again use a drawing and think about the
interpretation of the 1st stage fitted estimate.
(d) Define what we mean by a Local Average Treatment Effect and explain what the LATE will
represent in this case. How does this limit the interpretation of the estimated causal effect?
Is this LATE really the answer your manager is looking for?
(e) Suppose that your manager hands you the data for yesterday in which you observe worker
output, their treatment status that day as well as the value of their instrumental variable. You
estimate the reduced form and the first stage equations and obtain the relevant parameters:
pˆi1 = −0.12 from the reduced form and γˆ1 = 0.4 from the first stage. Give an interpretation of
these estimates. Then compute the 2nd stage parameter βˆ1 and give it an interpretation.
3. Difference in differences
We continue where we left off in Problems 1 and 2. Your manager is really getting impatient
with all your talk about reverse causality and you being LATE for work. As tried to explain to
him, the inclusion of control variables in the regression could not solve reverse causality. And the
2
IV approach, while technically valid, gave a LATE interpretation that really did not answer the
question he asked. Growing more and more frustrated by the day, your manager sends you his
Excel spreadsheet containing productivity data since the beginning of the pandemic and till now.
You start to ponder whether it is possible to exploit the time dimension as well as the cross-section.
You decide to focus the productivity data for the last day of mandatory home office as wells as
yesterday’s data when some people worked from home and some came in to the office.
(a) Briefly describe an empirical strategy to evaluate the productivity of those who work from
home relative to those who do not using the Difference in differences methodology. In this
setting, you might have to switch around who is considered treated and who is considered
control (counterfactual).
(b) What necessary assumption must hold for the empirical strategy to be valid? How can you
check that the assumption hold?
(c) Prior to the lifting of the restrictions, the control group had an average productivity of 0.92
and the treatment group 0.67. When the treatment group returned to the office, they averaged
at 0.88. The control group that decided to stay at home had an average productivity of 0.97.
Compute the estimated treatment effect and interpret the result.
(d) You want a standard error on your estimate and would rather opt for a regression framework
where you incorporate both the cross-section and the time-dimension using dummy variables
with interactions. Set up this regression. With the numbers in the previous problem, compute
what the estimated parameters would likely be. and finally post-treatment treatment group
E(Yit|TREATi = 1, POSTt = 1) = 0.88 = β0+β1+β2+β3 = 0.92−0.25+0.05+β3 ⇒ β3 = 0.16
(e) Make a graphical representation of the setup. Clearly mark the axes, observation points as
well as treatment-, control- and counterfactual graphs. Also mark the axes with appropriate
coefficients.
(f) You are now ready to approach your manager with your findings. What advise will you give
him wrt. the future home office policy?
3

欢迎咨询51作业君