Problem Set #1
Causal Inference (PSC 8121), Spring 2025
Due: 2/20
We're going to analyze the replication data for Susan Hyde's article, "The Observer Effect in
International Politics: Evidence from a Natural Experiment." The data file is called
Hyde_WP_2007_Armenia2003electionresults.dta. It contains two rounds of election results
across 1,764 polling stations from the Armenian presidential election in 2003. Please answer
the following questions. Include a copy of any commands you use. Feel free to describe your
results in text (e.g., "the coefficient is 1.23 (t-value = 2.34)" or some equivalent).
1.Replicate rows 1-4 of Table 1. (Note: There are a few ways to do this.) Using the same
method, what is the effect of observation in Round 2 on Round 2 voting for Kocharian,
conditional on no observation in Round 1? What is the effect of observation in Round 2 on
Round 1 voting for Kocharian, conditional on no observation in Round 1?
2.A critical assumption of the design is that observers are as-if randomly monitoring polling
stations. Identify some plausible pre-Round-1-treatment variables that might predict Round
1 observation. Do any of these variables significantly predict observation? How about when
multiple variables are tested together?
3.Repeat this exercise for Round 2. Compare your findings against Hyde's attempt to do this
in Table 3.
4.Briefly, what do the results of steps 2 and 3 tell you? Is this a potential threat to the design?
5.Test the effect of Round 1 observation on Round 1 voting for Kocharian, controlling for the
potential confounders you identified in step 2. What do you find? Compare this to Hyde's
attempt to do this in Table 4. What are some of the problems there?
6.Do the equivalent for Round 2, distinguishing the effect of Round 1 observation only, Round
2 observation only, and both. What do you find? Is there anything odd about this?
7.Can you think of a subsample within which the as-if random assumption is especially likely
to hold? If so, test the effect of observation within this subsample. What do you find?
8.Has this analysis changed your impression of the paper? What additional data could improve
the research design?
Data Notes: There are 40 subregions (subregion) nested in 11 regions (regionmarzes). (Try running
tab(regionmarzes), gen(_region) to get region dummies.) There are separate variables for the
Kocharian vote share in Rounds 1 and 2, as well as monitored voting in each round. Check the
variable labels. Hyde's do file is also posted if you want to confirm you are using the correct
variables. Don't worry about monitored counting for the problem set.