Lecture 2: Diff-in-Diff and January 14, 2016
Causal Inference There are 5 basic empirical methods to obtain causal inference: 1 Controls (includes matching/fixed-effects) 2 Randomized Experiments 3 Difference-in-Differences 4 5 Regression Discontinuity
Difference-in-Differences Difference-in-differences (DD) extends the idea behind individual fixed-effects Instead of just comparing before and after, it compares before and after among a treated and a control group In fact, fixed-effects is a difference; DD just adds another layer Some people add even another layer, making it a difference-in-difference-in-differences
Example For a little variety, I will do different examples today than my usual education cases Consider the question of immigration
A View on Immigration Ted Cruz (Nov 10, 2015 GOP debate): I will say, the politics of it [immigration] would be very, very different if a bunch of lawyers or bankers were crossing the Rio Grande. Or if a bunch of people with journalism degrees were coming over and driving down the wages in the press, then we would see stories about the economic calamity that is befalling our nation.
Immigration and Wages This is one of the major arguments against immigration: that immigrants will drive down the wages of locals Especially in the low-skilled sector (depending on the type of immigrant) We would like to look at this empirically
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis?
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis? 2 What is the treatment?
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in?
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in? 4 What are the counterfactual outcomes?
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in? 4 What are the counterfactual outcomes? 5 What is the causal link?
Framing the Research Question Our question is: Does immigration drive down the wages (or the employment rate) of locals? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in? 4 What are the counterfactual outcomes? 5 What is the causal link? 6 How could we mimic this? Can we do a randomized experiment?
The Mariel Boatlift What was the Mariel Boatlift?
DD The DD term refers to two levels of difference The first difference is (almost) always time The second difference can be: schools, grades, geography, individuals, etc.
First-Difference Let s look at the first-difference of time Notation: Y 1,city,t is treated city at time t Y 0,city,t is control city at time t What outcomes do we observe? Write out the treatment effect and the selection bias:
Second-Difference I What is a reasonable second layer of difference? Two possible: David Card (1990) v. George Borjas (2015)
Second-Difference II We now incorporate the second layer of difference What outcomes do we observe: What idea can we use to mimic the counterfactual?
Estimating Equations Y 1,miami,1981 Y 0,atlanta,1981 [Y 0,miami,1979 Y 0,atlanta,1979 ] In regression form: Open up STATA!
Interpretation of DD How do we interpret the DD estimate? We call this a Average Treatment Effect (ATE)
Internal Validity I What is our key assumption? Visual Representation:
Internal Validity II How do we check our identifying assumption? Possible failures? Ashenfelter dip Political Endogeneity (i.e. Reverse Causality) How can we weaken the key assumption? http://www.motherjones.com/kevin-drum/2015/09/ another-shot-fired-great-immigration-vs-wages-war
External Validity Generalizability? General Equilibrium Effects?
Pros and Cons of DD Pros: Cons:
Causal Inference There are 5 basic empirical methods to obtain causal inference: 1 Controls (includes matching/fixed-effects) 2 Randomized Experiments 3 Difference-in-Differences 4 5 Regression Discontinuity
Basic Intuition Basic idea: In a randomized experiment, we randomized people to treatment and control What happens if some external thing (geography, mass layoff, storms, etc) strikes some people and not others (at random!) Well the external thing has randomized for us! We call this an Instrumental Variable (IV) Randomized experiments can thus be treated as IVs
Example I Does earlier colonization improve economic outcomes? Problem: European empires likely colonized the best places (most fertile, etc) first Solution: Ships had to sail with the wind
Example II Thought Experiment: Compare two islands: Guam and Fefan (in Micronesia) Guam was directly on the East-West route across the pacific (used by Magellan) Fefan was not on this route (was on the much more difficult West-East route) https://www.google.ca/maps/dir/guam/fefan,+federated+ States+of+Micronesia/@10.3941378,130.3823886,4z/data=!4m13!4m12!1m5!1m1!1s0x671f76ff930f24ef: 0x5571ae91c5b3e5a6!2m2!1d144.793731!2d13.444304!1m5! 1m1!1s0x6667a4d6a8ce100d: 0xc47882f565ab012a!2m2!1d151.8379961!2d7.3487617
Example III Therefore, Guam was colonized before Fefan Wind randomized the colonization for us!
IV Basics I The (basic) math: What outcomes do we observe? What is our treatment and selection effect?
IVs Basics II The (more complex) math: In IVs, randomization is not perfect Fefan could have been discovered by luck earlier For example, Pitcairn was discovered earlier even though it was not on the main wind route This was because of the mutiny on the ship HMS Bounty We account for this by dividing by the probability randomization affected your treatment status:
IV Notation Notation: We call the instrument (here wind patterns) Z We call the endogenous regressor (here colonization date) X
IV Assumptions For an IV to be valid it must be both: Relevant = Corr(X, Z) 0 All this says is that there was randomization (i.e. Islands on favorable wind routes were colonized first) We can check this through the equation: ColonizationDate island,t = α + βfavorablewind island,t + ɛ it Exclusion = Corr(Z, ɛ) = 0 This says that randomization was proper (i.e. Islands on favorable wind routes had no other advantage (more fertile soil, etc.)) This assumption cannot be tested as ɛ is unobserved Do these assumptions seem likely to hold in this case?
IV as a Regression We have two terms to estimate: E[D 1 D 0 ] (first-stage) and E[Y 1 Y 0 Z = 1] (reduced-form) First-stage is the effect of Z on X (i.e. how much more likely was it for islands on favorable wind routes to be colonized first) Reduced-form is the effect of Z on Y (i.e. how much better off are islands on favorable wind routes?) First-stage: ColonizationDate island,t = α + β 1 FavorableWind island,t + ɛ it In general: X i = α + β 1 Z i + ɛ i Reduced-form: EconomicOutcomes island,t = α + β 2 FavorableWind island,t + ɛ it In general: Y i = α + β 2 Z i + ɛ i The IV estimate is then: β 2 β 1
For the Keeners In the last slide, we got the Wald estimator However, when implementing IVs we use the IV estimator Intuition is identical; the IV estimator is just more efficient Only difference is that in the reduced-form regression we plug in a predicted X rather than Z First-stage: ˆX i = α + β 1 Z i + ɛ i Reduced-form: Y i = α + β 2 ˆXi + ɛ i In STATA you do all this by simply typing: ivregress 2sls y (x=z), vce(robust) first
Another IV Our question is: Does having another child affect a mother s labour supply? OLS: HasWorked i = α + β#kids i + ɛ i Possible failures of OLS in this instance? Idea: Use the fact parents like to have a boy AND a girl in their family (Angrist and Evans, 1998)
Framing the Research Question Our question is: Does having another child affect the mother s labour supply? 1 What is the unit of analysis?
Framing the Research Question Our question is: Does having another child affect the mother s labour supply? 1 What is the unit of analysis? 2 What is the treatment?
Framing the Research Question Our question is: Does having another child affect the mother s labour supply? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in?
Framing the Research Question Our question is: Does having another child affect the mother s labour supply? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in? 4 What are the counterfactual outcomes?
Framing the Research Question Our question is: Does having another child affect the mother s labour supply? 1 What is the unit of analysis? 2 What is the treatment? 3 What outcome are we interested in? 4 What are the counterfactual outcomes? 5 What is the causal link?
To STATA How do we mimic this? (think of the IV as having your first 2 kids be the same gender) Open up STATA
Interpretation of IVs How do we interpret the IV estimate? (i.e. who complies with treatment?) We call this a Local Average Treatment Effect (LATE) This is (in my view) the biggest weakness of IVs
Internal Validity Relevance? Exclusion?
External Validity Generalizability? Mechanisms?
Pros and Cons of IV Pros: Cons: