On-Line Appendix for Consistency without Inference: Instrumental Variables in Practical Application

On-Line Appendix for Consistency without Inference: Instrumental Variables in Practical Application Alwyn Young London School of Economics This draft: September 2017

Table A1: Dimensionality and Size Distortions: ln (Size in Test of All Instruments/Average Coefficient Size) Regressed on ln # Instruments [no paper fe] first stage regressions clustered/robust default Anderson-Rubin reduced forms clustered/robust default.01.05.01.05.01.05.01.05.586 (.013).447 (.007).458 (.009).377 (.027).661 (.001).490 (.001).376 (.030).282 (.035) R 2.909.913.749.719.870.873.660.604 [paper fe].494 (.016).400 (.003).261 (.042).207 (.040).831 (.010).597 (.011).282 (.284).167 (.317) R 2.962.970.797.768.952.949.728.693 N 1358 1342 1396 1397 1375 1390 [no fe].664 (.009).532 (.003) excluding observations with one instrument.438 (.032).380 (.037).767 (.005).586 (.001).377 (.020).298 (.051) R 2.805.839.448.435.739.762.360.330 [paper fe].749 (.173).575 (.198 ) -.088 (.760) -.064 (.774).748 (.088).527 (.040) -.053 (.772) -.094 (.560) R 2.924.945.583.555.927.929.565.549 N 272 272 272 272 310 310 307 308 Notes: Dependent variable is ln size of the joint test of significance of the instruments (in a first stage F-test or an Anderson-Rubin reduced form regression) divided by the average size of the coefficients when tested in individual tests, both measured at nominal size.01 or.05. Clustered/robust and default, at top, refer to the method used for in the dependent variable, not the method used to evaluate results in this paper. Independent variable is ln number of excluded instruments tested in the joint test. Values reported in parentheses are bootstrapped p-values based upon paper clustered t-statistics, not standard errors. fe denotes fixed effects. Number of observations (N) is the same with and without paper fixed effects. Maximum possible number of observations is for first stage F and 1397 for Anderson-Rubin. A few observations are dropped because estimated size is zero. I. Size Distortions and the Dimensionality of Tests As mentioned in the paper, I find that size is increasing in the dimensionality of tests, imparting greater bias to joint tests. In Table A1 I take ln size in a joint test divided by the average size of its individual coefficient components, to control for equation specific coverage error, and regress it on the ln number of terms in the joint test. I report results for 1

the first-stage F-test for equations with only one endogenous variable 1 examined in Table VII of the paper, as well as the Anderson-Rubin tests of Table XII. In both cases the number of terms in the joint test equals the number of excluded instruments in the 2SLS regression. As in about ¾ of the sample there is only one instrument, with the dependent variable (ln size in the joint test divided by size in the individual test) identically equal to zero, I run regressions with this group excluded as a sensitivity test. Reported p-values (in parentheses) are based upon a bootstrap with paper clustered t-statistics. As shown in the table, without paper fixed effects almost all specifications find that empirical size is positively and significantly associated with the number of terms in the test. Point estimates are much smaller when the dependent variable is based upon using the default covariance estimate, however, and rendered even smaller and statistically insignificant with the addition of paper fixed effects, whereas point estimates when the dependent variable is based upon with the clustered/robust covariance estimate are larger and more robust to changing specifications. 2 2 Consequently, this appears to be more a property of using clustered/robust covariance estimates than that based upon the default covariance estimate. II. Mean Squared Error, Size and Bias Regressed on F-Statistics As referenced in Section V of the paper, Table A2 below regresses ln relative 2SLS to OLS mean squared error and bias, and ln 2SLS size, on the default, clustered/robust and bootstrap-t equivalent ln F statistics using the data for all regressions, not merely those for which Stock and Yogo provide critical values. Values reported in parentheses are bootstrap-t p-values based upon the distribution of the t-statistic with standard errors clustered at the paper level. I report separately regressions with the full sample and with the weakest F statistics (less than 1) removed. All regressions include paper fixed effects. As shown, the default F does very poorly, as it is only found to be significantly correlated with ln relative mse, and only in the full sample. The clustered/robust F is significantly correlated with both ln bias and ln mse, but again only in the full sample. The bootstrap F is significantly correlated with ln bias, mse and size in the full sample, but only ln bias once F s less than 1 are removed. Excluding Fs less than 1, the point estimate of the relationship between size 1 In the case of equations with more than one endogenous variable, the test of Table VII involves an estimate of the joint covariance across multiple first-stage equations. Since this is somewhat different than the usual joint test within an equation, I exclude those few observations from the analysis. 2 In regressions with fixed effects limited to regressions with more than one instrument, there are only 7 papers in the sample which have any variation in the number of instruments, so it is not surprising that the clustered/robust coefficients, while remaining large, become insignificant when evaluated using a paperclustered bootstrap.

ln F Table A2: Relative 2SLS to OLS Bias and Mean Squared Error and 2SLS Size Regressed on First Stage Fs with Paper Fixed Effects default F clustered/robust F bootstrap-t F bias size mse bias size mse bias size mse -.240 (.093) N R 2.367 ln F -.214 (.204) N R 2 1342.362 -.067 (.309) 1358.463.036 (.238) 1341.477 -.440 (.034).602 -.534 (.143) 1342.599 -.405 (.034).391 all observations -.076 (.292) 1358.464 -.619 (.016).615 observations with ln F > 0 -.473 (.268) 1336.388.038 (.458) 1335.476 -.808 (.111) 1336.614 -.566 (.005).402 -.891 (.000) 1089.419 -.165 (.041) 1358.481.002 (.971) 1088.476 -.799 (.022).619-1.94 (.096) 1089.684 Notes: Dependent variables are ln absolute value bias 2sls /bias ols and ln mse 2sls /mse ols (all around 2SLS population moment), and ln size 2sls at the.05 level. All regressions include paper fixed effects. Values reported in parentheses are bootstrapped p-values based upon paper clustered t-statistics, not standard errors. One observation is dropped from the ln size sample because its estimated size is 0. and the ln F statistic, of any sort, is positive, as mentioned in the paper. Putting aside statistical significance, point estimates for bias and mse are generally negative, with and without Fs less than 1, with the slope of the association becoming systematically steeper and statistically more significant as one moves from the default to the clustered/robust to the bootstrap-t Fs. This would be consistent with these representing improving measures of the strength of the first-stage relation. III. Tables for Section VI using Default Covariance-based P-values Tables A3 and A4 below duplicate tables XII and XIII in the paper, except that coefficient p-values are calculated using the default covariance estimate. The results are as described in the paper. 3

Table A3: Size Distortions with Anderson-Rubin Weak Instrument Robust Inference exactly identified equations overidentified equations A-Rubin 2SLS A-Rubin 2SLS N.01.05.01.05 N.01.05.01.05 all 1100.111.191.104.171 297.115.197.092.178 F d < 1 1 < F d < 10 F d > 10 15 133 952.032.046.122.090.110.203.289.033.111.351.071.182 2 136 159.105.069.154.189.128.257.239.071.108.347.156 F cl/r < 1 1 < F cl/r < 10 F cl/r > 10 19 209 872.035.077.121.098.146.203.245.063.111.308.110.183 4 138 155.076.072.154.145.133.256.139.072.109.229.157 F b < 1 1 < F b < 10 F b > 10 255 379 466.227.048.100.330.110.180.205.043.098.294.084.175 15 250 32.144.117.082.230.200.159.127.087.117.235.169.221 Notes: N = number of equations in each group; otherwise numbers reported are average size using default covariance estimates at the.01 or.05 levels. F d, F cl/r & F b = default, clustered/robust and bootstrap-t equivalent 1 st stage F statistics. Table A4: Size Distortions with LIML and Fuller-k Inference overidentified equations all equations LIML 2SLS Fuller-k 2SLS N.01.05.01.05 N.01.05.01.05 all 385.158.240.079.161 1524.103.173.095.165 F d < 1 1 < F d < 10 F d > 10 2 136 247.158.207.131.253.298.208.239.071.083.347.156.163 17 269 1238.034.101.104.076.169.176.283.052.102.350.114.173 F cl/r < 1 1 < F cl/r < 10 F cl/r > 10 4 138 243.373.133.436.288.209.139.072.083.229.157.162 23 347 1154.042.104.104.091.170.176.227.067.101.294.129.173 F b < 1 1 < F b < 10 F b > 10 15 250 120.443.144.152.505.233.219.127.087.058.235.169.135 270 629 625.078.088.284.138.161.201.061.084.291.118.157 Notes: As in Table A3. 4

IV. Papers in the Instrumental Variables Sample Acconcia, Antonio, Giancarlo Corsetti, and Saverio Simonelli. 2014. Mafia and Public Spending: Evidence on the Fiscal Multiplier from a Quasi-Experiment. American Economic Review, 104(7): 2185 2209. Acemoglu, Daron, Simon Johnson, James A. Robinson, and Pierre Yared. 2008. Income and Democracy. American Economic Review, 98 (3): 808 842. Albouy, David Y. 2012. The Colonial Origins of Comparative Development: An Empirical Investigation: Comment. American Economic Review, 102 (6): 3059-3076. Alesina, Alberto, and Ekaterina Zhuravskaya. 2011. Segregation and the Quality of Government in a Cross Section of Countries. American Economic Review, 101 (5): 1872-1911. Ananat, Elizabeth Oltmans. 2011. The Wrong Side(s) of the Tracks: The Causal Effects of Racial Segregation on Urban Poverty and Inequality. American Economic Journal: Applied Economics, 3 (2): 34 66. Autor, David H., David Dorn, and Gordon H. Hanson. 2013. The China Syndrome: Local Labor Market Effects of Import Competition in the United States. American Economic Review, 103 (6): 2121 2168. Bazzi, Samuel, and Michael A. Clemens. 2013. Blunt Instruments: Avoiding Common Pitfalls in Identifying the Causes of Economic Growth. American Economic Journal: Macroeconomics, 5(2): 152 186. Becker, Sascha O., Erik Hornung, and Ludger Woessmann. 2011. Education and Catch-up in the Industrial Revolution. American Economic Journal: Macroeconomics, 3 (3): 92 126. Bedard, Kelly, and Olivier Deschênes. 2006. The Long-Term Impact of Military Service on Health: Evidence from World War II and Korean War Veterans. American Economic Review, 96 (1): 176-194. Bleakley, Hoyt, and Aimee Chin. 2010. Age at Arrival, English Proficiency, and Social Assimilation Among US Immigrants. American Economic Journal: Applied Economics, 2 (1): 165 192. Brown, Kristine M., and Ron A. Laschever. 2012. When They re Sixty-Four: Peer Effects and the Timing of Retirement. American Economic Journal: Applied Economics, 4(3): 90 115. Burke, Paul J., and Andrew Leigh. 2010. Do Output Contractions Trigger Democratic Change? American Economic Journal: Macroeconomics, 2 (4): 124 157 Chalfin, Aaron. 2015. The Long-Run Effect of Mexican Immigration on Crime in US Cities: Evidence from Variation in Mexican Fertility Rates. American Economic Review: Papers & Proceedings, 105(5): 220 225. Chodorow-Reich, Gabriel, Laura Feiveson, Zachary Liscow, and William Gui Woolston. 2012. Does State Fiscal Relief During Recessions Increase Employment? Evidence from the American Recovery and Reinvestment Act. American Economic Journal: Economic Policy, 4(3): 118 145. 5

Chou, Shin-Yi, Jin-Tan Liu, Michael Grossman, and Ted Joyce. 2010. Parental Education and Child Health: Evidence from a Natural Experiment in Taiwan. American Economic Journal: Applied Economics, 2 (1): 33 61. Collins, William J., and Katharine L. Shester. 2013. Slum Clearance and Urban Renewal in the United States. American Economic Journal: Applied Economics, 5(1): 239 273. Decarolis, Francesco. 2015. Medicare Part D: Are Insurers Gaming the Low Income Subsidy Design. American Economic Review, 105 (4): 1547 1580. Dinkelman, Taryn. 2011. The Effects of Rural Electrification on Employment: New Evidence from South Africa. American Economic Review, 101 (7): 3078-3108. Draca, Mirko, Stephen Machin and Robert Witt. 2011. Panic on the Streets of London: Police, Crime, and the July 2005 Terror Attacks. American Economic Review, 101 (5): 2157-2181. Guryan, Jonathan, and Melissa S. Kearney. 2010. Is Lottery Gambling Addictive? American Economic Journal: Economic Policy, 2 (3): 90 110 Hornung, Erik. 2014. Immigration and the Diffusion of Technology: The Huguenot Diaspora in Prussia. American Economic Review, 104(1): 84 122. Hunt, Jennifer, and Marjolaine Gauthier-Loiselle. 2010. How Much Does Immigration Boost Innovation? American Economic Journal: Macroeconomics, 2 (2): 31 56. James, Alexander. 2015. US State Fiscal Policy and Natural Resources. American Economic Journal: Economic Policy, 7(3): 238 257. Kraay, Aart. 2014. Government Spending Multipliers in Developing Countries: Evidence from Lending by Official Creditors. American Economic Journal: Macroeconomics, 6(4): 170 208. Lipscomb, Molly, A. Mushfiq Mobarak, and Tania Barham. 2013. Development Effects of Electrification: Evidence from the Topographic Placement of Hydropower Plants in Brazil. American Economic Journal: Applied Economics, 5(2): 200 231. Miguel, Edward, and Shanker Satyanath. 2011. Re-examining Economic Shocks and Civil Conflict. American Economic Journal: Applied Economics, 3 (4): 228 232. Moser, Petra, Alessandra Voena, and Fabian Waldinger. 2014. German Jewish Émigrés and US Invention. American Economic Review, 104(10): 3222 3255. Oreopoulos, Philip. 2006. Estimating Average and Local Average Treatment Effects of Education When Compulsory Schooling Laws Really Matter. American Economic Review, 96 (1): 152-175. Saiz, Albert, and Susan Wachter. 2011. Immigration and the Neighborhood. American Economic Journal: Economic Policy, 3 (2): 169 188. Stephens, Melvin Jr., and Dou-Yan Yang. 2014. Compulsory Education and the Benefits of Schooling. American Economic Review, 104(6): 1777 1792. Thornton, Rebecca L. 2008. The Demand for, and Impact of, Learning HIV Status. American Economic Review, 98 (5): 1829-1863. Young, Alwyn. 2014. Structural Transformation, the Mismeasurement of Productivity Growth, and the Cost Disease of Services. American Economic Review, 104 (11): 3635 3667. 6