Unreported Trade Flows and Gravity Equation. Estimation

Similar documents
Is the Great Gatsby Curve Robust?

EXPORT, MIGRATION, AND COSTS OF MARKET ENTRY EVIDENCE FROM CENTRAL EUROPEAN FIRMS

Does the G7/G8 Promote Trade? Volker Nitsch Freie Universität Berlin

Immigration, Information, and Trade Margins

Migration and Tourism Flows to New Zealand

Trade and Inequality: From Theory to Estimation

Immigration and Internal Mobility in Canada Appendices A and B. Appendix A: Two-step Instrumentation strategy: Procedure and detailed results

IMMIGRATION REFORM, JOB SELECTION AND WAGES IN THE U.S. FARM LABOR MARKET

Ethnic networks and trade: Intensive vs. extensive margins

The Trade Liberalization Effects of Regional Trade Agreements* Volker Nitsch Free University Berlin. Daniel M. Sturm. University of Munich

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants

Model of Voting. February 15, Abstract. This paper uses United States congressional district level data to identify how incumbency,

Table A.2 reports the complete set of estimates of equation (1). We distinguish between personal

Immigrants Inflows, Native outflows, and the Local Labor Market Impact of Higher Immigration David Card

On Estimating The Effects of Legalization: Do Agricultural Workers Really Benefit?

Corruption, Political Instability and Firm-Level Export Decisions. Kul Kapri 1 Rowan University. August 2018

The Flow Model of Exports: An Introduction

Trade Flows and Migration to New Zealand

GENDER EQUALITY IN THE LABOUR MARKET AND FOREIGN DIRECT INVESTMENT

Trade, Diaspora and Migration to New Zealand

CENTRO STUDI LUCA D AGLIANO DEVELOPMENT STUDIES WORKING PAPERS N April Export Growth and Firm Survival

The WTO Trade Effect and Political Uncertainty: Evidence from Chinese Exports

Migrant Wages, Human Capital Accumulation and Return Migration

Immigrant Legalization

the notion that poverty causes terrorism. Certainly, economic theory suggests that it would be

Migration and Regional Trade Agreement: a (new) Gravity Estimation

Gender preference and age at arrival among Asian immigrant women to the US

Educated Preferences: Explaining Attitudes Toward Immigration In Europe. Jens Hainmueller and Michael J. Hiscox. Last revised: December 2005

USING MULTI-MEMBER-DISTRICT ELECTIONS TO ESTIMATE THE SOURCES OF THE INCUMBENCY ADVANTAGE 1

The Impact of Having a Job at Migration on Settlement Decisions: Ethnic Enclaves as Job Search Networks

Supplementary Materials for Strategic Abstention in Proportional Representation Systems (Evidence from Multiple Countries)

Introduction to Path Analysis: Multivariate Regression

Canadian Labour Market and Skills Researcher Network

School Quality and Returns to Education of U.S. Immigrants. Bernt Bratsberg. and. Dek Terrell* RRH: BRATSBERG & TERRELL:

Is Corruption Anti Labor?

Corruption and business procedures: an empirical investigation

Research Report. How Does Trade Liberalization Affect Racial and Gender Identity in Employment? Evidence from PostApartheid South Africa

REGIONAL INTEGRATION AND TRADE IN AFRICA: AUGMENTED GRAVITY MODEL APPROACH

ON ESTIMATING THE EFFECTS OF IMMIGRANT LEGALIZATION: DO U.S. AGRICULTURAL WORKERS REALLY BENEFIT?

On the Measurement and Validation of Political Ideology

Networks and Innovation: Accounting for Structural and Institutional Sources of Recombination in Brokerage Triads

The Role of Income and Immigration Policies in Attracting International Migrants

SocialSecurityEligibilityandtheLaborSuplyofOlderImigrants. George J. Borjas Harvard University

LABOUR-MARKET INTEGRATION OF IMMIGRANTS IN OECD-COUNTRIES: WHAT EXPLANATIONS FIT THE DATA?

Women and Power: Unpopular, Unwilling, or Held Back? Comment

Explaining the Deteriorating Entry Earnings of Canada s Immigrant Cohorts:

Self-Selection and the Earnings of Immigrants

The Costs of Remoteness, Evidence From German Division and Reunification by Redding and Sturm (AER, 2008)

Female Brain Drains and Women s Rights Gaps: A Gravity Model Analysis of Bilateral Migration Flows

And Yet it Moves: The Effect of Election Platforms on Party. Policy Images

Honors General Exam Part 1: Microeconomics (33 points) Harvard University

Rationality of Post Accession Migration

THE GLOBAL FINANCIAL CRISIS AND ECONOMIC INTEGRATION: EVIDENCE ON ASEAN-5 COUNTRIES 1

Online Appendix for. Home Away From Home? Foreign Demand and London House Prices

The Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China

Human Capital and Income Inequality: New Facts and Some Explanations

Remittances and the Brain Drain: Evidence from Microdata for Sub-Saharan Africa

IS THE MEASURED BLACK-WHITE WAGE GAP AMONG WOMEN TOO SMALL? Derek Neal University of Wisconsin Presented Nov 6, 2000 PRELIMINARY

Case Study: Get out the Vote

An Empirical Analysis of Pakistan s Bilateral Trade: A Gravity Model Approach

FOREIGN FIRMS AND INDONESIAN MANUFACTURING WAGES: AN ANALYSIS WITH PANEL DATA

PROJECTION OF NET MIGRATION USING A GRAVITY MODEL 1. Laboratory of Populations 2

Benefit levels and US immigrants welfare receipts

Size of Regional Trade Agreements and Regional Trade Bias

Peer Effects on the United States Supreme Court

CEECs Integration into Regional and Global Production Networks

Joining the World Trade Organization: It s All About the Exports

Working Papers in Economics

A REPLICATION OF THE POLITICAL DETERMINANTS OF FEDERAL EXPENDITURE AT THE STATE LEVEL (PUBLIC CHOICE, 2005) Stratford Douglas* and W.

Determinants and Effects of Negative Advertising in Politics

Liquidity Constraints and Investment in International Migration:

Random Forests. Gradient Boosting. and. Bagging and Boosting

The Impact of Conflict on Trade Evidence from Panel Data (work-in-progress draft)

Schooling and Cohort Size: Evidence from Vietnam, Thailand, Iran and Cambodia. Evangelos M. Falaris University of Delaware. and

The Wage Performance of Immigrant Women: Full-Time Jobs, Part-Time Jobs, and the Role of Selection

Female Migration, Human Capital and Fertility

English Deficiency and the Native-Immigrant Wage Gap

Comparing the Data Sets

Trade and Migration to New Zealand

Commuting and Minimum wages in Decentralized Era Case Study from Java Island. Raden M Purnagunawan

GLOBALISATION AND WAGE INEQUALITIES,

Online Appendix: Robustness Tests and Migration. Means

Online Appendix: The Effect of Education on Civic and Political Engagement in Non-Consolidated Democracies: Evidence from Nigeria

Online Appendix for Redistricting and the Causal Impact of Race on Voter Turnout

Authoritarian Reversals and Democratic Consolidation

The Impact of Conflict on Trade Evidence from Panel Data

Peer Effects on the United States Supreme Court

The Economic and Social Review, 30 (4): Economic and Social Studies, Dublin.

Endogenous antitrust: cross-country evidence on the impact of competition-enhancing policies on productivity

TITLE: AUTHORS: MARTIN GUZI (SUBMITTER), ZHONG ZHAO, KLAUS F. ZIMMERMANN KEYWORDS: SOCIAL NETWORKS, WAGE, MIGRANTS, CHINA

Immigrant Employment and Earnings Growth in Canada and the U.S.: Evidence from Longitudinal data

Immigrant Earnings Growth: Selection Bias or Real Progress?

Self-selection: The Roy model

The Determinants of Low-Intensity Intergroup Violence: The Case of Northern Ireland. Online Appendix

Skill Classification Does Matter: Estimating the Relationship Between Trade Flows and Wage Inequality

Labor Market Dropouts and Trends in the Wages of Black and White Men

Exports and Governance: is Middle East and North Africa different? InmaculadaMartínez-Zarzoso 1,2 and Laura Márquez-Ramos 2,3

Is inequality an unavoidable by-product of skill-biased technical change? No, not necessarily!

Online Supplement to Female Participation and Civil War Relapse

Corruption and Agricultural Trade. Trina Biswas

Transcription:

Unreported Trade Flows and Gravity Equation Estimation Thomas Baranga May 15, 2009 Abstract Some widely used trade databases do not distinguish between zero and unreported trade flows. The number of unreported trade flows is high but they account for a small volume of world trade, so the distinction may be unimportant for traditional gravity equation estimation. However, techniques that separately estimate the intensive and extensive margins of trade may be more sensitive to the distinction. This paper develops a methodology to consistently estimate the Helpman, Melitz and Rubinstein model when some trade is unreported. This also breaks the relationship between the sample selection and heterogeneity correction terms, reducing collinearity of the regressors. A natural exclusion restriction identifies the model, removing the need to distinguish fixed from variable costs of trade. 1 Introduction The new literature on firm-level heterogeneity has revived interest in distinguishing the intensive and extensive margins of trade. Helpman, Melitz and Rubinstein (2008) have demonstrated how, in the Melitz model in which firms My thanks for many invaluable conversations and support to Elhanan Helpman, Emilie Feldman, Ian Martin and Yona Rubinstein. Any mistakes remain my own. 1

vary in their productivity, traditional estimates of the gravity equation confound the effects of the two margins. HMR develop a methodology with which to consistently estimate both margins, separating out the effects of trade barriers on firms decisions to enter export markets from their influence on the quantities that firms will export. HMR s methodology exploits the presence of country-pairs which do not trade at all to estimate the role of fixed costs that prevent entry into export markets, using probit to estimate the determinants of whether there is any trade in the aggregate. With consistent estimates of the role of fixed costs of trade, one can also estimate trade barriers influence on the intensive margin, taking into account that exporting firms may have very different productivity levels. The first stage of HMR s procedure relies on the presence of zero trade flows at the aggregate level to estimate the role of fixed costs in firms entry decisions. However, the reliability of this data is questionable. The quality of the reporting of trade data is very variable over time, and across different countries. If all trade partners accurately reported their trade flows, we would have two corroborating reports for each trade flow, from both the exporter and importer. However, it is well known that there is wide variation in the level of trade reported by each partner 1. Less well recognised is that countries frequently fail to report their trade at all. In 1986, the baseline year of HMR s study, only 112 countries reported any of their trade to the UN s Comtrade database, which forms the basis of Feenstra et al s dataset, World Trade Flows, used by HMR. Out of HMR s original sample of 158 countries, only 103 reported their trade. Furthermore, reporting is not necessarily complete even among those countries which report some of their trade. Of the 8927 positive trade flows between partners that both 1 For example, Feenstra and co-authors assume that reporting of imports is more accurate than reporting of exports, and reconcile the difference between the numbers by adopting the importer s report when it exists, and the exporter s if there is no report by the importer. 2

made a report of some of their trade to the UN, 2155 (24%) were reported by only one partner. Feenstra s dataset does not distinguish between flows that are zero and flows that are unreported, and for traditional estimates of the gravity equation, this distinction was probably quite unimportant. However, if one does not try to take this into account when estimating HMR s model, one would automatically classify a large number of flows as zero, with implications for the estimates of the fixed costs of trade. Since 55 countries in HMR s sample did not report their trade at all, 2970 observations for which there was definitely no report may be misclassified. Given that even countries which report some of their trade do not usually report all of it, the status of trade flows between country-pairs in which only one side reports will also be somewhat unreliable. The reliability of reporting depends in part on characteristics of the country, and also on the size of the trade flow: small trade flows may be more likely to go unreported than large ones. Since the reporting decision is correlated with the underlying trading relationship, failing to account for the sample selection driven by non-reporting may bias estimates. A second reason to take into account non-reporting is that it weakens the collinearity of regressors in HMR s model. In HMR s original framework, the correction for sample selection is estimated from the same probit as that for omitted productivity heterogeneity. While this simplifies the estimation, it generates collinearity in the model, as discussed further below. Controlling for the additional sample selection due to non-reporting breaks this connection. HMR s original framework used factors that affect fixed but not variable costs of trade to separately identify the effects of productivity-heterogeneity and selection. However, finding such variables can be challenging. The introduction of an additional source of sample selection allows identification of the intensive and extensive margins without finding a factor that only affects fixed but not variable trade costs. Some countries do not report any of their trade in a 3

particular year, and for a pair of these countries, we know for certain that a trade-flow will not be observed. However, a country s decision not to participate in the Comtrade database is uncorrelated with their bilateral trade, and so is excludable from the other two equations. The following sections of the paper document the quality of reporting and develop a methodology to consistently control for the sample selection induced by some countries failure to report their trade. The final section compares estimates derived from HMR s original technique to the modified approach. 2 Reporting of Trade Flows There are three major databases for global trade flows. The UN and the IMF both collect trade data from their members, in the Comtrade and Direction of Trade Statistics databases respectively. In addition, Feenstra and co-authors have assembled and maintained a large database, World Trade Flows, which is derived from both the UN and IMF databases, supplemented by data from some national trade records. The Feenstra database makes a number of corrections to the original UN and IMF data, reconciling importers and exporters differing reports into a single number, and correcting entrepôt trade flows. It also establishes concordances between SITC1, SITC2 and SIC codes, allowing matching of disaggregated trade flows over time and between trade and industrial production. As part of their procedure for making adjustments to commodity level trade flows, Feenstra et al benchmark the aggregate trade flow to the level reported in the IMF s DOTS data 2. The number of countries reporting their trade to the IMF is consistently 2 The decision was to benchmark each country s total exports to the world to the world total of imports from that country reported in the International Monetary Fund volumes on The Direction of Trade... Data by partner country and by commodity were then adjusted in various ways so as to be compatible with these control totals, Feenstra, Lipsey and Bowen [6], pp.3-4; Feenstra [7], p.3. 4

lower than that to the UN, and this procedure appears to lead Feenstra et al to omit a large number of small trade flows. Table 1. DATA COVERAGE, 2001 2005 DATA REPORTED FOR: Complete Year Part of the Year DATA NOT REPORTED Number of Percent of Number of Percent of Number of Percent of Countries 1 World Trade Countries World Trade Countries World Trade Exports 2005 96 (72) 92 0 0.00 86 8 2004 105 (81) 93 1 0.01 76 7 2003 116 (92) 95 0 0.00 66 5 2002 116 (92) 96 0 0.00 66 4 2001 120 (97) 95 0 0.00 62 5 Imports 2005 99 (75) 95 0 0.00 83 5 2004 107 (83) 95 1 0.01 74 5 2003 117 (93) 96 1 0.04 64 4 2002 119 (95) 97 0 0.00 63 3 2001 123 (100) 97 0 0.00 59 3 1The figures in parentheses indicate the number of developing countries that reported complete data for the respective year. Figure 1: Taken from the DOTS database s documentation Complete documentation of reporting to the UN is available online from the UN s Comtrade database. Unfortunately less documentation is available for the IMF s DOTS, but Figure 1, taken from the DOTS supporting documentation, summarises the extent of reporting for 2001-2005. The striking feature of Figure 1 is that almost as many countries did not report their trade to the IMF as did. Although I have not been able to find data on the extent of reporting to the IMF for HMR s sample period, it is clear that significantly more countries reported their trade to the UN than to the IMF. Reporting to the UN from the Comtrade database is presented in Figure 2 and Table 1. Reporting to both institutions follows the same trend from 2001-2005 (the years for which the IMF figure is available), but the number reporting to the IMF is significantly lower. Figure 3 illustrates the extent of the missing data problem in the sample, comparing the trade-flows in Feenstra, for the 158 countries in HMR s sample, 5

170 Number of countries reporting trade-flows to the UN and IMF 160 UN IMF 150 140 130 120 110 100 90 80 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Figure 2: Extent of Reporting to the UN and IMF Year UN IMF 2001 166 123 2002 164 119 2003 162 117 2004 159 107 2005 153 99 Table 1: Comparison of Reporting of Trade-Flows to the UN and IMF 6

7000 6000 5000 unreported to UN unreported to UN, pos in Feenstra pos in UN, 0 in Feenstra reported 0 to UN, pos in Feenstra total missing in Feenstra total missing/0 in UN, pos in Feenstra 4000 3000 2000 1000 0 1970 1975 1980 1985 1990 1995 2000 Figure 3: Positive and Missing Trade Flows in Feenstra s Data with the data direct from the UN s Comtrade database 3. The blue line shows how many of the bilateral trade-flows in the UN data are definitely missing because neither partner reported its trade to the UN in that year 4. This moves inversely with the number of reporters shown in Figure 2 5. The missing data problem peaks in the sub-set of Feenstra s data used by HMR. The green line illustrates the number of zero trade-flows in Feenstra s data that the UN records as positive. The reason these trade-flows are missing from the Feenstra data is almost certainly because they were not reported to the IMF and so have been omitted per Feenstra s benchmarking procedure 6. 3 To produce a single number from both an importer and exporter s report, I followed Feenstra s convention of adopting the importer s report. 4 This is a conservative estimate. When neither partner reports any of their trade we can be certain that the flow is not observed. Since reporting is incomplete even for countries that report some of their trade, it is likely that additional flows are also unreported, particularly those for which only one partner makes any reports. 5 The correlation is not perfect because Figure 2 shows the total number of reporters in the world rather than the sample. For example, in 1996 the number of reporters to the UN increased slightly while the amount of missing trade also increased because the number of reporters in the sample fell slightly. 6 To be fair to Feenstra, he does not record non-positive trade-flows as zero - this is an interpretation imposed by HMR - and in the documentation for the latest revision of his trade data, he explicitly recognises this problem. In table 1 of Feenstra et al (2005) [8] he lists 7

The black line is the sum of the blue and the green, and shows the number of observations treated as zero by HMR that are actually either positive, or should be treated as missing. With 158 countries in the sample, there are 24806 possible trade-flows per year. A conservative estimate of the average number misclassified over HMR s sample period of 1980-89 is about 5000, or roughly 20% of the sample, a significant number. The other three lines in Figure 3 shows how many trade-flows recorded as positive by Feenstra are missing or zero in the UN data. There are a handful of such observations, many of which are associated with Taiwan, whose trade was not officially recorded by the UN but was for a time by the IMF. It seems that any country that reported its trade to the IMF also reported it to the UN, but not necessarily vice versa. 90000 80000 Distribution of trade-flows in the data sets, 1970-1997 UN data Feenstra data 70000 60000 50000 40000 30000 20000 10000 0 0 1 2 3 4 5 6 7 8 9 10 11 12 10 x <= trade < 10 x+1 Figure 4: Distribution of Positive Trade Flows Figure 4 shows the distribution of the positive trade flows in the two data the countries (only 65 of which are in HMR s sample) for which he has reported data for 1984-2000, and notes When the two countries are both not included in Table 1, however, the trade flows for 1984-2000 are entirely missing from the dataset, pp.2-3 8

sets, and that the positive flows unrecorded in Feenstra (the green line in Figure 3) are mostly small. Feenstra s data contains no flows less than $1000, and most of the missing flows are less than $1 million. Volume of Trade Missing from Feenstra and Recorded as Positive by the UN, as % of Total World Trade 2 1.8 1.6 1.4 % of World Trade 1.2 1 0.8 0.6 0.4 0.2 0 1970 1975 1980 1985 1990 1995 2000 Figure 5: % of World Trade Missing from Feenstra by Volume The small magnitude of the missing flows is reflected in the small percentage of world trade that is missing from the Feenstra data but included in the UN data, as represented in Figure 5, averaging less than 1% of world trade over the whole of Feenstra s sample, and less than 1% for the sub-sample used by HMR. For many applications, ignoring these trade flows might be unproblematic, as they are relatively close to zero and treating them as such should not be a source of bias. However, HMR s procedure distinguishes importantly between zero and positive trade flows. Figure 6 illustrates how many zero trade flows there are in the two data sets - the difference is quite steady at around 5000, or 20%, in each year. This suggests that although in terms of trade volume the missing trade is not significant, in terms of the number of observations, about 20% are misclassified in HMR s original dataset. 9

16000 Zero trade flows in the different data sets 14000 12000 10000 8000 6000 4000 2000 UN data Feenstra data 0 1970 1975 1980 1985 1990 1995 2000 Figure 6: Zero Trade Flows by Year 1 Correlation of Feenstra and UN trade data 0.95 0.9 0.85 0.8 Conditional on both flows positive Conditional on both flows observed 0.75 1970 1975 1980 1985 1990 1995 2000 Figure 7: Correlation of Trade in UN and Feenstra datasets 10

Feenstra s dataset is not appropriate for HMR s analysis, as one cannot distinguish between flows that are missing or are really zero. Fortunately the UN s data has thorough documentation that shows which countries reported and which did not. A drawback of using the UN data is that Feenstra has made adjustments to the UN data which are lost by reverting to the UN data. Many of the adjustments relate to the disaggregated trade data with which we need not be concerned. However, some relate to the aggregate data and adjustments for entrepôt trade. I have not attempted to make corrections for entrepôt trade. However, these affect only a small minority of trade flows, and so on balance ignoring this seems to be a cost worth bearing. Figure 7 shows the correlation between Feenstra s and the UN s data. The blue line indicates the correlation conditional on both data sets recording a positive flow; the red line indicates the correlation conditional on the data that is observed in the UN dataset. The correlation is reassuringly high (above 0.99 for HMR s sub-sample), and suggests that Feenstra s adjustments to the aggregate trade flows have been relatively small. 2.1 Estimating the Propensity to Report Trade We can distinguish three groups of observations, based on whether the countries involved report their trade: (A) neither partner reports; (B) both partners report; (C) only one partner reports. For group (A), trade is certainly unreported. For group (B), if both countries fully reported their trade we should have two numbers for each trade flow, but as alluded to above, 24% of these flows that are positive only have one partner reporting. This variation in reporting standards allows one to estimate the propensity of countries to report their trade. For the subset of countries that file some report of their trade, we can observe reports of a particular trade flow from both the importer and exporter. In particular, conditional on a trade flow being reported by at least one partner, we can observe whether the flow is reported by the other side. The propensity 11

[1] [2] [3] [4] Importer Exporter Importer Exporter log(trade) 0.229** 0.235** 0.3** 0.323** (0.009) (0.006) (0.012) (0.009) R-sq 0.255 0.27 0.377 0.419 n 7301 8398 7301 8398 ** p<0.01, * p<0.05 Data for 1986. Columns [3] and [4]: reporter fixed effects Probit of reporting trade > 0, conditional on partner reporting trade > 0 Table 2: Influence of the level of trade on the propensity to report of a partner to report, conditional on the flow being reported by their partner, can be estimated by probit. Table 2 shows that the size of the underlying trade flow has a strong influence on a country s propensity to report it. Columns [3] and [4] include reporter fixed effects to control for idiosyncratic differences in reporting quality across countries. These account for a significant amount of the variation, pushing up the R-squared of the regressions. Since by definition trade is unobserved if it is unreported, we cannot extend this approach to estimate the unconditional probability of a flow being reported. But the strong correlation between trade and reporting is prima facie evidence that the attrition in the sample of trade flows that are missing due to a failure to report is not random, leading to a classic sample selection problem. 3 Controlling for Unreported Trade Flows HMR s technique addresses the problem of sample selection due to the omission of observations without trade with Heckman s (1976, 1979) sample selection correction. This two-step procedure controls for the possibility that sample selection may bias estimates of the main equation of interest by first estimating a selection equation, and then controlling for possible correlation between the selection and main equations. Taking into consideration non-reporting of trade as an additional level of 12

sample selection, we face a simultaneous sample selection problem, in which an observation could be missing from the log-linearised gravity equation either because it is zero, or because it is unreported. 3.1 Sample Selection in the Original HMR Model The first step in controlling for sample selection is to specify a structural model relating the selection and main equations. In the original HMR framework, this was m ij = β 0 + λ j + χ i γd ij + ln(e δ(ẑ ij +ˆ η ij ) 1) + e ij (1) T ij = 1[z ij > 0] (2) z ij = γ 0 + ξ j + ζ i γ d ij κ φ ij + η Z ij (3) Equation (1) is HMR s intensive-margin gravity equation for imports m ij of i from j. λ j and χ i are exporter and importer dummies, controlling for the multilateral resistance terms analysed by Anderson and van Wincoop (2003). d ij is a vector of trade barriers that affect variable costs of trade. The error terms e ij and ηz ij are jointly normally distributed ẑ ij e ij η Z ij N(0, Σ), Σ = σ2 e σ eηz σ eηz 1 is the fitted value of the latent variable of the probit specified in equations (2) and (3). T ij is an indicator of whether there are any imports from j to i, as determined by the latent variable zij. This may be influenced by the same trade barriers that affect the intensive-margin, d ij, and possibly additional factors, denoted φ ij. A sample selection bias can arise if equation (1) is estimated by OLS, because what we can estimate is E[m ij T ij = 1] 7. If E[e ij T ij = 1] = E[e ij ] = 0 then there is no sample selection problem, and OLS is unbiased. However, if 7 All the conditional expectations are also conditional on the full set of regressors. This is suppressed for notational convenience only. 13

E[e ij T ij = 1] 0 there is a problem. Fortunately, under the assumptions of equations (1)-(3) we have a consistent estimator of E[e ij T ij = 1] E[e ij T ij = 1] = σ eη Z σ 2 e E[η Z ij T ij = 1] = σ eη Z σ 2 e where ˆ η ij = φ(ẑ ij ) Φ(ẑij ) is the inverse Mills ratio estimated from the first-stage probit, and which gives a consistent estimate of E[η Z ij T ij = 1]. ln(e δ(ẑ ij +ˆ η ij ) 1) is the term introduced by HMR to control for the potential correlation between the trade barriers, d ij, and the average productivity of firms in j that have chosen to enter the export market to i. The productivity of these firms will affect the volume of their sales; and this productivity will be correlated with trade barriers, because higher trade barriers will induce less productive firms to exit the market. ˆ η ij HMR show that the latent variable z ij of whether or not there is trade is related to the productivity level of the marginal exporter, and so can be used to estimate the unobserved heterogeneous productivity term in the aggregate gravity equation. Under the assumption that the distribution of firm productivity is Pareto, this has the form max{(zij )δ 1, 0}. To include this term in an estimation we need to simplify the max term. Fortunately, only observations for which trade is positive are observed, so in equation (1) all observations will have (Zij )δ 1 > 0. We do not observe, Zij, but can estimate E[z ij ] with ẑ ij 8. To simplify the max we need E[zij T ij = 1], but this is E[zij ] + E[η ij T ij = 1], which as discussed above in the context of sample selection can be estimated with the inverse Mills ratio, giving ˆ z ij = ẑij + ˆ η ij. Thus HMR elegantly show how to address both the sample selection and productivity heterogeneity biases in a simple two-stage procedure, using the same first-stage probit. However, this elegance comes at the cost of the model being potentially underidentified. The estimated latent variable ẑ ij is a linear combination of the regressors included in equation (3). If the regressors in 8 z ij = log(z ij ) 14

equations (1) and (3) are the same, then ẑij is perfectly collinear with the trade barriers in equation (1). Since ˆ η ij is also included as a regressor in the gravity equation to control for the zero-trade sample selection, this collinearity extends to ˆ z ij = ẑ ij + ˆ η ij. This collinearity is a particular problem for a non-parametric estimate of the heterogeneity productivity bias, which would proceed by including a highdegree polynomial of ˆ z ij instead of the term derived from the assumption of a Pareto distribution for firms productivity. Potentially the non-linearity of ln(e δ(ẑ ij +ˆ η ij ) 1) means that the perfect collinearity between ẑij + ˆ η ij and the other trade barriers does not prevent identification of the parametric model. However, for large δ, ln(e δ(ẑ ij +ˆ η ij ) 1) δ(ẑij + ˆ η ij) and the regressors are collinear again. For HMR s sample the non-linearity of the heterogeneity-bias term is insufficient to identify the model, and this motivates their search for an additional exclusion restriction. This is a variable φ ij, which enters into equation (3) but not equation (1). Such a variable breaks the collinearity problem, by introducing an extra source of variation into ẑ ij that is not collinear with the regressors of equation (1). In economic terms, this would be a factor that affects the fixed, but not the variable, costs of trade. HMR proposed two potential exclusion restrictions: measures of the costs of starting a firm, as compiled by Djankov et al (2002); and an index of religious similarity. There are drawbacks to using either of these exclusion restrictions. Regulatory costs seem like they would be correlated with fixed costs of entry into business, and possibly by extension with the fixed costs of entering export markets, although this is less clear. However, they may also be correlated with factors affecting variable costs of trade 9, violating the exclusion restriction. A 9 For example, a country with higher regulatory barriers may also be more likely to be a higher tax environment, which would be expected to reduce the profitability of exporting at the intensive margin too. Countries with more regulation might also be more likely to use quantitative trade restrictions such as import or export licenses, or other non-tariff barriers, which would also affect the intensive margin, but are typically not controlled for. 15

second weakness of the regulatory data is that it is only available for a sub-set of countries (116 out of HMR s full sample of 158), and so cannot be used in a broad panel setting 10. The conceptual case for the validity of religion as a factor affecting fixed but not variable costs of trade is very unclear. In their original paper HMR justify the exclusion on the grounds that their religion variable is not statistically significant in a benchmark OLS gravity equation, suggesting that it is broadly uncorrelated with trade. Unfortunately, there is a problem with their original data 11. Replacing their data with a similar index compiled from Barrett et al (2001) indicates that religion is a highly significant variable in the benchmark gravity equation, undermining the prima facie case for the validity of the exclusion restriction. The difficulty of finding valid or practical exclusion restrictions is a potential pitfall of HMR s methodology. One motivation for controlling for the sample selection induced by the non-reporting of trade is that it weakens the collinearity between the productivity-hetereogeneity and sample-selection correction terms. This allows more general identification of the model. 3.2 Controlling for Non-Reporting of Trade Heckman s sample-selection correction can be extended to the case of multiple selection decisions, by jointly estimating the underlying selection relationships. HMR s system of equations (1)-(3) is extended to include an equation specifying 10 Reduction of the sample size is a potentially serious problem for HMR s methodology, which relies on the presence of zero trade flows in the aggregate. If country j exports to all other countries in the sample, then its exporter-specific dummy perfectly predicts trade in the first-stage. This is problematic because it implies a fitted value of infinity for ẑij, which means that all observations of exports from j must be dropped from the second-stage, as the heterogeneity correction term cannot be estimated. For 1986, the baseline year for HMR s study, the reduction in sample size only led to the dropping of 9 importers or exporters (11 when one uses the more complete UN data which includes some positive trade flows omitted from the Feenstra dataset). However, given the growth in trade over time, in later years this reduction in the sample could be a critical problem for applying the technique, as more countries will trade with all members of this sub-sample. 11 The measure is an index of religious similarity of a country-pair, and as such should be the same for the observation of country i s exports to j as for the observation of i s imports from j. Unfortunately this is not the case, which indicates that there has been a corruption of their data. 16

the reporting decision R ij = 1[r ij > 0] (4) r ij = τ 0 + ϕ j + ω i ν d ij + η R ij (5) where the latent variable driving the reporting decision, rij, depends on the trade barriers d ij, importer/exporter dummies, and a normally distributed error η R ij. The errors of equations (1), (3) and (5) are assumed jointly Normally distributed 12 e η Z ij η R ij N(0, Σ), Σ = σ 2 e σ eηz σ eηr σ eηz 1 σ ηz η R σ eηr σ ηz η R 1 The selection relationships follow Poirier s (1980) model of a bivariate probit 13. The bivariate probit with partial observability treats the dependent variable as 1 if we observe a positive trade flow, and 0 otherwise. The dependent variable is assumed to be 1 if both the underlying probits are 1, and 0 if either is 0. This yields the log-likelihood function ln(l) = n y ij ln(f (z ij, r ij, ρ)) + (1 y ij ) ln(1 F (z ij, r ij, ρ)) where for notational convenience ρ denotes the covariance between η Z ij and ηr ij, previously denoted as σ ηz η R, and F (zij, r ij, ρ) is the CDF of the bivariate normal distribution with unit variances and covariance ρ. y ij is the dependent variable, which is 1 if a positive trade flow is observed. The parameters of both underlying selection equations can be jointly estimated from this log-likelihood function. Poirier (1980) discusses the identifiability of the model. The reduced form parameters are locally identified except 12 The assumption of unit variances for η Z ij and η R ij is without loss of generality. The coefficients of a probit can only be estimated up to scale, so the coefficients in equations (3) and (5) are normalised by the variance of their respective errors. The other key parameter in what follows is the covariance σ ηz η R, but this enters the following equations as the correlation, which is the covariance automatically scaled by the variances. 13 The model is also treated very approachably in Maddala (1983), pp. 278-283. See Grilli (2005) for an application. 17

in pathological cases 14. However, there can be a labelling problem, as if the regressors are identical in both equations it is not possible to identify which coefficients correspond to which selection relationship due to the symmetric nature of the problem. However, as long as there is at least one variable excludable from one of the selection equations, this labelling problem is resolved. Controlling for the simultaneous selection in the intensive-margin log-linearised gravity equation is straightforward. When estimating equation (1) by OLS, we estimate E[m ij T ij = 1, R ij = 1]. As in the single selection case, we need to take into account the possibility that E[e ij T ij = 1, R ij = 1] E[e ij ] = 0. E[e ij T ij = 1, R ij = 1] = β Z H Z ij + β R H R ij where β Z σ eη Z σ 2 e β R σ eη R σ 2 e H Z ij H R ij φ(z ij )Φ ( r ij ρz ij 1 ρ 2 F (zij, r ij (, ρ) φ(rij )Φ z ij ρr ij 1 ρ 2 F (z ij, r ij, ρ) ) ) These expressions are analogous to the inverse Mills ratio, extended to the two-stage selection case. If there is no correlation between the two selection stages (ρ = 0), then the expressions simplify down to the single-variable selection correction. F (zij, r ij, 0) = Φ(z ij )Φ(r ij ), so H Z ij = φ(z ij ) Φ(z ij ) H R ij = φ(r ij ) Φ(r ij ) 14 Such as equality of the coefficients across the two equations. 18

and the dual sample selection is controlled for simply by including a standard inverse Mills ratio for each stage. The objects zij, r ij, and ρ are all estimable quantities from the first stage, and so the sample selection corrections can be made in a two-step procedure analogous to HMR s original method. Defining ˆ η ij E[η Z ij T ij = 1] ẽ ij e ij β Z H Z ij β R H R ij ẽ ij e ij β HMRˆ η ij we consistently estimate the intensive margin gravity equation (1) by m ij = β 0 + λ j + χ i γd ij + β Z H Z ij + β R H R ij + ln(e δ(ẑ ij +ˆ η ij ) 1) + ẽ ij (6) Comparing this to HMR s original corrected equation m ij = β 0 + λ j + χ i γd ij + β HMRˆ η ij + ln(e δ(ẑ ij +ˆ η ij ) 1) + ẽ ij (7) the key difference is the change in the sample selection corrections, H Z ij and HR ij instead of ˆ η ij. This difference weakens the collinearity between ˆ z ij = ẑij + ˆ η ij and the other regressors. In HMR s original equation, the coincidence that the control for the productivity term being positive and trade being positive was the same meant that both were controlled for by the same inverse Mills ratio, ˆ η ij. Using the modified inverse Mills ratios breaks this coincidence. The correction for the productivity term is the original inverse Mills ratio 15, which controls for the fact that productivity is only in the regression when above its cutoff. This inverse Mills ratio is conditional on T ij > 0, but not conditional on R ij > 0 too, as the reporting decision is irrelevant to the underlying productivity cut-off, once the parameters of equation 3 have been consistently estimated. 15 Original in the sense that it has the same functional form. Its value will be different, as the estimates of the parameters of equation (3) will have changed, as they will reflect the estimates from the joint estimation which controls for some observations being unreported. 19

However, the corrections for sample selection in the main gravity equation require both of the new modified inverse Mills ratios. Since ˆ η ij is a non-linear function of ẑij, and is no longer itself included as a regressor in the main equation, ˆ z ij is now a non-linear function of the other regressors, and no longer collinear with them. Controlling for both dimensions of sample selection not only consistently estimates the parameters of the underlying structural models (assuming that they are correctly specified), but also separates the correction for sample selection from the correction for heterogeneity, allowing both effects to be identified without distinguishing fixed versus variable trade costs. 4 Empirical Results Table 3 reports traditional OLS gravity equation estimates on two samples. Column [1] reports for the full sample of 175 countries. Out of a possible 30450 trade flows, 14503 were observed positive. Column [2] repeats this for the sub-set of 116 countries for which there is data on regulatory costs of entry. Out of a possible 13340 flows, 8583 were observed positive. Some countries in the Reg sample export or import to all other partners in the sample 16. This makes their country exporter or importer dummy a perfect predictor of the outcome in the first-stage probit, which implies an infinite coefficient on the dummy and for the latent variable in the probit. The second stage estimation cannot proceed with an infinite value for ˆ z ij, so these observations must be dropped, reducing the number of useable positive observations in the second stage to 7327 17. To maintain consistency between the second stage sample and the benchmark gravity equation these observations are also dropped here. Column [3] repeats the regression of Column [2], but includes the measures 16 The exporters are Japan, Hong Kong, Denmark, France, Germany, Italy, the Netherlands, Sweden, the UK, and Norway. The importer is Japan. 17 The same issue arises in HMR s original paper. See the discussion on pp.461-462. 20

[1] [2] [3] All Reg Excl Reg Incl log(distance) -1.305*** -1.278*** -1.295*** (0.0269) (0.0413) (0.0415) Border 0.0605 0.216 0.212 (0.123) (0.154) (0.154) Island 0.719*** 0.750*** 0.732*** (0.0858) (0.176) (0.176) Landlock 0.265 0.0838 0.0889 (0.186) (0.198) (0.197) Colonial 0.925*** 0.558*** 0.553*** (0.110) (0.158) (0.157) Language 0.371*** 0.356*** 0.351*** (0.0570) (0.0809) (0.0808) Legal 0.321*** 0.372*** 0.384*** (0.0438) (0.0603) (0.0603) Religion 0.443*** 0.682*** 0.690*** (0.0897) (0.127) (0.127) CU 1.884*** 1.466*** 1.531*** (0.222) (0.370) (0.370) FTA 0.446*** -0.273-0.214 (0.116) (0.181) (0.181) Reg: cost -0.331*** (0.0973) Reg: days -0.234** (0.111) Observations 14503 7327 7327 R 2 0.706 0.683 0.684 *** p<0.01, ** p<0.05, * p<0.1 Standard errors in parentheses Column [1]: all 175 countries Columns [2] and [3]: 116 countries with regulation data Importer and Exporter dummies Table 3: Benchmark Traditional OLS Gravity Equations 21

of regulatory costs of starting a business, which HMR use as their exclusion restriction to identify the intensive margin. The regulation variables are highly significant in the traditional OLS regression. Although this does not necessarily invalidate the exclusion restriction (their statistical significance could reflect omitted variable bias, through their correlation with the omitted heterogeneityproductivity term), it is prima facie evidence that they are strongly correlated with the volume of trade, which is suggestive that they might affect both intensive and extensive margins. Table 4 reports estimates for the latent variables for the first-stage probits on the regulation sub-sample. Column [1] gives the estimates for a univariate probit on the observed positive trade flows, following HMR s methodology. Columns [2] and [3] of Table 4 report the joint maximum likelihood estimates of the positive trade and reporting probits using the partially observed bivariate probit model. Comparing the coefficients of the underlying positive-trade and reporting probits to the univariate probit, those of the univariate probit generally lie inbetween those of the two bivariate probits, suggesting that the outcome of the univariate probit reflects a mixture of the two selection processes. The trade barriers are highly correlated with the reporting decision, which suggests that the selection induced by non-reporting should not be ignored. The correlation between the errors in equations 3 and 5 is estimated to be 1. One concern with partial observability and the Poirier model is that there is a loss of efficiency relative to the full information estimation 18. Unfortunately we cannot compare the partial information to the full information estimates, but for most variables the standard errors are very similar to those for the univariate probit, suggesting that augmenting the first stage to control for non-reporting does not lead to a large efficiency loss. An exception to this is for the Island, Colonial and Currency Union variables in the Positive Trade probit, for which standard errors cannot be computed. In 18 See Meng and Schmidt (1985). 22

[1] [2] [3] Univariate Bivariate Probit Probit Positive Trade Reporting log(distance) -0.582*** -1.296*** -0.430*** (0.0356) (0.0723) (0.0540) Border -0.378*** 0.168 0.0893 (0.133) (0.376) (0.226) Island 0.314** 40.38-0.0725 (0.150) ( ) (0.178) Landlock 0.105-0.0577 0.789*** (0.132) (0.180) (0.233) Language 0.416*** 1.208*** -0.601*** (0.0632) (0.103) (0.103) Colonial -0.0856 50.12 5.582 (0.292) ( ) (530.3) Legal 0.149*** 0.0757 0.583*** (0.0440) (0.0678) (0.0745) Religion 0.390*** 0.202 0.578*** (0.102) (0.147) (0.163) CU 0.844*** 12.70 0.726 (0.230) ( ) (0.445) FTA 1.819*** -0.668 8.447 (0.533) (0.841) (75246) Reg: Cost -0.403*** -0.127-0.337*** (0.0857) (0.161) (0.130) Reg: Days -0.0939 0.106-0.546*** (0.0762) (0.109) (0.132) Neither Reports - ρ 1 (0) Observations 13340 13340 13340 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Importer and Exporter dummies Table 4: Zero-trade and Reporting Probits: Regulation Sample 23

a univariate probit, a dummy variable that perfectly predicts the dependent variable is estimated to have an infinite coefficient, and the probit is estimated after dropping those observations. In the bivariate case, it is possible for a variable to be a perfect predictor of one of the underlying probits, but not the other. In this case, the coefficient cannot be precisely estimated for the probit for which it is a perfect predictor, but the variable will not perfectly predict the imperfectly observed dependent variable of the bivariate probit, because the dependent variable may not be observed due to the second equation 19. It would be inappropriate to drop these observations from the bivariate probit, as firstly we do not know which variables will be perfect predictors in one of the underlying equations, and the variables should be included in the second equation. Table 5 reports the estimates for the parametric specification of the gravity equation given in equation (1), based on a Pareto distribution of productivity. Column [1] is based on the estimate of ˆ z ij derived from column [1] of Table 4, following the standard HMR procedure, and excluding the regulation variables from the second stage in order to identify the model. Comparison of column [1] of Table 5 and column [2] of Table 3 broadly replicates HMR s finding that the absolute magnitude of most trade barriers is smaller in the bias-corrected estimates than the traditional OLS gravity equation 20. This motivates their conclusion that heterogeneity-bias inflates standard OLS estimates. The standard errors for all the second stage regressions given are somewhat impressionistic, as they have not been corrected to take into account the generated regressors from the first-stage. The coefficient given on ˆ z ij is actually for = log(δ) 21. This implies an estimate of δ of 0.2526. This is somewhat 19 For example, all colonial powers might trade with their former colonies, making the colonial dummy a perfect predictor in the positive trade probit. However, if they do not also all report their trade, some colonial country-pairs will not have positive trade flows recorded, and the colonial dummy will not perfectly predict the overall dependent variable. 20 This is true for distance, island, landlock, legal, language, currency union, and religion, but not for border, colonial or FTA. 21 δ must be positive, and to impose this constraint it is convenient to replace it with the 24

[1] [2] [3] HMR TB - Reg excl TB - Reg incl ln(distance) -0.994*** -0.712*** -0.700*** (0.1029) (0.0692) (0.0713) Border 0.430*** 0.174 0.162 (0.1626) (0.1440) (0.1442) Island 0.540*** -13.618*** -14.363*** (0.1707) (2.0205) (2.1286) Landlock 0.030 0.099 0.106 (0.1994) (0.1956) (0.1952) Legal 0.301*** 0.377*** 0.385*** (0.0646) (0.0609) (0.0610) Language 0.144-0.203** -0.224** (0.1065) (0.0954) (0.0974) Colonial 0.610*** -16.867-17.796 (0.1193) (-16.8675) (-17.7961) Currency union 1.051** -3.608*** -3.780*** (0.4842) (0.7280) (0.7516) FTA -0.747* 0.357** 0.381*** (0.3934) (0.1461) (0.1467) Religion 0.489*** 0.475*** 0.480*** (0.1445) (0.1295) (0.1296) Reg Cost -0.101 (0.0974) Reg Days -0.284** (0.1129) ˆη -0.083 (0.1920) H Z ij -3.106*** -0.063 (0.5354) (0.0978) H R ij 0.268-0.005 (0.4254) (0.0041) ẑ -1.376-1.060*** -1.008*** (1.0404) (0.1470) (0.1467) Observations 7327 7327 7327 *** p<0.01, ** p<0.05, * p<0.1 Standard errors in parentheses Importer and Exporter dummies Table 5: Intensive Trade Margin: Regulation Sample 25

lower than HMR s original estimate of 0.84. I conjecture that one reason for this difference is that I do not follow their practice of censoring ˆ z ij above 5.199, which has the effect of increasing their estimate of δ 22. Column [2] of Table 5 reestimates equation (7), maintaining the regulation variables as excluded from the second stage, but using the estimate of ˆ z ij derived from column [2] of Table 4 and the dual sample selection correction terms HZ ij and HR ij. The variables Island, Colonial and Currency Union whose coefficients in the bivariate positive trade probit were estimated very imprecisely suffer a large loss of efficiency in the modified procedure, presumably reflecting the imprecision of the first stage. The opposite seems to be true for the other variables, whose standard errors diminish somewhat. The results support HMR s finding of a significant productivity-heterogeneity bias, as the coefficients in Table 5 generally have a smaller absolute magnitude than the OLS benchmarks. There is an interesting difference in the coefficient on membership of an FTA, which is negative in HMR s specification, but quite economically and statistically significant using the modified procedure. A positive coefficient seems more economically intuitive. Column [3] of Table 4 shows that countries sharing an FTA are much more likely to report their trade, which is also quite intuitive, since most FTAs have strict rules of origin clauses which necessitate careful documentation of intra-fta trade. Distinguishing this effect of higher reporting quality from the influence on the extensive trade margin seems to also make a significant difference to the intensive margin estimates. Column [3] repeats the estimation of column [2] but includes measures of regulation in the second stage. As discussed above, the second-stage is still identified even without an exclusion restriction, and there is no loss of precision unconstrained parameter = log(δ) and estimate ln(e e (ẑ ij +ˆ η ij ) 1). To recover δ, the coefficient in ln(e δ(ẑ ij +ˆ η ij ) 1), should be exponentiated. The delta method could be used to derive a standard error for δ from that of. 22 This would affect 396 observations in this sample. 26

in the standard errors from relaxing the exclusion restriction. The point estimates are also very similar to those of column [2], which is encouraging. Column [3] gives mixed support for the validity of HMR s exclusion restriction, as one of the regulation variables is found to be statistically significant in the intensive margin, although the other is not. This suggests that Reg Cost can be validly excluded, but Reg Days should not be used for identification. Table 6 repeats estimation of the first-stage probits on the full sample. The results are broadly similar to those on the Regulation sample, but the greater variation in the dataset means that none of the trade barriers appear to be perfect predictors in either of the underlying probits, so that all are estimated with relatively tight standard errors. There appears to be very little efficiency loss between the univariate and bivariate specifications, and the univariate coefficients mostly lie between those of the two bivariate equations. One noticeable difference between the two samples is that on the full sample ρ is estimated to be negative, whereas on the regulation sample it was estimated to be 1. It is hard to have a strong prior as to what the correct sign for the correlation based on unobserved variables should be. The estimate of 1 lies on the boundary of the coefficient space, and an interior solution may be more appealing. Although it is possible that the value could change a lot with the underlying sample, the difference in these results suggests that the correlation coefficient may not be very precisely estimated by this procedure. Table 7 reports estimates from a non-parametric approximation of the heterogeneity bias correction term, using a seventh-order polynomial of ˆ z ij. Columns [1]-[3] replicate the non-linear estimates of columns [1]-[3] of Table 5. Column [4] reports the polynomial approximation using the full sample and the first-stage estimates in columns [2] and [3] of Table 6. Columns [1] and [2] of Table 7 use the regulation data as an exclusion restriction to identify the second stage, while columns [3] and [4] are estimated without an additional exclusion restriction on fixed versus variable trade costs. 27

[1] [2] [3] Univariate Bivariate Probit Probit Positive Trade Reporting log(distance) -0.709*** -1.027*** 0.0753 (0.0198) (0.0301) (0.0581) Border -0.505*** 0.754*** 0.139 (0.0986) (0.293) (0.211) Island 0.274*** 0.209*** 0.416** (0.0544) (0.0701) (0.188) Landlock 0.180 0.0900 0.582 (0.111) (0.159) (0.383) Language 0.344*** 0.747*** -0.876*** (0.0371) (0.0520) (0.131) Colonial -0.366** 0.0760 0.186 (0.157) (0.343) (0.266) Legal 0.107*** 0.0417 0.631*** (0.0274) (0.0366) (0.114) Religion 0.202*** 0.116 0.293 (0.0580) (0.0749) (0.183) CU 0.524*** 0.422* 3.380*** (0.153) (0.231) (0.657) FTA 1.458*** 1.307*** 1.120*** (0.168) (0.257) (0.408) Neither Reports - ρ -0.548 (0.103) Observations 30450 30450 30450 *** p<0.01, ** p<0.05, * p<0.1 Standard errors in parentheses Importer and Exporter dummies Table 6: Zero-trade and Reporting Probits, Full Sample 28

[1] [2] [3] [4] HMR TB - Reg excl TB - Reg incl TB - Full log(distance) -0.919*** 0.0230 0.119-3.252*** (0.137) (0.197) (0.202) (0.486) Border 0.572*** 0.0408 0.0107 1.578*** (0.175) (0.156) (0.156) (0.378) Island 0.279-35.94*** -39.21*** 1.055*** (0.190) (6.064) (6.193) (0.128) Landlock 0.0157 0.186 0.200 0.543*** (0.196) (0.196) (0.196) (0.187) Colonial 0.591*** -44.65*** -48.67*** 1.073*** (0.156) (7.530) (7.688) (0.114) Language 0.120-0.857*** -0.954*** 1.771*** (0.126) (0.194) (0.198) (0.361) Legal 0.262*** 0.330*** 0.335*** 0.407*** (0.0677) (0.0611) (0.0611) (0.0467) Religion 0.420*** 0.416*** 0.406*** 0.644*** (0.153) (0.129) (0.129) (0.104) CU 0.989** -10.75*** -11.73*** 2.343*** (0.405) (1.928) (1.969) (0.290) FTA 0.689 0.753*** 0.821*** 3.884*** (0.505) (0.206) (0.207) (0.630) Reg: cost -0.0362 (0.0995) Reg: days -0.377*** (0.112) ˆ η ij 0.130 (0.668) H R ij 0.000186 0.000827-0.605*** (0.00743) (0.00743) (0.169) H Z ij -0.450*** -0.483*** 3.741*** (0.0963) (0.0977) (0.426) ˆ z ij -2.683 1.519*** 1.610*** 1.918** (4.398) (0.186) (0.191) (0.877) ˆ z 2 ij 3.385-0.0572*** -0.0583*** -1.110*** (3.204) (0.00718) (0.00730) (0.280) ˆ z 3 ij -1.345 0.00225*** 0.00230*** 0.150** (1.238) (0.000348) (0.000352) (0.0630) ˆ z 4 ij 0.269-4.45e-05*** -4.54e-05*** -0.0105 (0.270) (8.07e-06) (8.14e-06) (0.00751) ˆ z 5 ij -0.0293 4.64e-07*** 4.74e-07*** 0.000357 (0.0331) (9.59e-08) (9.66e-08) (0.000484) ˆ z 6 ij 0.00164-2.44e-09*** -2.49e-09*** -4.22e-06 (0.00212) (5.63e-10) (5.66e-10) (1.59e-05) ˆ z 7 ij -3.69e-05 0*** 0*** -1.60e-08 (5.49e-05) (0) (0) (2.07e-07) Observations 7327 7327 7327 14503 R 2 0.695 0.690 0.691 0.719 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 Importer and Exporter dummies Table 7: Polynomial Approximations of Intensive Margin 29

There is very little loss of efficiency from including the regulation data in column [3] relative to column [2], or from estimating the polynomial equation on the full sample without an additional exclusion restriction. For the regulation sample, the variables Island, Colonial and Currency Union continue to be poorly estimated, as in the non-linear estimation. The higher order polynomial terms in this sample are very statistically significant, which suggests that the approximation may be somewhat inaccurate and even more high-order terms should be included. This may also explain why the coefficients of some of the trade barriers here (notably distance) are somewhat different from those in the non-linear specification. The implications for the validity of HMR s exclusion restriction are similar to those from the non-linear specification, with Reg Cost appearing excludable but not Reg Days. Column [4] presents results for the full sample. The polynomial approximation appears to be less dependent on higher order terms here, and the coefficients on all of the trade barriers are relatively tightly estimated. The sample correction and heterogeneity-bias terms are much more statistically significant using the modified procedures than under HMR s original, which suggests that the modifications are helping to identify these effects more accurately. 5 Conclusions Distinguishing effects of trade barriers on the intensive and extensive margins of trade is a growing area of research, and HMR have provided an elegant framework with which to disentangle these effects. However, when bringing their approach to the data, it is important to recognise the limitations of existing databases. In particular, insufficient attention has been given so far to distinguishing trade flows that are actually zero from those that are unreported. This issue is likely to be even more urgent for scholars using disaggregated trade data, for which the likelihood of underreporting is presumably somewhat higher. 30