Systematic unexplained variance of Standards- based human rights measures 1

Similar documents
The Politics of Human Rights G George W. Downs Spring 2006

Measuring Human Rights and the Impact of Human Rights Policy

Review Meeting. Corruption and Human Rights. Geneva, July Todd Landman & Carl Jan Willem Schudel

AmericasBarometer Insights: 2009 (No.27)* Do you trust your Armed Forces? 1

Appendix: Uncovering Patterns Among Latent Variables: Human Rights and De Facto Judicial Independence

AmericasBarometer Insights: 2010 (No. 37) * Trust in Elections

The impact of U.S. foreign aid on human rights conditions in post-cold War era

Thinking Inside the Box: A Closer Look at Democracy and Human Rights

Uncovering patterns among latent variables: human rights and de facto judicial independence

Do Constitutional Rights Make a Difference?

Impact of Human Rights Abuses on Economic Outlook

Coding Personal Integrity Rights: Assessing Standards-Based Measures against Human Rights Law and Practice

International Human Rights Treaty to Change Social Patterns. - The Convention on the Elimination of All Forms of Discrimination against Women

Measuring the Impact of Human Rights Organizations

Poverty Reduction and Economic Growth: The Asian Experience Peter Warr

Supplementary Material for Preventing Civil War: How the potential for international intervention can deter conflict onset.

Human Rights Institutions, Membership, and Compliance. Emilie M. Hafner-Burton Jon C. Pevehouse

Information Politics v Organizational Incentives: When Are Amnesty International s Naming and Shaming Reports Biased?

Legalization and Leverage: How Foreign Aid Dependence Conditions the Effect of Human Rights Commitments

In 2007, American volunteers from a prominent

Brazilian Political Science Review E-ISSN: Associação Brasileira de Ciência Política Brasil

Statistical Analysis of Corruption Perception Index across countries

Do Constitutional Rights Make a Difference?

Measuring Mutual Dependence between State Repressive Actions

POLITICAL REPRESSION AND PUBLIC PERCEPTIONS OF HUMAN RIGHTS. Christopher J. Anderson Patrick M. Regan Robert L. Ostergard

The transition of corruption: From poverty to honesty

Course Description. Course Objectives. Required Reading. Grades

Contiguous States, Stable Borders and the Peace between Democracies

Measuring Mutual Dependence Between State Repressive Actions

The Employment of Low-Skilled Immigrant Men in the United States

Rewarding Human Rights? Selective Aid Sanctions against Repressive States Supporting Information

The Political Terror Scale (PTS): A Re-introduction and a Comparison to CIRI

GOVERNANCE RETURNS TO EDUCATION: DO EXPECTED YEARS OF SCHOOLING PREDICT QUALITY OF GOVERNANCE?

Ethnic minority poverty and disadvantage in the UK

Unequal Recovery, Labor Market Polarization, Race, and 2016 U.S. Presidential Election. Maoyong Fan and Anita Alves Pena 1

Differences Lead to Differences: Diversity and Income Inequality Across Countries

GENDER EQUALITY IN THE LABOUR MARKET AND FOREIGN DIRECT INVESTMENT

Better the Devil You Know? Autocracy, State Failure, and Human Rights

Strengthening Protection of Labor Rights through Preferential Trade Agreements (PTAs)

A Comparative Analysis of the Transitional Effect of Democratic Regime Change on Human Rights Development. Alana McElhinney Bemidji State University

Chapter 1. Introduction

Theorists, policymakers, NGOs, revolutionaries,

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants

Happiness and economic freedom: Are they related?

Measuring Mutual Dependence Between State Repressive Actions

Quality of Institutions : Does Intelligence Matter?

Presidents and The US Economy: An Econometric Exploration. Working Paper July 2014

Volume 35, Issue 1. An examination of the effect of immigration on income inequality: A Gini index approach

Political authorities know that youth are generally

All s Well That Ends Well: A Reply to Oneal, Barbieri & Peters*

Comparing the Data Sets

How to read statistics? Kjersti Skarstad, PhD Candidate, Department of Political Science

Colorado 2014: Comparisons of Predicted and Actual Turnout

Can Politicians Police Themselves? Natural Experimental Evidence from Brazil s Audit Courts Supplementary Appendix

Paper Title: Political Conditionality: An Assessment of the Impacts of EU Trade and Aid Policy

Immigrant-native wage gaps in time series: Complementarities or composition effects?

When and how do international commitments

Benefit levels and US immigrants welfare receipts

ADDITIONAL RESULTS FOR REBELS WITHOUT A TERRITORY. AN ANALYSIS OF NON- TERRITORIAL CONFLICTS IN THE WORLD,

The role of Social Cultural and Political Factors in explaining Perceived Responsiveness of Representatives in Local Government.

English Deficiency and the Native-Immigrant Wage Gap

CLAIR APODACA Associate Professor

Course Description. Course Objectives. Required Reading. Grades

Panel 3 New Metrics for Assessing Human Rights and How These Metrics Relate to Development and Governance

Human rights, political instability and investment in south Africa: a note

Corruption and business procedures: an empirical investigation

Just War or Just Politics? The Determinants of Foreign Military Intervention

Human Rights Violations and Competitive Elections in Dictatorships

Direction of trade and wage inequality

HOW DO HUMAN RIGHTS PROSECUTIONS IMPROVE HUMAN RIGHTS AFTER TRANSITION?

The UK Policy Agendas Project Media Dataset Research Note: The Times (London)

Economic and Social Council

Thomas Plumper and Eric Neumayer The level of democracy during interregnum periods: recoding the polity2 score

Risk Factors for Forced Migrant Flight

Supporting Information for Signaling and Counter-Signaling in the Judicial Hierarchy: An Empirical Analysis of En Banc Review

Economic Growth, Foreign Investments and Economic Freedom: A Case of Transition Economy Kaja Lutsoja

International Law, Constitutional Law, and Public Support for Torture

Comparison on the Developmental Trends Between Chinese Students Studying Abroad and Foreign Students Studying in China

Special Report: Predictors of Participation in Honduras

NBER WORKING PAPER SERIES HOMEOWNERSHIP IN THE IMMIGRANT POPULATION. George J. Borjas. Working Paper

PROJECTING THE LABOUR SUPPLY TO 2024

The Not So Peaceful Domestic Democratic Peace

David Stasavage. Private investment and political institutions

The interaction effect of economic freedom and democracy on corruption: A panel cross-country analysis

School of Politics and International Relations, University of Nottingham, Nottingham, NG7 2RD, UK;

THE IMPACT OF CHINESE FOREIGN AID ON HUMAN RIGHTS IN AFRICA. Siobhan C. Dempsey. Submitted to the Faculty of

Does horizontal education inequality lead to violent conflict?

ISLAM S BLOODY INNARDS? Religion and Political Terror, *

Monitoring Governance in Poor Countries. Steve Knack DECRG-PRMPS June 13, 2002

WP 2015: 9. Education and electoral participation: Reported versus actual voting behaviour. Ivar Kolstad and Arne Wiig VOTE

Response to the Evaluation Panel s Critique of Poverty Mapping

Community Well-Being and the Great Recession

Classification and Rating of Democracy. A Comparison. John Högström. Abstract

The Contribution of Veto Players to Economic Reform: Online Appendix

The Demography of the Labor Force in Emerging Markets

FOREIGN FIRMS AND INDONESIAN MANUFACTURING WAGES: AN ANALYSIS WITH PANEL DATA

A REPLICATION OF THE POLITICAL DETERMINANTS OF FEDERAL EXPENDITURE AT THE STATE LEVEL (PUBLIC CHOICE, 2005) Stratford Douglas* and W.

David A. Armstrong II Curriculum Vitae 1

Disaggregating the Human Rights Treaty Regime

What makes people feel free: Subjective freedom in comparative perspective Progress Report

Transcription:

Systematic unexplained variance of Standards- based human rights measures 1 Anita Rosemary Gohdes University of Mannheim & Peace Research Institute Oslo Email: anita.gohdes@uni- mannheim.de Todd Landman University of Essex Director, Institute for Democracy and Conflict Resolution Email: todd@essex.ac.uk Draft, October 2012. Please do not cite without permission (Comments welcome) Abstract Research, policy analysis and conditional aid policy among some donor countries rely on standards- based measures of country human rights performance. These measures code annual performance based on narrative reports published by the US State Department and Amnesty International. In particular, these include the Political Terror Scales (PTS), the Cingranelli and Richards (CIRI) Physical Integrity Rights Index, and the Scale of Torture by Oona Hathaway. In this paper, we compare these measures across two steps. First, we use each measure to estimate a model of human rights performance that includes explanatory variables for the protection of civil and political rights for which there is widespread consensus in the literature. We then use this model to produce in- sample and out- of- sample predictions of human rights performance for each country between 1982 and 2004. Comparing these predictions across different measures, time periods, and geographical regions allows us to assess differences and commonalities of these most widely used indicators for human rights protection that move beyond mere case comparisons or correlations. Our analysis reveals that the predictions we make of the state of human rights in the world are dependent on the human rights measures we use. Furthermore, we find that the models tend to systematically over- predict the level of human rights protection in the 1990s, both at the global and the regional level. 1 Paper prepared for the Workshop on Comparative Observation with Numbers A critical assessment of standards- based human rights measures, Blankenbach, 26-27 October 2012. This paper builds on: Todd Landman, David Kernohan and Anita Gohdes (forthcoming) Relativising Human Rights, Journal of Human Rights.

Introduction The quantitative analysis of human rights often features standards- based scales of human rights, which code country performance on limited ordinal scales coded from narrative accounts of human rights conditions (see Jabine and Claude, 1992; Cingranelli and Richards, 1999; Landman, 2004; Landman and Carvalho, 2009). These scales have also been used by policy analysts in the international donor community in studies on governance (UNDP, 2006), allocation of aid through such initiatives as the Millennium Challenge Account, and other projects that need to assess the human rights performance of global samples of countries (Landman and Carvalho, 2009). Two of the most popular scales include the Political Terror Scale, which are two separate scales that code US State Department Country Reports and Amnesty International Annual Reports (see Poe, Carey, and Vazquez, 2001), and the Cingranelli and Richards physical integrity rights index, which is a composite index comprised of individual measures of disappearances, extrajudicial killing, political imprisonment, and torture coded from US State Department Country Reports. In addition, Oona Hathaway (2002) devised a scale of torture using the same coding methods used for the Political Terror Scale. These scales have served primarily as dependent variables in increasingly complex statistical models using pooled cross- section time- series (PCTS) data sets (see Landman, 2005; 2009a; 2009b) that identify key explanatory variables. In our previous work (Landman, Kernohan, and Gohdes, 2012), we argued that absolute comparison of these different scales to judge country performance ignores the different socio- economic and institutional conditions that may have an effect on the protection of human rights, and we created a relativised score of human rights in two stages. First, we created a human rights factor index from these different scales given their high degree of inter- correlation (see below). Second, we estimated a model human rights performance, where the factor index was modeled as a function of explanatory variables about which there is a large degree of academic consensus. We then used the residuals of this model as a meaningful measure of the over- and under- performance of countries given the presence of this set of variables. The analysis provided an alternative way of ranking country performance and showed that many countries actually have much better human rights records than expected, while others have worse human rights records than we would expect, given their socio- economic situation.

In this paper, we build on this previous work, but focus more specifically on differences between the different standards- based measures. Whereas our previous paper aimed at providing a better understanding of the relative human rights protection by composing a composite index and comparing the performance of different countries and regions, we now examine how the predictions of human rights change, depending on the measure we use. We analyze the systematic unexplained variance of the two versions of the Political Terror Scale, the Cingranelli and Richards physical integrity rights index, and the scale of torture. Each individual scale is used to estimate a model of human rights that includes the collection of explanatory variables in our previous work. As these scales are limited ordinal scales, we use an ordered logistic regression model as our main method of estimation. For each model, we produce in- sample and out- of- sample predictions of human rights performance for each country- year observation between 1982 and 2004. Comparing these predictions across different measures, time periods, and geographical regions allows us to assess differences and commonalities of these most widely used indicators for human rights protection. We begin by discussing the basic features of the scales and their high degree of inter- correlation. We then outline the consensus model of human rights and explain each of the variables used to comprise that model. We then discuss the procedure and results from the in and out of sample predictions, and compare the different results for the different scales. We conclude by discussing the implications of our analysis for the use of standards- based scales in academic and policy research. Existing scales of human rights performance The development of standards- based measures of human rights have moved from fairly broad conceptions of the relative freedom in a country 2 to more narrowly defined sets of human rights that have in some cases included worker rights, women s economic rights, and women s social rights 3, as well as measures of the de jure commitment of states to human rights through measuring the treaty ratification behavior of states (Keith 1999; Landman 2005; Landman and Carvalho 2009). This present paper is concerned with comparing measures that capture the variable de facto protection in 2 www.freedomhouse.org 3 www.humanrightsdata.com

civil and political rights using what are known as standards- based scales. These scales use source material on human rights practices within countries and apply coding protocols (based on international human rights standards) to the information to derive a set of comparable measures for cross- national and time- series analysis. [Table 1 here] Table 1 summarizes the four standards- based measures compared in this paper. These measures include the two versions of the political terror scale (one coded using US State Department Reports and one coded using Amnesty International Reports), the scale of torture from Oona Hathaway (2002), and the Physical Integrity Rights Index from Cingranelli and Richards. Each of the scales provides a measure of violations of civil and political rights, including such rights as freedom from arbitrary detention, torture, extra- judicial killings, disappearances, exile, freedom of speech, freedom of expression and belief, and freedom of assembly and association. The Political Terror Scales and the Torture scale both range from 1 to 5, whereas the physical integrity index ranges from 0 to 8. For easier comparability, we transformed all measures to range from low (bad rights protection) to high (good rights protection). The table shows that despite the differences in emphasis across the scales, there is considerable overlap between them, as evidenced by the statistically significant inter- correlations. The correlations for the torture scale are the lowest across the board, which reflects the scale s more narrow focus on this form of human rights abuse, but the values within the table range from 0.52 to 0.79, and are all statistical significant at the 1%- level. Building a model of human rights performance We build our model of human rights performance on existing research in the social and political sciences, which has led to a general consensus on the basic model of human rights protection (see Landman 2005a, Davenport, 2007). Since the first cross- national statistical analysis on human rights in late 1980s (Mitchell and McCormick 1988), there has been a proliferation of studies using increasingly large and complex data sets and an expanding list of independent variables (see Landman 2005a; Moore 2006). These variables most notably include the level, pace, and quality of economic development (e.g. Henderson 1991; Poe and Tate 1994; Poe, Tate, and Keith 1999); the level, timing, and

quality of democratization (e.g. Davenport, 1999; Zanger, 2000b; Davenport and Armstrong, 2004; Mesquita, Downs, Smith, and Sherif, 2005); involvement in internal and external conflict (Poe and Tate, 1994; Poe, Tate, and Keith, 1999); the size and growth of the population (Henderson, 1993; Poe and Tate, 1994; Poe Tate and Keith, 1999); a country s history of repressive activities (Davenport, 1996; Poe and Tate, 1994; Poe, Tate and Keith, 1999); foreign direct investment and/or the presence of multinationals (Meyer, 1996; 1998; 1999a; 1999b; Smith, Bolyard, and Ippolito, 1999); the level of global interdependence (Landman 2005b); and the growth and effectiveness of international human rights law (Keith 1999; Hathaway 2002; Landman 2005b; Neumayer 2005; Hafner- Burton and Tsuitsui 2005, 2007; Simmons 2009). Our selection of explanatory variables includes income and land inequality, the level of democracy, level of economic development, domestic conflict, population size, and previous levels of repression. This collection of variables represents a set that has received the most support or generated the highest consensus within the cross- national quantitative research on human rights (see Landman 2005a, 2009 for a summary). Each of these variables and the ways in which they are operationalized are discussed in turn. Inequality For income inequality, we use a new measure based on the inequality project (UTIP) developed by James K. Galbraith and Hyunsub Kum at the University of Texas, Austin. In an effort to overcome the well- known deficiencies of the Deininger and Squire (1996) data set on income inequality (i.e. sparse coverage, problematic measurements, and the combination of diverse data types into a single data set), Galbraith and Kum use the UTIP- UNIDO measures of manufacturing pay inequality as an instrument to create a new panel data set of Estimated Household income inequality (EHII), which covers a large panel of countries from 1963 through 1999, for nearly 3200 country- years. This new dataset provides comparable and consistent measurements across space and through time, thus being a more valid proxy of income inequality than the Deininger and Squire data usually employed by cross- national empirical studies (Galbraith and Kum 2004). For our estimations, a linear interpolation of the original EHII variable has been computed for each country- series to increase the number of observations. For land inequality, we use a measure that is expressed as the area of family farms as a percentage of the total area of land holdings (Vanhanen 1997). The reasoning behind

this measure is that the higher the percentage of family farms, the more widely economic power resources based on ownership patterns of agricultural land are distributed (Vanhanen 1997: 47). Family farms are defined as holdings that are mainly cultivated by the holder family and that are owned by the cultivator family or held in ownerlike possession (Vanhanen 1997: 49). The data on landownership were mainly derived from the FAO World Censuses of Agriculture (from the 1960s to the 1980s) and Vananhen s own estimations for the 1990s. As with our income inequality data, these data have been interpolated to fill in missing time points for those countries where two or more time points of data were made available. To make this variable equivalent to income inequality in terms of its measurement of land inequality, it has been inverted by subtracting the original percentage value from 100 such that a low score means a more favorable distribution of land. Democracy For the level of democracy, we use a modified version of the Polity IV 20- point combined democracy score (DEMOC AUTOC), which ranges from - 10 to + 10. Following Vreeland (2008), we use the X- POLITY variable, which includes most of the components of the combined POLITY score, but takes out the components for competitiveness of political participation (PARCOMP) and regulation of political participation (PARREG), since both of these components contain elements of political violence and suppression. Vreeland argues that their inclusion does not make sense for research on civil war, and we agree that the same holds true for research on the violation of civil and political rights, since both components contain features that are also found in measures of human rights. Even though a large number of human rights studies use the original Polity measure (see Poe and Tate 1994; Poe Tate, and Keith 1999; Davenport 1999; Zanger 2000a; Davenport and Armstrong 2004; Mesquita, Downs, Smith, and Sherif 2005), we are persuaded by Vreeland s argument, and we also expect a positive relationship between his modified Polity measures and our measure of human rights. Certainly, the work of Mesquita et al. (2005) shows that particular components of the Polity measure are indeed related to human rights.

Previous Levels of Repression Extensive studies find past levels of personal integrity abuse to be a reliable predictor of future repressive behavior of states (Davenport, 1996; Poe, Tate and Keith, 1999; Poe and Tate, 1994, to name but a few). Institutional change is slow and coercive behavior, once exercised, tends to become part of a government s portfolio of responses towards internal challenges (Poe, 2004). To control for previous levels of repression, we include lagged binary indicators of the dependent variable into each model (see Hafner- Burton, 2005), to account for the non- linearity of the repression measures. For example, for the Political Terror Scale, which ranges from 1 to 5, we include repression=1(t- 1), repression=2(t- 1), repression=3(t- 1), repression=4(t- 1), where the previous year s level of repression takes on (1) and the other variables take on (0). 4 Domestic conflict As in the research on human rights and political violence, we include a variable for internal domestic conflict, which is specified as an independent variable alongside the other variables in our model. We do not use the conventional dummy variable for civil war from the Correlates of War project, nor do we use events- based measures of the kind coded from single and multiple news sources found in the literature on political violence. The civil war dummy is still a fairly crude variable that tends to absorb quite a lot of the explanatory space in most human rights literature (see Poe and Tate, 1994), and events- based measures are not available for all countries and years under consideration in this study. We thus employ the International Country Risk Guide (ICRG) measure of internal conflict, which is an aggregate 12- point scale that comprises the overall risk levels for civil war and threat of a military coup, terrorism and political violence, and general levels of civil disorder. We feel that this measure is superior in some respects since it provides greater variance than the civil war dummy. We expect this variable to have a negative relationship with the protection of human rights, which is consistent with the findings in both the literature on human rights abuse and political violence. 4 If the previous level of repression is 5, then all four variables take on (0).

Other control variables The level of economic development is measured through the natural log of the value of real per capita income (GDP, constant 2000 US $), and is taken from the World Bank Development Indicators. We expect this variable to have a positive relationship with the protection of human rights. Total population size is based on de facto definition of population, which counts all residents regardless of legal status or citizenship- - except for refugees not permanently settled in the country of asylum, which are generally considered part of the population of their country of origin. The variable is taken from the World Bank and has been logged to correct for skewed distribution. We expect this variable to have a negative relationship with the protection of human rights, since more populous countries tend to have greater difficulty in protecting personal integrity rights. Modeling In- sample and Out- of- sample predictions Our data set follows by now what has become a standard construction of cross- section and time- series units, where variation in the variables and the number of observations is maximized across time and space. Such data sets do, however present a number of problems for estimating parameters using standard regression techniques. First, the error terms tend to be correlated from one time period to the other, which is known as autocorrelation. Second, the error terms tend to be heteroskedastic, which means that they tend to produce different variances across units (Stimson 1985: 19; Beck and Katz 1995: 637-638). To control for autocorrelation, we model the dynamics of our data by including lagged J- 1 dummy variables that account for the J categories of each standards- based measure (see Hafner- Burton, 2005). We use the lagged dummy variables in order to relax the assumption that respect for human rights protection, temporally speaking, follows a linear trend. The inclusion of previous levels of repression into the model is thus both theoretically and statistically relevant. To control for heteroskedasticity, we adopt a variation of White s (1980) estimator of robust standard errors that adjusts for clustering across countries. 5 5 The final adjustment made to the data set was to use a popular method to address the problem of missing data. Some of our variables have less frequent observations than others and create patches in the data set with missing values. We used Gary King s multiple imputation method in R to estimate values for which there are data (Honaker, King and Blackwell 2012). The method uses algorithms to impute missing values for all variables. Since the algorithms randomly draw from the distributions they assume the

In order to compare our measures, we calculate in- sample and out- of- sample predictions of human rights performance, relying on the set of explanatory variables introduced above. Since all four standards- based measures are ordinal, we estimate ordered logistic models. The ordered logistic model provides us with a predicted probability for the level of human rights respect in a country. This is the probability that, given a certain set of explanatory variables, a country will fall into a specific category of our human rights measure. To give an example, if we look at the model using the PTS Amnesty data, our model would tell us that in 1982, Haiti has a predicted probability of 3% to have a PTS score of 1, 27% to have a score of 2, 52% to have a score of 3, 15% to have a score of 4, and 1% to have the most repressive score of 5. The actual score for Haiti in 1982 is 3, so in this case our model does a good job at predicting the human rights score, since 3 has the highest probability associated with it. We use the highest predicted probability to determine the predicted value for each observation, which in the case of Haiti in 1982 would be category 3 with a predicted probability of 52%. In other cases, the highest predicted probability might fall into a different category than the one actually assigned by the original coding of the human rights measure. Here, we are interested to see whether our model over- or under- predicts the human rights performance of a country. If our model predicts a lower category than the actual value, a country is more repressive than we would predict, thus our model over- predicts human rights protection in this country. If our model predicts a higher category than the real measure, then our model under- predicts human rights protection; this country is in fact performing better than we would expect, based on our model. The process is called in- sample prediction, as we use the information contained in the statistical model to predict level of human rights respect of observations in our sample. Whereas in- sample predictions are a great way of comparing how well our different human rights measures are explained by the explanatory variables, we are also interested in assessing how well our models would perform in explaining levels of human rights protection and abuse that occurred outside of the period under investigation. We thus want to compare the out- of- sample predictions of our measures. Ideally, we would like to see how well our models fare in predicting levels of repression variables to follow, the values imputed are a sample, and he thus recommends multiple imputations to make sure that the results are not driven by the imputed values themselves.

in the future or how well they would explain human rights protection 50, 100 or 200 years ago. Unfortunately, such an endeavor is not possible due to the lack of data on these events. We therefore artificially create an out- of- sample dataset by dividing our observations into two samples, the first including observations running from 1982 to 1990, and the second from 1991 to 2004. We then use the first sample to estimate our model of human rights protection for each of the four standards- based measures. The parameters estimated in these models are then used to predict levels of human rights performance for the second sample, from 1991 to 2004. Finally, we compare these predictions with actual measures in this period and assess when and where our model over- and under- predicts human rights performance. Since the Cingranelli and Richards physical integrity scale has a wider range of values, we cannot directly compare the predictions to the other three measures. Figures 1 through 6, therefore, display the aggregated in- sample and out- of- sample predictions for the PTS and Torture human rights measures, plotting them against actual means scores. [Figures 1-6 about here] Figure 1 shows the three scales plotted over time for the whole world and includes the predicted mean, the in sample predicted means and the out of sample predicted means. Across the three scales, we can see that the PTS coded from Amnesty International reports has the tightest overall plot, suggesting that for this scale, our model has a good overall prediction and good in sample and out of sample predictions. The torture scale also has reasonably tight plots, while the State Department PTS has a greater variance greater dispersion over time among the three lines. While our model over- predicts performance across all the scales, it is greatest for the State Department PTS scale, and greater for the out of sample predictions, where the over prediction increases over time. Figures 3 through 6 comprise similar plots broken down by region, where the general findings for the global analysis are replicated at the regional level. The Amnesty International PTS and torture scales have much tighter plots over time and the other two scales have greater dispersion. In all cases, the out of sample predictions show that greatest difference with the predicted means from our modeling, suggesting that countries do better over time than expected. Indeed, across the plots, the out of sample predictions are predominantly lower than the actual values, which means that our model, trained on the data pre- dating the end of the Cold War over- predicts the

human rights performance of different regions after the 1990. Across regions, overall performance gets worse towards the end of the period in Asia, slightly better in Africa, slightly better in the Middle East, much better in Eastern Europe, and much better in Latin America. [Figures 7a and 7b about here] Figures 7a and 7b compare the mean human rights performance with the in- sample and out- of- sample predictions of the Inverted physical integrity index by Cingranelli and Richards (0 (high protection) 8 (total repression)). At the global level, the in- sample predictions under- predict the level of human rights protection prior to 1990, but from 1991 onwards switches to over- predicting the level of respect for human rights in the world. The out- of sample predictions almost continuously over- predict human rights performance, which means that our model would assume the world to be in a much better state of protecting human rights that it actually was and is. Comparing different regions, we see that both the in- and out- of- sample predictions for Asia are much more optimistic from the mid- 1990s than the actual measure informs us. The model does a slightly better job at predicting levels of human rights abuse in Latin America and Sub- Saharan Africa, despite the fact that it generally paints a more peaceful picture of the regions than the actual measure of physical integrity rights provides us with. Discussion and Implications The analysis here raises important questions that have implications for using standards- based scales of human rights. First, the differences between scales coded from Amnesty Reports versus US State Department suggest that the Amnesty coded scales are more consistent in terms of explanatory modeling, in- sample predictions and out of sample predictions. In this way, the Amnesty coded scales are more reflective of the underlying socio- economic and institutional conditions that have predominantly been discussed in the quantitative human rights literature, whereas the other measures display higher levels of systematic unexplained variance. Second, for the other scales, our modeling tends to over- predict performance in general terms and over time, where the difference between actual values and expected values becomes greater for in- sample and out- of- sample predictions. In general, all out- of- sample predictions, based on the Cold War era of the 1980 paint a much more optimistic picture of the world and different regions than what was actually observed in the 1990 and early 2000s. Third, these differences

suggest that analyses that rely solely on one of these scales may well misrepresent global and regional trends in human rights performance over time. Using all four measures and comparing model estimations across them is the superior strategy for human rights research seeking to understand general cross- national and time- series patterns.

Bibliography Amnesty International (2007) From burning buses to caveirões : the search for human security, London: Amnesty International, 2 May 2007; AMR 19/010/2007. Arat, Zehra (1991) Democracy and Human Rights in Developing Countries, Boulder: Lynn Rienner. Beck, N. and Katz, J. N. (1995) 'What to Do (And Not to Do) with Time- Series Cross- Section Data', American Political Science Review, 89 (3): 634-47. Brockett, Charles D. (1992) Measuring Political Violence and Land Inequality in Central America, American Political Science Review 86 (1): 169-176. Bueno de Mesquita, B., Downs, G. W., Smith, A. and Cherif, F. M. (2005) Thinking Inside the Box: A Closer Look at Democracy and Human Rights, International Studies Quarterly, 49: 439-457. Cingranelli, D. and Richards, D. (1999), Measuring the Level, Pattern and Sequence of Government Respect for Physical Integrity Rights, International Studies Quarterly, 43: 407-417. Cingranelli, D. L. and Richards, D. (2007) Measuring government effort to respect economic and social human rights: A peer benchmark in S. Hertel and L. Minkler (eds) Economic Rights: Conceptual, Measurement, and Policy Issues, Cambridge: Cambridge University Press. Coppedge, M. (2005) Explaining Democratic Deterioration in Venezuela Through Nested Inference, in Frances Hagopian and Scott Mainwaring, eds., The Third Wave of Democratization in Latin America, Cambridge: Cambridge University Press, 289-316. Davenport, C. (1995) Multi- dimensional threat perception and state repression, American Journal of Political Science, 39(3): 683-713. Davenport, C. (1996) Constitutional promises and repressive reality: A cross- national time- series investigation of why political and civil liberties are suppressed, The Journal of Politics, 58(3): 627-54.

Davenport, C. (1996) The Weight of the Past: Exploring Lagged Determinants of Political Repression, Political Research Quarterly, 49: 377-403. Davenport, C. and Armstrong, D. A. (2004) Democracy and the violation of human rights: A statistical analysis from 1976 to 1996, American Journal of Political Science, 48(3): 538-554. Davenport, C. (2007) State repression and political order, Annual Review of Political Science, 10: 1-23. Deininger, K. and Squire, L. (1996) A New Data Set Measuring Income Inequality, World Bank Economic Review, 10, 565-591. Duvall, R. and Shamir, M. (1980) Indicators from Errors: Cross- National Time- Serial Measures of the Repressive Disposition of Government, in Charles Lewis Taylor (ed) Indicator Systems for Political, Economic, and Social Analysis (Cambridge, MA: Oegeschlager, Gunn and Hain Publishers, Inc., 155-82. Foweraker, J. and Landman T. (1997) Citizenship Rights and Social Movements: A Comparative and Statistical Analysis, Oxford: Oxford University Press. Joe Foweraker and Todd Landman (2004) Economic Development and Democracy Revisited: Why Dependency Theory Is Not Yet Dead, Democratization, 11 (2004): 1-20. Galbraith, J. K. and Kum, H. (2004) Estimating the Inequality of Household Incomes: A Statistical Approach to the Creation of a Dense and Consistent Global Data Set, UTIP Working Paper N.22 (revised version), [online] http://utip.gov.utexas.edu/papers/utip_22rv5.pdf Gibney, M. and M. Dalton (1996) "The Political Terror Scale", in D. Cingranelli, Human Rights and Developing Countries (Greenwich, CT: JAI Press, 73-84. Gibney, M., Dalton, M.and Vockell, M. (1992) USA refugee policy: A human rights analysis update", Journal of Refugee Studies, 5(1): 37-46. Gibney, M. and Stohl, M. (1988) Human rights and US refugee policy in M. Gibney (ed.) Open Borders? Closed Societies? The Ethical and Political Issues, Westport, CT: Greenwood Press.

Hafner- Burton, E. M. and Tsutsui, K. (2005) Human rights practices in a globalizing world: The paradox of empty promises, American Journal of Sociology, 110(5): 1373-1411. Hafner- Burton, E. M. and Tsutsui, K. (2007) Justice Lost! The failure of international human rights law to matter where needed most, Journal of Peace Research, 44(4): 407-425. Hathaway, O. (2002) Do treaties make a difference? Human rights treaties and the problem of compliance, Yale Law Journal, 111: 1932-2042. Helliwell, J. F. (1994) Empirical linkages between democracy and economic growth, British Journal of Political Science, 24: 225-48. Henderson, C. (1991) Conditions affecting the use of political repression, Journal of Conflict Resolution, 35(1): 120-142. Henderson, C. (1993) Population pressures and political repression, Social Science Quarterly, 74: 322-33. Honaker, J., King, G., and Blackwell, M. (2012) AMELIA II: A Programme for Missing Data, Version 1.6 (February 23, 2012). Keith, L. C. (1999) The United Nations International Covenant on Civil and Political Rights: Does it make a difference in human rights behaviour? Journal of Peace Research, 36(1): 95-118. Knack, S. (2002) Governance and Growth: Measurement and Evidence, Forum Series on the Role of Institutions in Promoting Growth, Washington DC: IRIS Center and USAID. Todd Landman (2004) Measuring Human Rights: Principle, Practice, and Policy, Human Rights Quarterly, 26: 906-931 Landman, T. (2005a) Protecting Human Rights: A Comparative Study, Washington D.C.: Georgetown University Press. Landman, T. (2005b) Review article: The political science of human rights, British Journal of Political Science, 35(3): 549-572.

Todd Landman, (2009a) Measuring Human Rights in Michael Goodhart (ed) Human Rights: Politics and Practice, Oxford: Oxford University Press, pp. 47-60. Todd Landman (2009b) Political Science and Human Rights in Rhiannon Morgan and Bryan Turner (eds) Interpreting Human Rights: Social Science Perspectives, London: Routledge, pp. 23-43. Todd Landman and Edzia Carvalho (2009) Measuring Human Rights, London: Routledge. Landman, T. and Larizza, M. (2009) Inequality and human rights: Who controls what, when, and how, International Studies Quarterly, 53 (3): 715-736. Landman, Todd, David Kernohan and Anita Gohdes (forthcoming) Relativising Human Rights, Journal of Human Rights. Larizza, M. (2008) The State, Democracy and Human Rights: Latin America in Comparative Perspective, Unpublished PhD Dissertation, University of Essex, DXN 121856. Mitchell, N. and McCormick, J. M. (1988) Economic and political explanations of human rights violations, World Politics, 40: 476-498. Moore, W. (2006) Synthesis v. Purity and Large- N Studies: How Might We Assess the Gap between Promise and Performance? Human Rights and Human Welfare, 6: 89-97. Meyer, W. (1996) Human rights and MNCs: Theory versus quantitative analysis, Human Rights Quarterly, 18(2): 368-397. Meyer, W. (1998) Human Rights and International Political Economy in Third World Nations, London: Praeger. Meyer, W. (1999a) Confirming, infirming, and "falsifying" theories of human rights: Reflections on Smith, Bolyard, and Ippolito through the lens of Lakatos, Human Rights Quarterly, 21(1): 220-228. Meyer, W. (1999b) Human rights and international political economy in third world nations: Multinational corporations, foreign aid, and repression, Human Rights Quarterly, 21(3): 824-830.

Muller, E. N. and Seligson, M. A. (1987) Inequality and insurgency, American Political Science Review, 81(2): 425-451. Neumayer, E. (2005) Do international human rights treaties improve respect for human rights?, Journal of Conflict Resolution, 49(6): 925-953. Poe, S. and Sirirangsi, R. (1993) Human rights and U.S. economic aid to Africa, International Interactions, 18(4): 1-14. Poe, S. and Sirirangsi, R. (1994) "Human rights and U.S. economic aid during the Reagan years, Social Science Quarterly, 75: 494-509. Poe, S. and Tate, C. N. (1994) Repression of human rights to personal integrity in the 1980s: A global analysis, American Political Science Review, 88: 853-872. Poe, S., Tate, C. N. and Keith, L. C. (1999) Repression of the human right to personal integrity revisited: A global cross- national study covering the years 1976-1993, International Studies Quarterly, 43: 291-313. Poe, S. C.; Carey, S. C. & Vazquez, T. C. (2001) How are These Pictures Different? A Quantitative Comparison of the US State Department and Amnesty International Human Rights Reports, 1976 1995, Human Rights Quarterly, 23: 650 677. Poe, Steven C. (2004) The Decision to Repress: An Integrative Theoretical Approach to the Research on Human Rights and Repression, in Carey, Sabine C. and Steven C. Poe (eds), Understanding Human Rights Violations: New Systematic Studies, 16-38. Prosterman, Roy L. and Jeffrey M. Riedinger (1987) Land Reform and Democratic Development (Baltimore: John Hopkins University Press). Rabushka, A. and Shepsle, K. A. eds. (1972) Politics in Plural Societies: A Theory Of Democratic Instability, Columbus, Ohio: Merrill. Risse, T., Ropp, S. C. and Sikkink, K. (eds.) (1999) The Power of Human Rights: International Norms and Domestic Change, Cambridge: Cambridge University Press. Russett, B. M., Alker, H.R., Deutsch, K. W. and Lasswell, H.D. (1964) World Handbook of Political and Social Indicators, New Haven: Yale University Press.

Simmons, Beth (2009) Mobilizing for Human Rights: International Law in Domestic Politics, Cambridge: Cambridge University Press. Stimson, J. (1985) Regression in Space and Time: A Statistical Essay, American Political Science Review, 29: 914-47. Vanhanen, T. (1997) The Prospects of Democracy, London: Routledge. James R. Vreeland (2008) "The Effect of Political Regime on Civil War: Unpacking Anocracy", Journal of Conflict Resolution 52: 401-425. White; Halbert (1980), A Heteroskedasticity- Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity, Econometrica 48 (4): 817-838. Zanger, S. C. (2000a) Good governance and European aid: The impact of political conditionality, European Union Politics, 1(3): 293-317. Zanger, S. C. (2000b) A global analysis of the effect of regime change on life integrity violations, 1977-93, Journal of Peace Research, 37(2): 213-233.

Tables and Figures: Table 1: Correlations of Human Rights Measures PTS (AI) PTS (SD) Torture PTS (AI) 1.0000 PTS (SD) 0.7811* 1.0000 Torture 0.5217* 0.5876* 1.0000 CIRI Phys. Int. 0.7473* 0.7917* 0.5923* Note: * sig at.001. Figure 1: Global Human Rights Performance Global Human Rights Performance PTS (AI) Global Mean PTS (SD) Global Mean Torture Scale Global Mean

Figure 2: Human Rights Performance: Asia Human Rights Performance: Asia PTS (AI) 4.0 Asian Mean PTS (SD) 4.0 Asian Mean Torture Scale 4.0 Asian Mean

Figure 3: Human Rights Performance: Sub- Saharan Africa Human Rights Performance: Sub!Saharan Africa PTS (AI) 4.0 Sub!Saharan Africa Mean 1.5 1.0 PTS (SD) 4.0 Sub!Saharan Africa Mean 1.5 1.0 Torture Scale 4.0 Sub!Saharan Africa Mean 1.5 1.0

Figure 4: Human Rights Performance: Middle East Human Rights Performance: Middle East PTS (AI) 4.0 Middle East Mean 1.5 PTS (SD) 4.0 Middle East Mean 1.5 Torture Scale 4.0 Middle East Mean 1.5

Figure 5: Human Rights Performance: Eastern Europe Human Rights Performance: Eastern Europe PTS (AI) Eastern Europe Mean 1.5 1.0 PTS (SD) Eastern Europe Mean 1.5 1.0 Torture Scale Eastern Europe Mean 1.5 1.0

Figure 6: Human Rights Performance: Latin America Human Rights Performance: Latin America PTS (AI) 4.0 Latin America Mean PTS (SD) 4.0 Latin America Mean Torture Scale 4.0 Latin America Mean

Figure 7a: Global and Regional Comparison of CiRi Physical Integrity Index Inverted Physical Integrity Index (CiRi) Global 5 Mean 4 3 2 1 Asia 5 Mean 4 3 2 1 Eastern Europe 5 Mean 4 3 2 1

Figure 7b: Regional Comparison of CiRi Physical Integrity Index Inverted Physical Integrity Index (CiRi) Latin America 5 Latin America Mean 4 3 2 1 Sub!Saharan Africa 5 Mean 4 3 2 1 Middle East 5 Mean 4 3 2 1