The Unexpected Empirical Consensus Among Consensus Methods Michel Regenwetter, 1 Aeri Kim, 1 Arthur Kantor, 1 and Moon-Ho R. Ho 2

Similar documents
Computational Social Choice: Spring 2017

Chapter 2 Descriptions of the Voting Methods to Be Analyzed

Economics 470 Some Notes on Simple Alternatives to Majority Rule

Democratic Rules in Context

Assessing Alternative Voting Procedures

Trump, Condorcet and Borda: Voting paradoxes in the 2016 Republican presidential primaries

Computational Social Choice: Spring 2007

MATH4999 Capstone Projects in Mathematics and Economics Topic 3 Voting methods and social choice theory

HANDBOOK OF SOCIAL CHOICE AND VOTING Jac C. Heckelman and Nicholas R. Miller, editors.

Varieties of failure of monotonicity and participation under five voting methods

Mathematics and Social Choice Theory. Topic 4 Voting methods with more than 2 alternatives. 4.1 Social choice procedures

c 2014 by Anna V. Popova. All rights reserved.

Approaches to Voting Systems

answers to some of the sample exercises : Public Choice

Public Choice. Slide 1

HANDBOOK OF EXPERIMENTAL ECONOMICS RESULTS

Rationality of Voting and Voting Systems: Lecture II

Social Rankings in Human-Computer Committees

THE ALTERNATIVE VOTE AND COOMBS RULE VERSUS FIRST-PAST-THE-POST: A SOCIAL CHOICE ANALYSIS OF SIMULATED DATA BASED ON ENGLISH ELECTIONS,

In Elections, Irrelevant Alternatives Provide Relevant Data

An Empirical Study of Voting Rules and Manipulation with Large Datasets

Arrow s Impossibility Theorem on Social Choice Systems

Statistical Evaluation of Voting Rules

Voting Criteria April

Vote budgets and Dodgson s method of marks

Problems with Group Decision Making

The Manipulability of Voting Systems. Check off these skills when you feel that you have mastered them.

Mathematical Thinking. Chapter 9 Voting Systems

Empirical Evaluation of Voting Rules with Strictly Ordered Preference Data

Arrow s Impossibility Theorem

An Introduction to Voting Theory

Chapter 9: Social Choice: The Impossible Dream Lesson Plan

Introduction to the Theory of Voting

(67686) Mathematical Foundations of AI June 18, Lecture 6

Introduction to Theory of Voting. Chapter 2 of Computational Social Choice by William Zwicker

1.6 Arrow s Impossibility Theorem

Chapter 10. The Manipulability of Voting Systems. For All Practical Purposes: Effective Teaching. Chapter Briefing

Social welfare functions

Extended Abstract: The Swing Voter s Curse in Social Networks

Safe Votes, Sincere Votes, and Strategizing

Intro Prefs & Voting Electoral comp. Voter Turnout Agency GIP SIP Rent seeking Partisans. Political Economics. Dr. Marc Gronwald Dr.

Voting System: elections

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants

Approval Voting and Scoring Rules with Common Values

Fairness Criteria. Review: Election Methods

Majority cycles in national elections

Many Social Choice Rules

A New Method of the Single Transferable Vote and its Axiomatic Justification

Problems with Group Decision Making

VOTING TO ELECT A SINGLE CANDIDATE

Collective Decisions, Error and Trust in Wireless Networks

Chapter 4: Voting and Social Choice.

Voting Systems for Social Choice

Fair Division in Theory and Practice

Social Choice Theory. Denis Bouyssou CNRS LAMSADE

Sorting Out Mechanical and Psychological Effects in Candidate Elections: An Appraisal with Experimental Data

Comparison of Voting Systems

Mathematics and Democracy: Designing Better Voting and Fair-Division Procedures*

Chapter 1 Practice Test Questions

The Borda Majority Count

Election Theory. How voters and parties behave strategically in democratic systems. Mark Crowley

Election outcomes under different ways to announce preferences: an analysis of the 2015 parliament election in the Austrian federal state of Styria

Voting. Hannu Nurmi. Game Theory and Models of Voting. Public Choice Research Centre and Department of Political Science University of Turku

Social Choice: The Impossible Dream. Check off these skills when you feel that you have mastered them.

Four Condorcet-Hare Hybrid Methods for Single-Winner Elections

Voting: Issues, Problems, and Systems, Continued

Fairness Criteria. Majority Criterion: If a candidate receives a majority of the first place votes, that candidate should win the election.

Voting Paradoxes and Group Coherence

VOTING SYSTEMS AND ARROW S THEOREM

Algorithms, Games, and Networks February 7, Lecture 8

Instant Runoff Voting s Startling Rate of Failure. Joe Ornstein. Advisor: Robert Norman

The Impossibilities of Voting

MATH 1340 Mathematics & Politics

CSC304 Lecture 14. Begin Computational Social Choice: Voting 1: Introduction, Axioms, Rules. CSC304 - Nisarg Shah 1

Voting and preference aggregation

Voting and preference aggregation

On the Causes and Consequences of Ballot Order Effects

Elections with Only 2 Alternatives

REFLECTIONS ON ARROW S THEOREM AND VOTING RULES

A NOTE ON THE THEORY OF SOCIAL CHOICE

Random tie-breaking in STV

How should we count the votes?

Towards an Information-Neutral Voting Scheme That Does Not Leave Too Much To Chance

The Mathematics of Voting

Simple methods for single winner elections

Font Size: A A. Eric Maskin and Amartya Sen JANUARY 19, 2017 ISSUE. 1 of 7 2/21/ :01 AM

Lecture 12: Topics in Voting Theory

Voting: Issues, Problems, and Systems, Continued. Voting II 1/27

Mathematics of Voting Systems. Tanya Leise Mathematics & Statistics Amherst College

Borda s Paradox. Theodoros Levantakis

Constructing voting paradoxes with logic and symmetry

Cloning in Elections 1

Voting rules: (Dixit and Skeath, ch 14) Recall parkland provision decision:

Strategy and Effectiveness: An Analysis of Preferential Ballot Voting Methods

Is Majority Rule the Best Voting Method? Partha Dasgupta and Eric Maskin

The search for a perfect voting system. MATH 105: Contemporary Mathematics. University of Louisville. October 31, 2017

Decision making and problem solving Lecture 10. Group techniques Voting MAVT for group decisions

Josh Engwer (TTU) Voting Methods 15 July / 49

Social Choice Theory and Deliberative Democracy: A Response to Aldred

Voting and Complexity

Transcription:

PSYCHOLOGICAL SCIENCE Research Article The Unexpected Empirical Consensus Among Consensus Methods Michel Regenwetter, 1 Aeri Kim, 1 Arthur Kantor, 1 and Moon-Ho R. Ho 2 1 University of Illinois at Urbana-Champaign and 2 Nanyang Technological University, Singapore ABSTRACT In economics and political science, the theoretical literature on social choice routinely highlights worst-case scenarios and emphasizes the nonexistence of a universally best voting method. Behavioral social choice is grounded in psychology and tackles consensus methods descriptively and empirically. We analyzed four elections of the American Psychological Association using a state-ofthe-art multimodel, multimethod approach. These elections provide rare access to (likely sincere) preferences of large numbers of decision makers over five choice alternatives. We determined the outcomes according to three classical social choice procedures: Condorcet, Borda, and plurality. Although the literature routinely depicts these procedures as irreconcilable, we found strong statistical support for an unexpected degree of empirical consensus among them in these elections. Our empirical findings stand in contrast to two centuries of pessimistic thought experiments and computer simulations in social choice theory and demonstrate the need for more systematic descriptive and empirical research on social choice than exists to date. Social choice theory has long been dominated by normative, rational choice results from economics and political science. In contrast, behavioral social choice (Regenwetter, Grofman, Marley, & Tsetlin, 2006) is grounded in psychology. It puts social choice theory to the empirical test. As is the case for descriptive theories of individual choice (Tversky & Kahneman, 1974), descriptive theories of social choice may complement, and possibly contrast with, rational theory. Although behavioral Address correspondence to Michel Regenwetter, Department of Psychology, University of Illinois at Urbana-Champaign, 603 E. Daniel St., Champaign, IL 61820, e-mail: regenwet@uiuc.edu. decision research, especially the heuristics and biases literature, documents how individuals frequently fall short of rationality benchmarks, our findings from our research program, including those reported here, suggest that social choice by real people may bypass the pessimistic predictions of normative theory. We analyzed the ballots of the five-candidate American Psychological Association (APA) presidential elections from 1998 through 2001. In these elections, the number of ballots (N) ranged from 17,482 in 2001 to 20,239 in 2000 (see Table 1). Such values of N are two to three orders of magnitude larger than the sample size of most laboratory studies on consensus methods (for a review of group consensus research, see Hastie & Kameda, 2005). Our data are also exceptionally rich in that they provide (partial) preference rankings of five choice alternatives, not just single candidate choices. Another major asset of these data over laboratory data is that, as Chamberlin, Cohen, and Coombs (1984) explained, APA elections are excellent proxies for major political elections with real stakes. Although the presidencies of many academic societies are largely honorary, the APA president actively represents and promotes the diverse, and sometimes conflicting, interests of a constituency ranging from practitioners to basic research scientists. A centerpiece of this project resides in our advanced and comprehensive analysis techniques. Because it is difficult to gain access to empirical ballot data (other than plurality ballots), the classical literature is dominated by theoretical work that relies on thought experiments, computer simulations, and mathematical theorems whose underlying assumptions are heavily debated. We are among the first to systematically integrate mathematical modeling and statistical inference with the descriptive analysis of real social choice data. We formulated very general probabilistic models of the ballot-casting process, grounded in mathematical psychology, and we evaluated our statistical confidence in the inferred social choice outcomes. We also checked how our results depended on the model of decision Volume 18 Number 7 Copyright r 2007 Association for Psychological Science 629

Consensus Among Consensus Methods TABLE 1 Behavioral Social Choice Analysis of the American Psychological Association s Presidential Election Ballots, 1998 2001 Aggregation method Model Condorcet Borda Plurality AV 1998 election (N 5 18,723) Weak-order model 32145 32145 35124 3 Linear-order model 32415 32415 35124 3 Weak-order model 32145 (86%) 32145 (99.1%) 35124 (67%) 3 32415, 31245,... 31245 31524 Linear-order model 32415 (99.8%) 32415 (98%) 35124 (67%) 3... 32145 31524 1999 election (N 5 18,398) Weak-order model 43215 43125 43152 4 Linear-order model 43215 43215 43152 4 Weak-order model 43215 (52%) 43125 (94%) 43152 (80%) 4 (99.6%) 43125,... 43215, 34125 34152... Linear-order model 43215 43215 (99.6%) 43152 (80%) 4... 34152 2000 election (N 5 20,239) Weak-order model 52134 52134 53214 5 Linear-order model 52134 52134 53214 5 Weak-order model 52134 52134 (99.7%) 53214 (69%) 5... 52314 Linear-order model 52134 (99.3%) 52134 (99.8%) 53214 (67%) 5 51234... 52314 2001 election (N 5 17,482) Weak-order model 53124 53124 53124 5 Linear-order model 51324 51324 53124 5 Weak-order model 53124 (68%) 53124 53124 (99.7%) 5 51324... Linear-order model 51324 51324 (96%) 53124 (99.5%) 5 53124 53214 Note. For the Condorcet, Borda, and plurality procedures, the table presents the social welfare order; for the alternative vote procedure (AV), the table presents the winning candidate. The Condorcet method is used as a benchmark, and disagreements between another procedure s social welfare order and the Condorcet order are underlined. Results with bootstrapped confidence of at least 95% are in boldface. For each bootstrap analysis, the table first reports the outcome with the highest bootstrapped confidence, with its confidence level in parentheses, followed by the other outcomes in decreasing order of confidence. The bootstrapped confidence level is reported for the most likely outcome only, and confidence levels of 100% are omitted. Outcomes with confidence levels below 0.5% are not listed, but their presence is indicated by three dots. making underlying the analysis. This methodology was spearheaded by Regenwetter et al. (2006). Our linear-order model assumes, in accordance with most social choice theory, that individual preferences are linear orders (rankings without ties, i.e., without indifference) of the candidates. Our weak-order model assumes that individual preferences are weak orders (rankings with possible ties, i.e., with possible indifference) of the candidates. Whereas linear orders are the most common representation of preferences in social choice theory, weak orders are the second most common 630 Volume 18 Number 7

M. Regenwetter et al. representation. Most contemporary models of individual decision making also imply that each person s preferences have a linear or weak order. CONSENSUS METHODS The most famous historic debate over what constitutes a fair election method is the 18th-century argument between Condorcet (1785) and Borda (1770, cited in McLean & Urken, 1995). Because Nobel Laureate Arrow s (1951) impossibility theorem is usually interpreted to mean that there cannot exist a universally, unambiguously optimal consensus method, this debate is ongoing. Different consensus methods can yield different aggregate orders. Regardless of the method, the latter are called social welfare orders. The Condorcet criterion is also called majority rule (although there exist other common interpretations of what majority means). A candidate is a Condorcet winner if he or she wins all pair-wise contests against all other candidates. In a Condorcet (social welfare) order, the candidates are ordered by the outcomes of all pair-wise majority contests. The Condorcet procedure need not always generate a winner, because it is susceptible to the notorious Condorcet paradox of majority cycles, in which each candidate loses against some other candidate by a majority (see, e.g., Mueller, 2003; Riker, 1982; Saari, 2001). The most commonly used, simple, and intuitive voting method is plurality. In this procedure, each voter casts a single vote for a single candidate, and each candidate s plurality score is the total number of votes received. The resulting plurality (social welfare) order is the aggregate ordering of the candidates by decreasing plurality scores. The plurality winner is the first candidate in the plurality order, regardless of the actual proportion of votes received. Like plurality, the Borda method relies on numerical scores, but it uses each individual voter s entire preference order. Given weak- or linear-order ballots, the Borda score of a given candidate equals the sum over all ballots of the difference between the number of candidates that candidate is preferred to on any given ballot and the number of candidates who are preferred to that candidate on that ballot. The Borda (social welfare) order is the aggregate candidate order by Borda scores. Social welfare orders by the Condorcet and Borda methods are unambiguously computable from the ballots only if each voter provides a preference judgment for each possible pair of candidates (e.g., via a linear or weak order). For two-candidate contests, the Condorcet, plurality, and Borda methods are indistinguishable. However, the social choice literature contains a substantial body of mathematical and simulation results on how dramatically the three methods may or should differ in multicandidate contests (for overviews, see, e.g., Mueller, 2003; Riker, 1982; Saari, 2001). Social choice theorists generally agree that plurality has drawbacks, but debate the relative merits of the Condorcet and Borda methods. Generally, the theoretical literature has placed its main emphasis on examining the mathematical possibilities and impossibilities, as well as other mathematical properties, of consensus methods. Our data originate from APA elections conducted under the alternative vote (AV) election method (a.k.a. Hare system, instant runoff ). This single-seat election method differs substantially from the three classical procedures we just reviewed. In principle, each AV ballot records a complete ranking of all candidates. In APA elections, voters may partially rank any number of candidates, starting with their most-preferred candidate. To be elected, a candidate must tally in excess of 50% of first-rank votes. Failure to elect a candidate leads to an iterative instant runoff using the original ballots; in the runoff, the candidate with the lowest number of first-rank votes is eliminated, and all ballots on which this candidate is ranked first are transferred to their next-ranked candidate. The AV method is used by many universities and professional organizations, including APA. MODELS AND ANALYSIS METHODS We write C for the set of candidates. A partial ranking r is a oneto-one mapping from a subset C C into the set f1; 2;...; jcjg. We write r for the number of candidates that are partially ranked. As a consequence of our notation, r(c) is the rank of candidate c in partial ranking r, and r 1 (i) is the candidate ranked at position i in partial ranking r. The models we discuss in this section build on similar models for approval voting presented elsewhere (Regenwetter et al., 2006). When a decision maker completely ranks all choice alternatives, we have access to their full (ordinal) preference among those alternatives. However, suppose that a decision maker indicates only that his or her rank 1 choice is candidate K, rank 2 choice is candidate L, and rank 3 choice is candidate M, with C 5 {K, L, M, N, O}. Our linear-order model assumes that this decision maker has a latent complete linear order of the candidates, and that this order starts with KLM (i.e., the preference state is either KLMNO or KLMON ). The weak-order model assumes that this decision maker has a weak order in which the unranked candidates are tied at the bottom. More precisely, the weak-order model assumes that this decision maker prefers K to L, K to M, and L to M, but is indifferent between N and O, and prefers K, L, and M to both N and O. Linear-Order Model Let P denote the collection of all linear orders on C. Each p A P can be viewed as a complete ranking, that is, as a one-to-one mapping P: C! {1, 2,..., C }. Given a partial ranking r,we denote by P r the collection of complete rankings that coincide with r on the candidates that are partially ranked by r, that is, P r 5 fp A P p 1 (i) 5 r 1 (i), i 5 1,..., r g. According to the linear-order model, the decision process is governed by two Volume 18 Number 7 631

Consensus Among Consensus Methods jointly distributed random variables, S and L, with S (as in size ) taking its values in {1, 2,..., C } and L (as in linear order ) having outcomes in P. The probability p r that a randomly sampled ballot exhibits partial ranking r is p r ¼ {z} Prob: of casting partial ranking r PðS ¼jrjÞ fflfflfflfflfflffl{zfflfflfflfflfflffl} Prob: of partially ranking jrj many candidates PðL ¼ P r Þ fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl} Prob: of a latent linear-order preference that starts with r ð1þ This model has an extremely general equivalent formulation as a distribution-free random utility model including, as parametric special cases, econometric logit, probit, and multinomial logit or probit models. A downside is its reliance on a (however modest) independence assumption inherent in the product form of Equation 1. The weak-order model circumvents that assumption, but introduces different restrictions. Weak-Order Model In this model, each voter s preference state is a strict weak order. The decision maker is assumed (a) to strictly prefer any betterranked candidate to any worse-ranked candidate on his or her ballot, (b) to be indifferent between all candidates not ranked on the ballot, and (c) to prefer all ranked to all nonranked candidates. This model disallows some weak orders, because only candidates at the bottom can be tied. For a partial ranking r and for all c A C not partially ranked by r, define r(c) 5 C. Then r ¼fðx; yþjx; y 2 C, rðxþ < rðyþg denotes the strict weak order that corresponds to r by the three assumptions just listed. Let WO 0 denote the collection of all such weak orders. According to the weak-order model, the observed partial rankings originate from a random variable W, taking its values in WO 0. The probability of partial ranking r is p r ¼ {z} Prob: of casting partial ranking r PðW ¼ r Þ fflfflfflfflfflfflffl{zfflfflfflfflfflfflffl} Prob: of a latent weak-order r that corresponds to r ð2þ Model-Based Social Choice Functions and We used maximum likelihood estimation to compute best-fitting predicted frequencies under each model. These, in turn, were our input to compute Condorcet, Borda, and plurality social orders. This approach circumvents more ad hoc methods for dealing with missing data that may artificially inflate disagreements among consensus methods. To move beyond point estimates for social welfare orders, we generated statistical confidence levels through a nonparametric bootstrap (Efron & Tibshirani, 1993). The purpose of the bootstrap was to evaluate how the social choice outcomes would be affected by small perturbations of the ballot frequencies and thus to quantify the confidence we can have that the social welfare orders computed from a model s parameter point estimates would be unaffected by such perturbations. For each bootstrap, we sampled, with replacement, the same number of ballots as there were in the original election. For instance, for the 1998 bootstrap, we sampled 18,723 ballots with replacement and recomputed all social welfare orders. For each bootstrap, model, and election, we repeated this process 1,000 times. We report in Table 1 the proportion of times a given social order or AV outcome occurred in such a sample. We think of the bootstrapped confidence as a first approximation of the uncertainty that is inherent in ballot data because of various factors, such as voters uncertainty about their preference and election officials uncertainty about the intended vote for some ballots. RESULTS In Table 1, for each set of ballots and each analysis method, we report the outcomes under the Condorcet, Borda, plurality, and AV procedures. For instance, the point estimate results for the 1998 election show that under the weak-order model, the Condorcet procedure yielded candidate 3 as the winner, followed by candidates 2, 1, 4, and 5. The Borda order perfectly matched the Condorcet order. Plurality yielded the same (single-seat) winner as did the Condorcet and Borda methods, but the full plurality order differed from the Condorcet and Borda orders, in that candidate 5 was ranked second, followed by candidates 1, 2, and 4. The underlining of the last four positions in the plurality results highlights the disagreement with the Condorcet outcome, which we used as a benchmark. The standard algorithm for computing the AV outcome from partial rankings is mathematically identical to that under the weak-order model, and thus the AV outcome shown in the table for the weak-order model is the same as the actual AV outcome. In 1998, APA elected candidate 3. The small literature on empirical studies of social choice procedures rarely goes beyond the type of point-estimate results we report in Table 1. Empirical investigations of social choice data usually take the empirical data as a given and avoid statistical considerations in their analysis and interpretation (see, e.g., Chamberlin et al., 1984 1 ; Felsenthal & Machover, 1995, and earlier work by these authors). Because the original data cannot be accessed for most such studies, it is impossible to evaluate the extent to which the empirical findings may have been subject to statistical error, or how sensitive the findings would have been to small perturbations of the ballot counts. The existing literature also, by and large, does not treat the ballotcasting and -counting processes as probabilistic processes. Voters and election officials may experience uncertainty or make occasional errors when they cast or tally votes. Therefore, we consider it fundamentally important to evaluate the statistical confidence one can have in the social outcomes under any aggregation method (see Regenwetter et al., 2006, for a deeper discussion). 1 Unfortunately, the data of Chamberlin et al. are not accessible for reanalysis. 632 Volume 18 Number 7

M. Regenwetter et al. In Table 1, we report bootstrapped statistical confidences in the social orders under each model and for each aggregation procedure. For each analysis, we report the outcome with the highest bootstrapped confidence (with its confidence level in parentheses), followed by the other outcomes in decreasing order of confidence. Furthermore, we omit any outcomes that have confidence below 0.5%, but indicate their presence by three dots. For example, for 1998, the most likely Condorcet social order under the weak-order model was 32145, with 86% bootstrapped confidence; social orders 32415 and 31245 were alternative possibilities whose bootstrapped confidence levels were at least 0.5%, and there were additional, unlisted alternatives with confidence levels below 0.5%. Although the table leaves out some low-confidence outcomes, our analyses found the Condorcet winner, namely, candidate 3, and the Condorcet loser, namely, candidate 5, with 100% bootstrapped confidence. However, we are not very confident about the other three ranks in the Condorcet order. Under the linear-order model, the most likely Condorcet order was 32415, with bootstrapped confidence of 99.8%. Absence of the Condorcet Paradox We found no trace of the (in)famous Condorcet paradox. All Condorcet point estimates were complete linear orders, and all analyses avoided the paradox with 100% bootstrapped confidence. (Among the 8,000 bootstrapped samples across all data sets and models, not one yielded a cycle.) This finding diametrically contrasts with the notoriety of majority cycles in the theoretical literature, and it is consistent with previous empirical work suggesting that the Condorcet procedure may perform well in reality (see Regenwetter et al., 2006, for a discussion, e.g., of how the literature s focus on worst-case scenarios inflates the perceived likelihood of cycles). Violation of Domain-Restriction Conditions A standard explanation for the absence of a Condorcet paradox is that the electorate may be highly structured (e.g., by a political spectrum) and that preferences may be single peaked (Black, 1958) or otherwise value restricted (Sen, 1970). Any restriction on the domain (not just on the distribution) of preferences that is aimed at eliminating the Condorcet paradox would have to rule out some linear-order preference states because otherwise the collection of linear orders alone could generate a cycle. Thus, we can concentrate on the full rankings to rule out every conceivable type of domain restriction. In each of the four data sets, every possible complete ranking was used by at least one voter. It is not straightforward to assess a statistical confidence in the violation of all conceivable domain-restriction conditions. By using a reasonable proxy for confidence, we are able to conclude that the violation of all domain restrictions carries high confidence (we leave out the details for brevity). This means that the most prominent theoretical explanations for the absence of a Condorcet paradox fail to account for any of these data. Regenwetter et al. (2006) discussed alternative explanations for the absence of a Condorcet paradox in the context of national election surveys and approval voting, and these explanations likely apply here, too. Condorcet Versus Borda Despite the centuries-old debate about the relative merits of the Borda and Condorcet procedures, we found nearly perfect agreement between the two. In the analysis using only point estimates, there was only one case in which we failed to have perfect agreement. The discrepancy affected neither the winner nor the loser; rather, it consisted of one reversal of two neighboring candidates in the middle of the social order. The bootstrap results paint a more detailed picture. The only disagreement between the Condorcet and Borda procedures that involved a Borda order with a nonnegligible confidence value had a modal Condorcet order with only 52% bootstrapped confidence (1999, weak-order model). By and large, the most likely Condorcet and Borda orders could be determined with extremely high confidence, and they matched. We determined the Condorcet winner and Borda winner with at least 99% bootstrapped confidence, and they always matched. Similarly, we determined the Condorcet loser and Borda loser with 100% confidence, and they matched throughout. Given the historic debate about how very different these methods are, the empirical agreement in these data is surprising, and deserves further investigation (see, however, Saari, 1999, for theoretical arguments why, in the absence of cycles, the Condorcet and Borda procedures stand a good chance of agreeing). Plurality Versus Alternative Aggregation Rules The standard critique of the plurality method is that it discards precious information by using only each voter s single top preference. Our results are consistent with that critique in that we found we lacked statistical confidence in assessing the plurality social order. However, with the caveat that the plurality winner occasionally carried a bootstrapped confidence on the order of only 80%, we found that in all cases, the most likely plurality winner matched the Condorcet winner. This congruence among methods did not fully extend beyond the winner. Compared with the Borda method, the plurality method disagreed substantially more with the Condorcet method, but none of the discrepancies were as dramatic as in the ubiquitous thought experiments in the literature (Mueller, 2003; Riker, 1982; Saari, 2001). All in all, we found some weaknesses in the plurality aggregation procedure in that we found low confidence in correct election outcomes under this rule (primarily for multiseat elections) but we emphasize that we found a rather surprising degree of agreement between the plurality method and both the Condorcet and Borda methods. Thus, the problem with the Volume 18 Number 7 633

Consensus Among Consensus Methods plurality rule does not so much appear to be its disagreement with the benchmark; rather, the problem is the low statistical confidence one can have that it has yielded the correct outcome. For lack of a statistical perspective, the low statistical confidence in plurality outcomes has received little attention in the literature. AV Versus Alternative Aggregation Rules We determined the AV winner with at least 99% confidence in all four elections, and it matched the Condorcet and Borda winners, regardless of analysis method. We concluded with a variable degree of confidence that the AV winner matched the plurality winner. For the agreement between AV and plurality, the sometimes low confidence was entirely attributable to the lack of confidence in plurality. Model Dependence of the Results Any analysis of voting ballots is limited to sparse empirical information about an extremely complex decision process that often involves a large number of people. Fortunately, AV ballots provide much more preference information than do the ballots of almost any other voting method. In addition, because the complexity of the AV iterative tally procedure makes strategizing difficult, it is plausible that AV ballots sincerely reflect the electorate s true preferences. Nonetheless, any analysis of ballot data must rely on simplifying assumptions that are unlikely to give adequate credit to the complexity of the actual situation. As a consequence, empirical findings risk becoming artifacts of technical assumptions. Despite substantial differences in modeling assumptions, we found striking agreement among analysis methods, and the disagreements we did find did not affect our substantive conclusions. Our substantive findings were also replicated via three other models, as well as a parametric bootstrap, which we do not discuss here for the sake of brevity. The surprising empirical consensus among consensus methods that we found in these data also extends to several other prominent consensus methods that we likewise do not discuss here. This multimethod, multimodel reliability adds credibility to the validity of our conclusions. DISCUSSION Because of the difficulty in accessing large and informationrich, real-world election data, empirical social choice research on real elections has been limited. The contrast between our empirical findings and the standard theory demonstrates the urgent need for a systematic empirical counterpart to social choice theory. The empirical consensus even dwarfs the relatively optimistic assessment Hastie and Kameda (2005) derived from a computer-simulated signal detection environment. However, the observed low level of confidence in the plurality method contrasts with Hastie and Kameda s optimism. Highly homogeneous electorates are unlikely to yield cycles or substantial disagreements among social choice rules. Despite APA s similarities with a multiparty system, this electorate might be much more homogeneous than expected. Yet our analysis ruled out value restriction, including single-peakedness. The theoretical literature has investigated the role of voter homogeneity (for classical examples, see, e.g., Gehrlein & Fishburn, 1976; Kuga & Nagatani, 1974) in producing agreement among methods. A promising current avenue does not reduce voter homogeneity to a one-dimensional numerical index. Rather, Saari s (1999, 2000a, 2000b, 2001) decomposition of voter profile space carves the space of ballot profiles into regions that classify all possible agreements and disagreements among certain social choice rules. To our knowledge, such a decomposition is not yet available for the five-candidate case or for weak orders. Future work in psychology may consider such decompositions and investigate testable properties using laboratory, survey, or ballot data. Future work will also reveal how well our findings generalize. Regenwetter et al. (2006) have already thoroughly discussed the empirical rarity of cycles. Note also that several studies have found high agreement among consensus methods (Regenwetter & Grofman, 1998; Regenwetter, Ho, & Tsetlin, in press; Regenwetter & Rykhlevskaia, in press; Regenwetter & Tsetlin, 2004); these studies include an analysis of survey data from the American National Election Studies (Sapiro, Rosenstone, & Miller, 1998). Future work would benefit from more detailed and sophisticated large-scale data. For instance, one standard argument in favor of the AV method is that its complicated tally discourages strategic voting. We lacked the necessary data to evaluate this question empirically in the present study, as we were forced to evaluate all voting procedures from the same ballots. Assuming sincere votes, we had access to the sincere outcomes under many methods. More elaborate data may enable researchers to evaluate how an election campaign and the resulting ballot counts would have been affected if a different voting method had been used (see, e.g., the 2002 French approval-voting study by Laslier, 2003, run concurrently with the national election). We conclude with an interesting policy suggestion made by one of the referees for this article: In an election, compare multiple social choice rules for their outcomes on the same ballots. High consistency suggests an election winner supported by multiple rational choice criteria. Inconsistency suggests the need for a runoff. Acknowledgments This research was supported by the University of Illinois Research Board. We thank the American Psychological Association for access to their ballots. We thank S. Brams, D. Budescu, J. Dana, J.-C. Falmagne, D. Felsenthal, B. Grofman, Y.-F. Hsu, P. Laughlin, R.D. Luce, M. Machover, A.A.J. Marley, A. Rapoport, D. Saari, I. Tsetlin, A. Urken, two referees, and the action editor for their comments. 634 Volume 18 Number 7

M. Regenwetter et al. REFERENCES Arrow, K.J. (1951). Social choice and individual values. New York: Wiley. Black, D. (1958). The theory of committees and elections. Cambridge, England: Cambridge University Press. Chamberlin, J.R., Cohen, J.L., & Coombs, C.H. (1984). Social choice observed: Five presidential elections of the American Psychological Association. The Journal of Politics, 46, 479 502. Condorcet, M. (1785). Essai sur l application de l analyse à la probabilité des décisions rendues à la pluralité des voix [Essay on the application of the probabilistic analysis of majority vote decisions]. Paris: Imprimerie Royale. Efron, B., & Tibshirani, R.J. (1993). An introduction to the bootstrap. New York: Chapman & Hall. Felsenthal, D.S., & Machover, M. (1995). Who ought to be elected and who is actually elected an empirical investigation of 92 elections under 3 procedures. Electoral Studies, 14, 143 169. Gehrlein, W.V., & Fishburn, P.C. (1976). Condorcet s paradox and anonymous preference profiles. Public Choice, 26, 1 18. Hastie, R., & Kameda, T. (2005). The robust beauty of majority rules in group decisions. Psychological Review, 112, 494 508. Kuga, K., & Nagatani, H. (1974). Voter antagonism and the paradox of voting. Econometrica, 42, 1045 1067. Laslier, J.-F. (2003). Analysing a preference and approval profile. Social Choice and Welfare, 20, 229 242. McLean, I., & Urken, A. (Eds.). (1995). Classics of social choice. Ann Arbor: University of Michigan Press. Mueller, D.C. (2003). Public choice III. Cambridge, England: Cambridge University Press. Regenwetter, M., & Grofman, B. (1998). Approval voting, Borda winners and Condorcet winners: Evidence from seven elections. Management Science, 44, 520 533. Regenwetter, M., Grofman, B., Marley, A., & Tsetlin, I. (2006). Behavioral social choice. Cambridge, England: Cambridge University Press. Regenwetter, M., Ho, M.-H., & Tsetlin, I. (in press). Sophisticated approval voting, ignorance priors and plurality heuristics: A behavioral social choice analysis in a Thurstonian framework. Psychological Review. Regenwetter, M., & Rykhlevskaia, E. (in press). A general concept of scoring rules: General definitions, statistical inference, and empirical illustrations. Social Choice and Welfare. Regenwetter, M., & Tsetlin, I. (2004). Approval voting and positional voting methods: Inference, relationship, examples. Social Choice and Welfare, 22, 539 566. Riker, W.H. (1982). Liberalism against populism. San Francisco: W.H. Freeman. Saari, D.G. (1999). Explaining all three-alternative voting outcomes. Journal of Economic Theory, 87, 313 355. Saari, D.G. (2000a). Mathematical structure of voting paradoxes 1: Pairwise vote. Economic Theory, 15, 1 53. Saari, D.G. (2000b). Mathematical structure of voting paradoxes 2: Positional voting. Economic Theory, 15, 55 101. Saari, D.G. (2001). Decisions and elections: Explaining the unexpected. Cambridge, England: Cambridge University Press. Sapiro, V., Rosenstone, S., & Miller, W. (1998). American national election studies, 1948 1997. Ann Arbor, MI: Inter-University Consortium for Political and Social Research. Sen, A.K. (1970). Collective choice and social welfare. San Francisco: Holden-Day. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124 1131. (RECEIVED 5/3/06; REVISION ACCEPTED 10/20/06; FINAL MATERIALS RECEIVED 11/20/06) Volume 18 Number 7 635