Learning in the Judicial Hierarchy

Similar documents
Supporting Information Political Quid Pro Quo Agreements: An Experimental Study

EFFICIENCY OF COMPARATIVE NEGLIGENCE : A GAME THEORETIC ANALYSIS

Authority versus Persuasion

Legal Change: Integrating Selective Litigation, Judicial Preferences, and Precedent

Compulsory versus Voluntary Voting Mechanisms: An Experimental Study

Supporting Information for Signaling and Counter-Signaling in the Judicial Hierarchy: An Empirical Analysis of En Banc Review

Illegal Migration and Policy Enforcement

Does the Median Justice Control. the Content of Supreme Court Opinions? Cliff Carrubba. Barry Friedman. Andrew Martin.

Over the last 50 years, political scientists and

THREATS TO SUE AND COST DIVISIBILITY UNDER ASYMMETRIC INFORMATION. Alon Klement. Discussion Paper No /2000

The Odd Party Out Theory of Certiorari

The Information Dynamics of Vertical Stare Decisis. Thomas G. Hansford Associate Professor of Political Science UC Merced

Expert Mining and Required Disclosure: Appendices

Disasters and Incumbent Electoral Fortunes: No Implications for Democratic Competence

ON IGNORANT VOTERS AND BUSY POLITICIANS

Evolution of Conflict in the Courts of Appeals

Reviewing Procedure vs. Judging Substance: The Effect of Judicial Review on Agency Policymaking*

Why Do Courts Delay?

Law enforcement and false arrests with endogenously (in)competent officers

The Information Dynamics of Vertical Stare Decisis. Thomas G. Hansford. Associate Professor of Political Science. UC Merced.

UNIVERSITY OF CALIFORNIA, SAN DIEGO DEPARTMENT OF ECONOMICS

Classical papers: Osborbe and Slivinski (1996) and Besley and Coate (1997)

The Interplay of Ideological Diversity, Dissents, and Discretionary Review in the Judicial Hierarchy: Evidence from Death Penalty Cases

Handcuffs for the Grabbing Hand? Media Capture and Government Accountability by Timothy Besley and Andrea Prat (2006)

Defensive Weapons and Defensive Alliances

Testing Political Economy Models of Reform in the Laboratory

The Politics of Judicial Procedures: The Role of Public Oral Hearings in the German Constitutional Court

Evolution of Conflict in the Federal Circuit Courts

Political Economy: The Role of a Profit- Maxamizing Government

Biased Information, Supreme Court Precedent, and Decision-Making on the U.S. Courts of Appeals. Georg Vanberg

Published in Canadian Journal of Economics 27 (1995), Copyright c 1995 by Canadian Economics Association

Voters Interests in Campaign Finance Regulation: Formal Models

1 Electoral Competition under Certainty

A Study of Approval voting on Large Poisson Games

Supplementary Materials for Strategic Abstention in Proportional Representation Systems (Evidence from Multiple Countries)

International Cooperation, Parties and. Ideology - Very preliminary and incomplete

Enriqueta Aragones Harvard University and Universitat Pompeu Fabra Andrew Postlewaite University of Pennsylvania. March 9, 2000

Corruption in Committees: An Experimental Study of Information Aggregation through Voting 1

Compulsory versus Voluntary Voting An Experimental Study

Social Rankings in Human-Computer Committees

ONLINE APPENDIX: Why Do Voters Dismantle Checks and Balances? Extensions and Robustness

Nuclear Proliferation, Inspections, and Ambiguity

The Role of the Trade Policy Committee in EU Trade Policy: A Political-Economic Analysis

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

The Constraining, Liberating, and Informational Effects of. Non-Binding Law

'Wave riding' or 'Owning the issue': How do candidates determine campaign agendas?

Voluntary Voting: Costs and Benefits

Parliamentarism or Presidentialism? 1

Technical Appendix for Selecting Among Acquitted Defendants Andrew F. Daughety and Jennifer F. Reinganum April 2015

Are Supreme Court Nominations a Move-the-Median Game?

Decisions by the U.S. Supreme Court have great

1 Strategic Form Games

The Principle of Convergence in Wartime Negotiations. Branislav L. Slantchev Department of Political Science University of California, San Diego

Policy Reputation and Political Accountability

The Constraining, Liberating, and Informational Effects of. Non-Binding Law. Accepted at Journal of Law, Economics, and.

Sampling Equilibrium, with an Application to Strategic Voting Martin J. Osborne 1 and Ariel Rubinstein 2 September 12th, 2002.

IS STARE DECISIS A CONSTRAINT OR A CLOAK?

Compulsory versus Voluntary Voting An Experimental Study

Princeton University/New York University Department of Politics Graduate Program Spring 2016

How do domestic political institutions affect the outcomes of international trade negotiations?

The Constraining, Liberating, and Informational Effects of Nonbinding Law

Organized Interests, Legislators, and Bureaucratic Structure

Maria Katharine Carisetti. Master of Arts. Political Science. Jason P. Kelly, Chair. Karen M. Hult. Luke P. Plotica. May 3, Blacksburg, Virginia

The Effectiveness of Receipt-Based Attacks on ThreeBallot

The Effects of the Right to Silence on the Innocent s Decision to Remain Silent

Experimental Computational Philosophy: shedding new lights on (old) philosophical debates

The Impact of Supreme Court Precedent in a Judicial Hierarchy

Game theory and applications: Lecture 12

Preferential votes and minority representation in open list proportional representation systems

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Economics Department Discussion Papers Series ISSN

Immigration and Conflict in Democracies

Buying Supermajorities

Increasing Leverage: Judicial Review as a Democracy-Enhancing Institution

Party Platforms with Endogenous Party Membership

Decision Making Procedures for Committees of Careerist Experts. The call for "more transparency" is voiced nowadays by politicians and pundits

Choosing Among Signalling Equilibria in Lobbying Games

HOTELLING-DOWNS MODEL OF ELECTORAL COMPETITION AND THE OPTION TO QUIT

Are Supreme Court Nominations a Move-the-Median Game?

Voter Participation with Collusive Parties. David K. Levine and Andrea Mattozzi

Wisdom of the Crowd? Information Aggregation and Electoral Incentives

An Epistemic Free-Riding Problem? Christian List and Philip Pettit 1

Supplementary/Online Appendix for The Swing Justice

We conduct a theoretical and empirical re-evaluation of move-the-median (MTM) models of

Jury Voting without Objective Probability

Testing Leniency Programs Experimentally

Information Aggregation in Voting with Endogenous Timing

policy-making. footnote We adopt a simple parametric specification which allows us to go between the two polar cases studied in this literature.

Learning and Belief Based Trade 1

Accountability, Ideology, and Judicial Review

Should We Tax or Cap Political Contributions? A Lobbying Model With Policy Favors and Access

Candidate Citizen Models

Statistical Evidence and the Problem of Robust Litigation

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants

Modeling Collegial Courts (3): Adjudication Equilibria

Comments on Prat and Strömberg, and Robinson and Torvik 1

Who Emerges from Smoke-Filled Rooms? Political Parties and Candidate Selection

Counterterrorism Policy-Making, Partisanship, and the Electoral Consequences of Terrorism

Goods, Games, and Institutions : A Reply

PS 124A Midterm, Fall 2013

Transcription:

Learning in the Judicial Hierarchy Deborah Beim Department of Politics Princeton University dbeim@princeton.edu September 27, 2012 Abstract In this paper, I develop and empirically test a theory of judicial learning. In the model, the Supreme Court uses the Courts of Appeals as laboratories of law, learning from their decisions to determine how best to advance doctrine. The model shows how the Supreme Court leverages multiple Courts of Appeals decisions to identify which cases will be most informative to review, and what decision to make upon review. Because an unbiased judge only makes an extremist decision when there is an imbalance in the parties arguments, the Supreme Court is able to draw inferences from cases it chooses not to review. Applying two of the model s new empirical predictions to data, I show the Supreme Court is more likely to review moderate than extremist decisions, and also show that when the Supreme Court is resolving conflicts between lower courts it is more likely to strike down doctrines it reviews. Together, the results depict the judicial hierarchy as an institution focused on doctrinal development, rather than doctrinal discipline. For generously sharing data, I thank Cliff Carrubba, Tom Clark, John Kastellec, David Klein, Bill Landes, Stefanie Lindquist, Richard Posner, and John Summers (of Hangley Aronchick Segal Pudlin and Schillar). Thanks also to Chuck Cameron, Brandice Canes-Wrone, Justin Fox, Alex Hirsch, John Kastellec, Lewis Kornhauser, Adam Meirowitz, Jim Rogers, and seminar participants at Princeton and Emory Universities for their helpful comments.

1 Introduction The opportunity to learn from subordinates successes and failures is one of the fundamental strengths of hierarchical organizations. American states are referred to as laboratories of democracy for just this reason: It is one of the happy incidents of the federal system that a single courageous State may, if its citizens choose, serve as a laboratory; and try novel social and economic experiments without risk to the rest of the country (New State Ice Co. v. Liebmann, 285 U.S. 262 (1932), Justice Brandeis, dissenting). The federal government can watch states, observe their choices, and adopt the best of their social and economic experiments. The same is true in the federal courts, where new law is developed in the lower courts as the Supreme Court watches. This hierarchy of experimentation can help the judges at the top develop informed opinions and make good decisions. In short, hierarchy can help superiors learn. But that learning is not always straightforward. Aggregating the results of many agents experiments, and understanding the causes of their successes and failures, requires careful supervision and strategic review. In this paper, I explore how a supervisor can best learn from a group of agents in the context of the federal judicial hierarchy. I show how the Supreme Court uses the Courts of Appeals as laboratories of law, observing their decisions and reviewing cases to learn about doctrine. I begin by presenting a formal model in which a high court learns about doctrine by aggregating the decisions of multiple lower courts. Although the high court can review only one case, it can see the results of many cases. Allowing the high court to learn from a group of lower courts yields a nuanced relationship between rules and dispositions that is substantively resonant, and leads directly to the conclusion that the high court s review decisions hinge on estimates of which case will be most informative to review. The insight that the high court interacts with a group of lower courts in a learning envi- 1

ronment has several empirical implications. The model makes a series of novel predictions, including which case will be most informative and what the probability of reversal will be conditional on review. I take two of these new empirical predictions to data. First, the theory suggests moderate decisions where each party prevails on some counts provide much more information than decisions where one party prevails on all counts. Using an existing dataset of about 6,000 Courts of Appeals decisions, I show the Supreme Court is indeed more likely to review decisions where each party prevails on some counts than decisions where only one party wins. Second, when the Supreme Court is resolving a conflict between lower courts, the model offers a prediction for which side of the conflict the Supreme Court will review and which it will endorse: usually, it will endorse the side it did not choose to review, especially if the lower courts are ideologically distant from each other. I use two existing, independently collected datasets to show that empirical evidence is consistent with this prediction. On the whole, the results suggest that the Court is not primarily concerned with ensuring lower courts follow established doctrine, but with the establishment and extension of new doctrine. 2 Learning, supervision, and decision making In the attitudinalist line of research on the Supreme Court as a national policy-maker, justices are assumed to know what outcomes they seek and to know how to achieve their goals (Segal and Spaeth 2002). This literature has taken a broad view of the Supreme Court s task, with the perspective that the Court s opinion resolves a question for all to hear. Typically, studies of the politics of judicial decision making focus on understanding how justices struggle to achieve their known goals how they overcome differences of opinion with their colleagues, how they ensure lower court judges abide by their precedents, how they respond to the institutional constraints of the judiciary in general. One strand of this literature considers how justices learn to make law how they discover 2

how to accomplish their goals. But these consider mostly solitary learning, that is, learning outside a hierarchy. These models focus on whether and how law converges to optimality as one Supreme Court judge hears cases alone; the models use this focus to describe the dynamics of legal learning and law creation (Cooter, Kornhauser and Lane 1979, Baker and Mezzetti 2011, Niblett 2010). As such, they tend to ignore the lower courts who must apply the law and to abstract away from the lower levels of the hierarchy that generate the cases Supreme Court justices might hear. There is a growing literature that uses this technology within the context of a hierarchy, considering how repeated experimentation in lower courts aids law-creation (Clark and Kastellec 2012). However, even these models are not structured to examine which of multiple cases the Supreme Court will review. Some models, like Bueno de Mesquita and Stephenson (2002), Lax (2012) and Staton and Vanberg (2008) (see also Jacobi and Tiller (2007)), consider how the anticipation of lower courts application affects rule-making. Clark and Carrubba (2012) and Carrubba and Clark (2012) consider how lower courts rule development influences the Supreme Court, but in their models the Supreme Court adopts lower courts rules because it is cheaper to do so, not because it is informative to do so. Even in research that explores how lower courts contribute to law creation, the question of which case will be most informative to review remains unanswered. There is, however, a large literature on strategic review of lower court decisions. In that line of work the hierarchy has been largely understood as a disciplinary organization, in which the Supreme Court aims to ensure lower courts are following its preferences. The advantage of learning from subordinates is generally ignored to focus on the difficulty of monitoring them. A classic example of this is Cameron, Segal and Songer (2000). Using a traditional principal-agent framework, these models consider how the Supreme Court ensures lower courts comply with its preferences. The sources and communication of judicial doctrine are largely left out. Much of this literature considers the role of information transmission 3

in the hierarchy, but the Court learns the particular facts about individual cases rather than learning how to make doctrine. With few exceptions, this is because these models are dyadic the Supreme Court supervises only one lower court. A small set of models consider the supervision of multiple lower courts. In these tournament models of the judiciary lower courts compete to be least non-compliant (Cameron 1993, McNollgast 1995, Lax 2003). Still, these do not consider how the plurality of lower courts decisions can be useful in concert. While judicial literature has largely considered rule-making absent the informational value of lower courts decisions, the model here is nested in a broader tradition of learning from agents. The most relevant models from this literature are Dewatripont and Tirole (1999) and Calvert (1985). The model in this paper builds directly on the technology of Dewatripont and Tirole (1999). In that paper, an unbiased judge tries to dispose of a case whose correct resolution depends on the balance of facts; some facts may point toward the defendant s innocence and others toward his guilt. Dewatripont and Tirole derive the conditions under which it is more efficient for the judge to attend to two biased advocates, one for guilt and one for innocence, than to attend to one unbiased information collector. They demonstrate that creating an adversarial procedure is almost always more efficient. Calvert (1985) presents a model reminiscent of the one presented here (albeit with a different technology). In Calvert (1985) the principal has two potential sources of advice and can choose to learn from only one, however, the principal does not observe anything before choosing which advisor to consult. In this paper, the principal does observe some information before making this choice. A number of papers in the signaling literature analyze how agents messages can interact with and sometimes counteract one another (Epstein 1998, Minozzi 2011, Battaglini 2002). That literature, however, tends not to allow or require the principal to further investigate either of the agents messages. Studying the judiciary makes the possibility of review explicit, but also requires attention to learning from self-interested agents, since lower courts are not advising 4

the Supreme Court on law-making. Certiorari and learning in the federal courts While this is the first paper to develop a formal model of learning from lower courts, the theory builds on a rich body of empirical research on the Supreme Court s relationship with lower courts. The Supreme Court often chooses to grant certiorari that is, to review the decision of an ideological ally. Lindquist, Haire and Songer (2007) and Walson (2011) both show that while the Supreme Court reviews distant lower courts more often than allied lower courts, it still reviews its allies at a significant rate. Justices also prefer to review the decisions of high-quality judges. Perry (1991) writes of his interviews with clerks, Contrary to what I had expected, justices would rather take a case where the opinion below is from a well-respected judge, because they will start from a more informed point... so that, if they take the case, there is less chance for surprises. There is also more direct empirical evidence that justices learn from lower courts. When lower courts encounter new areas of the law, the Supreme Court adopts their rules after allowing them to percolate sufficiently (Klein 2002). More narrowly, lower courts citation practices are informative to the Supreme Court about how doctrines have been interpreted (Hansford, Spriggs and Stenger 2010) and the language from lower courts opinions finds its way to the opinions of the Supreme Court (Corley, Collins and Calvin 2011). Importantly, this is true about a group of decisions most Supreme Court opinions cite at least one Courts of Appeals opinion other than that of the case they are reviewing (George and Berger 2005). That is to say, in most cases the Supreme Court is aware of, and informed by, multiple lower courts decisions. Importantly, these decisions are often in conflict with one another. Empirical patterns suggest the Supreme Court uses conflict to its advantage for example, when lower courts are in disagreement, the Supreme Court follows the side that more circuits agree with (Klein and Hume 2003, Lindquist and Klein 2006). A particularly robust pattern in cert decisions 5

remains theoretically under-explored: the Supreme Court is most likely to grant cert in order to resolve a conflict between two lower courts. Rule 10 of the Rules of the Supreme Court of the United States mentions conflict in the lower courts as a reason to consider granting cert, and indeed, conflict is an excellent predictor of review (see e.g. Caldeira and Wright (1988) and Estreicher and Sexton (1984)). Black and Owens (2009) show that conflict is a better predictor of review than is lower court ideology. The relationship between conflict and learning has long been recognized, but there has been much less work offering explanations for the relationship. The model presented here aims to explain that. 3 The model Before describing the formal structure of the game, I briefly discuss the intuition behind it. In the model, the Supreme Court supervises two lower courts. Each of the three courts wishes to choose the best doctrine to fit a new legal question. For example, warrantless searches have been conducted in motorhomes and the courts must decide whether to apply doctrine for searching houses or cars (see California v. Carney and Friedman (2006)). Or, drivers bring torts claims against car manufacturers and the courts must decide upon an appropriate standard of care for automobile safety. The lower court judges hear lawyers arguments for both sides of the dispute, then decide their cases. The Supreme Court sees the decisions the lower court judges make, but does not hear the arguments that led to those decisions. Still, although the lawyers arguments are masked, the Supreme Court can draw simple inferences about them from the judges choices. This is the crux of the model s intuition: even if the Supreme Court can review only one case, it can observe the results of many cases. Therefore, learning can take place before cert is granted much more learning than is usually assumed or considered. This learning allows the Supreme Court to make informed choices about which case to review. In some instances, 6

it is obvious what arguments must have been presented an unbiased judge only makes an extremist decision if one party s evidence was much stronger than the other s. Other decisions are ambiguous moderate decisions can arise either because strong arguments were presented for both liberal and conservative positions or because both sides arguments were weak. Before announcing the final doctrine, the Supreme Court can choose to review one of the lower courts decisions, at some cost. Reviewing the ambiguous case will always be more informative for choosing optimal doctrine; therefore, the ambiguous decision is more likely to be reviewed. After review, some information allows the Supreme Court to make dispositive rulings while other information is only suggestive. As a result, the Supreme Court may either reverse or affirm after review. The sections that follow present equilibria describing what choices lower court judges will make, which cases the Supreme Court will review, and what the Supreme Court will do upon review. 3.1 Play of game There are three players in the game: two lower court judges, LC I and LC II, and one Supreme Court justice. The lower court judges are referred to as he; the Supreme Court justice is referred to as she. The goal is to choose one of three doctrines A, M, or B to apply. The area of law is relatively new, so the judges do not know which doctrine they prefer. Which doctrine is best is summarized by a random variable, θ. I assume that there are two unknown state variables, θ A and θ B, and they determine which doctrine is best. Payoffs to the courts depend on the conjunction of both variables and the choice of doctrine. A sufficient summary of the state is θ = θ A + θ B. It is common knowledge that: 0 with prob. 1 α 0 with prob. 1 α θ A = θ B = 1 with prob. α 1 with prob. α 7

Thus 1 with prob. α(1 α) θ = 0 with prob. 1 2α + 2α 2 1 with prob. α(1 α) For every state of the world there is an associated doctrine: A if θ = 1, M if θ = 0, and B if θ = 1. A, M, and B represent existing doctrines or approaches; the Court can be thought to be extending these by deciding which is most applicable for a new fact pattern. An example of this is sex discrimination law, in which judges struggled with the choice between rational basis review and strict scrutiny and ultimately created the doctrine of intermediate scrutiny. Of course, most cases at the Courts of Appeals are simple applications of existing law; this model focuses on the subset of difficult, law-creating cases, either gap filling or cases of first impression in which multiple doctrines could plausibly be applied. The game proceeds in five stages, two in the lower courts and three in the Supreme Court. First, lawyers present evidence to the lower courts about the value of θ. Second, each lower court judge resolves his dispute based on the evidence he sees. Third, the Supreme Court sees the lower court judges decisions, but does not see the evidence that led to those decisions. She uses this information to update her beliefs about θ. Fourth, the Supreme Court decides whether, and which, case to review. If the Supreme Court reviews, she learns the signals that lower court saw. She then makes her decision whether to affirm or reverse the lower court whose decision she reviewed and what doctrine to choose. I discuss each of these steps in detail below, and the game is summarized in Figure 1. 3.2 Decision making in the lower courts Simultaneously, the lower courts each hear a case. Both cases depend on the value of θ, which is common across both courts. 1 To decide the case correctly, a judge must learn 1 In this sense, arguments are interpreted in the same sense as Che and Kartik (2009): The signal could take the form of scientific evidence obtainable by conducting an experiment, witnesses or documents 8

Don t Review; Review, Uphold; or Review, Overturn Supreme Court Lower Court I Lower Court II Advocate IA Advocate IB Advocate IIA Advocate IIB Figure 1: Play of game the value of θ, which implies learning about θ A and θ B. Two lawyers one in each lower court search for evidence about θ A. 2 Their searches are independent. The same is true for θ B : two lawyers independently search for evidence, one in each lower court. The lawyers then privately present the results of their searches to their lower court judge. m A denotes the signals of the lawyers for θ A ; m B denotes the signals of the lawyers for θ B. Each signal takes on one of two values: for i {A, B} a lawyer either finds and presents hard evidence locatable by investigation, a mathematical proof, or a convincing insight that can reveal something about the state. Legally, they are appropriately interpreted as legislative facts (which are often solved by expertise and may pertain to many cases, as opposed to adjudicative facts, which pertain to a particular party; see Davis (1942)). 2 I discuss the game as if lawyers are presenting evidence to the court, but abstract away from strategic advocacy by the lawyers I assume that incentives are such that a lawyer presents any evidence he finds and assume lawyers cannot fabricate evidence, so lawyers messages are always truthful. The incentives that maintain this condition are the focus of Dewatripont and Tirole (1999). From their results it is possible to deduce that promising the lawyers sufficiently high wages can always satisfy this condition, so long as the lawyers care only about winning their own case. 9

m i = 1 to the judge, or does not find any conclusive evidence and so presents m i = 0. If θ i = 0, both lawyers are unable to find any hard evidence and send signals m i = 0. If θ i = 1, each lawyer finds hard evidence of this with probability q. When he finds evidence that θ i = 1, a lawyer sends signal m i = 1. Even if θ i = 1, however, a lawyer may fail to find evidence of this fact. That is, the lawyer may not find evidence that exists, even when he is searching for it. This happens with probability 1 q. In this instance, the lawyer sends signal m i = 0 even though θ i = 1. Therefore, when a lawyer for θ A presents no hard evidence, this suggests θ A = 0, as it is also possible that θ A = 1 but the lawyer did not find the evidence. In contrast, a signal of m A = 1 proves θ A = 1. In other words, presenting evidence perfectly reveals the state of the world, but failing to present evidence is merely suggestive. Notice also that if θ i = 0 both lawyers will send m i = 0, but if θ i = 1 the lawyers may send different signals if one s search is successful and the other s is not. However, each lower court judge observes only his own lawyers signals he cannot learn what the other lower court did or what messages the other lower court received. Thus, a lower court judge observes one of four possible message pairs (0, 0), (0, 1), ( 1, 0), or ( 1, 1). After observing one of these pairs, each lower court judge makes an inference about the value of θ, which incorporates the primitive probability that θ i = 1, α; and the conditional probability that a lawyer s search is successful, q. After establishing a posterior belief about the value of θ, each lower court judge chooses a doctrine, A, M, or B, to correspond to his belief. Denote this ruling D: D I for LC I and D II for LC II. 3.3 Learning and decision making at the Supreme Court Both cases are then automatically appealed to the Supreme Court. The Supreme Court sees both lower courts rulings these are thought to be presented in the briefs petitioning for review. However, she does not directly observe the evidence the judges saw, as these are thought to be contained in lawyers briefs on the merits, which are only submitted if 10

the Supreme Court chooses to review the case. After seeing the lower courts rulings, the Supreme Court updates her beliefs about θ and decides whether to review either of the lower courts decisions. The Supreme Court can review either one of the lower courts decisions, or neither, but not both. 3 If she chooses not to review a case, the lower courts decisions stand and the game ends. If the Supreme Court does choose to review a case she pays c and learns the signals that judge saw. She uses these signals to update her beliefs about θ. 4 Based on her estimates of θ, she then chooses a disposition and a rule. The disposition, to reverse or affirm, pertains only to the case she is reviewing. The rule, A, M, or B, is a universally binding precedent that can effectively reverse or affirm the decision she did not review. That is, issuing a universally binding doctrine changes the outcome of all cases, even those the Supreme Court did not review. Like the lower court judges, the Supreme Court justice chooses doctrine to match her beliefs about θ. Her decision to reverse or affirm the lower court s ruling follows immediately from this doctrinal choice she affirms their decision if she agrees it is the appropriate doctrinal response based on her estimate of θ. Of course, her estimate of θ may be different from the lower court s estimate: although she cannot see the arguments presented in the other lower court, the Supreme Court s beliefs are also based on the additional information provided by the other lower court s decision. 3.4 Preferences and beliefs Before the game begins, each judge believes pr(θ A = 1) = pr(θ B = 1) = α, and believes that if θ i = 1 a search is successful with probability q, that is, pr(m A = 1 θ A = 1) = pr(m B = 1 θ B = 1) = q. 3 In practice, the Supreme Court may consolidate cases and hear them together. I ignore this option to maintain a focus on the Supreme Court s choices when it does not have the resources to read every lower court s case on a particular question. 4 Note that the signals are preserved perfectly between the Courts of Appeals stage and the Supreme Court stage. There is no additional information collection between the stages. 11

Types of Judicial Preferences Unbiased Biased -1 0 1-1 0 1 A 0 -L -1 0 -L -1 M -L 0 -L -L 0 -L B -1 -L 0-1 -1 0 Table 1: Judges preferences over doctrine, A, M, or B, conditional on the state of the world θ, 1, 0, or 1. All judges get the maximum utility, 0, from choosing the right doctrine A when θ = 1, M when θ = 0, and B when θ = 1. Mistakes cost 1 or L, where 0 < L < 1. The left panel shows the preferences of unbiased judges who lose more utility from large mistakes than small ones, but have symmetric preferences otherwise. The right panel shows judges who are biased against B, so that wrongly choosing B is more costly than wrongly choosing A. After seeing signals from his advocates, a lower court judge is able to update his beliefs about θ. Lower court judges update their beliefs based only on their own advocates signals. Thus, after hearing arguments, LC I s beliefs about θ are a function of (α, q, m AI, m BI ) and LC II s beliefs about θ are a function of (α, q, m AII, m BII ). The Supreme Court is able to update her beliefs about θ based on both lower courts decisions. After seeing the lower courts decisions, the Supreme Court s beliefs about θ are a function of (α, q, D I, D II ). If the Supreme Court chooses to review one of the lower courts, she learns the signals that lower court received. This allows her to update her beliefs again. Then, her beliefs are a function of (α, q, m AI, m BI, D II ) (if she reviews LC I ) or (α, q, D I, m AII, m BII ) (if she reviews LC II ). All judges agree on the best doctrine when they know the value of θ with certainty A is best when θ = 1, M is best when θ = 0, and B is best when θ = 1. But judges may differ in how costly certain types of mistakes are, so when there is uncertainty about the value of θ they may disagree about what doctrine to choose. Consider a suit brought by an injured car-owner against the manufacturer, where the judge must decide if the manufacturer s safety efforts met a standard of care. If the manufacturer is indeed liable for some injury he should have prevented, all judges agree he should be penalized. But if there is uncertainty 12

about whether or not he is liable, judges might disagree some might not want to burden the manufacturer with too many requirements, others might find the injured party s claims more important. This is formalized by letting some judges suffer more from choosing A than B when the correct decision is M. Furthermore, under certain conditions, a judge s fear under uncertainty can be so extreme that one lawyer could never provide enough evidence to convince him to choose a particular doctrine. For example, a judge biased in favor of drivers might only be willing to choose a low standard of care if all evidence suggests manufacturers are never liable, so that one lawyer could never present enough evidence in one case to convince him of such. Because all judges agree what they should do if the facts are clear, a judge who chooses the doctrine that corresponds to the state of the world always gets utility 0. If the doctrine he chooses is wrong, he incurs some cost; these costs vary across judges and doctrines. The panels of Table 1 show different arrangements of these costs. Consider the lefthand panel. In that panel, a judge loses 1 if he chooses A when θ = 1, or B when θ = 1. This is a bad mistake, where there is a large mismatch between doctrine and the state of the world. If he makes a smaller mistake choosing A when θ = 0 or M when θ = 1 the judge loses L, where 0 < L < 1. Likewise for B, smaller mistakes cost only L while large mistakes cost 1. Thus if a judge chooses doctrine M, for example, his expected utility is L pr(θ = 1) L pr(θ = 1). In the righthand panel, the judge is wary of choosing doctrine B. This is formalized by making a small mistake as costly as a large one, so that choosing B when θ = 0 costs 1. But choosing A when θ = 0 still only costs this judge L. This imbalance captures judicial bias the judge is more willing to choose doctrine A, even if it is the wrong doctrine, and less willing to choose doctrine B, even if it is the right doctrine. Note that a lower court judge values the disposition he chooses while a Supreme Court justice values the disposition she chooses. In other words, lower court judges care only about 13

resolving the dispute correctly based on the evidence they see and the arguments they hear, without concern for future doctrine or response from the Supreme Court. 5 As a baseline, I begin by considering lower courts whose preferences are identical to one another and to the Supreme Court. Presented with the same information, every judge in this version of the game would make the same decision. The equilibrium from the game with these homogeneous lower courts is presented in Section 4.1. I then consider a scenario where the Supreme Court supervises one ideological ally and one judge who is biased. Section 4.2 presents the equilibrium under these conditions. All proofs are gathered in the appendix. 4 Optimally learning from agents decisions 4.1 Supervising two perfectly faithful agents I begin with the choices of the lower court judges, who attempt to resolve cases based on the evidence the lawyers bring to bear. A lower court judge wants to choose the doctrine that best corresponds to the state of the world. He learns the probability of each state, θ { 1, 0, 1}, from the advocates signals. Before he sees the results of an advocate s search, the judge s prior belief that θ A = 1 is α. He also believes the probability that θ B = 1 is α. Suppose an advocate s search is unsuccessful, so the judge receives a message m i = 0. This new information makes the judge believe it is more likely that θ i = 0. Define the judge s posterior belief pr(θ A = 1 m A = 0) ˆα = α αq 1 αq (and likewise pr(θ B = 1 m B = 0) ˆα). This posterior belief encapsulates the chances that θ i = 1 and the lawyer was simply unsuccessful in proving this. I place restrictions on ˆα so that after observing m i = 0 the lower court judge is more inclined to believe that θ i = 0 than θ i = 1. This condition is ˆα < 1/2. Because of this assumption, if a lower court judge receives a message pair of ( 1, 0), 5 I choose to assume lower court judges do not fear reversal for two reasons. First, even if Courts of Appeals judges fear reversal, this is not likely to come into play in cases of first impression. Second, the assumption highlights the challenge of learning from agents who are purely self-interested. 14

he believes it is more likely that θ = 1 than that θ = 0. (Since he knows θ = θ A + θ B, θ A = 1, and θ B {0, 1}, he knows θ 1.) In other words, after seeing ( 1, 0) he believes it is more likely that A is the best doctrine than that M is. But he is not sure it is possible that θ B = 1 and the lawyer failed to find evidence of this, in which case θ = 0 and M is the best doctrine. Since judges are unbiased in this version of the game, a lower court judge suffers equal utility loss either if he chooses A when he should have chosen M or if he chooses M when he should have chosen A (and likewise for B). As a result, after receiving his signals the lower court judge will simply choose the doctrine associated with whichever state he thinks is most likely. Messages ( 1, 0) and ˆα < 1/2 imply θ = 1 is most likely; therefore a judge who sees ( 1, 0) will choose A. The same is true for (0, 1) this will lead the lower court judge to choose B. If he receives a message pair of ( 1, 1), a lower court judge will choose doctrine M, for he knows θ = 0 with certainty. If the lower court judge receives a message pair of (0, 0), there is still a strictly positive probability on all values of θ. If the lower court judge chooses A, and θ = 1, he will experience a large loss in utility. Likewise, it will be very costly to choose B if it happens that θ = 1. Choosing M guarantees the lower court will not incur too large a loss, no matter what the value of θ is. After (0, 0), therefore, the lower court judge will choose doctrine M. Therefore, the lower court judge will choose A if and only if he received signals ( 1, 0). The same is true for B he will choose B if and only if he receives signals (0, 1). But he will choose M after either (0, 0) or ( 1, 1). This leads to the first stage of Supreme Court inference. If lower courts are behaving optimally, then after some histories the Supreme Court can perfectly infer what signals a judge must have received. Because preferences are common knowledge, she can make this inference without reviewing the case. This occurs after a lower court reaches a decision of A, in which case the Supreme Court can be sure that lower court must have received messages 15

of ( 1, 0), or after a lower court makes a decision of B, in which case the Supreme Court can be sure that lower court received messages of (0, 1). However, when the Supreme Court observes a decision of M, she does not know if it was reached because of messages (0, 0) or ( 1, 1). This uncertainty is the primary driving force behind the results that follow. The Supreme Court can only learn if M was reached because of messages (0, 0) or ( 1, 1) by paying c to review the case, at which time she may choose to uphold or reverse the decision of M. Because the Supreme Court sees the results of two cases, she can make an informed decision of whether it is worthwhile to review a decision of M. For example, suppose LC I has made a decision of A and LC II has made a decision of M. Under these conditions, the Supreme Court can perfectly infer the messages LC I saw they must have been ( 1, 0). Based only on the fact that LC I chose doctrine A, the Supreme Court knows for sure that θ A = 1 and is slightly more confident that θ B = 0. She uses this information to make an inference about the messages LC II saw, then decides whether to review LC II s decision. If she discovers LC II s decision was generated by messages of ( 1, 1), the Supreme Court learns with certainty that a decision of M is correct. If LC II s decision was generated by messages of (0, 0) the Supreme Court is much more inclined to believe the appropriate doctrine is A than M, but she still does not know this with certainty and so finds it less beneficial to reverse the decision. She will review LC II s decision if the probability of learning ( 1, 1) is sufficiently high or the costs from an incorrect decision are sufficiently high. If the Supreme Court observes one lower court choose doctrine A and the other choose B, the Supreme Court concludes with certainty that θ = 0 without reviewing either case but she still must pay c in order to communicate this to the lower courts. Since either case is an equally good vehicle, she simply chooses one to review; since she is indifferent between the two, she may choose which case to review randomly. She reverses the decision and announces 16

a doctrine of M. When the Supreme Court observes two decisions of A, two decisions of B, or two decisions of M, there is nothing to gain from review. Any of these decisions could be wrong, but no review will lead the Supreme Court to learn enough to change the lower courts doctrine. Whenever the lower courts in this version of the game are in agreement, the Supreme Court lets their decisions stand and does not review a case. Together, these beliefs and actions describe the equilibrium in the game with homogeneous agents, which is summarized in the following proposition. Proposition 1 (Equilibrium with homogeneous agents) In the game with homogeneous agents, the following occurs in the unique equilibrium. Each lower court chooses doctrine: A iff his advocates send messages ( 1, 0) B iff his advocates send messages (0, 1) M if his advocates send messages (0, 0) or ( 1, 1) After seeing the lower courts decisions, the Supreme Court does the following. If the lower courts chose (A, A), (B, B), or (M, M), the Supreme Court does not review a case. If the lower courts chose (A, M), the Supreme Court reviews LC II if ( ) α(1 q) 2 c < L 1. q 2 + 1 + α(1 q) 2 otherwise she does not review either case. If she discovers M was generated by messages ( 1, 1), she determines θ = 0, affirms the decision of M, and issues universal precedent M. If she discovers M was generated by (0, 0), she believes θ = 1 with p > 1/2, reverses the decision of M, and issues universal precedent A. Parallel equilibrium strategies hold for (M, A) (B, M), and (M, B). If the lower courts chose (A, B) or (B, A), the Supreme Court determines θ = 0. If c < 2L, she reviews a case (either case), reverses the decision, and issues universal precedent M. 17

Given that the Supreme Court can afford to review any one case, she will be most likely to review decisions of M. However, the threshold for affordability is higher for decisions of M, so that as costs rise the Supreme Court does not find it sufficiently beneficial to review decisions of M and reviews only decisions of A or B. The reasons for this are twofold. First, when costs are low, one lower court has made a decision of M, and the other has not, the Supreme Court is more likely to review the decision of M than the other. In fact, when costs are low a decision of M is reviewed unless both lower courts reach a decision of M. This is because reviewing a decision of M is always informative, and outcome-consequential unless both lower courts make that decision. In contrast, a decision of A is never informative to review. Thus, decisions of A are only reviewed if the other lower court makes a decision of B; even then, the Supreme Court may randomly choose to review the other case. Second, as the cost of review rises, decisions of M become less likely to be reviewed. This is because observing simultaneous decisions of A and B guarantees maximum utility upon review, while reviewing a decision of M is less beneficial in expectation. Therefore, the Supreme Court is more likely to review extreme conflict, where one lower court chooses A and the other chooses B, than moderate conflict, where one lower court chooses M and the other does not. This is because moderately high costs will make the Supreme Court willing to review extreme conflict but unwilling to review moderate conflict. Furthermore, in this game the Supreme Court will never review if there is no conflict. Black and Owens (2009) find that the Supreme Court is particularly likely to resolve conflicts that are neither shallow nor tolerable. The model provides a persuasive theoretical explanation for why conflicts between the lower courts are so good at predicting certiorari, including a justification for why the extremity of conflict matters. Even though all judges have the same preferences, the Supreme Court still reviews and 18

reverses lower courts decisions. In fact, if costs are moderate, all of the Supreme Court s decisions will be reversals, even though the lower courts are perfectly faithful agents. Recall that the purpose of reviewing a moderate decision is informational, not reversal. As a result, it is not always true that the Supreme Court is more likely to reverse decisions than affirm them. The model offers conditions under which the Supreme Court is more likely to affirm with c low enough that the Supreme Court is willing to review decisions of M, the likelihood of reversal exceeds the likelihood of affirmance when α and q are jointly relatively small. Furthermore, given that the Supreme Court reverses one and only one lower court, it is more likely to be the court she reviews when αq < 1/2. 4.2 Bias in the lower courts Next, to investigate how ideological heterogeneity affects learning, I consider a scenario in which the Supreme Court is supervising one lower court who is its unbiased ideological twin (LC I ) and one lower court who is biased against outcome B (LC II ). LC II prefers to choose B if θ = 1, but if θ = 0 he incurs a large loss from choosing B. LC II will therefore only choose B if he is very sure θ = 1. To consider the worst effects of this bias, I put an additional condition on ˆα so that after seeing (0, 1) LC II is not sure enough that θ = 1 to be willing to choose B. This condition is L < ˆα 1 ˆα.6 Because of this assumption, LC II would lose so much if it turned out that θ A = 1 and thus θ = 0 that he chooses M after seeing (0, 1). LC II s bias means the Supreme Court cannot learn as much about θ before deciding whether to review. Still, after some histories, the Supreme Court s equilibrium responses are the same as with unbiased agents. After (A, A), for example, the Supreme Court still does not intervene. Now, however, the Supreme Court is willing to review after all histories 6 It would also be sufficient to increase the loss from B to a loss still greater than 1. I choose to manipulate ˆα instead for algebraic simplicity. 19

other than (A, A) (so long as costs are low enough). Furthermore, nearly all of the additional review falls to the biased lower court. Proposition 2 (Equilibrium with Heterogeneous Agents) In the game with heterogeneous agents and a biased lower court, the following occurs in the unique equilibrium. Lower Court I chooses: A iff he receives messages ( 1, 0) B iff he receives messages (0, 1) M if he receives messages (0, 0) or ( 1, 1) Lower Court II { chooses: A iff he receives messages ( 1, 0) M if he receives messages (0, 1), (0, 0) or ( 1, 1) After seeing the lower courts decisions, the Supreme Court does the following. If the lower courts chose (A, A), the Supreme Court does not review a case. If the lower courts chose (A, M), then the Supreme Court reviews LC II if ( ) α(1 q) 2 α(1 q) 2 c < L 1 2, α(1 q) 2 + αq 2 α(1 q) 2 + 1 α otherwise she does not review a case. If she reviews and discovers M was generated by messages (0, 0), then the Supreme Court reverses LC II s decision and issues doctrine A. ( 1, 1), then the Supreme Court affirms LC II s decision and issues doctrine M. (0, 1), then the Supreme Court affirms LC II s decision and issues doctrine M. If the lower courts chose (M, A), then the Supreme Court reviews LC I if ( ) α(1 q) 2 c < L 1, q 2 + 1 + α(1 q) 2 otherwise she does not review a case. If she reviews and discovers M was generated by messages ( 1, 1), then the Supreme Court affirms LC I s decision and issues doctrine M. (0, 0), then the Supreme Court reverses LC I s decision and issues doctrine A. 20

If the lower courts chose (M, M), then the Supreme Court reviews LC II if α < 2q(1 q) and c is low enough, reviews LC I if α > 2q(1 q) and c is low enough, otherwise she does not review a case. If she reviews and discovers M was generated by messages (0, 0) or ( 1, 1), then she affirms LC II s decision and issues doctrine M. (0, 1), then she reverses LC II s decision and issues doctrine B. If the lower courts chose (B, A) then the Supreme Court takes either case if c < 2L, reverses the case, and issues doctrine M. If c 2L, she does not review. If the lower courts chose (B, M) then the Supreme Court reviews LC II if 1 α c < L (1 q) 2 α + 1 α, otherwise she does not review a case. If she reviews and discovers M was generated by messages (0, 0), then the Supreme Court reverses LC II and issues doctrine B. ( 1, 1), then the Supreme Court affirms LC II and issues doctrine M. (0, 1), then the Supreme Court reverses LC II and issues doctrine B. This equilibrium is different from that with two unbiased courts in two important ways. First, recall that under ideological homogeneity, the Supreme Court grants cert only if there is conflict in the lower courts. Even if the Supreme Court might learn the fact pattern that led to a doctrinal choice, review without conflict will never be outcome-consequential. This result does not hold when one of the lower courts is biased: the Supreme Court does review after multiple courts have reached the same conclusion. This is because lower courts might reach the same conclusion for different reasons, and that possibility merits the Supreme Court s attention. Second, as a result of this, the Supreme Court may affirm all cases on some matter. Resolving conflict requires the Supreme Court to reverse at least one of the lower courts decisions, so without biased lower courts an affirmance is always accompanied by an implicit reversal of the other court s decision. Now, however, the Supreme Court may affirm two decisions that are similar on their face. A closely related result is that the 21

probability of an affirmance is higher when one lower court is biased. In all histories in which the Supreme Court affirms with two unbiased lower courts, she also affirms when one lower court is biased. With a biased lower court, however, she also (1) affirms cases she would have denied and (2) affirms cases she would have reversed, had both lower courts been unbiased. That the Supreme Court affirms cases she would have denied highlights an important point: the probability of review is higher with a biased lower court than without. The Supreme Court grants cert to all cases she would review under homogeneity and also grants after additional histories. Furthermore, most of this additional review falls to the biased agent, who now chooses M and earns review when his signals are (0, 1). As a result, the Supreme Court is more likely to review the biased lower court than her ideological ally. This result is similar to previous models of the hierarchy (like Cameron, Segal and Songer (2000)), but the intuition is more subtle: occasionally the Supreme Court will prefer reviewing its ideological ally to reviewing the biased lower court. In fact, when both lower courts have made a decision of M, under some conditions when α > 2q(1 q) the Supreme Court will prefer to review her ally over the biased lower court. The second of these effects that, conditional on review, the likelihood of an affirmance is higher with a biased lower court arises because biased decisions are sometimes affirmed in equilibrium. This occurs when the unbiased lower court chooses doctrine A, and the biased lower court chooses M despite receiving signals which would lead an unbiased judge to choose doctrine B. Together, these messages guarantee that the appropriate doctrine is M. LC I s decision of A implies θ A = 1, and the signals from LC II s lawyers (0, 1) imply θ B = 1. Thus, after seeing (A, M) and reviewing LC II s decision, the Supreme Court knows θ = 1 + 1 = 0. Therefore, even though LC II behaved contrary to what the Supreme Court would have wanted, the Supreme Court upholds his decision. 22

4.3 Predicting certiorari The model generates a number of empirical predictions. I focus on two: which outcome the Supreme Court wishes to review (A, M, or B, representing what one could think of as liberal, moderate, and conservative decisions) and the relationship between review and reversal when lower courts are in conflict. Both arise because the Supreme Court prefers to review decisions that are most likely to be informative. First, recall the following from the game: when one lower court has made a decision of M and the other has not, if the Supreme Court reviews it will review the decision of M. Decisions of A will only be reviewed if the other lower court chooses B, and even then review is not certain. Furthermore, recall from Proposition 2 that LC II s decisions of M are most likely to be reviewed. This is because all other decisions are reviewed only conditional on the other courts ruling. The Supreme Court is more likely to review decisions of M than decisions of A or B, particularly for biased lower courts. Assuming the cost of review is not very high, it thus follows that: Moderation Hypothesis The Supreme Court is more likely to review moderate than extremist decisions. The second prediction relies on the Supreme Court s treatment of both the court she reviews and the court she does not review. Recall the Supreme Court s choice of doctrine can effectively reverse the decision of a court she does not review. This is because the Supreme Court s choice of doctrine is binding on all lower courts. If LC I makes a decision of A and LC II makes a decision of M, a ruling of M from the Supreme Court reverses LC I s decision, regardless of whether the Supreme Court chose to review LC I or LC II s decision before making her own. Thus, by considering reversals for cases that are not reviewed, the model is able to make predictions about which doctrines are more likely to be struck down those that are reviewed or those that are not. If the Supreme Court reverses one and only one 23

of the lower courts rulings, she must reverse either the case she reviews or the case she does not review. The model predicts the Supreme Court will tend to reverse the case she reviews. This tendency is more pronounced when the lower courts are ideologically heterogeneous than when the lower courts are homogeneous. Figure 2 illustrates the parameter space in which the Supreme Court is more likely to reverse the cases it reviews than the cases it does not review. This space is larger when lower courts are ideologically heterogeneous. 1.0 0.8 0.6 q 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Α Figure 2: Parameter space in which taken cases are more likely to be reversed than nottaken cases (α on the horizontal axis and q on the vertical axis). The lighter-shaded area is the space in which this is true only for ideologically heterogeneous lower courts. The darker-shaded area is the space in which this is true for homogeneous and heterogeneous lower courts. Reversal Hypothesis The probability of reversing cases that are reviewed is higher than the probability of reversing cases that are not reviewed, especially when lower courts are ideologically heterogeneous. While no existing work predicts the Court has these incentives, some empirical work suggests the results might hold. In particular, Wasby (2005), Lindquist and Klein (2006), and Summers and Newman (2011) argue the Supreme Court s reversal rate is lower than it seems. Each study shows that the Supreme Court often mentions other Courts of Appeals 24