Partisan Influence in Congress and Institutional Change

Partisan Influence in Congress and Institutional Change Scott de Marchi Duke University demarchi@duke.edu Michael Ensley Indiana University ensley@indiana.edu Michael Tofias UW-Milwaukee tofias@uwm.edu Draft: March 31, 2008 Abstract We suggest using cluster analysis as an alternative to NOMINATE (Poole and Rosenthal 2000) for summarizing legislative voting behavior. NOMINATE is difficult to implement and it assumes that votes are independently and identically distributed while many theories of congressional organization explicitly model the non-independence of vote choice. Cluster analysis is a parsimonious, well-understood statistical technique and it is available in many statistical packages. Cluster analysis produces roll call scores that are highly correlated to NOMINATE scores. This lightweight procedure encourages data analysis and a fresh look at the assumptions typically made when working with roll call data. We explore the consequences of several historical events as well as changes in legislative organization on congressional voting patterns overtime. We also present some Monte Carlo results that demonstrate the benefits of the cluster analysis methodology and some issues with NOMINATE. Authors listed in alphabetical order. Paper Prepared for the 2008 annual meeting of the Midwest Political Science Association, April 4-6 2008, Chicago, IL. Draft very preliminary, please do not quote without permission. Thank you to Keith Poole for making NOMINATE and related data available on his website. 0

Introduction Among those scholars in our discipline that probably cannot be heralded enough stands Keith Poole. His work with Howard Rosenthal on the various NOMIATE techniques for scaling congressional roll call votes is ample enough and important enough that we need not recount it in a conference paper. There is of course a cottage industry in roll call voting data reduction methods (including, but not limited to Heckman and Snyder 1997; Clinton et al. 2004). But like several recent authors (Roberts 2007, notable among them), we want to focus on NOMINATE and suggest that its over-whelming popularity and the assumptions it carries with it are getting in the way of congressional data analysis and hypothesis testing. For instance, Aldrich (1995) argues that political parties are negotiated solutions designed to combat the fundamental instability of collective decision confronting legislators. If the social choice impossibility results have any empirical force, this suggests that current attempts to uncover the effects of significant changes in congressional organization caused by partisan maneuvering may be compromised (Smith 2008). In particular, the use of the groundbreaking NOMINATE methodology developed by Poole and Rosenthal (2000) to estimate legislator ideal points might not be well suited for analyzing changes in congressional organization. NOMINATE is difficult to implement (the program is over 200 pages of Fortran code and requires considerable amounts of machine time to run) and it assumes that votes are independently and identically distributed while many theories of congressional organization explicitly model the non-independence of voting. The number of parameters (both inherent in the statistical model and additional parameters such as the threshold for lopsided votes) is considerable, as is the amount of data dropped from their analyses. Replicating and extending Poole and Rosenthal's groundbreaking work is harder than one might think given constraints of data and the time that would be required to parse their code. This overhead, both mechanical and theoretical makes it difficult to conduct certain types of data analysis. Therefore we propose using cluster analysis, a simpler yet more robust statistical procedure for summarizing voting behavior. 1

We take a new look at using cluster analysis to analyze roll call voting. We are not the first by any means to us cluster analysis to analyze roll call data. Cluster analysis is a parsimonious, well-understood statistical technique available in many statistical packages (including Stata). Perhaps the most recent analysis of roll calls using cluster analysis was conducted by Wilcox and Clausen (1991) who were simply interested in the overall question of dimensionality. We use cluster analysis as a tool to test for the partisan structure of voting behavior. We introduce a technique to get cluster analysis to produce summary scores of voting behavior, which the literature often refers to as the ideal points of legislators. Importantly, our cluster analyses produce roll call scores that are highly correlated to NOMINATE scores. We suspect that this lightweight procedure for data analysis makes it easier to explore the consequences of historical events and changes in legislative organization on congressional voting patterns and discuss several examples. We identify precise time periods in the US Congress where we expect to observe significant changes in the pattern of roll-call voting and examine whether our measure of legislators ideal points uncovers the expected relationship. We also present several Monte Carlo simulations that demonstrate the benefits of the cluster analysis methodology and some weakness associated with NOMINATE. A New Measure with an Old Technique The object of our cluster analysis is to classify legislators into voting blocs. We investigate classifications with 2 clusters based on the idea that the two-party system structures voting in any Congress. When 2 clusters fit the data well, parties and the party system are influencing voting behavior. We are agnostic on whether or not this structure is coming from internal institutions or electoral forces. We use Stata's k-means cluster analysis procedure, with k=2 clusters and a city-block distance metric. Results were largely invariant for related metrics (e.g., k-medians, Euclidean distance). K-means clustering uses a hill-climbing algorithm to minimize the within cluster distance of all legislators repeatedly classifying each into one of the two clusters (for more on cluster analysis and k-means clustering see Everitt et. al. 2001). Distance is defined as the difference between a legislator s vote choice and the mean vote 2

choice for all members of the legislator s group. Each roll call is an observation. We use Stata s default number of iterations per analysis (10,000) and start with a random assignment of legislators into clusters. Roll call classification for each legislator is then made based on the mean distance of cluster that that legislator has been assigned to. We impute missing data by assigning the roll call mean to each member who abstained or otherwise did not vote (these observations are not used for classification success). Unlike the more familiar NOMINATE procedures; we do not discard lopsided roll calls. We use the entire voting history of the House. We take the percent of roll calls correctly classified to be a measure of the influence that the partisan structure of Congress has in roll call behavior. If two clusters can correctly classify all votes then we would take behavior to be perfectly partisan. As the error rate in classification increases, we take that to be an indication that systematic or idiosyncratic factors are influencing vote choices outside of party politics. To achieve cardinal measures of ideology, each legislator s normalized distance from the opposing cluster minus 1/2 the total distance between the two clusters was used. This measure correctly distinguishes between two partisans that are equidistant from their own cluster but with varying distances from the opposing cluster. We term our cluster based voting scores Demento scores (de Marchi Ensley Tofias). We proceed to discussing voting behavior in the context of our scores. Historical Analysis To present face validity of our measure we present evidence of the method s ability to recover what Poole and Rosenthal (2007) call the arrival of the unidimensional Congress. 3

The familiar pattern of a decrease in the error rate of roll call classification is readily apparent. We take the trend to be indicative of the increases in the partisan structure of roll calls. A similar figure omitted describes roll call voting by congressional session. We can compare our classification success to that of DW-NOMINATE (as downloaded from voteview). Here we fit a locally weighted regression to show trends. 4

The Cluster analysis achieves an error rate over the entire history of the House of about 17.2%. The DW-NOMINATE procedure with 2 dimensions and fewer roll calls has an error rate of about 14.3%. In seven Congresses, the cluster technique out-performs DW- NOMINATE, the 60 th and 65 th Congresses as well as most of the recent Republican controlled congresses (the run from the 105 th through the 109 th Congresses). Researchers most interested in using summary scores from roll call behavior in statistical analyses might be most interested in the correlation between Demento scores and DW- NOMINATE. The overall correlation is quite high at.893. As Congress has settled into a more unidimensional structure, the scores from the two systems have become more similar. It is not surprising that the two scores are in more agreement as dimensionality collapses since the cluster analysis only has the ability to recover a single-dimension (recall we assume only two clusters). The suggestion is that for recent Congresses one could use Demento scores for many purposes. We can further explore the cluster analysis method and the Demento scores by looking at key dates and periods in history where we might expect a change in the partisan structure of roll call voting behavior. For instance, Aldrich (1995) identifies two key instances in the development of the party system: the June 20, 1790 dinner where Jefferson, Hamilton, and Madison agreed to trade votes in Congress and the birth of the modern 5

party system in 1828 reaching maturity by 1840. We also examine the revolt against Speaker Joseph Canon (3/17/10) and the failed Republican coup against Newt Gringrich (7/9/97). In addition, we examine the effect on roll call voting of truly exogenous shocks to the political system by examining behavior around Pearl Harbor, and the Black Monday stock market crash (10/19/87). Table 1: Key Dates and Error Rates in Roll Call Classification Success Key Date Error Rate Before Error Rate After Change June 20, 1790 25.6% 23.4% 2.1% March 17, 1910 10.2% 12.8% 2.59% July 9, 1997 10.4% 11% 0.6% December 7, 1941 16.3% 13.4% 2.9% October 19, 1987 14.0% 11.1% 2.9% To examine the effect of each key date, we conducted a cluster analysis on all of the votes taken before the date and we compare the error rate in roll call classification to a parallel analysis on all the votes taken during the remainder of that Congress. While most of the effects are small, they tend to be larger than the similar changes observed between sessions of a Congress. The average absolute change in sessions between Congresses is about 1.9%. This figure was higher and more variable in the past, and smaller and less variant in the modern era. We expected that the famous dinner and the deal struck would pave the way to increased proto-partisan structure in voting during the remainder of the first Congress. We observe a change in the right direction and a 1-tailed t-test is significant (p-value =.0475). This suggests that increased structure was introduced over the course of the rest of the first Congress. The internal party revolt against Speaker Canon significantly decreased the partisan structure as power devolved to more members of the majority party. The modern day failed coup against New Gingrich led to a much smaller increase the error rate that we would expect relative to the successful coup against Canon. We examine the effects on the partisan nature of roll call behavior of war and economic crisis with key dates of the Pearl Harbor attack and the Black Monday stock market 6

crash. Both events led to similar sized decreases in the error rate, suggesting an increase in the partisan structure of voting behavior. Interestingly, the effects seem larger for these two external events than for the institutionally internal break points. This might be because we are not directly controlling for changes in the agenda. Pearl Harbor presents us with an interesting opportunity to examine changes in voting behavior. We can compare directly the Demento scores of representatives before and after the attack. The scores before and after the attacks are correlated at.71 and the biggest change in behavior is coming from a group of mostly Southern Democrats who assume voting behavior more similar to Republicans for the remainder of the Congress. For the rest of the Democratic Party (as well as the entire GOP), the dispersion of voting behavior decreases as measured by the standard deviation of the scores. This hiving off of Southern Democrats might be a sign of agenda change within the context of a unity coalition. Voting occasionally with Republicans stands out a great deal more when most of the time (all) Democrats and Republicans are voting together on war issues. This analysis has been quite preliminary and invites a more extensive treatment and perhaps a more rigorous framework going forward. 7

Some Simulations We conducted several simulations to examine the performance of NOMINATE, as well as cluster analysis (here using W-NOMIANTE as downloaded from voteview.com). The first simulation was designed to examine how cluster analysis performed relative to NOMINATE when the assumptions match those assumed by NOMINATE. So this can be considered the ideal circumstances for the NOMINATE procedure. We created a data set with 435 hypothetical legislators and 500 roll-call votes. The legislator s ideal points were drawn from a standard (mean 0, standard deviation 1) bivariate Normal distribution. In one variation of the simulations, a legislator s ideal point on the first dimension is uncorrelated with her ideal point on the second dimension (rho=0). In the other variation of the simulations, the correlation between the legislators ideal points on the two dimensions is rho =0.3. The ratio of roll-call votes on the two dimensions was varied, where the total number of roll calls on the first dimension ranged from 450 to 250 of the 500 total roll calls. For each roll call, a status quo and a new policy position were generated by taking random draws from a standard (mean 0, standard deviation 1) univariate Normal distribution. It is important to recognize that each vote is over policies in a single dimension; policy proposals are germane to one dimension. For each legislator we determine utility for the status quo and policy position is with the squared Euclidean distance between the position and legislators ideal point on the relevant dimension. A legislator votes for the position that she is closer to with the caveat that there is a random component for each legislator on each roll call. The error component is a significant but small fraction of the total utility. Thus, those legislators that are near the point of indifference may switch their vote depending on the size on the size and direction of the random component. In Table 2, we report the results of these simulations. We compare the percent of roll calls predicted correctly and the estimated correlation between the simulated ideal points on the first dimension with the estimates provided by NOMINATE and the cluster 8

procedure. Note that the percent predicted correctly by NOMINATE is based on the twodimension model, whereas the scores from the cluster analysis are essentially assuming there is one dimension. The NOMINATE procedure is accurate as we would expect under these circumstance. Since NOMINATE essentially maximizes the likelihood function under the set of assumptions we have used to generate the data, we should expect the close fit (correlation between ideal points and estimates provided by NOMINATE). The estimates provided by the cluster analysis are encouraging. First, the percent predicted correctly mirrors the measure of fit provided by the NOMINATE procedure. Thus, if one were interested in creating a relative measure of the extent to which patterns of legislative voting are unidimensional, the cluster procedure performs in a manner remarkably similar to NOMINATE (under conditions that are ideal for NOMINATE). These results thus support the earlier findings using the House roll-call data. Further, in the one condition that NOMINATE appears to performs poorly (when rho = 0 and roll calls are evenly divided between the two dimensions), the cluster procedure provides estimates of the first-dimension scores that are consistent with the other simulation conditions. Thus, this simple Monte Carlo simulation suggests that cluster procedure we advocate in this paper is an accurate and efficient estimator for scaling roll-call data. Table 2: Simulation Results for W-NOMINATE and Cluster Analysis Votes on First Dimension/Votes on Second Dimension rho=0 450/50 400/100 350/150 300/200 250/250 % Predicted Correctly NOMINATE 90 87 85 83 81 Cluster 82 80 79 77 76 Correlation with First Dimension ideal points NOMINATE 0.99 0.99 0.99 0.98 0.98 Cluster 0.95 0.95 0.94 0.95 0.91 Votes on First Dimension/Votes on Second Dimension rho=0.3 450/50 400/100 350/150 300/200 250/250 % Predicted Correctly NOMINATE 90 88 86 84 81 Cluster 82 80 79 78 77 9

Correlation with First Dimension ideal points NOMINATE 0.99 0.99 0.99 0.98 0.76 Cluster 0.95 0.95 0.94 0.93 0.90 We also ran some simulations to get an idea about how many roll calls would be needed to successfully use cluster analysis. We used a simple model with 100 voters where voters have an ideal point in 1 dimension with ideal points drawn from a normal (0,1) distribution. Rolls calls are on votes between two alternatives each drawn from a normal (0,1) distribution. Voters vote for the alternative closer to them with a stochastic term that such that ~ 5% of the time voters make the wrong choice. We analyzed 10,000 simulations for legislatures with 5, 10 and 20 roll calls. Table 3 presents the results. Table 3: Simulation with Cluster Analysis by Number of Roll Calls Error Rate in Vote Classification Correlation to True Ideal Points Roll Calls Mean Standard Deviation Mean Standard Deviation 5.099796.03518.90272.03662 10.11078.0248.93007.02514 20.116298.01798.94347.01890 Our simulations suggest that we can recover meaningful results with as little as 5 votes, but prudence suggests that the bar be set at 10 votes. This bodes well for using cluster analysis to compute scores for issue areas where there are many fewer votes than in the overall roll call vote matrix. One potential critique of using cluster analysis to generate cardinal scores of voting behavior is the potential for the hill-climbing algorithm to converge on a local maxima or perhaps hide the existence of global maxima, which might suggest measurement error issues when using the scores in regression analysis. However, this problem is not actually unique to cluster analysis. For any technique, we should want to know how robust it is to small perturbations in parameters -- particularly, seemingly innocuous parameters that aren't chosen based on theory or the sample itself. 10

In W-NOMINATE, for example, there is a setting for which senator (or MC) you believe is "left" on each dimension. This choice should not produce different outcomes -- it is meant as a way to initialize each dimension from left to right: "...the record number of a Senator/Legislator that you believe is on the "Left" of the first dimension (this just sets "liberal" on the left and "conservative" on the right of the first dimension), and the record number of a Senator/Legislator that you want to be "Up" on the second dimension (this feature is handy if you want, for example, Southern Democrats to be "Up" or positive on the second dimension -- in other countries, this could be another salient division such as religion). In the 107 th 10 is Senator Boxer (D-CA) and 18 is Senator Graham (D-FL)" (Poole, W- NOMINATE help file). To see if this parameter matters, we changed their values. For the first dimension, Boxer was changed to Feinstein (which surely cannot matter) and Graham was changed to Specter (to be a bit more difficult). Given that many researchers use the ideological locations generated by NOMINATE as independent variables, one would like it to be the case that these values are at least ordinal and ideally interval. By changing a parameter that should have no impact on NOMINATE's output, do we violate either of these two conditions? Unfortunately, both conditions are violated in varying degrees. The order of the ideology scores on the first dimension is largely the same, with only four pairs changing places: {Boxer and Reed}, {Rockefeller and Jeffords}, {Landrieu and Carper}, and {Ensign and Hagel} switch places. For the second dimension, things are dicier -- half of the senate changes location and the movement is greater than pairs swapping places. To examine whether or not the data are interval, a simple measure of error is used: the absolute value of the distance between Poole and Rosenthal's parameter values (Boxer and Graham) is compared to our choice (Feinstein and Specter). Given the scaling of the NOMINATE scores, the maximum error is 1 (which is bad) and the minimum is 0 (which is good). On the first dimension, the mean error is.001, which is excellent. On the second dimension, the mean error is.06, which is not as good -- as we saw in the test of ordinality,.06 is a large enough magnitude to substantially change the order of senator NOMINATE scores. 11

A final test is whether or not any senators change signs -- by and large, Democrats are on one side of 0 and Republicans are on the other (n.b., in the 107th Senate, there are two exceptions: Miller is located at -.04 and Jeffords is at.033). As one would hope, no one changes sign given this parameter change. At this point we are left with a somewhat curious puzzle. A substantively innocuous parameter change violated both the ordinal and interval property of the NOMINATE data, albeit by (mostly) small amounts. The question remains, however, as to why this occurred at all, and whether or not a larger parameter change would produce more of a train wreck. In general, if seemingly innocent parameter changes disrupt the results, it indicates that the technique is overfitting the sample to some degree -- the question is by how much? An easy test to examine this question directly is to divide the sample in half and compare NOMINATE results. For the 107th Senate, there are 633 votes (n.b., Poole and Rosenthal drop votes that produce large supermajorities, so the actual number is less than this). What would happen if one ran NOMINATE on the first 300 votes and compared the results to the next 300 votes? Using our three tests from the preceding section, one can easily verify that the train has gone off the tracks and exploded in an alarming fashion. Ordinality is grossly violated; on the first dimension 89 senators change their place, and when one visually inspects the data, it's difficult to even find where senators are in the two side-by-side columns. Mean absolute error reflects this. On the first dimension, it is.25 (!) and on the second.53 (!!!). By any measure, these are departures that are as large as the ocean and just as deep. Finally, seven senators change signs on the first dimension, and 56 do so on the second. Overall, one is left with the sense that NOMINATE recovered two entirely different senates, which is not a good thing. Repeating this Monte Carlo with uniform draws to create the two subsamples (rather than dividing them by time) produces the same chaos. 12

Discussion Often in both the food service industry and engineering we are told that we may have our products cheap, fast, and good, but that we must pick only two. We suggest that cluster analysis provides us with the option of having all three desirable characteristics. This cluster analysis method has several interesting features. First, by classifying roll calls for every legislator based on cluster assignment we do not estimate cut-points for each roll and yet the success of the procedure has been shown to be quite similar to DW- NOMINATE which is using 2 dimensions and estimating cut-points. Secondly, this method is more useful than just assigning vote choice based-on party identification because we are letting the observed behavior assign the group choice, not the claim of party label. We should do more to investigate this relationship in future work. Finally, we are able to recover cardinal scores of voting behavior, which are quite similar to the socalled ideal points, estimated by NOMINATE. To the extent that many scholars are only interested in having variables capturing partisan voting behavior for some other analysis, we think that the Demento scores provide a nice alternative. However, we must exercise caution when analyzing the dimensionality of congressional voting behavior and when estimating legislator ideal points. Koford (1989) has argued that Nominate overstates the unidimensionality in the structure of roll calls. If we are serious about discovering the number of issue dimensions underlying elite behavior in Congress, as well as the ideal points that correspond to those dimensions, we need to consider alternative models of voting behavior. Further, we need to play closer attention to how the nature of the policy agenda affects out ability to estimate ideal points from roll calls. Roberts (2007), for example, has demonstrated how scaling procedures are sensitive to the agenda. We have uncovered a similar issue in our analysis of the 107 th Senate using Nominate. Future research needs to consider which statistical procedures are the most appropriate given that the assumptions imposed by Nominate may be to unrealistic (e.g. independent roll-call votes). While it s tempting to suggest that researchers have exhausted the amount of information in the roll call data set and that any new algorithms or refinements are wasted efforts, the truth is that researchers enthusiasm to explore the roll calls and utilize scores for 13

hypothesis testing in a wide set of circumstances outstrips the limitations in the data. We hope that future work in roll call analysis focuses on the problems current to all of the extant methods, namely, the ability to make out-of sample predictions and forecasts as well as dealing with the vexing problem in which moderate roll call summary scores conflate true ideological moderates as well as legislators voting in a higher dimensional space (or randomly). Our future work might include a return to the older idea of constructing more interpretable issue-based dimensions and otherwise bringing more data into the roll call choice classification problem. References Aldrich, John H. 1995. Why Parties? The Origin and Transformation of Political Parties in America. Chicago: University of Chicago Press. Clinton, Joshua. Simon Jackman and Doug Rivers. 2004. The Statistical Analysis of Roll Call Data. American Political Science Review. 98:355-370. Everitt, Brian S., Sabine Landau and Morven Leese. 2001. Cluster Analysis Fourth Edition. London: Arnold Press. Heckman James and James Snyder. 1997. Linear Probability Models of the Demand for Attributes with Empirical Application to Estimating the Preferences of Legislators. The Rand Journal of Economics. 28:142-89. Koford, Kenneth. 1989. Dimensions in Congressional Voting. American Political Science Review 83: 949-62. Poole, Keith and Howard Rosenthal. 2000. Congress: A Political-Economic History of Roll Call Voting. Oxford University Press. Poole, Keith and Howard Rosenthal. 2007. Ideology and Congress: Second, Revised Edition of A Political-Economic History of Roll Call Voting. New Brunswick: Transaction Publishers. Roberts, Jason M. 2007. The Statistical Analysis of Roll-Call Data: A Cautionary Tale. Legislative Studies Quarterly. 32(3) 341-360. Smith, Steven S. 2008. Party Influence in Congress. New York: Cambridge University Press. Wilcox, Clyde and Aage Clausen. 1991. The Dimensionality of Roll-Call Voting Reconsidered. Legislative Studies Quarterly. 16(3) 393-406. 14