arxiv: v1 [stat.ap] 11 Jul 2017

Similar documents
Can Politicians Police Themselves? Natural Experimental Evidence from Brazil s Audit Courts Supplementary Appendix

The Limits of a Quota Clara Araújo

The Limits of Women s Quotas in Brazil

Report for the Associated Press: Illinois and Georgia Election Studies in November 2014

A New Method of the Single Transferable Vote and its Axiomatic Justification

Supplementary Materials A: Figures for All 7 Surveys Figure S1-A: Distribution of Predicted Probabilities of Voting in Primary Elections

Practice Questions for Exam #2

Quantitative Prediction of Electoral Vote for United States Presidential Election in 2016

Corruption, Political Instability and Firm-Level Export Decisions. Kul Kapri 1 Rowan University. August 2018

ANNUAL REPORT ON THE PROFILE OF DEFENDANTS ASSISTED AT CUSTODY HEARINGS

Navigating Brazil s Changing Political Landscape

Working Paper: The Effect of Electronic Voting Machines on Change in Support for Bush in the 2004 Florida Elections

Determinants and Effects of Negative Advertising in Politics

Publicizing malfeasance:

CASE SOCIAL NETWORKS ZH

Combining national and constituency polling for forecasting

ONLINE APPENDIX for The Dynamics of Partisan Identification when Party Brands Change: The Case of the Workers Party in Brazil

A Dead Heat and the Electoral College

Content Analysis of Network TV News Coverage

PROJECTION OF NET MIGRATION USING A GRAVITY MODEL 1. Laboratory of Populations 2

Migrant Wages, Human Capital Accumulation and Return Migration

Chapter. Sampling Distributions Pearson Prentice Hall. All rights reserved

Journal of Political Science & Public Affairs

Understanding Taiwan Independence and Its Policy Implications

Report for the Associated Press. November 2015 Election Studies in Kentucky and Mississippi. Randall K. Thomas, Frances M. Barlas, Linda McPetrie,

Lab 3: Logistic regression models

14.11: Experiments in Political Science

Supplementary/Online Appendix for:

Civil Society Organizations in Montenegro

The Development of FTA Rules of Origin Functions

UNDERSTANDING TAIWAN INDEPENDENCE AND ITS POLICY IMPLICATIONS

Voter Uncertainty and Economic Conditions: A Look into Election Competitiveness

This situation where each voter is not equal in the number of votes they control is called:

3 Electoral Competition

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

Who Would Have Won Florida If the Recount Had Finished? 1

Impact of Human Rights Abuses on Economic Outlook

Game theory and applications: Lecture 12

Preferential votes and minority representation in open list proportional representation systems

Immigration and Internal Mobility in Canada Appendices A and B. Appendix A: Two-step Instrumentation strategy: Procedure and detailed results

Corruption and Political Competition

CSES Module 5 Pretest Report: Greece. August 31, 2016

Model of Voting. February 15, Abstract. This paper uses United States congressional district level data to identify how incumbency,

The backstage of presidential elections in Brazil

Analyzing Racial Disparities in Traffic Stops Statistics from the Texas Department of Public Safety

DHSLCalc.xls What is it? How does it work? Describe in detail what I need to do

ELITE AND MASS ATTITUDES ON HOW THE UK AND ITS PARTS ARE GOVERNED VOTING AT 16 WHAT NEXT? YEAR OLDS POLITICAL ATTITUDES AND CIVIC EDUCATION

Benchmarks for text analysis: A response to Budge and Pennings

Gender preference and age at arrival among Asian immigrant women to the US

EFFICIENCY OF COMPARATIVE NEGLIGENCE : A GAME THEORETIC ANALYSIS

Corruption and business procedures: an empirical investigation

the notion that poverty causes terrorism. Certainly, economic theory suggests that it would be

Supplementary Materials for Strategic Abstention in Proportional Representation Systems (Evidence from Multiple Countries)

Remittances and Poverty. in Guatemala* Richard H. Adams, Jr. Development Research Group (DECRG) MSN MC World Bank.

The Cook Political Report / LSU Manship School Midterm Election Poll

14.770: Introduction to Political Economy Lectures 8 and 9: Political Agency

Appendices for Elections and the Regression-Discontinuity Design: Lessons from Close U.S. House Races,

1 Electoral Competition under Certainty

DE KOSTEN VAN CRIMINALITEIT

FOREIGN FIRMS AND INDONESIAN MANUFACTURING WAGES: AN ANALYSIS WITH PANEL DATA

Supplemental Online Appendix to The Incumbency Curse: Weak Parties, Term Limits, and Unfulfilled Accountability

Women s Education and Women s Political Participation

List of Tables and Appendices

DATA ANALYSIS USING SETUPS AND SPSS: AMERICAN VOTING BEHAVIOR IN PRESIDENTIAL ELECTIONS

ANES Panel Study Proposal Voter Turnout and the Electoral College 1. Voter Turnout and Electoral College Attitudes. Gregory D.

HOTELLING-DOWNS MODEL OF ELECTORAL COMPETITION AND THE OPTION TO QUIT

Homework 4 solutions

The RAND 2016 Presidential Election Panel Survey (PEPS) Michael Pollard, Joshua Mendelsohn, Alerk Amin

Growth and Poverty Reduction: An Empirical Analysis Nanak Kakwani

NH Statewide Horserace Poll

Sentencing Guidelines, Judicial Discretion, And Social Values

Preliminary Effects of Oversampling on the National Crime Victimization Survey

Guns and Butter in U.S. Presidential Elections

Traditional Electoral Parties and Political Connection: evidence from an electoral experiment

Congressional Gridlock: The Effects of the Master Lever

Political ignorance & policy preference. Eric Crampton University of Canterbury

What makes people feel free: Subjective freedom in comparative perspective Progress Report

Does forced voting result in political polarization?

Ohio State University

Allocating the US Federal Budget to the States: the Impact of the President. Statistical Appendix

Economic correlates of Net Interstate Migration to the NT (NT NIM): an exploratory analysis

Family Values and the Regulation of Labor

Designing Weighted Voting Games to Proportionality

Immigrant Legalization

Negative advertising and electoral rules: an empirical evaluation of the Brazilian case

Interethnic Tolerance, Demographics, and the Electoral Fate of Non-nationalistic Parties in Post-war Bosnian Municipalities

Learning from Small Subsamples without Cherry Picking: The Case of Non-Citizen Registration and Voting

Illegal Migration and Policy Enforcement

Factors influencing Latino immigrant householder s participation in social networks in rural areas of the Midwest

DU PhD in Home Science

Incumbency as a Source of Spillover Effects in Mixed Electoral Systems: Evidence from a Regression-Discontinuity Design.

Poor Voters vs. Poor Places

Political Posts on Facebook: An Examination of Voting, Perceived Intelligence, and Motivations

Partisan Nation: The Rise of Affective Partisan Polarization in the American Electorate

The Shadow Value of Legal Status --A Hedonic Analysis of the Earnings of U.S. Farm Workers 1

Political Sophistication and Third-Party Voting in Recent Presidential Elections

Supporting Information Political Quid Pro Quo Agreements: An Experimental Study

Response to the Report Evaluation of Edison/Mitofsky Election System

Introduction to Path Analysis: Multivariate Regression

Pork Barrel as a Signaling Tool: The Case of US Environmental Policy

Transcription:

Evidence of Fraud in Brazil s Electoral Campaigns Via the Benford s Law Daniel Gamermann a and Felipe Leite Antunes a a Universidade Federal do Rio Grande do Sul (UFRGS) - Instituto de Física, Av. Bento arxiv:77.8826v [stat.ap] Jul 27 Gonçalves 95, Porto Alegre, Rio Grande do Sul July 28, 27 Abstract The principle of democracy is that the people govern through elected representatives. Therefore, a democracy is healthy as long as the elected politicians do represent the people. We have analyzed data from the Brazilian electoral court (Tribunal Superior Eleitoral, TSE) concerning money donations for the electoral campaigns and the election results. Our work points to two disturbing conclusions: money is a determining factor on whether a candidate is elected or not (as opposed to representativeness); secondly, the use of Benford s Law to analyze the declared donations received by the parties and electoral campaigns shows evidence of fraud in the declarations. A better term to define Brazil s government system is what we define as chrimatocracy (govern by money). keywords: Benford s Law, Logistic regression, Electoral campaign, Politics, Fraud Introduction Modern society dependence on technologies, in particular the Internet and mobile phones, has as consequence the generation of huge amounts of raw data. Apart from the problematic involved in the processing and storage of this data, the data s volume, structure and variety call for the development of new analysis methodologies in order to extract the important information (knowledge) behind it. Also, as scientific fields that have traditionally adopted qualitative approaches slowly tackle quantitative analyses, a vast new horizon opens to new applications of methodologies long known to the physics community. This interaction of physics with other sciences has been fruitful in apparently distant fields such as economics [, 2, 3, 4], biology [5], medicine [6] or political sciences[7, 8]. In this context, Statistical Physics has much to offer, particularly in understanding, quantifying and modeling the dynamics and properties of a large number of elements. Big [9] with its unprecedented scale and much finer

resolution, provides a powerful experimental apparatus to challenge our existing models, explore new tools and frameworks, and lead research to new areas []. At the moment, the sector that most benefits from the rising data science field is the private sector. Companies invest heavily in studding costumer profiles and needs in order to offer more attractive services and increase their profits or optimizing decision making process minimizing risks. On the other hand, the public sector might enormously benefit from knowledge obtained with these new information technologies. Objective analysis could guide public policies preventing the spread of epidemics [, 2], minimize traffic jams[3], decreasing unemployment [4], fighting corruption [5, 6, 7], crime [8, 9] or violence [2]. An interesting result applied in the detection of fraud is the Benford s Law. Noted for the first time by the astronomer and mathematician Simon Newcomb [2], and empirically postulated by Benford when comparing data collected from a variety of sources, ranging from the statistics of the American baseball league to the atomic weights of the elements, the law of probability of occurrence of numbers, as observed by Newcomb, is such that all the mantissas of its logarithms are equiprobable. This observation can be put as follows 2 [23]: ( P (d) = log (d + ) log (d) = log + ), () d d {, 2,.., 9}. (2) Despite its simplicity, the first rigorous proof was only developed by Hill in 995 [24]. In the original work, Hill proves, based on probability theory, that scale invariance implies base invariance and base invariance, in turn, implies the Benford s Law. Sets of numbers tend to follow this law given that they are naturally occurring (random) numbers, coming from multiple different distribution and expanding many orders of magnitude. By naturally occurring numbers, it is meant numbers that are not sequential, man made, as would be for example, serial numbers or license car plates, which would not be random, but cover a given range uniformly. It is interesting to note that this law is scale invariant, so it does fell as a natural law (independent of man made measurement systems or concepts): i.e. take the measurement of the heights of all mountains in a country, if they tend to follow Benford s law, they will do so no matter if the measurements are made in meters, feet or inches. The distribution of the first digit will have approximately the same shape no matter the unit system used. Were the distribution uniform in a given measurement system, it would have a complete different shape in another system, the distribution would then be measurement system dependent. Benford s law may be an important tool in order to search big amounts of data for anomalies. It is interesting to note that Benford s law has already been used in order to detect evidence of fraud in electoral results [25] and in revenue tax declarations [26, 27]. Being an important accounting forensic The mantissa of a given number x is such that x = m n, Where n is an integer. 2 More accurately, the uniform distribution of the log of the mantissa is equivalent to the generalized Benford s distribution for n-digits [22]. 2

tool it could be admissible as evidence in courts of law. In this work we analyze publicly available data on Brazilian elections. Brazil s superior electoral court (TSE from Tribunal Superior Eleitoral) freely provides all statistics on election results and financial declarations made by parties, candidates and electoral committees. This information can be downloaded from the TSE webpage [28] (see also the appendix). Ideally, in a democracy, the people elects its leaders based on representativeness. Those politicians that better represent the population or groups within the population and better defend their interests should end up elected. The electoral campaign is the opportunity the candidates to offices have to express their ideas and the voters to get acquainted with the candidates and to chose those that better represent their interests. In practice, Brazil s system faces many problems. On one hand, not all candidates have the same opportunity to appear in front of the population and express their plans; on the other hand, no matter what a politician promises during the campaign, once elected he can follow a completely different line. The first problem, we believe, can be traced to a single factor: money. Electoral campaigns are much closer to plain publicity than to ideological debate. The more money a candidate or a party has, the better the marketing professionals he can hire and the more time he can buy in private media and consequently, the more he is remembered by the voters. The public media time is shared by the candidates and parties, but it is proportional to the number of congressmen each party has, such that one has a positive feedback effect: the more time a party has, the bigger the opportunity it has to influence the voters, therefore, the bigger the probability it has to elect its members and the bigger the media time it will have in the next election. It is easy to realize the nasty effect money has in an electoral campaign, completely perverting the principles of democracy. At this point, we would like to define the term Chrimatocracy 3, from the greek word χρηµατ α which means money. Chrimatocracy is the system of government where the ones who receive more money, govern. The principle of democracy would therefore be broken: in a country where the majority of the population is relatively poor, those who have big amounts of money to donate to politicians do not represent the people. After obtaining some descriptive statistics on the data from Brazil s TSE, two analyzes are performed: using logistic regression we determine the relationship between the money a candidate declares he received as donations and the probability of him to get elected for office; in a second analysis, we study the set of all single donations received by each player (party, candidate or committee) and search it for anomalies not following Benford s law. We evaluate the statistical significance of this discrepancy and we also construct a random model for donations and create random sets of donations with similar descriptive statistic than the declared donations to perform the same test over the modeled sets of numbers. In the next section we describe the data and the analyses performed, in the section after that we present and discuss our results and in the last section we give a short overview and present our conclusions. 3 The fact that this word appears to have its roots in the word crime is just a happy coincidence. 3

2 Materials and Methods In this section, the data used is explained. All data used in this study is publicly available from Brazil s superior electoral court (TSE). In the appendix, we describe how to obtain the data from Brazil s TSE, exactly which files were used and how to download them. Based on the data statistics, a model is elaborated in order to generate artificial data to compare the results when performing the Benford analysis. Last in this section we briefly explain the logistic regression model. 2. Brazil has elections every two years, but alternating between two different types of elections, each type occurring every four years. There are the municipal elections, where mayors and city council members are elected (the last one occurred in 26) and general elections where president, governors, senators and congressmen (regional and national) are elected (the last one occurred in 24). Brazil has 26 federal units plus the federal district. Each one of these units (regions) elects its senators, congressmen and governors. For each federal unit, Brazil s TSE provides information on the donations declared by the three entities: candidates, parties and committees. The data comprises information describing every donation received. The donations can be divided in two categories with respect to the donor: they can come from legal persons (private citizens, identified by the CPF 4 number) or from legal entities (i.e. companies, identified by the CNPJ number). Also, some entities can make donations among them (the party can give part of the money from a given donation to a candidate). In this type of transaction, the information on the original donor is also specified in the declarations. From now on, these type of donations will be referred to as non-original donations. Apart from information concerning each Brazilian federal unit separately, one can also obtain the information declared by the parties and committees at national level and for the presidential campaign (which has national and not regional scope). Parallel to financial information on the donations declared for the electoral campaigns, we also obtained information concerning the elections results (valid votes obtained by each candidate, and his situation: elected or not) and the party coalitions in each federal unit. This information is interesting because for some offices, not necessarily the most voted candidates are elected, but the number of congressmen elected for a given party coalition depends not only on the votes obtained by a single candidate, but all votes to candidates in the coalition determine how many seats the coalition receives and then these seats are distributed among the most voted candidates within the coalition. So, a candidate can have more votes than a minimum needed for being elected and the excess votes somehow go to less voted candidates of his coalition. In practice, some times a candidate is elected having received less votes than some of his non-elected competitors (yes, this is our democracy ). We will present analysis done with data for the 24 elections. In this election, Brazil s president 4 CPF is an identification number used by the Brazilian tax revenue office. It is roughly the Brazilian analogue to a social security number. With the same purpose, companies are identified with a similar number called CNPJ. 4

was elected, along with the national congress, senate and regional governments. When analyzing the data, we do not mix information for national with regional elections. Therefore, first we present three different sets of data: the donations specific for the presidential campaign (donations received by comittees and candidates), donations received directly by the parties (which end up distributed among candidates or committees as non-original donations) and the donations received by the governor regional campaign in one federal unit, the state of Rio Grande do Sul. In our analyses, the donations can be divided in four categories according to their nature: CNPJ, CPF, Non-original and Unknown (donations for which neither a CPF nor a CNPJ has been attributed). For each set of donations, the distribution of the first digits in the amounts donated are obtained and compared to the Benford s Law by performing a standard χ 2 test, the p-value obtained from this test is the probability that a fluctuation as big or bigger than the observed one comes from a distribution with the assumed shape. So the bigger the p-value, the more the observed frequencies are in agreement with the expectation of Benford s Law. But then, it is fair to ask, why should the amounts declared as donations have the first digit distribution following Benford s law? Although in some cases a satisfactory explanation for the manifestation of Benford s law in some sets of naturally occurring numbers has eluded mathematicians, explanations have been given for describing this phenomenon in sets of numbers that come from multiples distributions expanding many orders of magnitude [24]. We argue that this is the case with electoral donations. Donations are not made by fixed amounts, they are in principle, spontaneous, the donor chooses the amount he wants to donate. The amounts donated are, in this sense, random and not sequential or uniform. Electoral campaign rules only determine that the maximum amount a legal person can donate should not exceed % of his revenue in the year before election and donations from legal entities should not exceed 2% of its brute revenue 5. This characteristic of the donations results in values expanding many orders of magnitude: richer persons or companies can donate more, much more, than poor citizens or small business. In fact, in figure we show the cumulative distribution for all declared donations at national level (not regional), with the horizontal axis in logarithmic scale. This figure clearly shows that a range encompassing 7 orders of magnitude (the smallest donation is real and the maximum is 4 R$) is covered and in the detail, one can see that legal entities donation values are on average much bigger than common citizens donations (multiple different distributions). The statistics for the four distributions is shown in table (note, from the table that donations have not been attributed to neither a legal person nor entity). In table 2 we show the descriptive statistics for the donations made to the central directories of the parties. This is the money that the parties have to redistribute among all their candidates and campaigns. In figure 2 and table 3 we show the same histograms and statistics, evaluated for the governor regional campaign in one federal unit, the Rio Grande do Sul (RS). for the other 26 federal units, shows very similar patterns. 5 Actually, this rule was valid only until 24 election. After this year, legal entities are (officially) forbidden to donate. 5

Sum(Counts).9.8.7.6.5.3. All Donations - Cumulative Distribution 2 4 6 8 2 4 6 8 Sum(Counts).9.8.7.6.5.3. Non-original Donations - Cumulative Distribution 2 4 6 8 2 4 6 Sum(Counts).9.8.7.6.5.3. CNPJ Donations - Cumulative Distribution 2 4 6 8 2 4 6 8 Sum(Counts).9.8.7.6.5.3. CPF Donations - Cumulative Distribution 2 4 6 8 2 4 Figure : Cumulative distributions for donations declared for the presidential campaigns. Top left: all donations; Top right: non-original donations; Bottom left: CNPJ donations; Bottom right: CPF donations. Table : Statistics for all donations declared for the presidential campaigns. Donations Min [R$] Max [R$] Average [R$] STD [R$] N Total [R$] All. 4. 8773.96 43582.697 4 932222528.3 CNPJ.56 4. 252883.227 87437.765 2242 56696494.65 CPF. 2. 2777.926 3722.847 598 443966.54 Non-original 2.38 7. 88825.673 29793.854 3949 3577258.57 Unknown. 8826.29 49 588.4 469.55 Table 2: Statistics for all donations declared by the parties central directories. Donations Min [R$] Max [R$] Average [R$] STD [R$] N Total [R$] All.2 3. 36697.292 7566.9 249 9362456.53 CNPJ 3. 3. 422592.562 8285.53 229 899699564.27 CPF.. 429.83 33472.945 336 38937.87 Unknown.2 262.2 424.86 674.7 25 52.39 Table 3: Statistics for all donations declared in the RS Governor electoral campaigns. Type Min [R$] Max [R$] Average [R$] STD [R$] N Total [R$] All.5 5. 79.73 668.435 2595 44694.9 CNPJ 8. 5. 4365.57 8942.678 545 23525.55 CPF.5 75. 523.75 45374 229 87859.8 Non-original 4. 95. 23458.969 7986.723 89 922895.27 Unknown 33.. 665. 473.762 2 33. 6

Sum(Counts).9.8.7.6.5.3. All Donations - Cumulative Distribution 2 4 6 8 2 4 6 Sum(Counts).9.8.7.6.5.3. Non-original Donations - Cumulative Distribution 2 4 6 8 2 4 Sum(Counts).9.8.7.6.5.3. CNPJ Donations - Cumulative Distribution 2 4 6 8 2 4 6 Sum(Counts).9.8.7.6.5.3. CPF Donations - Cumulative Distribution 2 4 6 8 2 Figure 2: Top: Histogram in logarithmic scale for all donations declared in a regional campaign. Bottom left: only donations from legal entities. Bottom right: only donations from legal persons. 2.2 Donation Model In figures and 2 one can see that the cumulative distribution of the donations tend to have a sigmoidal shape when the horizontal axis is in logarithmic scale. Let s fit to the cumulative distributions a truncated sigmoidal function. Take the function F (ξ) = ξ γ ξ γ + = ( ) γ (3) ξγ + ξ ξ ξ (; ), (4) where γ and ξ are the parameters to be fitted. In order to shift the minimum possible value of the variable ξ, one can make the replacement ξ ξ where is now the new minimum possible value for the variable. In order to truncate the maximum value for the distribution (F (ξ) should be equal to for ξ max), one makes the transformation F (ξ) F (ξ) F (ξ max). Since the horizontal axis is in logarithmic scale, the variable of the distribution is related to the amounts (x) present in the data by ξ = ln(x). The actual distribution one needs is the derivative of the cumulative distribution: 7

Table 4: Fitted parameters to the distributions in the presidential campaign. Donations Max [R$] N γ e ξ [R$] All 4. 4 5.28926 7747.23 CNPJ 4. 2242 5.33468 53965.67 CPF 2. 598 74493 5463.28 Non-original 7. 3949.98249 68336.77 F (x) =. x < e ( ) ξ γ + ξmax ξ +( ξ ) γ e < x < e ξmax x > e ξmax f(x) = d dx F (x) = γ ( ( ) γ ) ξ + x ξ max ( ( + ) γ ξ ln(x) ( ξ ln(x) (5) ) γ ) 2 ln(x). (6) Now, given a set of numbers (x i, i =, 2,..., N) we set ξ max = max(x i) +, = log(.) (the smallest possible donation is one cent) and determine the values of ξ and γ that maximize the likelihood for the set: L = N f(x i) (7) i= i= ( ( ( ln γ + ln + ln L = f(x i) = N i N N (γ + ) ln (ln x i ) 2 i= ξ ξ max ( ( ln + ) γ ) + γ ln ξ ) ξ ln x i N ln x i + i= ) γ ). (8) A steepest ascent algorithm was implemented in order to maximize the likelihood: iteratively, given initial arbitrary values for γ and ξ, the gradient of the likelihood in parameter space is evaluated and the values of the parameters are increased (or decreased) by small amounts following this gradient. This iteration is repeated until the norm of the gradient approaches zero (within some numerical precision). In figure 3, one can see the same distributions from figure with the fitted distributions. The parameters obtained for these fits are in table 4. 2.3 Logistic Regression Before presenting our results on the first digit distribution for the declared donations in the electoral campaigns, we would like to establish the importance this money has in the outcome of the election. For this purpose, we fit a logistic model (logit regression) setting as dependent variable the result of the election for a given candidate (elected or not elected) and as independent variable the fraction 8

All Donations - Cumulative Distribution Non-original Donations - Cumulative Distribution.8.8 Sum(Counts).6 Sum(Counts).6 Fit Fit 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 CNPJ Donations - Cumulative Distribution CPF Donations - Cumulative Distribution.8.8 Sum(Counts).6 Sum(Counts).6 Fit Fit 2 4 6 8 2 4 6 8 2 4 6 8 2 4 Figure 3: Cumulative distributions for donations declared for the presidential campaigns with fitted distributions. Top left: all donations; Top right: non-original donations; Bottom left: CNPJ donations; Bottom right: CPF donations. of all money declared as donations in the whole campaign for the same office 6 that each candidate received. The logistic model comes from the assumption that one can determine the probability of success (p) in a given process from a set of k predictor factors (x i, i=, 2,..., k) with the logistic function[29]: p(x) = + e β + k β ix i i= (9) where p is the probability of success, x i are the values of the predictors and β i are parameters to be fitted from data. In our case, we fit two parameters, β and β by correlating the probability of a candidate to be elected for office with the total fraction of donations x received by him during the campaign. In order to obtain the best fit to the model we use Newton-Raphson method in order to obtain the values of the parameters that maximize the likelihood for the observed data given the model. Regarding the logistic regression, we perform two statistical tests: for each fitted parameter (β and β ), the Wald test where we compare the ratio between the square of the parameter with its uncertainty against a χ 2 distribution with one degree of freedom (this is equivalent to the z-test). Small p-values in this test indicate a significant value for the parameter obtained (that it is in fact 6 We use the fraction instead of the absolute amount for numerical stability in the calculations. A simple variable transformation, that has no influence in the statistics of the regression results, can be done simply by multiplying the fractions by the total amounts that are shown in the tables. 9

different from zero). In another test we evaluate the deviance for the model given the obtained parameters [3] and compare it to a χ 2 distribution. This test asserts the overall quality of the fit: high (close to ) p-values in this test indicate a good fit (a good description of the data by the model). 3 Results and Discussion For performing the logistic regression analysis, the election results for each one of the 27 Brazilian federal units were obtained. In each federal unit, for each candidate to a given office (we analyze data for the regional and federal congress offices because for these offices many candidates are elected and, therefore, there is enough statistics to perform the analysis) one is able to obtain, from the data, the total amount of money received by him as donations and to know whether he was elected or not. In table 5 we show the results of the logistic regression fit for the federal congressmen campaign in each one of Brazil s federal units. It is astonishing how well the model fits the data (p-values associated to the deviance close to ) and how significant the obtained parameters are (Wald p- values close to ). This result indicates that money certainly is a good predictor of whether a candidate is elected or not. From these results, the value of β can be used to estimate how more likely (how the odds ratio increase) for a candidate to be elected if he receives an extra amount of money y: oddr(y) = p(x + y) p(x) = e β y () take, for example, the case of federal congress campaign in RS, where β =329.49. If a candidate receives a R$ donation, the odds of him being elected increase e 329.49 559659.3 =.82. In other words, the odds of being elected raise by 82% every R$ a candidate receives as donation. And receiving nothing, the chances of being elected are p() =.33 (around.3%). In Table 6 we show the results of the logistic regression for the regional congress office seats.

Table 5: Results for the logistic regression fits for the federal congress elections. For each federal unit (UF), the first line show the value for the fitted intercept (β ) with its uncertainty and the correspondent Wald p-value and the second line shows the values for the parameter associated with the money variable (β ) with its correspondent uncertainty and p-value. The column N shows the number of candidates and the column n the number disputed chairs (elected candidates). The last p-value column is the result for the test performed with the deviance parameter that reflects the overall quality of the fit. In the last column we present the total amount of money donated for the campaigns in each UF. UF β ± σ p-value (Wald) N n Deviance p-value Total Money [R$] RS -4.374 ± 8548. 38 3 74.527927. 559659.3 329.49389 ± 52.56683. AC -3.44999 ±.75749.5 62 8 32.33329.99868 84335.29 55.66227 ± 6.37827.582 AL -4.69582 ±.99996.2 9 27256. 7434.7 92.58538 ± 23.36522.74 AM -4.7782 ±.6766.55 79 8 7.98722. 2298648.5 9.22557 ± 26.324.524 AP -2.86732 ± 373. 4 8 49.45353.999998 86682.78 22.82996 ±.65373.322 BA -3.896939 ± 76. 32 39 9.93468. 757732.26 48.67858 ± 64.287666. CE -3.53933 ± 33627. 95 22 74.93382. 3486757.57 37.94233 ± 23.663. DF -4.3338 ±.75693. 28 8 36.354865. 928865.4 78.96265 ± 8.8565.27 ES -4.24638 ±.69728. 57 4.35258. 8836757.57 87.468974 ± 9.952579.2 GO -3.7569 ±.534978. 95 7 5.8892.99987 63643888.3 93.675674 ± 2.56259.5 MA -3.28799 ±.334342. 235 8 2.435737. 2227657.48 79.4432 ± 9.6368.34 MG -4.252349 ±.33755. 62 53 4.95935. 5863223.27 53.22668 ± 56.34858. MS -5.97387 ±.58883.58 6 8 4.792692. 2969885.35 28.623347 ± 35.97929.258 MT -4.49448 ±.947234.2 96 8 2.76442. 2572237.33 83.39765 ± 23.946738.497 PA -3.764366 ±.57372. 74 7 64.372699. 923789.65 29.29794 ± 24.536572. PB -4.795535 ±.83687. 96 2 7.65824. 3639599.77 57.28575 ± 44.88445.449 PE -2.89783 ±.38326. 55 25 85.84997.999998 5797.79 26.7287 ± 23.957652. PI -3.362935 ±.62656. 88 38.327.999998 2443449.38 8.469 ± 23.8655.556 PR -3.52287 ±.34836. 295 3 99.829366. 69943697.86 98.37 ± 3.4852. RJ -4.395 ± 78953. 953 46 82.6865. 99346.6 497.533828 ± 56.28842. RN -5.6925 ±.756848.96 83 8 3.736874. 4278235.86.8578 ± 35.288882.68 RO -3.4228 ±.64683. 8 8 35.835.999993 7787.46 56.756842 ± 8.38973.226 RR -4.5292 ±.84656.2 8 8 25.82922. 8356682.9 96.267954 ± 29.868455.268 SC -3.75785 ±.57954. 28 6 49.832673. 373733.92 5.394467 ± 23.2984. SE -2.98286 ±.564645. 73 8 36.336.999795 762264.24 37.83953 ± 3.895874.648 SP -4.354 ± 5463. 38 7 3.8793. 234478433.27 55.4672 ± 47.4222. TO -3.35332 ±.853338.85 47 8 25.37784.992752 4895843.7 53.979352 ± 7.65884.2237

Table 6: Results for the logistic regression fits for the regional congress elections. For each federal unit (UF), the first line show the value for the fitted intercept (β ) with its uncertainty and the correspondent Wald p-value and the second line shows the values for the parameter associated with the money variable (β ) with its correspondent uncertainty and p-value. The column N shows the number of candidates and the column n the number of disputed chairs (elected candidates). The last p-value column is the result for the test performed with the deviance parameter that reflects the overall quality of the fit. In the last column we present the total amount of money donated for the campaigns in each UF. UF β ± σ p-value (Wald) N n Deviance p-value Total Money [R$] RS -3.895895 ± 6878. 67 55 229.72642. 5389747.35 52.6353 ± 57.562793. AC -4.26827 ±.3686. 497 24 22.25355. 74882.23 28.62445 ± 42.665487. AL -3.85559 ± 33952. 262 27 93.457895. 952.69 2.48857 ± 3.624984. AM -5.42684 ±.56394. 57 24 76.85367. 24576432.8 46.4723 ± 6.8589. AP -3.838433 ±.373426. 338 24 2.24578. 562758.62 26.887652 ± 34.298. BA -3.59355 ± 6449. 578 63 235.229572. 479849.37 488.226 ± 54.57845. CE -3.866844 ± 9654. 558 46 77.253. 3258388.46 368.87924 ± 42.44498. DF -4.473329 ± 92542. 979 24 58.9482. 3564845.99 26.27656 ± 4.36546. ES -4.86863 ±.35796. 472 3 38.579334. 2323725.98 285.562 ± 37.78774. GO -3.736298 ± 453. 77 4 2.477478. 7933.98 27.582549 ± 32.7466. MA -4.34746 ±.38674. 49 42 9.6959. 2576975. 47.8926 ± 6.73738. MG -3.8226 ± 9294. 55 77 325.29458. 48882.89 57.56349 ± 49.725496. MS -5.252 ±.569786. 395 24 69.93927. 4593385.85 338.564 ± 5.48838. MT -3.863263 ± 4347. 292 24.8479. 56968.29 99.2668 ± 3.8483. PA -3.59982 ± 43462. 65 4 223.96945. 3697472.64 276.65226 ± 36.73893. PB -4.5842 ± 8954. 333 36 9532. 723837.47 35.883453 ± 4.9545. PE -3.65668 ± 88765. 489 49 75.6559. 4646.64 342.5497 ± 39.459. PI -5.64635 ±.8848. 226 3 57.843668. 224773.9 42.376534 ± 69.7243. PR -4.628 ±.348345. 738 54 66.5573. 62695.5 692.29375 ± 7.533492. RJ -4.36389 ±.99879. 846 7 342.42533. 2952722.7 63.8462 ± 54.92459. RN -4.52926 ±.59765. 244 24 69.343989. 833268.55 98.54737 ± 3.987689. RO -3.658539 ±.325359. 382 24 3.8295. 252575.3 76.974288 ± 328. RR -4.62472 ±.39427. 393 24 3352. 34568.74 227.6773 ± 32.68995. SC -3.662375 ±.357. 49 4 59.84989. 593583. 323.89886 ± 43.323434. SE -3.379372 ± 56329. 62 24 83.8699. 842739.38 6.55846 ± 28.39777. SP -4.9356 ±.854. 878 94 432.635429. 22936447.6 836.92236 ± 6.696497. TO -4.478863 ±.587435. 238 24 74.475. 9956645.63 29.77473 ± 5.52657. 2

The results clearly indicate that money is an excellent predictor of whether a given candidate will be elected for office or not. Note, that the model deals with probabilities, so we are not saying that the ones who receive more money are surely elected; what the analysis shows is that the more money a candidate receives, the more probable it is for him to be elected. This might (sadly) sound like an obvious statement, but this actually obliterates the principle of democracy: our representatives are not elected because they represent our interests, they are elected because they gather huge amounts of money in order to hire marketing professionals and flood media time with spurious publicity. In other words, candidates know that, to get elected, they do not need to have previously done a good job, to have developed good projects or even to be honest, the most important thing they need is to get money, lots of it. It could be possible try and argue that the candidates that receive the most money are those that better represent the population, since it is the population that makes the donations. Actually, most of the money comes from legal entities (CNPJ) or from the parties (non-original) which recieved most of their money from companies. Moreover, it is common knowledge that most parties and candidates receive unofficial donations (in Portuguese referred to by the term caixa dois, second cashier) that must pass through some kind of money laundry before being used. It is quite easy to compare the huge amounts donated by companies with plain bribes: the candidates, once elected, legislate beneficial laws to help those economic sectors that helped them to be elected in the first place; from the companies point of view, the donations are investments [3]. The next analysis is based on the idea that the set of numbers in an honest financial declaration should follow the Benford s law for the first significant digit distribution. Some arguments for that have already been presented: the declarator should have no real control over the amounts (random donations) and the amounts expand many orders of magnitude and come from multiple different distributions. Therefore, deviations of Benford s law would be evidence of fraud, indications that the declaration has been cooked (money laundry). In table 7 we show the first significant digit distribution for all donations received by each political party specifically for the 24 presidential campaign. For each party, we present the results for all donations together and then for the amounts classified according to the donor (CNPJ, CPF, nonoriginal). We also present results for artificial data generated from the distributions fitted to each data set; those are indicated by the tag Rand and in the last line, for each party, indicated by the tag Model, is the result for the different sets of artificial data for each category combined. In this analysis, we have considered only sets of data with more than 2 elements (some parties or some specific categories in a given party that are not shown in the table, had N < 2 and were omitted). In table 8 we present a similar table for all the donations declared by each political party central directory. These are the donations that end up distributed as non-original to various campaigns. The χ 2 values in the table are calculated via: 3

χ 2 = 9 i= (Oi Ei)2 N () E i where O i is the observed frequency of each digit, E i is the expected frequency according to Benford s Law and N is the total number of elements in the analyzed set. The p-values are the probabilities that a fluctuation equal or bigger than the observed χ 2 may be obtained in a set of numbers of the same size that do agree with the Benford empirical distribution. The results in tables 7 and 8 show that most declarations have sets of numbers that do not follow Benford s law, while the artificial data generated from the fitted distributions do result in p-values greater than. The few exceptions where the declared amounts seem to fit the Benford distribution are those with very small campaigns (few donations), but the artificial data generated in every case seems to render sets of numbers (of the same size as the declared donations) that do fit the Benford distribution, which indicates that the unfitting of data to the Benford prediction is not a bias particular of the donation statistical distributions. The χ 2 values for the most prominent parties are discrepantly high, making it hard to accept that these fluctuations from the expected distribution are merely statistical flaws. Note that the critical value for the χ 2 statistic in order to obtain a p-value of % is around χ 2 = 2.9 and the p-value drops to zero very fast for higher values of χ 2. 4 Conclusions and Overview We have analyzed data from Brazil s superior electoral court (TSE) regarding campaign donations and election results. First, the data was fitted to a logistical regression model such that it was possible to significantly determine that the money a candidate receives to run his campaign is a good predictor on whether he is elected or not. Then, assuming that the donations first digits should naturally follow the Benford distribution, as argued that genuine financial declarations should, we find strong evidence that fraud may have been committed in declarations made by candidates, parties or committees. Applying well established statistical techniques and results to data concerning Brazil s election campaigns financing and results, it is possible to identify strong evidence that the democratic principles are corrupted: the determining factor on whether a candidate is elected or not is money and there is strong evidence that fraud has been committed in the financial declarations made by the players. If fraud has been committed in these declarations, it is not possible to really determine how the money came to the candidates and therefore it is impossible to know which interests they will be defending once elected. At this point, we would like to make a small digression. Objective analysis of data (Big ) has an amazing potential to be beneficial to society, in many different aspects. Close monitorization of individuals medical data could guide public policies that would greatly improve the population s 4

Table 7: First significant digit distribution for all donations received for the presidential campaign by the different parties. The columns -9 indicate the proportion each one of the nine digits is observed, N the total number of donations, χ 2 is the statistic for the fluctuation, p-val its correspondent p-value, Min and Max are respectively the minimum and maximum amounts donated and sum is the sum of all donations, γ and ξ are the model parameters. Partido 2 3 4 5 6 7 8 9 N χ 2 p-val Min Max Sum γ ξ Benford.3.76 5.97.79.67.58.5.46 PSDB - All.34.83.5.9.88.6.49.68.43 3545 36.968. 7. 4. 42837499.87.6744 3.958 PSDB - All Rand 92.8 9.92.84.65.54.49.54 3545.23 63.58 3553439.45 6476868.7.6744 3.958 PSDB - CNPJ 63 22.8.86.98.47.27.38.38 338 8.72. 2. 4. 7684668.46 9.985 6.427 PSDB - CNPJ Rand 43.8.5.8.77.53.7.8.65 338 6.872.3 27.87 3656868.89 295885.73 9.985 6.427 PSDB - CPF 68.39.98.49 2.6.49..8 23 53.99. 45. 2. 793333.8.489 3.343 PSDB - CPF Rand 85 28.38.73.57.49.4.49.8 23 8.28 7 5.78 428568.2 46226.49.489 3.343 PSDB - Non-original.3.69.93.74.63.5.74.45 378 39.22. 7. 7. 24336969.95.858 3.797 PSDB - Non-original Rand.35.87 6.9.83.69.48.48.45 378 9.838 77 3.77 6998795.84 32248463.76.858 3.797 PSDB - Model 98.88 9.89.8.67.5.5.48 3545 9.759 82 3.77 3656868.89 622449. - - PV - All 67 6.7.79.9.42.67.3.3 65.57 3.56. 792228.69 5.9339 2.622 PV - All Rand.352.52 5.55.3.48.6.42.42 65 7.95 43 8 97956.62 5255748.7 5.9339 2.622 PV - CNPJ 42 26.69.89.56.56.8.4.4 24 7.698 64.56. 7893364.47 5.543 3.227 PV - CNPJ Rand.339.53.3.5.8.73.65.24.48 24 3..927.95 93572.73 42984.38 5.543 3.227 PV - CPF.34 6.7.49 68..24.. 4 28.3.. 7. 2896.22 9.928.454 PV - CPF Rand 68.37.49.73.49 2.49.73. 4.572.7 6.36 5294.4 28733.8 9.928.454 PV - Model.32.94.97.97.73.85.6.36.36 65 3.475.9.95 93572.73 777.47 - - PSTU - All 96.33.67.8.3.5.22.5 35 3.598. 2. 2. 7676.75 9.762 68 PSTU - All Rand 8.78 8.8.4.67.52.37.52 35 2.836.944 3.6 5837.56 962.5 9.762 68 PSTU - CPF.546.2.39.74.83.28.9..9 8 39.99. 2. 2. 445..369 9.7449 PSTU - CPF Rand.36 22.76.74.46.56.9.56.46 8 8.769.362 2.6 5493.99 4624.56.369 9.7449 PSTU - Model.356 8.67.44.44.3.59.52 35 8.862.354 2.6 9258.32 3229.6 - - PT - All.82.59.7.7.22.34.28.22 324 75.337... 352746 6.549 56 PT - All Rand.33.82 6.9.8.7.63.45.4 324 8.29 2. 999329.98 242479.22 6.549 56 PT - CNPJ.365.33.9 9.99.4.6.53.3 478 96.44. 2.8. 37724.56 5.888.843 PT - CNPJ Rand.37.72.36.85.77.62.66.46.39 478 9.835 77 999764.38 2454372.53 5.888.843 PT - CPF 85 35.7.4 64.2.2..2 32 2954.. 5. 8697..387 8.846 PT - CPF Rand 78.8.37.9.76.69.67.53.48 32 6.855.552. 6284.98 466397.2.387 8.846 PT - Non-original 83.95.59.88.84.3.44.27.88 226 9.852. 6.2 38. 42688.7.3475 4.7289 PT - Non-original Rand.367.33 8.88.88.4.7.35.49 226.399 38 5.32 3722448.46 3569479.64.3475 4.7289 PT - Model.34.73.36.88.77.63.67.48.44 324.365.82 999764.38 28486249.37 - - PSB - All 72.32.4 44.77..4.5.6 3876 24.93.. 5. 232325.46 6.6782 9.822 PSB - All Rand.33.74.7.2.79.68.6.49.47 3876 4.537.86.3 47824.53 559564.3 6.6782 9.822 PSB - CNPJ.353.99.96.38 2.32.45..26 56 55.538. 3.33 5. 5432667.36 5.868 7.5564 PSB - CNPJ Rand.3.3.54.5.7.7.7.5.64 56 8.29 2 6.25 469762.88 84952.53 5.868 7.5564 PSB - CPF.58.8.8 82.65.4.3.2. 342 256.399.. 5. 5387549.24.46 8.49 PSB - CPF Rand 8.94 8..79.67.56.5.43 342.228.89.5 34842.86 4922.92.46 8.49 PSB - Non-original.37.86.85.6.47.68.26.33 574 2.3.7 2.38 3. 633493.98.22 4.395 PSB - Non-original Rand 87.79.57..68.75.45.4.37 574.799.6 26.64 266538.45 6738.8.22 4.395 PSB - Model 82.88.33.3.77.69.55.49.43 3876 3.33..5 469762.88 4897729.29 - - PDT - All 9 36.64.9.73.55.36.8.36 55 3.77.883 4. 2. 8938826.29 5.78 6.6327 PDT - All Rand 36 7.36.9.9.55.55.73 55 6.4.647 2 93866.96 3948573.89 5.78 6.6327 PDT - CNPJ 78 4.67.93.74.56.37.9.37 54 3.887.867 4. 2. 892. 5.49 6.679 PDT - CNPJ Rand 4.3.93.3.74.93.93.74.74 54 5.527.7 835.23 822964.94 83674.4 5.49 6.679 PDT - Model 55 7.9 7.73.9.9.73.73 55 5.75.75 835.23 822964.94 932836.54 - - PRTB - All.328.38.55.69.69.86...55 58 23.4.3 43.3. 44985.26.68.5622 PRTB - All Rand.9.38.72.3.7.55.34.69 58 7.787.23 6.6 5457.4 246427.4.68.5622 PRTB - Non-original.368.5.32.79.79.5...32 38 2.893.6 43.3. 959.32 2.5753.372 PRTB - Non-original Rand 89.368.79.53.5.26..79. 38 5.9.56 29.2 87262.96 367477.89 2.5753.372 PRTB - Model 59.3.52.52.86.52.86.69.34 58.548.73 29.2 94.86 238572.93 - - PCB - All 73.52.52..52.6.6.3 33 7.592 74 7. 6. 6554.69 3.525.83 PCB - All Rand.33 2.6.3.82.3..6 33 8 33 54.45 325.84 35234.7 3.525.83 PSOL - All 37 65.28.6 5.2.4.4.2 47 393.53.. 94953.74 4555.67 9.8255 8.4428 PSOL - All Rand 9.95 3.72.79.64.62.53.62 47 6.923.545.5 4644.2 2953.55 9.8255 8.4428 PSOL - CNPJ.67 7..42 8..83.42.42 24 2.6. 8.4 94953.74 24699.27 9.8356 3.237 PSOL - CNPJ Rand.542.67.42.83.42..83..42 24 9.542 99 56.2 43474.7 267.25 9.8356 3.237 PSOL - CPF 5 6.27.5 55.2... 443 393.235.. 2. 3538..878 8.339 PSOL - CPF Rand.3.87.88.8.5.7.52.52 443 4.25.834 5 837.6 7625.52.878 8.339 PSOL - Model.36.85.5.87.79.47.7.49.53 47 6.6.635 5 43474.7 363857.42 - - 5

Table 8: First significant digit distribution for all donations received for the presidential campaign by the different parties. The columns -9 indicate the proportion each one of the nine digits is observed, N the total number of donations, χ 2 is the statistic for the fluctuation, p-val its correspondent p-value, Min and Max are respectively the minimum and maximum amounts donated and sum is the sum of all donations. Partido 2 3 4 5 6 7 8 9 N χ 2 p-val Min Max Sum Benford.3.76 5.97.79.67.58.5.46 PT do B - All 7..83..333..67.. 2 8.75.2 2. 5. 4842.23 PT do B - CNPJ 7..83..333..67.. 2 8.75.2 2. 5. 4842.23 PEN - All.667....333.... 3 5.639.688 5. 5. 53. PEN - CNPJ..........629.69 5. 5. 5. PEN - CPF......... 2 4.644.795 5. 5. 3. DEM - All.364.93.7.79.79.2.2.2.4 4 33.4. 6. 33. 59277. DEM - CNPJ.35.97.9.8.82.22.22.22.5 37 32.995. 6. 33. 56277. DEM - CPF......... 3 6.966.54.. 3. PMDB - All.32.8 9.55.89.3.58.24.3 38 79.46. 53.58 7. 86799588.38 PMDB - CNPJ.38.88 8.45.76.3.6.26.28 352 65.57. 3.95 7. 8236886.6 PMDB - CPF.3.3.38.72.345.34.34..69 29 34.778. 53.58. 4434.77 PR - All.339 7.59.85 3.25.8..8 8 5.5. 542.39 5. 6327599.49 PR - CNPJ.339 84.64.83.93.8.9..9 9 46.59. 25. 5. 623243.2 PR - CPF.333.67...333.67... 6 8.7 27 5. 65. 93. PTN - All 92.67.42 5.83. 5.42. 24 9.556.2 2.38 8. 565982.94 PTN - CNPJ 5.62.62.88.62..375.. 6 33.569. 5. 73. 485435. PTN - CPF.667.......333. 3 7.946 39. 8. 82. PP - All 66.53.48 8.24.4.24.6 24 43.363.. 3. 72497495.3 PP - CNPJ 54 3.56.49 2.25.4.25.6 22 44.62.. 3. 72297495.3 PP - CPF......... 2 4.644.795.. 2. PV - All.354 28.76.38.25.25.25 79 6.47.595.. 799375.23 PV - CNPJ.346 3.5.5.77.38.26.26.26 78 6.224.622 228.62. 799365.23 PV - CPF......... 2.322.97... PT - All.379.77 7.42.69.23.44.23.6 385 84.742... 928848.27 PT - CNPJ.378.82.33.44.55.22.47.25.4 362 68.58. 326.26. 88973838.27 PT - CPF.39.87.43..39.43...43 23 36... 5. 255. PRB - All.5.58.58.4.9...5. 22 75.35..2. 58326.73 PRB - CNPJ.39 5..83.8.4.28.4. 72 2.37.6 4.. 565. PRB - CPF.69.95.83.8.87.8... 26 86.824.. 5. 4865. PTC - All....5.5.... 2 9.474.34 5. 4. 45. PTC - CNPJ....5.5.... 2 9.474.34 5. 4. 45. PTB - All 67.333.67.33.67..33.. 3 4.29.78. 3. 88. PTB - CNPJ 67.333.67.33.67..33.. 3 4.29.78. 3. 88. PPL - All 3 3 29. 3... 3 7 9.53.32 3 33343.93 43989.53 PPL - CNPJ 5..5..... 5 4 98 45 9. 33343.93 43736 PPL - CPF..5...5.... 2 7.54.52 5. 2. 25. PPS - All.32.4 4..4..4 25 2.227 5.5 5. 2598945.87 PPS - CNPJ 63.58.58.53 63..53..53 9.7.98 3. 5. 2545. PPS - CPF.5.5....... 4 5..757 5. 2. 55. PRP - All..5.5...... 2 4.84.774 25. 3. 325. PRP - CNPJ......... 7.4.536 3. 3. 3. PRP - CPF......... 4.679.79 25. 25. 25. PMN - All.33 33 67.67.67..33.33.67 3 2.546 8 3. 3. 2485542.3 PMN - CNPJ 8 59 96.74.74..37.37.74 27 2.247 3. 3. 2472242.3 PMN - CPF......... 3 27.957. 4. 47. 33. PSDC - All 4 86.7. 4 3..7. 4 8.98.344 85.96. 65885.96 PSDC - CNPJ 3.38.77..54.54..77. 3 6.89.548 85.96. 65385.96 PSDC - CPF..........629.69 5. 5. 5. PSDB - All.322 2.9.57.69.44.44.25.8 366 69.67.. 9. 5678898.33 PSDB - CNPJ.37 26.6.66.39.48.24.9 332 59.653. 935.6 9. 52935.5 PSDB - CPF.536 3..36 4.36..36. 28 9.777... 4565. PDT - All. 22.667...... 9 25.99. 3. 35. 5. PDT - CNPJ. 22.667...... 9 25.99. 3. 35. 5. PRTB - All.53.58.5 63.58.53.. 9 4.66.66 5. 32. 5663.94 PRTB - CNPJ 22.56.67. 78.67... 8 6.573.35 5. 32. 5597.94 PRTB - CPF......... 6.244.39 724. 724. 724. PSL - All.34.9 9.43.7.64..2.2 47.42.99 5. 4. 77799. PSL - CNPJ.34.9 9.43.7.64..2.2 47.42.99 5. 4. 77799. PSTU - All...... 5.743 7 986. 8. 353.8 PSTU - CNPJ.333.....667... 3 8.24.2 6. 8. 3995.8 PSTU - CPF.5.5....... 2 2.5.962 986. 25. 436. PSB - All.9.7 4.76.324.38.29.9.29 5 89.765. 627.32 3. 359865.2 PSB - CNPJ.92.73 5.77.327.38.29.9.9 4 9.587. 627.32 3. 358965.2 PSB - CPF......... 2.854.8 9. 9. 9. PSC - All.34.52.9.43.34.22.22..43 46 35.862. 5. 5. 5969.99 PSC - CNPJ 68 6 2.24.34.24.24..49 4 42.23.. 5. 589965.99 PSC - CPF.6...... 5 4.79.84 5. 49. 725. PCO - All.5.......5. 2 9.436.37 83.33. 83.33 PCO - CNPJ.5.......5. 2 9.436.37 83.33. 83.33 PROS - All...5..5.... 6 24.95.2 3. 3. 483. PROS - CNPJ....6.... 5 24.36.2 3. 3. 48. PROS - CPF......... 7.4.536 3. 3. 3. PSD - All.348.97.82.45.5.76..5 66 3.373. 3. 5. 5573638.58 PSD - CNPJ.354.85.46.8.5.77..5 65 2.68 3 3. 5. 5568638.58 PSD - CPF..........629.69 5. 5. 5. PC do B - All.333 42.52.3.9..3. 33 7.36 98 25. 5. 72754.99 PC do B - CNPJ.355.94.6.32 9.97..32. 3 6.834.555 2. 5. 7274.99 PC do B - CPF......... 2 9.358.33 25. 25. 5. PHS - All......... 7.4.536 3. 3. 3. PHS - CPF......... 7.4.536 3. 3. 3. PSOL - All.34.87.8 28.24.24.65.8 23 6.4.. 94953.74 38998.38 PSOL - CNPJ.39 78..28 22.83.83.28.28 36 7.573.25 72.65 94953.74 288473.38 PSOL - CPF 25.46 8. 3...8. 87 68.65... 2525. SD - All.393 5.7.24.55.48.48..2 84 24.6.2 44.8 3. 2799286.64 SD - CNPJ.392 66.76.3.65.38.5.. 79 29... 3. 2786667. SD - CPF......... 2.322.97 8. 8. 8. 6

health. Mobile technology allows tracking individuals and monitoring conversations such that crime and terrorism could be more easily solved or, in some cases, even avoided. Consumption data could be used in order to balance the needs and expenses of a population and to optimize the production and industry of a country, minimizing its environmental impacts. Nevertheless, people are usually afraid of sharing data and many strongly argument against it. The reason is simple: no one trust authority. Politicians do not use information for the benefit of the public they rule over; security services usually abuse the power they have and the only thing companies like to optimize is their profits. All the potential science has to be beneficial to society relies on its good public use and therefore, on the leaders the people have. It is of uttermost importance to understand the flaws in our political system such that they can be corrected. In this sense, Brazil has good public information policies and laws. Most government data are made public, and is waiting to be properly analyzed and scrutinized. A Files from TSE The data used for the analysis in this paper is publicly available information that can be downloaded from the TSE website. Here we provide information on exactly which files were used to produce the tables and the Md5sums of these files. The information concerning the presidential election, comes from files with the BR tag (Brazil) in its name. The other tags ( RS, for example) refer to each one of the 27 Brazilian federal units. For each one of these tags, one can identify the information on donations received (receitas ) and information about the expanses (despesas ), since our analysis was about the money the candidates received, we only worked with the former. Moreover, for each category, one identifies three different files, concerning the different players: the parties (partidos ), committees (comites ) and candidates (candidatos ). For analyzing the presidential campaign, data from the candidates and committees was considered (the two files with the tag BR). For the logistic regression model, only data from the candidates was considered. In table 9 we show a list of all files used in the analysis and their md5sums. The webpage to obtain these files can be found in http://www.tse.jus.br/eleicoes/estatisticas/repositorio-de-dados-eleitorais, under the menu Prestaçao de Contas, by choosing the year 24. There, a zip file containing all the files in the table (and others) can be downloaded. For performing the logistic regression, one also needs to know the situation of each candidate (elected 7 or not). This information was also obtained from the TSE webpage in CSV file format by filling the form in: http://www.tse.jus.br/eleicoes/estatisticas/estatisticas-candidaturas-24/estatisticas-eleitorais-24-resultados 7 There are actually two different ways to be elected: by QP or by média. We did not differentiate between the two in the analysis. 7

Table 9: Md5sum for the data files used in the study. File MD5SUM receitas candidatos 24 AC.txt eb798e94258ead3a7b6ee6f8bc6b receitas candidatos 24 AL.txt 2928ae7f9377ca289bd49f5c7e receitas candidatos 24 AM.txt d49a75f25aebc6a944a5974ef86e8b receitas candidatos 24 AP.txt b2d3f59996236e5bffe38a4f6c237d receitas candidatos 24 BA.txt 69b592e34ace294c82e3bdb2e7d receitas candidatos 24 BR.txt 27426c7ec86d3dd884e9948c5db37 receitas candidatos 24 CE.txt 8ddf6c9ae62f3e5efd7b9326a36fa receitas candidatos 24 DF.txt 4ba9bdc4f9e89773db8c757b95ff receitas candidatos 24 ES.txt fc52db93854ac6cc324b57ca582 receitas candidatos 24 GO.txt a6275ba4cbf99866ee4ee9988db8c7d receitas candidatos 24 MA.txt 99eeb5659d695f9ab576a399a receitas candidatos 24 MG.txt e82ece2e9d26f8cc2bee9d8e56e receitas candidatos 24 MS.txt a839bdeb249e76593ca4b5f39789 receitas candidatos 24 MT.txt e8b8f32f6f3b953c24266956c receitas candidatos 24 PA.txt 4a2f22b33f2c66ebf8977f48e4baeba6 receitas candidatos 24 PB.txt 735cf5dd4955d2f44dba36c5a4 receitas candidatos 24 PE.txt 85844593ae55a68765e38fcd9ec2 receitas candidatos 24 PI.txt 3d89349fef663985deb256dccb53 receitas candidatos 24 PR.txt 867fc85ec35af9e699bd2b6e7bfc receitas candidatos 24 RJ.txt 59f77c495af6be9db94bb5ff7f4 receitas candidatos 24 RN.txt 9bec87ec49beec5b9445ad673d3 receitas candidatos 24 RO.txt 3462f966f3d46492443a6d64c4fea2e receitas candidatos 24 RR.txt 7fcc4fafeec92a6ba67577f8ab receitas candidatos 24 RS.txt 685dd6448b333e783bc84e9e8773 receitas candidatos 24 SC.txt 89275859d3bc8bd5ced22acad8e receitas candidatos 24 SE.txt be5cba6864f97f2bfb26a7e627 receitas candidatos 24 SP.txt 4af6dec3dee88e2bc58fd2aea6c receitas candidatos 24 TO.txt 56e3bda5244fea3adc23feac434d8 receitas comites 24 BR.txt 292b8be8e66d9e888cc8c3e4a82d receitas partidos 24 BR.txt 55f37484653e96b9a99cabfc3ca2 8