Universality of election statistics and a way to use it to detect election fraud.

Universality of election statistics and a way to use it to detect election fraud. Peter Klimek http://www.complex-systems.meduniwien.ac.at P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 1 / 53

Background: Elections in Russia P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 2 / 53

Background Elections in Russia It s not the people who vote that count; it s the people who count the votes. Joseph Stalin P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 3 / 53

Background Russian legislative election, 2011 During and after the Nov 4, 2011 Russian legislative election more than 1,100 official complaints were filed. International observers (OSCE) reported undue interference of state authorities due to a convergence of the state and the governing party, in particular the government s control over the Central Election Commission. Protests started soon after the election (with more than 15,000 people gathering at the Red Square) and peaked in May 2012 with about 20,000 people protesting in Moscow on the day before Putin s inauguration. What ignited these public upheavals? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 4 / 53

Background Ballot boxes already filled before the polling station opens? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 5 / 53

Background Very motivated voters? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 6 / 53

Background Strange equipment? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 7 / 53

A research question Purely hypothetical If practices like ballot stuffing and the re-casting of votes would have a widespread occurrence, would this leave a detectable impact on the election statistics? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 8 / 53

A statistical perspective Elections can be seen as large-scale social experiments. A country is segmented into a large number of electoral units. Each unit represents an experiment, where each citizen articulates his/her political preference through a ballot. What are the statistics of such a process? How does ballot stuffing influence these statistics? P. Klimek, Y. Yegorov, R. Hanel, S. Thurner, Statistical detection of systematic election irregularities. Proc. Natl. Acad. Sci. USA, 109, 19151-4 (2012). P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 9 / 53

A statistically educated protest rally Make a histogram of votes for a specific party over each electoral district and you get... We do not trust Churov, we trust Gauss! P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 10 / 53

Introductory election statistics P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 11 / 53

Elections as experiments Let us start with the simplest imaginable case, where each electoral unit contains the same number of people with the same distribution of preferences (we will later relax all these assumptions). An experiment is any procedure which can be infinitely repeated and has a well defined set of mutually exclusive outcomes. The set of all possible outcomes is the sample space Ω. Example I: Roll a six-sided die (experiment). The possible outcomes can be labeled 1, 2, 3, 4, 5, 6 and give the sample space Ω = {1, 2, 3, 4, 5, 6}. Example II: Ask N people if they vote for a party (experiment). The possible outcomes can be labeled 0, 1, 2,..., N and give the sample space 0, 1, 2,..., N. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 12 / 53

Elections as experiments An event A is a subset of the sample space Ω, i.e. it is a set of outcomes. Assume all outcomes are equally likely. Then the probability for event A, P(A), is defined by P(A) = A Ω, (1) (here: denotes the number of elements in the respective set). Example I: What is the probability to toss a head with a fair coin? A = {head}, and P(A) = 1 2. Example II: What is the probability to roll at least a 2 with a die? A = {1, 2}, and P(A) = 1 3. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 13 / 53

Elections as experiments An event A is a subset of the sample space Ω, i.e. it is a set of outcomes. Let n T be the number of repetitions of the random experiment. Let n A by the number of times an outcome of the event A is observed. Frequentist position: In the long run, i.e. as the number of trials approaches infinity, the relative frequency of event A, n A /n T will approach a true frequency, the probability P(A), P(A) = n A lim. (2) n T n T The probability of the impossible event, i.e. the empty set of events, is zero: P( ) = 0. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 14 / 53

Elections as experiments Random variables A random variable X is a function from the sample space Ω to the real numbers, X : Ω R. X can be interpreted as a quantity whose value depends on the outcome of an experiment. X is a discrete random variable, if it takes one out of a countable set of values. For each x in this countable set, define the probability mass function p(x) as p(x) := P(X = x). (3) Intuition: Small x is a concrete outcome of a random experiment (election result in a unit). We use capital X as a placeholder for x. We can then make general statements about the process/experiment generating x, without having to refer to the actual outcomes. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 15 / 53

Elections as experiments Random variables The expectation value of the random variable X, denoted E(X) is defined as E(X) = xp(x). (4) x:p(x)>0 Example I: Toss two coins. Let X be the number of heads, what is E(X) =? Ω = {(H, H), (H, T ), (T, H), (T, T )} E(X) = 1 4 2 + 1 4 1 + 1 4 1 + 1 4 0 = 1. Example II: What is the expectation value of rolling a six sided die? E(X) = 1 6 1 + 1 6 2 + 1 6 3 + 1 6 4 + 1 6 5 + 1 6 6 = 3.5. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 16 / 53

Elections as experiments Random variables Let X be a random variable with expectation value µ, E(X) = µ. The variance of X, denoted Var(X), is defined as Var(X) = E((X µ) 2 ). (5) which can be written as, Var(X) = (x µ) 2 p(x). (6) x:p(x)>0 Intuition: The variance Var(X) measures how much the random variable X varies from its mean over consecutive trials; it is the expectation value of the quadratic distance from the mean. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 17 / 53

Central Limit Theorem P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 18 / 53

Central limit theorem (CLT) The central limit theorem is maybe the single most theoretically striking and practically important result of probability theory. Let {X i } be random variables which are independent from each other and drawn from identical distributions, i.i.d. variables. The only thing we know from the distribution of the X i is the mean E(Xi ) = µ, the variance Var(Xi ) = σ 2 <. What can we say about the sum S n of i.i.d. variables, S n = 1 n (X 1 + X 2 + + X n )? CLT: we can say a lot!! But before we understand this, let us first understand the sum of two random variables... P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 19 / 53

Central limit theorem Rolling dice Example I: We roll two six-sided dice, a red and a blue one, with random variables X 1 and X 2. Both dice are fair, each outcome has equal probability. What is the probability mass function for the sum of the two dice, i.e. P(X 1 + X 2 = t) =? We will visualize the two probability mass functions of the dice in the following way: P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 20 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 2) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 21 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 3) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 22 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 4) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 23 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 5) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 24 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 6) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 25 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 7) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 26 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 8) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 27 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 9) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 28 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 10) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 29 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 11) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 30 / 53

Central limit theorem Rolling dice P(X 1 + X 2 = 12) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 31 / 53

Central limit theorem Rolling dice P(X 1 + X 2 + X 3 = t) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 32 / 53

Central limit theorem Rolling dice P(X 1 + X 2 + X 3 + X 4 = t) =? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 33 / 53

Central limit theorem Suppose {X 1, X 2,... } is a sequence of i.i.d. variables with E(X i ) = µ and Var(X i ) = σ 2, then i lim X i = 1 n n σ x µ e( σ ) 2 N (µ, σ 2 ), (7) 2π where N (µ, σ 2 ) is the Gauss distribution or normal distribution with mean µ and variance σ 2. So why did the protester trust Gauss? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 34 / 53

Central limit theorem Interpretation of the CLT for election results Assume that we have an infinite (or very large) number of electoral units, each having the same number of voters. Assume that in each unit the people s preferences to vote for a party would have the same expectation value and variance. CLT: the distribution of votes over electoral units must be normal! Straightforward: the same holds not only for the vote count, but also for the turnout. Note that we just constrain mean and variance of the distribution, not its shape! P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 35 / 53

Central limit theorem Faith in Gauss P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 36 / 53

Central limit theorem Faith in Gauss P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 37 / 53

Central limit theorem Faith in Gauss P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 38 / 53

Central limit theorem Putting the fun in distribution functions P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 39 / 53

Modeling election outcomes P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 40 / 53

Modeling election outcomes Assume a country is segmented into n electoral units, label them by i. We are interested in: N i... the vote eligible population in unit i. V i... the number of valid votes cast in unit i. W i... the number of votes for the winning party in unit i. v = (1/n) W i i N i... mean of votes for the winning party. σ v... variance of votes for the winning party. ā = (1/n) V i i N i... turnout (percentage). σ a... variance of turnout. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 41 / 53

Modeling election outcomes An extremely simple null-model for fair election outcomes, using {N i }, v, ā. For each unit i, take the electorate size N i from the data. Draw the model votes for unit i, v (m) i, from the normal distribution with mean and variance estimated by v, σ v. Draw the model turnout for unit i, a (m) i, from the normal distribution with mean and variance estimated by a, σ a. How well does this model describe actual empirical vote and turnout distributions? P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 42 / 53

Modeling election outcomes Comparison of election results from France to the model without fraud P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 43 / 53

Modeling election outcomes Further election results P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 44 / 53

Modeling election outcomes Further election results P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 45 / 53

Modeling election outcomes Further election results Obviously, our model is not good in explaining the Russian data... Why is there a substantial correlation between vote and turnout? Why is there a large number of districts with almost hundred percent votes for the winner and hundred percent turnout? We have to extend our model. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 46 / 53

Ballot stuffing A mechanism for electoral fraud How would ballot stuffing influence these election statistics? Assume a large number of ballots with votes for one party would be stuffed into an urn. More ballots in the urn inflated turnout. If all ballots count for the same party inflated vote numbers, always in conjunction with inflated turnout. The data also suggests a mode of extreme fraud, where all ballots are counted for only one party. Let us introduce these mechanisms in the model. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 47 / 53

Modeling election outcomes Estimating effects of ballot stuffing For each unit i, take the electorate size N i from the data. Draw the model votes for unit i, ˆv i, from the normal distribution with mean and variance estimated by v, σ v. Draw the model turnout for unit i, â i, from the normal distribution with mean and variance estimated by a, σ a. Incremental fraud: With probability f i ballots are taken away from both the non-votes and the opposition and added to the winning party s ballots. Extreme fraud: With probability f e almost all ballots from the non-voters and the opposition are added to the winning party s ballots. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 48 / 53

Modeling election outcomes Estimating effects of ballot stuffing The parameters f i and f e quantify how often incremental/extreme fraud takes place. If incremental / extreme fraud takes place, its intensities are again estimated from the data. The model is executed for each pair of (f i, f e ) values. The result for the fraud parameters is the pair which offers the highest overlap with the data (as measured by a test-statistics comparing observed and modeled vote distributions). P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 49 / 53

Modeling election outcomes Estimating effects of ballot stuffing The left-handed variance σv L estimates the normal scatter of the voters preferences. The right-handed variance σv R estimates the incremental/ballot stuffing intensity. σ x estimates the intensity of the extreme fraud mechanism. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 50 / 53

Modeling election outcomes Results Incremental fraud mechanism explains the smearing out of the main blob towards the upper right. Extreme fraud explains the peak near hundred percent vote & turnout. The data from Russia and Uganda can be better explained by the model with ballot stuffing, compared to the case without electoral fraud. In all other studied countries the fair model describes the data best. Not discussed here: Results of this method are robust with respect to the aggregation level of the data and the country size! P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 51 / 53

Modeling election outcomes All this with a relatively simple model Estimate means of vote distribution (v) and turnout distribution (a). Estimate variances: σ L/R v = (v W i /N i ) 2 Wi /N i </>v, σ a = (a V i /N i ) 2 (Vi /N i <a) (W i /N i <v), σ x = 0.075. Estimate model turnout of unit i, a (m) i v (m) i N (v, 2σv L ). N (a, σ a), and fair vote number Incremental fraud: with probability f i choose x i N (0, σ R v ). Extreme fraud: with probability f e choose x i 1 N (0, σ x). Apply correction for fraud: v (m) i N i (v (m) i a (m) i + x i (1 a (m) i Apply goodness-of-fit test to derive values for f i and f e. ) + x α (1 v (m) i i )a (m) i. P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 52 / 53

FIN FIN P. Klimek (COSY @ CeMSIIS) Election statistics 26. 2. 2013 53 / 53