Quantitative Prediction of Electoral Vote for United States Presidential Election in 2016

Similar documents
Red Oak Strategic Presidential Poll

Supplementary Materials A: Figures for All 7 Surveys Figure S1-A: Distribution of Predicted Probabilities of Voting in Primary Elections

What is The Probability Your Vote will Make a Difference?

Proposal for the 2016 ANES Time Series. Quantitative Predictions of State and National Election Outcomes

The Job of President and the Jobs Model Forecast: Obama for '08?

Lab 3: Logistic regression models

Google Consumer Surveys Presidential Poll Fielded 8/18-8/19

NEWS RELEASE. Poll Shows Tight Races Obama Leads Clinton. Democratic Primary Election Vote Intention for Obama & Clinton

Changes in Party Identification among U.S. Adult Catholics in CARA Polls, % 48% 39% 41% 38% 30% 37% 31%

Bias Correction by Sub-population Weighting for the 2016 United States Presidential Election

Response to the Report Evaluation of Edison/Mitofsky Election System

Case 1:17-cv TCB-WSD-BBM Document 94-1 Filed 02/12/18 Page 1 of 37

The Keys to the White House: Updated Forecast for 2008

Latinos at the Ballot Box (For use with Episodes 3, 4, 5, 6)

SHOULD THE DEMOCRATS MOVE TO THE LEFT ON ECONOMIC POLICY? By Andrew Gelman and Cexun Jeffrey Cai Columbia University

Should the Democrats move to the left on economic policy?

Predicting Elections from the Most Important Issue: A Test of the Take-the-Best Heuristic

Exposing Media Election Myths

Why We Need a Better Approach

Elections and Voting Behavior

In the Margins Political Victory in the Context of Technology Error, Residual Votes, and Incident Reports in 2004

Polling and Politics. Josh Clinton Abby and Jon Winkelried Chair Vanderbilt University

Campaigns and Elections

TUESDAY, MARCH 22, 2016 ELECTORAL COLLEGE VOTES: 11

November 9, By Jonathan Trichter Director, Pace Poll & Chris Paige Assistant Director, Pace Poll

American public has much to learn about presidential candidates issue positions, National Annenberg Election Survey shows

2016 Presidential Election Chartbook

From Straw Polls to Scientific Sampling: The Evolution of Opinion Polling

THE 2008 ELECTION: 1 DAY TO GO October 31 November 2, 2008

Who Would Have Won Florida If the Recount Had Finished? 1

Who Voted for Trump in 2016?

CHAPTER 11 PUBLIC OPINION AND POLITICAL SOCIALIZATION. Narrative Lecture Outline

Santorum loses ground. Romney has reclaimed Michigan by 7.91 points after the CNN debate.

Organizing On Shifting Terrain. Understanding the underlying shifts that are shaping polarization and realignment during the 2016 election

SCATTERGRAMS: ANSWERS AND DISCUSSION

State of the Facts 2018

Indicate the answer choice that best completes the statement or answers the question.

Voters Divided Over Who Will Win Second Debate

Emerson College Poll: Iowa Leaning For Trump 44% to 41%. Grassley, Coasting to a Blowout, Likely to Retain Senate Seat.

STATISTICAL GRAPHICS FOR VISUALIZING DATA

Electoral Surprise and the Midterm Loss in US Congressional Elections

The Cook Political Report / LSU Manship School Midterm Election Poll

Obama s Support is Broadly Based; McCain Now -10 on the Economy

A Record Shortfall in Personal Popularity Challenges Romney in the Race Ahead

Predicting How U.S. Counties will Vote in Presidential Elections Through Analysis of Socio- Economic Factors, Voting Heuristics, and Party Platforms

Forecasting Elections: Voter Intentions versus Expectations *

UNDERSTANDING TAIWAN INDEPENDENCE AND ITS POLICY IMPLICATIONS

The RAND 2016 Presidential Election Panel Survey (PEPS) Michael Pollard, Joshua Mendelsohn, Alerk Amin

- Notice that each candidate after 5% (10k votes) has a zero slope horizontal curve.

Why The National Popular Vote Bill Is Not A Good Choice

News English.com Ready-to-use ESL / EFL Lessons

YouGov Results in 2010 U.S. Elections

OHIO: CLINTON HOLDS SMALL EDGE; PORTMAN LEADS FOR SENATE

The Electoral College

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

2016 Presidential Elections

Running head: PARTISAN PROCESSING OF POLLING STATISTICS 1

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Presidential Candidate Images (Communication, Media, And Politics)

What Happened on Election Day

Deep Learning and Visualization of Election Data

Election Night Results Guide

LESSONS LEARNED FROM THE 2016 ELECTION

VOTING MACHINES AND THE UNDERESTIMATE OF THE BUSH VOTE

The Election What is the function of the electoral college today? What are the flaws in the electoral college?

America s Voice: Immigration Presented by Benenson Strategy Group and Lake Research Partners February 19, 2008

Subject: Pinellas County Congressional Election Survey

The Case of the Disappearing Bias: A 2014 Update to the Gerrymandering or Geography Debate

THE PRESIDENTIAL RACE HEADING INTO THE FIRST DEBATE September 21-24, 2008

VoteCastr methodology

Presidents and The US Economy: An Econometric Exploration. Working Paper July 2014

The 2011 Debt Ceiling Crisis & the 2012 House Elections

Math of Election APPORTIONMENT

A Vote Equation and the 2004 Election

An open primary 2. A semi-open primary

Working Paper: The Effect of Electronic Voting Machines on Change in Support for Bush in the 2004 Florida Elections

IV. Labour Market Institutions and Wage Inequality

answers to some of the sample exercises : Public Choice

More State s Apportionment Allocations Impacted by New Census Estimates; New Twist in Supreme Court Case

CLINTON TRUMPS TRUMP WITH MAJORITY SUPPORT IN FAIRLEIGH DICKINSON UNIVERSITY PUBLICMIND POLL, BUT VOTERS DIVIDED OVER TRUMP S LOCKER ROOM TALK

Please note: additional data sources are referenced throughout this presentation, including national exit polls and NBC/WSJ national survey data.

Union Voters and Democrats

Friends of Democracy Corps and Greenberg Quinlan Rosner Research. Stan Greenberg and James Carville, Democracy Corps

Who Really Voted for Obama in 2008 and 2012?

Election 2008 Exit Poll David Redlawsk Associate Professor of Political Science University of Iowa

Some Change in Apportionment Allocations With New 2017 Census Estimates; But Greater Change Likely by 2020

A Dead Heat and the Electoral College

The University of Akron Bliss Institute Poll: Baseline for the 2018 Election. Ray C. Bliss Institute of Applied Politics University of Akron

Who is registered to vote in Illinois?

NBER WORKING PAPER SERIES THE EMPIRICAL FREQUENCY OF A PIVOTAL VOTE. Casey B. Mulligan Charles G. Hunter

The Case of the Disappearing Bias: A 2014 Update to the Gerrymandering or Geography Debate

US Count Votes. Study of the 2004 Presidential Election Exit Poll Discrepancies

A positive correlation between turnout and plurality does not refute the rational voter model

RECOMMENDED CITATION: Pew Research Center, July, 2016, 2016 Campaign: Strong Interest, Widespread Dissatisfaction

Forecasting the 2012 U.S. Presidential Election: Should we Have Known Obama Would Win All Along?

Introduction to the declination function for gerrymanders

Trump, Populism and the Economy

by Casey B. Mulligan and Charles G. Hunter University of Chicago September 2000

Topline questionnaire

British Election Leaflet Project - Data overview

Transcription:

Quantitative Prediction of Electoral Vote for United States Presidential Election in 2016 Gang Xu Senior Research Scientist in Machine Learning Houston, Texas (prepared on November 07, 2016) Abstract In this paper I am reporting the quantitative prediction of the electoral vote for United States presidential election in 2016. This quantitative prediction was based on the Google Trends (GT) data that is publicly available on the internet. A simple heuristic statistical model is applied to analyzing the GT data. This is intended to be an experiment for exploring the plausible dependency between the GT data and the electoral vote result of US presidential elections. The model's performance has also been tested by comparing the predicted results and the actual electoral votes in 2004, 2008 and 2012. For the year 2016, the Google Trends data projects that Mr. Trump will win the white house in landslide. This paper serves as a document to put this exploratory experiment in real test, since the actual election result can be compared to the prediction after tomorrow (November 8, 2016). Introduction Tomorrow, November 8 of 2016, shall be the election day for American people whose votes will determine the next US president. There have been several forecasting reports available on the internet. It is beyond the scope of this report to review all these work in details. Lichtman had developed a pattern recognition method in early 1980s and has been able to correctly predict the past 30 years of presidential outcomes using this method [1]. Based on his method, Lichtman predicted that Trump is headed for a win in 2016 [2]. However his method cannot give the quantitative prediction of the electoral vote. Silver has maintained a web site for the presidency forecasting [3]. As to the ET 6:15 PM, November 7, 2016, his model gave the results as shown in Fig 1, which predicted that Clinton would win the election with electoral vote of 299. Pepper analyzed the Google Trends (GT) data for the keywords of candidate's last name with word sign. He discovered an interesting dependency pattern between the election result and the Google interest scores [3] since 2004. The work in this paper was inspired by the Pepper's analysis, since a curious question is raised as whether this dependency could be used to quantitatively predicting the electoral vote.

at ET 6:15 PM, November 7, 2016

Data Description The GT data with the same type of keywords as in Pepper's analysis were used in the current work. The keywords are the candidate's last name combined with the word sign. The GT data for the two candidates were downloaded (in CSV format). The original CSV data were slightly reformatted for the further processing and analyzing. The restricting conditions for the GT data search are: United States (region) 2004 present (time) All categories (type) Web search (search) Even though all the GT data from 2004 to present were downloaded, only the data of 3-year window up to the October of the election year were used for predicting the electoral vote. As an illustrative example, the GT data for predicting the 2008 US presidential election are shown in Fig 2 and Tab 1. Figure 2. Google Trend Data for Predicting the US Electoral Vote in 2008

Table 1. Google Trend Data for Predicting the US Electoral Vote in 2008 Year Month Obama-sign McCain-sign 2005 10 1 1 2005 11 1 1 2005 12 0 0 2006 1 0 0 2006 2 0 0 2006 3 0 0 2006 4 0 1 2006 5 0 1 2006 6 0 0 2006 7 0 0 2006 8 0 0 2006 9 0 1 2006 10 0 0 2006 11 0 0 2006 12 0 0 2007 1 1 0 2007 2 2 1 2007 3 1 0 2007 4 0 0 2007 5 0 0 2007 6 0 0 2007 7 2 0 2007 8 0 0 2007 9 1 0 2007 10 1 0 2007 11 1 0 2007 12 1 0 2008 1 5 1 2008 2 11 2 2008 3 7 2 2008 4 6 2 2008 5 8 2 2008 6 13 4 2008 7 9 3 2008 8 32 10 2008 9 51 36 2008 10 100 58

A Heuristic Theory and Statistical Model To explore the quantitative relation between the above GT data and the actual electoral vote, a heuristic model is applied. The essential idea can be shown by a simple estimation given as follows. Taking the data in Tab 1, summing the interest scores in columns of Obama-sign and McCain-sign gives overall scores 254 and 126, respectively. The fractional ratio of the overall interest score for the Obama-sign is calculated as 254/(254+126) ~ 66.8%. The fractional ratio of the overall interest score for the McCain-sign is calculated as 126/(254+126) ~ 33.2%. These two results can be compared to the actual electoral vote in 2008: Obama: 365 / 538 ~ 67.8% McCain: 173 / 538 ~ 32.2% With the theoretical assumption that the above numerical matching is not by coincidence, instead the electoral vote be statistically correlated with the GT interest scores for the presidential candidates, we may therefore develop a statistical model to quantitatively predict the electoral vote. A few technical details for the heuristic statistical model is briefly summarize as follows: The ensemble of models, based on the bootstrapping approach, is adopted to account for the statistical uncertainty. A deterministic bootstrapping procedure is applied. The ensemble set of bootstrap samples is given by {X t t min t t max, X t = {d t, d t+1, d max }}, where d t represents a single data point of the GT data at time t (a certain month). The bootstrapping procedure is designed to give the higher weight to the GT data sample that are closer to the election month (November of the election year). The histogram of bootstrap samples is smoothed by a radial-basis kernel density model, so the MAP-like (maximum a posteriori probability) estimate, as well as the mean estimate, can be obtained Comparison of Predictions with Electoral Vote in 2004, 2008 and 2012 The prediction performance has been evaluated using the historical ballot results in 2004 (Tab 2 and Fig 3), 2008 (Tab 3 and Fig 4) and 2012 (Tab 4 and Fig 5). Table 2. Comparison of Prediction with Electoral Vote in 2004 MAP Est. Mean Est. Actual Electoral Vote Kerry 245 246 251 Bush 292 291 286 Table 3. Comparison of Prediction with Electoral Vote in 2008 MAP Est. Mean Est. Actual Electoral Vote Obama 359 353 365 McCain 179 185 173

Table 4. Comparison of Prediction with Electoral Vote in 2012 MAP Est. Mean Est. Actual Electoral Vote Obama 343 341 332 Romney 195 197 206 Figure 3. The Predicted Distribution of Electoral Vote (solid curves and histograms) versus The Actual Electoral Vote (vertical lines) in 2004 Figure 4. The Predicted Distribution of Electoral Vote (solid curves and histograms) versus The Actual Electoral Vote (vertical lines) in 2008

Figure 5. The Predicted Distribution of Electoral Vote (solid curves and histograms) versus The Actual Electoral Vote (vertical lines) in 2012 Prediction of Presidential Electoral Vote in 2016 The quantitative prediction and estimation of presidential electoral vote in 2016 have been shown in Tab 5 and Fig 6. It is seen that the distributions of electoral vote for Clinton and Trump are well separated in two different regions (124 163 for Clinton, 375 414 for Trump). It therefore indicates Trump will win the 2016 US presidential election in landslide (with 70% - 77% of total electoral vote). This model prediction is subject to the falsification by the actual ballot result. Discussions Using the Google Trends data for quantitatively predicting the electoral vote appears to be an appropriate approach. It seems to well correlate the voter's sentiment and interest over the US nation with the actual electoral vote, based on 3 historical ballot results for the US presidential election. In the current exploratory experiment, I have applied a simple statistical model to some wellselected Google Trends data. In future this work could be extended to several directions. (1) to rigorously assess whether the speculated correlations exist or not; (2) to understand why and how such quantitative correlations could be established, if ever existed; (3) to improve the prediction accuracy by developing the better models, or the better selected data (e.g., Google Trends combined with other data sources)

Table 5. MAP Estimates, Mean Estimates, and Estimated Vote Ranges for Clinton and Trump (2016) MAP Est. Mean Est. Estimated Vote Range Clinton 147 143 124 163 Trump 391 395 375 414 Figure 6. The Predicted Distribution of Electoral Vote (solid curves and histograms) for Clinton and Trump in 2016 References 1. A. J. Lichtman and V. I. Keilis-Borok, Pattern recognition applied to presidential elections in the United States, 1860-1980: Role of integral social, economic, and political traits, Proc. Natl. Acad. Sci. USA, Vol. 78, No. 11, pp. 7230-7234 (1981) 2. https://www.washingtonpost.com/news/the-fix/wp/2016/09/23/trump-is-headed-for-a-win-saysprofessor-whos-predicted-30-years-of-presidential-outcomes-correctly/ 3. http://projects.fivethirtyeight.com/2016-election-forecast/?ex_cid=rrpromo 4. Ethan Pepper, Google Trends Indicate Trump Landslide, August 30, 2016, http://regated.com/2016/08/googles-trends-indicate-trump-landslide/