Big Data, information and political campaigns: an application to the 2016 US Presidential Election

Similar documents
Team 1 IBM UNH

Nevada Poll Results Tarkanian 39%, Heller 31% (31% undecided) 31% would renominate Heller (51% want someone else, 18% undecided)

Illustrating voter behavior and sentiments of registered Muslim voters in the swing states of Florida, Michigan, Ohio, Pennsylvania, and Virginia.

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Topline questionnaire

Polling and Politics. Josh Clinton Abby and Jon Winkelried Chair Vanderbilt University

Electoral forecasting with Stata

To understand the U.S. electoral college and, more generally, American democracy, it is critical to understand that when voters go to the polls on

Bias Correction by Sub-population Weighting for the 2016 United States Presidential Election

Why We Need a Better Approach

1. A Republican edge in terms of self-described interest in the election. 2. Lower levels of self-described interest among younger and Latino

1 Year into the Trump Administration: Tools for the Resistance. 11:45-1:00 & 2:40-4:00, Room 320 Nathan Phillips, Nathaniel Stinnett

Us and Them Adversarial Politics on Twitter

Issues in Information Systems Volume 18, Issue 2, pp , 2017

Who Voted for Trump in 2016?

NABPAC 2016 Biennial Post Election Conference

ISSUES IN FOCUS ROAD TO THE APRIL 26 TH CONTESTS

Tulane University Post-Election Survey November 8-18, Executive Summary

Robert H. Prisuta, American Association of Retired Persons (AARP) 601 E Street, N.W., Washington, D.C

Ushio: Analyzing News Media and Public Trends in Twitter

Ohio State University

From Brexit to Trump: Social Media s Role in Democracy

The Cook Political Report / LSU Manship School Midterm Election Poll

5 Key Facts. About Online Discussion of Immigration in the New Trump Era

Clinton Leads by 13% in Michigan before Last Debate (Clinton 51% - Trump 38%- Johnson 6% - Stein 2%)

The 2008 DNC Presidential Nomination Process

NEWS RELEASE. Poll Shows Tight Races Obama Leads Clinton. Democratic Primary Election Vote Intention for Obama & Clinton

Changes in Party Identification among U.S. Adult Catholics in CARA Polls, % 48% 39% 41% 38% 30% 37% 31%

RECOMMENDED CITATION: Pew Research Center, October, 2016, Trump, Clinton supporters differ on how media should cover controversial statements

2016 Presidential Elections

Survey Overview. Survey date = September 29 October 1, Sample Size = 780 likely voters. Margin of Error = ± 3.51% Confidence level = 95%

President Donald Trump: how and what next?

The sustained negative mood of the country drove voter attitudes.

Mathematics of the Electoral College. Robbie Robinson Professor of Mathematics The George Washington University

Dynamic Results in Real-Time

Geek s Guide, Election 2012 by Prof. Sam Wang, Princeton University Princeton Election Consortium

VP PICKS FAVORED MORE THAN TRUMP AND CLINTON IN FAIRLEIGH DICKINSON UNIVERSITY NATIONAL POLL; RESULTS PUT CLINTON OVER TRUMP BY DOUBLE DIGITS

Conspiracist propaganda

CALTECH/MIT VOTING TECHNOLOGY PROJECT A

LESSONS LEARNED FROM THE 2016 ELECTION

Clinton Maintains 3% Lead in Michigan (Clinton 47% - Trump 44% - Johnson 4% - Stein 1%)

Key Factors That Shaped 2018 And A Brief Look Ahead

Clinton Lead Cut to 8% in Michigan (Clinton 49% - Trump 41%- Johnson 3% - Stein 1%)

Trump Topple: Which Trump Supporters Are Disapproving of the President s Job Performance?

Please note: additional data sources are referenced throughout this presentation, including national exit polls and NBC/WSJ national survey data.

What is The Probability Your Vote will Make a Difference?

Reasons That Donald Trump Was Elected (and how that s connected to our class studies):

Subject: Pinellas County Congressional Election Survey

The Case of the Disappearing Bias: A 2014 Update to the Gerrymandering or Geography Debate

Peter A. Brown, Assistant Director (203) Tim Malloy, Assistant Director (203) Rubenstein Pat Smith (212)

Quantitative Prediction of Electoral Vote for United States Presidential Election in 2016

The Path to 270 In 2016, Revisited

Supplementary Materials A: Figures for All 7 Surveys Figure S1-A: Distribution of Predicted Probabilities of Voting in Primary Elections

Possible voting reforms in the United States

Response to the Report Evaluation of Edison/Mitofsky Election System

2016 LATINO ELECTION ANALYSIS. November 30, 2016

DRA NATIONAL AUDIENCE & COALITION MODELING:

Gab: The Alt-Right Social Media Platform

NextGen Climate ran the largest independent young

ELECTION OVERVIEW. + Context: Mood of the Electorate. + Election Results: Why did it happen? + The Future: What does it mean going forward?

Red Oak Strategic Presidential Poll

All The President s Tweets: l. Political Rhetoric on Social Media THAD KOUSSER AND STAN OKLOBDZIJA DEPARTMENT OF POLITICAL SCIENCE, UC SAN DIEGO

The NRA and Gun Control ADPR 5750 Spring 2016

Project Presentations - 1

Simulating Electoral College Results using Ranked Choice Voting if a Strong Third Party Candidate were in the Election Race

Data Literacy and Voting

Consolidating Democrats The strategy that gives a governing majority

Q-study in Estonia: preliminary results. Ragne Kõuts University of Tartu

Is it Still the Economy Stupid?

Should Politicians Choose Their Voters? League of Women Voters of MI Education Fund

Chapter 6 Online Appendix. general these issues do not cause significant problems for our analysis in this chapter. One

Biases in Message Credibility and Voter Expectations EGAP Preregisration GATED until June 28, 2017 Summary.

2018 Midterm Elections

Current Pennsylvania Polling

The California Primary and Redistricting

Another Billion-Dollar Blunder?

CRUZ & KASICH RUN STRONGER AGAINST CLINTON THAN TRUMP TRUMP GOP CANDIDACY COULD FLIP MISSISSIPPI FROM RED TO BLUE

Fake news on Twitter. Lisa Friedland, Kenny Joseph, Nir Grinberg, David Lazer Northeastern University

ANNUAL SURVEY REPORT: REGIONAL OVERVIEW

Social Science Survey Data Sets in the Public Domain: Access, Quality, and Importance. David Howell The Philippines September 2014

Trump, Populism and the Economy

AP PHOTO/MATT VOLZ. Voter Trends in A Final Examination. By Rob Griffin, Ruy Teixeira, and John Halpin November 2017

Center for American Progress Action Fund Survey of the Florida Puerto Rican Electorate

Electoral College Reform: Evaluation and Policy Recommendations

Issues vs. the Horse Race

Summer Research College Projects International Relations and Political Science Summer 2017

America s Voice/LD 2016 National and Battleground State Poll (Field Dates August 19-30)

BASE ERODES AS TRUMP DROPS TO NEW LOW SCORES, QUINNIPIAC UNIVERSITY NATIONAL POLL FINDS; VOTERS SAY PERCENT PRESIDENT IS NOT HONEST

Campaigns & Elections November 6, 2017 Dr. Michael Sullivan. FEDERAL GOVERNMENT GOVT 2305 MoWe 5:30 6:50 MoWe 7 8:30

Media and State Stability Lessons Learned

FAU Poll: Hispanics backing Clinton in Key Battleground States of Ohio, Colorado Nevada, North Carolina and Florida.

Why The National Popular Vote Bill Is Not A Good Choice

The Case of the Disappearing Bias: A 2014 Update to the Gerrymandering or Geography Debate

HART RESEARCH ASSOCIATES/PUBLIC OPINION STRATEGIES Study # page 1

The Outlook for the 2010 Midterm Elections: How Large a Wave?

Emerson College Poll: Iowa Leaning For Trump 44% to 41%. Grassley, Coasting to a Blowout, Likely to Retain Senate Seat.

Social Network and Topic Modeling Analysis of US Political Blogosphere

POLL RESULTS. Question 1: Do you approve or disapprove of the job performance of President Donald Trump? Approve 46% Disapprove 44% Undecided 10%

ELECTION UPDATE Tom Davis

Google Consumer Surveys Presidential Poll Fielded 8/18-8/19

Transcription:

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

Presentation largely based on Politics and Big Data: Nowcasting and Forecasting Elections with Social Media, 2017

Preamble Big Data are those labeled, for strange reasons, with the capitalized Big. Nevertheless, they are still Data (with also a Capital letter!) Therefore good statistical techniques are required in order to extract meaningful results from such sources

What Big Data are not Big Data are not just a data collection with a very large-n That is, a very large survey of citizen participation crossnationally is not, strictly speaking, Big Data

What Big Data are : 3 main attributes (at the same time!) volume - data exceed the capacity of traditional computing methods to store and process them frequency - data come from streams or complex event processing, i.e., size per unit of time matters unpredictability - data come in the many different forms, they are raw, messy, unstructured, not ready for processing, and so on Their origin: administrative data; transaction data; social media data Let s focus on the latter ones

Why studying Big Data? To make a long answer short Always more (and more) data are available out there in time and space! Can we really ignore this?

Big Data Analytics

The main approaches (with a specific emphasis on electoral campaigns) Main approach Computational Sentiment Analysis (SA) Supervised Aggregated Sentiment Analysis (SASA) Sub-approaches Volume data Endorsement data Automated Sentiment Analysis (ontological dictionaries) Machine Learning ReadMe (Hopkins & King 2010) isa (Ceron, Curini & Iacus 2016)

Computational approaches Purely Quantitative Endorsement data: counting #followers, #likes Volume data: counting the # of mentions related to a party or candidate or the occurrence of particular hashtags (such as the party name) etc. More followers, likes, mentions, more votes! Limits: Endorsement/volume data measure the degree of public attention or awareness around each candidate/party. Anything more? So add some sentiment to that!

Sentiment analysis Analyzing the stance of the comments: positive, negative or neutral to infer Usually, more positive, more votes! But also other approaches: How to map tweets into votes for the US Presidential Race 2012 (Ceron, Curini & Iacus 2014): a) the tweet includes an explicit statement related to the intention to vote for a candidate/party b) the tweet includes a statement in favor of a candidate/party together with an hastag connected to the electoral campaign of that candidate/party c) the tweet includes a negative statement opposing a candidate/party together with an hastag connected to the electoral campaign of another candidate/party Also retweets that satisfy any of the previous conditions

Sentiment analysis: ML vs. SASA Notice that in social science as well as in electoral studies, what matters in forecasting attempts is the aggregated distribution of opinion or share of votes rather than the individual opinion or vote behaviour Estimating a good aggregate measure with the lowest possible error is what is relevant here! Therefore, SASA approaches better.?

SM advantages with respect to political campaigning & elections You listen, you do not ask! Less affected by the social desirability bias that often plagues survey on hot topics (i.e., racism, sympathy toward terrorism more on this, this afternoon!)? But also Brexit, the Shy-Tory ( Shy-Trump?) effect

SM advantages with respect to political campaigning & elections A geo-localized analysis is possible as well as a real-time analysis of the electoral campaigning (i.e., which is the impact of a TV debate on the popularity of candidates?) Through that it becomes possible to capture (and often anticipate) sudden change in public opinion (so called momentum ): nowcasting the present! Let s see an example based on the US 2016 Presidential Campaign

US Presidential Race 2016: the Debates US Presidential Debates: First One???? #Debates2016 minute by minute Pro TRUMP Pro HILLARY 62% Trade Race Women 70 60 59.5% 50 27% 30 5 15 25 35 45 55 65 75 85 minute 40 40.5%

US Presidential Race 2016: First Debate

US Presidential Race 2016: the Debates US Presidential Debates: Second One???? Second #Debates2016 by minute Pro TRUMP Pro HILLARY 57% Video Email WikiLeaks Pence Minorities Deplorables 1 51.5%.8.6 34% 0 5 15 25 35 45 55 65 75 85 Minute.4.2 48.5%

US Presidential Race 2016: Second Debate

US Presidential Race 2016: the Debates US Presidential Debates: Third One???? 52% 50.5% 39% 49.5%

Limits of Social Media data The real profiles behind social media accounts are not known in most cases The population on Social Media is (can be?) a biased sample from the demographics population (and in the surveys?) The population of Social Media under observation, changes according to the topic Social media are not the same everywhere (no FB but VK in RUSSIA, no Twitter but Sina Weibo in China, etc) (possible solutions to some of these issues exist)

Beyond nowcasting Can we also forecast the electoral final result? This is actually quite fascinating because to validate the predictive accuracy of a model we need to have an independent measure of the observed outcome that the model is trying to predict In this respect forecasting an election is one of the few exercises on collective social events where an independent measure of the outcome that you want to try to predict is clearly available, i.e., the vote-share of candidates (and/or parties) at the ballots

A meta-analysis 239 electoral forecasts related to 94 different elections, held between 2007 and 2015 in 22 countries, covering all the five continents (Japan included!) Our DV: the MAE of each social-media based forecast (we focused just on vote-share, not seatshare!) Within our sample, the average value of MAE is 7.39 MAE of electoral surveys for same elections: 2.22 Note, however, that the variance of MAE within the social-media based forecasts: s.d 6.65, i.e., some socialmedia forecasts were as good as (if not better than) surveys

A meta-analysis Our research question when social media analysis is able to provide accurate forecast and when not (method, context, other elements prompting the coherence between online opinions and offline behavior, etc.)

How to map posts/tweets into votes? Computational approach: more volume (more discussion), more votes! Sentiment approach: more positive posts, more votes! SASA approach: more positive posts (at the aggregate level), more votes!

Which factors matter? The method through which you extract information from social media is crucial! SASA method decreases the MAE by 3.4 points if compared to forecasts based on a mere computational approach and by 2.2 points if compared to other SA techniques, which are not more effective than computational methods in improving the accuracy of the prediction (and isa beats ReadMe!)

Which factors matter? Institutions do matter! When elections are held under PR, the MAE decreases by a remarkable 2.13 points if compared to plurality (on-line sincere vs. off-line strategic vote effect?) Volume also matters! Having more information on citizens preferences decreases the error, though only when the turnout rate is sufficiently high, i.e. when we can expect to observe an actual behavior that is somehow consistent with the declared one

Which factors matter? Other Prediction s Attributes? No Academic vs. Non Academic impact No time effect (whether prediction was made exante/ex-post) impact No per-user comment impact

Which factors matter? Other Prediction s Attributes? Weighting the predicted share of votes according to the socio-demographic features of the users has a negligible impact. How is possible? Weighting procedure limits? No socio-demographic representativeness; but political? And in terms of topic coverage?

Which factors matter? Finally, taking into account polls when doing social media electoral forecasts reduces the MAE considerably let s go back to the US 2016 Presidential Campaign

U.S. 2016 Presidential Elections

The Method we employed Three steps process First: we focused on all posts coming from U.S., written in English (no Spanish or other languages), explicitly mentioning on of the two main candidates via Twitter API. Between 3/5M tweets on a daily-basis

The Method we employed Three steps process To forecast the nationwide popular vote via isa, we counted only tweets coming from the United States To estimate the state-by-state results, we analyzed tweets geolocated in each state, using the geolocation information metadata attached to each tweet That means that while we were able to effectively monitor the Twitter discussion about the campaign nationally, we could not be as precise for individual states, because only a fraction of tweets (2 to 5 percent) give location data in the US case

The Method we employed Three steps process Second: econometric calibration: from 19 th of September till the 2 nd of October, we run an econometric model to explain the survey data at the national level in terms of our sentiment analysis estimate In particular, we considered only the negative sentiment against Donald Trump and the online voting intentions expressed in support of Hillary Clinton Together these two variables are able to explain.97% of variance in the polls avarage taken from Real Clear Politics

The Method we employed Three steps process Third: since the 3 rd of October we relied on socialmedia data only (i.e., the two previous variables weighted according to the analysis just illustrated), to produce our final estimates of the electoral forecast Final results? Quite good at the national scale %: +1.2% Hillary (actual restul: +2.1%)

How (not) to analyze Social Media Data

The Method we employed But what about the outcome of the election?

Big Data vs. the rest

The Method we employed Ohio, Florida, Nevada and Colorado were basically not in competition (contrary to Statesurveys). Even in mid October, the first two ranked consistently for Trump, and the latter two for Clinton Pennsylvania race was much closer than expected, with predictions moving back and forth, favoring one candidate and then the other (Trump up between the 3 rd and 6 th of November. At the end Clinton at 51.5%). Same for Michigan (Clinton: 50.5%) The Trump rise in the other Midwest States (Wisconsin or Iowa ) couldn t have been predicted via Twitter

The Method we employed The inaccurate results are probably affected by the limits of the geolocated data and the fact that, instead of calibrating the social media results with state specific surveys, we relied on the national data. This was not a deliberate choice; we did so because no state polls were available every day, while national ones were

Conclusion Public opinion has profoundly changed. And the way to measure it must change as well there is no longer the option to avoid listening to social media information or, even worst, considering it as merely noise But beware of the challenges!

Conclusion To err is human, to really mess things up requires a computer

Conclusion For example: behind a ML algorithm there is no any explicative model!

Conclusion Induction is not so much wrong as impossible Without a theoretical understanding of the world, how would we even know what to describe? This remains true also in a BIG DATA world!

Conclusion Telescope! Big data has the power to transform and expand the universe of answerable social science questions but we need new questions now!

Two final sentences Big Data are simply today s data to (better) understand our world More Data is better than less