Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Similar documents
Twitter Topic Modeling and the 2016 Presidential Campaigns

Issues in Information Systems Volume 18, Issue 2, pp , 2017

Project Presentations - 1

THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

5 Key Facts. About Online Discussion of Immigration in the New Trump Era

Topline questionnaire

Ushio: Analyzing News Media and Public Trends in Twitter

Us and Them Adversarial Politics on Twitter

All The President s Tweets: l. Political Rhetoric on Social Media THAD KOUSSER AND STAN OKLOBDZIJA DEPARTMENT OF POLITICAL SCIENCE, UC SAN DIEGO

The NRA and Gun Control ADPR 5750 Spring 2016

Survey Instrument. Florida

THE ECHO: A FRIDAY TIPSHEET OF POLITICAL ACTIVITY ON TWITTER Thanks to the support of GSPM alumnus William H. Madway Class of 2013.

NBC News/WSJ/Marist Poll. April New York Questionnaire

Subject: Pinellas County Congressional Election Survey

For immediate release Monday, March 7 Contact: Dan Cassino ;

Characterizing the 2016 U.S. Presidential Campaign using Twitter Data

2016 NCSU N=879

Illustrating voter behavior and sentiments of registered Muslim voters in the swing states of Florida, Michigan, Ohio, Pennsylvania, and Virginia.

STAR TRIBUNE MINNESOTA POLL. April 25-27, Presidential race

HOW THE POLL WAS CONDUCTED

THE GEORGE WASHINGTON BATTLEGROUND POLL

Red Oak Strategic Presidential Poll

DRA NATIONAL AUDIENCE & COALITION MODELING:

Survey Overview. Survey date = September 29 October 1, Sample Size = 780 likely voters. Margin of Error = ± 3.51% Confidence level = 95%

THE AUTHORITY REPORT. How Audiences Find Articles, by Topic. How does the audience referral network change according to article topic?

Georgia Democratic Presidential Primary Poll 2/23/16. Fox 5 Atlanta

Who s Following Trump and Clinton?

The Cook Political Report / LSU Manship School Midterm Election Poll

Emerson Poll: With No Joe, Clinton Leads Sanders By Wide Margin. Trump Solidifies Support in GOP Field. Carson and Rubio Pull Away From Pack.

The Digital Battleground: The Political Pulpit to Political Profile

Oregon Polling. Contact: Doug Kaplan,

Data Literacy and Voting

News Consumption Patterns in American Politics

College Voting in the 2018 Midterms: A Survey of US College Students. (Medium)

1 Year into the Trump Administration: Tools for the Resistance. 11:45-1:00 & 2:40-4:00, Room 320 Nathan Phillips, Nathaniel Stinnett

VP PICKS FAVORED MORE THAN TRUMP AND CLINTON IN FAIRLEIGH DICKINSON UNIVERSITY NATIONAL POLL; RESULTS PUT CLINTON OVER TRUMP BY DOUBLE DIGITS

FAU Poll: Hispanics backing Clinton in Key Battleground States of Ohio, Colorado Nevada, North Carolina and Florida.

Marist College Institute for Public Opinion Poughkeepsie, NY Phone Fax

Pastor Views on Presidential Candidates. Survey of Protestant Pastors

Team 1 IBM UNH

Likely New Hampshire Primary Voters Attitudes Toward Social Security

JULY. Presidential Election. Chartbook

Hillary Clinton Holds Significant Lead in Democratic Presidential Race in New Hampshire

CRUZ & KASICH RUN STRONGER AGAINST CLINTON THAN TRUMP TRUMP GOP CANDIDACY COULD FLIP MISSISSIPPI FROM RED TO BLUE

Google Consumer Surveys Presidential Poll Fielded 8/18-8/19

Source institution: The Florida Southern College Center for Polling and Policy Research.

Alabama Republican Presidential Primary Poll 2/26/16. None

Trump and Sanders Have Big Leads in MetroNews West Virginia Poll

Georgia Polling. Contact: Doug Kaplan,

Likely Iowa Caucus Voters Attitudes Toward Social Security

COSC-282 Big Data Analytics. Final Exam (Fall 2015) Dec 18, 2015 Duration: 120 minutes

November 2017 Toplines

CLINTON NARROWLY LEADS TRUMP IN FLORIDA -- GOP THIRD PARTY DEFECTIONS & HISPANIC VOTERS CREATING THE CURRENT GAP

Marist College Institute for Public Opinion Poughkeepsie, NY Phone Fax

Computational challenges in analyzing and moderating online social discussions

From Brexit to Trump: Social Media s Role in Democracy

2018 Vote Margin Narrows as Democratic Engagement Slips

Nevada Poll Results Tarkanian 39%, Heller 31% (31% undecided) 31% would renominate Heller (51% want someone else, 18% undecided)

UMass Lowell/7News Daily Tracking Poll of New Hampshire Voters Release 7 of 8

Morris Central School Mock Primary Election Results April 19, 2016

Faculty Research Grant Proposal Cover Sheet DUE: November 6, 2017

2016 GOP Nominating Contest

THE WMUR GRANITE STATE POLL

Who Voted for Trump in 2016?

*Embargoed Until Monday, Nov. 7 th at 7am EST* The 2016 Election: A Lead for Clinton with One Day to Go November 2-6, 2016

Toplines. UMass Amherst/WBZ Poll of NH Likely Primary Voters

A Majority of Likely Voters Approve of President Trump s Decisions.

Conducted by the University of New Hampshire Survey Center

September 2017 Toplines

NEW JERSEY: DEM TILT IN CD07

Obama Gains Among Former Clinton Supporters

CLINTON TRUMPS TRUMP WITH MAJORITY SUPPORT IN FAIRLEIGH DICKINSON UNIVERSITY PUBLICMIND POLL, BUT VOTERS DIVIDED OVER TRUMP S LOCKER ROOM TALK

Muhlenberg College/Morning Call. Pennsylvania 15 th Congressional District Registered Voter Survey

What the 2016 Election Means to My Millennial Generation Destiny Goede

A Text-Analytic Approach to Campaign Dynamics

Civitas Institute North Carolina Statewide Poll Results November 17 19, 2018

Florida Atlantic University Poll: Clinton and Trump Poised to win Florida; Cruz and Rubio in Battle for Second.

Center for American Progress Action Fund Survey of the Florida Puerto Rican Electorate

Christian Kabbas CO 102 PR PLAN

RBS SAMPLING FOR EFFICIENT AND ACCURATE TARGETING OF TRUE VOTERS

SUMMER PROJECT AP GOVERNMENT AND POLITICS ACADEMIC YEAR

Akron Buckeye Poll: Ohio Presidential Politics. Ray C. Bliss Institute of Applied Politics University of Akron. Executive Summary

World Statistics Day Prepared by the United Nations Statistics Division

The Fourth GOP Debate: Going Beyond Mentions

Heading into the Conventions: A Tied Race July 8-12, 2016

Thinking back to the Presidential Election in 2016, do you recall if you supported ROTATE FIRST TWO, or someone else?

GenForward March 2019 Toplines

Understanding factors that influence L1-visa outcomes in US

GOV. KASICH IS NUMBER ONE IN OHIO PRESIDENTIAL RACE, QUINNIPIAC UNIVERSITY POLL FINDS; CLINTON TIES OR TRAILS ALL REPUBLICANS

POLL: CLINTON MAINTAINS BIG LEAD OVER TRUMP IN BAY STATE. As early voting nears, Democrat holds 32-point advantage in presidential race

TREND REPORT: Like everything else in politics, the mood of the nation is highly polarized

Trump Back on Top, Cruz Climbs to Second December 4-8, 2015

WBUR Poll New Hampshire 2016 General Election Survey of 501 Likely Voters Field Dates October 10-12, 2016

HIGH POINT UNIVERSITY POLL MEMO RELEASE 2/15/2018 (UPDATE)

Fake news on Twitter. Lisa Friedland, Kenny Joseph, Nir Grinberg, David Lazer Northeastern University

Simulating Electoral College Results using Ranked Choice Voting if a Strong Third Party Candidate were in the Election Race

Clinton Leads by 13% in Michigan before Last Debate (Clinton 51% - Trump 38%- Johnson 6% - Stein 2%)

NATIONAL: FAKE NEWS THREAT TO MEDIA; EDITORIAL DECISIONS, OUTSIDE ACTORS AT FAULT

Business Wire. At a Glance. January 13, 2015 at 9am - January 20, 2015 at 9am Page VC. 2% Positive Peak: 1 mentions on January 14th at 4pm

Transcription:

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump ABSTRACT Siddharth Grover, Oklahoma State University, Stillwater The United States 2016 presidential campaign has seen an unprecedented amount of media coverage, numerous presidential candidates, and acrimonious debates over wide-ranging topics from candidates of both the Republican and the Democratic Party. Twitter is a dominant social medium for people to understand, express, relate and support the policies proposed by their favorite political leaders. In this paper, we analyzed the sentiment of the tweets posted by Hillary Clinton and Donald Trump on their Twitter feeds. We also analyzed the most frequent policy related keywords used by these candidates along with the Twitter handles they most frequently mentioned in their tweets. This paper demonstrates application of SAS Text Miner and SAS Sentiment Analysis Studio to perform text mining and sentiment analysis on tweets collected during the election fever. We found out that Donald Trump was more concerned with media coverage as he frequently tweeted at and mentioned media handles and used social space to negatively talk about other presidential nominees. Trump also falls short on positive sentiment and had an overall negative sentiment. Hillary Clinton, on the other hand, used the same space to discuss some of her policies and current events. Though Clinton did not show the same ability to drive the mainstream media narrative on Twitter as Trump, she did generate an overall positive sentiment. INTRODUCTION The 2016 US Presidential election has been the nation s biggest media fest for the past several months, and the media coverage is poised to increase as we near the Election Day. In this election year, Twitter has emerged as yet another battleground. How this platform is used, how both the candidates project themselves and how they are perceived by the online community will ultimately have an influence on the voters on the Election Day. This paper follows the Twitter journey of two different presidential candidates by analyzing their views and sentiments though tweets. Text mining and sentiment analysis was performed that helped in focusing on the following key points to better lay the comparison between both the candidates. Most used phrases on Twitter Which Twitter handles candidates most tweeted at Policy key words mentioned on Twitter Comparing sentiments on Clinton and Trump s tweets DATA PREPARATION We extracted about 200,000 tweets accessing the live stream API of Twitter, using a java program mytwitterscraper which is an open source real-time Twitter scraper. The timeline for the analysis was from April 2016 to June 2016. We concentrated on @realdonaldtrump, @hillaryclinton Twitter handles and also on trending hashtags like #trump2016 and #clinton2016. We also collected information on number of followers, re-tweets and favorited tweet for both the candidates. Web Scraping (mytwitterscraper) Donald Trump Tweets (@realdonaldtrump,#trump20 16) Hillary Clinton Tweets (@hillaryclinton, #clinton2016) Figure 1. Data Preparation 1

METHODOLOGY Following text mining process flow was implemented - Figure 2. Text Mining Process Flow TEXT IMPORT The text from around 200,000 tweets was extracted and saved as text files using the Text Import node of SAS Enterprise Miner. This node converts different type of files into text files and saves them in the destination folder specified by the user. Many of the tweets were re-tweets and redundant, therefore removed to get a wide variety of topics for better analysis. TEXT PARSING Text mining was initiated by parsing the data to find tokens (terms), parts of speech tags, entities, etc. We ignored parts of speech, which filter prepositions, determinants, auxiliary verbs etc. along with numeric values and punctuation as these contains very less information. The term-by-frequency document matrix which is generated by this node was really helpful in understanding the frequency of terms in the text and number of documents those terms are in. Figure 3. SAS Text Parsing Node property settings and Terms Output TEXT FILTERING Text Filter node was used to reduce the number of terms by eliminating the terms with lowest frequencies in the documents. English dictionary was used to identify and correct the spell check errors. Filter viewer helps in viewing all the tweets containing a specific term and the ability to further drill down by creating concept links based on those terms. We used Text filtering node with Inverse Document Frequency as term weight property. The spell check option was helpful in removing redundant and incorrect terms. 2

Output 1. Text Filtering Node Spell Check output Output 2. Most Frequent Terms trough Text Filter Node CONCEPT LINKS One of the interesting functions of SAS Enterprise Miner is to create Concept Links in the Interactive Filter Viewer setting of the Text Filter Node. Concept links helps in visualizing the association between the co-occurring terms in the documents. The width of the line signifies the strength of the association between the terms. A thick line depicts strong association between the terms. We created four different concept links to understand the association between words in our data set based on frequency. We built two concept links around official Twitter accounts of both the candidates to reflect what they are talking about. We further made two more concept links around what Twitter users are talking about both the candidates. 3

Output 3. Official Hillary Clinton Twitter handle: @hillaryclinton The above concept shows the association between terms in the tweets Hillary Clinton made. There is a strong relationship among the terms status, hillaryclinton, supporter and love, which might imply she tweets about her love for her supporters. Terms like campaign, money, and voter might imply asking voters to vote or donate money for her campaign. Output 4. Concept link for #hillary2016 This concept links shows the association between terms in the tweets containing hashtag #hillary2016. There s a strong relationship between the terms iamwithher and hillary, this might be due to Hillary supporters mentioning the hashtag #iamwithher and #hillary2016 together in the same tweet. Also people tweeting about #hillary2016 also might have been Donald Trump supporters or Bernie Sanders supporters as terms like trump2016 and bernieorbust are also in there. 4

Output 5. Official Donald Trump Twitter handle: @realdonaldtrump The concept link shows the terms mentioned in the tweets by Donald Trump. There s a strong relationship between terms foxnews, cnn, trump2016 and hillaryclinton as these might be most mentioned terms in Donald s tweets. This might imply that Trump frequently tweets at media handles and also talks about terms like American, gun, potus. Output 6. Concept link for #Trump 2016 The concept link shows association between terms in the tweets containing hashtag #trump2016. There s a strong association between terms gun, realdonaldtrump, neverhillary and hillaryclinton. This might be due to the 5

Trump supporters tweeting their support for Donald Trump by mentioning the term like neverhillary and hillaryclinton. Also people tweeting about #trump2016, mentioned the terms like america, want, cnn. ANALYSIS OF MOST RECENT TWEETS We further did an analysis on the most recent original 3000 tweets from both Hillary Clinton and Donald Trump. Most Frequent Words Figure 3. Most frequent words based on recent 3000 tweets Most Mentioned Twitter Handles TRUMP % CLINTON % @CNN 16% @POTUS 36% @FoxNews 14% @billclinton 20% @foxandfriends 8% @realdonaldtrump 16% @nytimes 7% @BernieSanders 10% @JebBush 7% @HFA 4% Figure 4. Most mentioned handles in Twitter timeline 6

Top Policy-Focused Key Words TRUMP COUNT CLINTON COUNT Terror 133 Guns 250 Immigration 78 Health 150 Jobs 78 Taxes 83 Taxes 53 Immigration 80 Guns 13 Education 53 Education 13 Foreign 53 Health 8 Vets 28 Figure 5. Most frequently mentioned policy keywords SENTIMENT ANALYSIS We focused on performing sentiment analysis on tweets posted from official Twitter handles of both candidates - (@realdonaldtrump and @hillaryclinton). From the data collected for text mining, we extracted two random samples as modeling data sets with 5,000 tweets each. We further extracted two additional sets of random data sets with 2,000 tweets that would be used to test the results. We used the most recent tweets for data exploration and initial trends before diving into sentiment analytics. We built a basic Statistical model to find out the overall sentiments of the tweets. The statistical model is built from the training tweets by taking term frequencies contributing to the weights of the terms and validation data is used to finetune the model for increased accuracy. The Statistical model is able to predict the sentiment for the overall tweets but not at granular level. Feature level sentiment prediction can only be accomplished using rule-based models. The Rule-based model is more flexible and sophisticated as compared to the Statistical model. It allows to write custom rules along with the rules learned from the Statistical model. In the rule-based model to predict the sentiment of the tweets we divided the tweets into two bins - positive and negative. For this, we took a random sample of tweets from the entire dataset containing around 2,000 tweets and categorized them as either positive or negative. We only considered those tweets for modeling which were undisputedly coded as positive or negative. Finally, all of the models were scored against the test data to see how they ll hold and predict overall sentiment expressed by both candidates. MODEL TESTING The statistical model has an overall precision of 84% for Trump and 83% for Clinton whereas the rule-based model has a precision of 92% for Trump and 90% for Clinton on their respective test data sets. We ran different statistical models on the modeling data set. Smoothed Relative Frequency with Chi Square was used as criterion to build the model. The statistical model built is used to test the model accuracy on the test dataset for overall sentiments. We have used a total of 2,000 tweets of the candidates for testing the accuracy of the statistical model. 7

Statistical Model Results Negative Positive Overall Rule-Based Model Results Negative Positive Overall Trump 83% 85% 84% 94% 90% 92% Clinton 79% 87% 83% 85% 95% 90% Output 7. Model Testing Results Results: The rule-based model built for Trump shows 94% precision for positive sentiment and 90% precision for negative sentiment. Model built for Hillary shows 85% precision for positive sentiment and 95% precision for negative sentiment. Overall model precision for both the rule-based models is above 90%. SENTIMENTS ANALYSIS: TRUMP Overall sentiment associated to Trump is negative. Sentiment distribution is at 47% Negative, 27% Positive and 25% Neutral. Output 8. Trump Sentiment Analysis Model Output SENTIMENTS ANALYSIS: CLINTON Overall sentiment associated to Hillary is positive. Sentiment distribution is at 37% Positive, 35% Neutral and 28% Negative. Output 9. Clinton Sentiment Analysis Model Output 8

CONLCUSION In the world of real-time information, Twitter plays an important role in disseminating news and opinions. In order to capture the varying political views of leading presidential candidates, we performed text mining and sentiment analysis on Hillary Clinton and Donald Trump tweets in the time period from April 2016 June 2016. We created concept links to understand association between terms mentioned in the tweets by both the candidates and also the public tweets about them. We discovered that Donald Trump frequently tweeted at and mentioned media handles in his tweets and the tweets had an overall negative sentiment. Hillary Clinton on the other hand had more policy focused keywords and frequently tweeted towards the political establishment. Clinton fell short on engagement, measured through re-tweets, but generated more overall positive sentiment. REFERENCES 1. Chakraborty, Goutam, Murali Pagolu and Satish Garla. November 2013. Book Text Mining and Analysis: Practical Methods, Examples, and Case Studies Using SAS Institute. 2. SAS Institute Inc. 2014. Getting Started with SAS Text Miner 13.2. Cary, NC: SAS Institute Inc. 3. Analysis of Change in Sentiments towards Chick-fil-A after Dan Cathy s Statement about Same-Sex Marriage Using SAS Text Miner and SAS Sentiment Analysis Studio by Swati Grover, Jeffin Jacob and Goutam Chakraborty ACKNOWLEDGMENTS We thank WUSS 2016 conference committee for giving us an opportunity to present our work. We also thank Dr. Goutam Chakraborty for his continuous support and guidance. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Siddharth Grover Oklahoma State University Phone: 407-744-6809 Email: sid.grover@okstate.edu Siddharth Grover is a Master s student in Business Analytics from Oklahoma State University. He has an MBA degree in Marketing from Xavier University, India. He is working as a graduate teaching assistant at Oklahoma State University. He interned with BMO Financial Group as an AML Model Management Intern during summer 16. Earlier he worked as a Media Planner for a year at GroupM, India. He has a years experience in using SAS tools for Predictive Modeling and Data Mining. He is a Base SAS 9 certified professional, SAS Certified Advanced Programmer and SAS Certified Statistical Business Analyst. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 9