Issues in Information Systems Volume 18, Issue 2, pp , 2017

Similar documents
Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

Social Network and Topic Modeling Analysis of US Political Blogosphere

arxiv: v2 [cs.si] 10 Apr 2017

Characterizing the 2016 U.S. Presidential Campaign using Twitter Data

Towards Tracking Political Sentiment through Microblog Data

Computational challenges in analyzing and moderating online social discussions

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

From Brexit to Trump: Social Media s Role in Democracy

The NRA and Gun Control ADPR 5750 Spring 2016

Nevada Poll Results Tarkanian 39%, Heller 31% (31% undecided) 31% would renominate Heller (51% want someone else, 18% undecided)

Gab: The Alt-Right Social Media Platform

The Attack of the Bots and Trolls: The Social Storms that are Destroying Public Confidence in Institutions

Twitter Topic Modeling and the 2016 Presidential Campaigns

Us and Them Adversarial Politics on Twitter

Trump Topple: Which Trump Supporters Are Disapproving of the President s Job Performance?

Ushio: Analyzing News Media and Public Trends in Twitter

Project Presentations - 1

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene

RECOMMENDED CITATION: Pew Research Center, October, 2016, Trump, Clinton supporters differ on how media should cover controversial statements

Ohio State University

POLL RESULTS. Page 1 of 6

Vote Compass Methodology

The Fourth GOP Debate: Going Beyond Mentions

POLL RESULTS. Question 1: Do you approve or disapprove of the job performance of President Donald Trump? Approve 46% Disapprove 44% Undecided 10%

Statewide Survey on Job Approval of President Donald Trump

THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015

The Battleground: Democratic Perspective September 7 th, 2016

Case 1:17-cv TCB-WSD-BBM Document 94-1 Filed 02/12/18 Page 1 of 37

5 Key Facts. About Online Discussion of Immigration in the New Trump Era

Team 1 IBM UNH

You re Fake News! The 2017 Poynter Media Trust Survey

Experiments on Data Preprocessing of Persian Blog Networks

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media

Will Tim Kaine Help Hillary Clinton Get Elected?

CSE 190 Assignment 2. Phat Huynh A Nicholas Gibson A

Political Posts on Facebook: An Examination of Voting, Perceived Intelligence, and Motivations

The Digital Battleground: The Political Pulpit to Political Profile

Topicality, Time, and Sentiment in Online News Comments

Deficiencies in the Internet Mass Media. Visualization of U.S. Election Results

Digital Democracy: The Influence of the Internet on Voting Intention

Battleground 59: A (Potentially) Wasted Opportunity for the Republican Party Republican Analysis by: Ed Goeas and Brian Nienaber

Don Me: Experimentally Reducing Partisan Incivility on Twitter

TREND REPORT: Like everything else in politics, the mood of the nation is highly polarized

A Majority of Likely Voters Approve of President Trump s Decisions.

Biases in Message Credibility and Voter Expectations EGAP Preregisration GATED until June 28, 2017 Summary.

Who s Following Trump and Clinton?

Users reading habits in online news portals

Electronic Voting For Ghana, the Way Forward. (A Case Study in Ghana)

All The President s Tweets: l. Political Rhetoric on Social Media THAD KOUSSER AND STAN OKLOBDZIJA DEPARTMENT OF POLITICAL SCIENCE, UC SAN DIEGO

Electoral forecasting with Stata

CRUZ & KASICH RUN STRONGER AGAINST CLINTON THAN TRUMP TRUMP GOP CANDIDACY COULD FLIP MISSISSIPPI FROM RED TO BLUE

Survey Overview. Survey date = September 29 October 1, Sample Size = 780 likely voters. Margin of Error = ± 3.51% Confidence level = 95%

From Sentiment Analysis to Preference Aggregation

DRA NATIONAL AUDIENCE & COALITION MODELING:

A User Modeling Pipeline for Studying Polarized Political Events in Social Media

Julie Lenggenhager. The "Ideal" Female Candidate

WHAT IS PUBLIC OPINION? PUBLIC OPINION IS THOSE ATTITUDES HELD BY A SIGNIFICANT NUMBER OF PEOPLE ON MATTERS OF GOVERNMENT AND POLITICS

STAR TRIBUNE MINNESOTA POLL. April 25-27, Presidential race

Women's Driving in Saudi Arabia Analyzing the Discussion of a Controversial Topic on Twitter

This journal is published by the American Political Science Association. All rights reserved.

2016 GOP Nominating Contest

College Voting in the 2018 Midterms: A Survey of US College Students. (Medium)

Marist College Institute for Public Opinion 2455 South Road, Poughkeepsie, NY Phone Fax

User Perception of Information Credibility of News on Twitter

More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior

Michael Bruter & Sarah Harrison Understanding the emotional act of voting

Do two parties represent the US? Clustering analysis of US public ideology survey

The 1995 EC Directive on data protection under official review feedback so far

GRADE 9: Canada: Opportunities and Challenges

THE ANALYTIC HIERARCHY PROCESS: APPLICATION TO THE ELECTION OF THE CHIEF MINISTER OF PERAK, MALAYSIA 2013

OPINION POLL ON CONSTITUTIONAL REFORM TOP LINE REPORT SOCIAL INDICATOR CENTRE FOR POLICY ALTERNATIVES

Polarisation in Political Twitter Conversations

Unequal Recovery, Labor Market Polarization, Race, and 2016 U.S. Presidential Election. Maoyong Fan and Anita Alves Pena 1

When should I use the Voting and Elections Collection?

Social Media based Analysis of Refugees in Turkey

Global Media Journal German Edition

Kansas Speaks 2015 Statewide Public Opinion Survey

Statistics, Politics, and Policy

Business Wire. At a Glance. January 13, 2015 at 9am - January 20, 2015 at 9am Page VC. 2% Positive Peak: 1 mentions on January 14th at 4pm

North Carolina Races Tighten as Election Day Approaches

Illustrating voter behavior and sentiments of registered Muslim voters in the swing states of Florida, Michigan, Ohio, Pennsylvania, and Virginia.

Demographics of News Sharing in the U.S. Twittersphere

The 2006 United States Senate Race In Pennsylvania: Santorum vs. Casey

IBM Cognos Open Mic Cognos Analytics 11 Part nd June, IBM Corporation

DU PhD in Home Science

for Bar Leaders Effective Advocacy Holly O Grady Cook Leah G. Johnson Principal Deputy Director, ABA Governmental Affairs Office

Report for the Associated Press: Illinois and Georgia Election Studies in November 2014

Columbia Undergraduate Science Journal

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info

Survey Instrument. Florida

Politcs and Policy Public Policy & Governance Review

Marcia Macaulay Editor. Populist Discourse. International Perspectives

Bias Correction by Sub-population Weighting for the 2016 United States Presidential Election

Practice Questions for Exam #2

The College of Charleston. Fall POLI American Government. Tuesday-Thursday 8 A.M. 9:15 A.M. Maybank Hall 307

Voices of Victory: A Computational Focus Group Framework for Tracking Opinion Shift in Real Time

We are here to help? Volunteering Behavior among Immigrants in Germany

1 Year into the Trump Administration: Tools for the Resistance. 11:45-1:00 & 2:40-4:00, Room 320 Nathan Phillips, Nathaniel Stinnett

Survey of Likely Voters 45 th Legislative District Senate Special Election General Election

Transcription:

IDENTIFYING TRENDING SENTIMENTS IN THE 2016 U.S. PRESIDENTIAL ELECTION: A CASE STUDY OF TWITTER ANALYTICS Sri Hari Deep Kolagani, MBA Student, California State University, Chico, skolagani@mail.csuchico.edu Arash Negahban, Ph.D., California State University, Chico, anegahban@csuchico.edu Christine Witt, Ph.D., California State University, Chico, cwitt3@csuchico.edu ABSTRACT Social media provides a platform for people to share information, exchange thoughts, and discuss their views about various topics. Sentiment analysis techniques analyze sentiments expressed by people on social media. This study contributes to the emerging research on sentiment analysis of social media content related to a certain event. The goal of this research is to analyze public sentiments associated with the candidates in the United States Presidential Election of 2016. The authors collected more than 200,000 tweets via hashtag for the two major presidential candidates, customized the dictionary based on the political context of the study, and analyzed the tweets in terms of positive and negative as well as eight types of sentiment (anger, anticipation, disgust, fear, joy, sadness, surprise, trust). The result of the study shows significant difference among the candidates in terms of joy, fear, surprise, disgust, trust, while the difference in the rest of the sentiments were not significant. We also tested the difference in the polarity of the sentiments in terms of positive and negative sentiments in general and found that there is a significant difference in positive sentiments between the candidates while the difference in negative sentiments in general was insignificant. Keywords: Sentiment Analysis, Sentiments, Social Media, and Twitter INTRODUCTION In the past decade, a vast amount of data on public opinions has been collected and analyzed. Although more data on public opinions is accessible, determining relevant information from data collected on opinions has proven to be difficult. Sentiment analysis provides an overview of favorable and unfavorable opinions on various topics and subject matter. Sentiment analysis is sometimes referred to as opinion mining. Sentiment analysis assists researchers in analyzing opinions. Bing (2010) contends sentiment analysis has tremendous value for real-time applications to data collection and analysis. Sentiment analysis provides the edge for analyzing opinions on important events such as political movements. Sentiment analysis can also provide organizations information on their completion, marketing, public relations, and risk management (Wang, Wei, Liu, Zhou, & Zhang, 2011; Ravi & Ravi, 2015). However, the interpretation of opinions can be debatable because determining the emotional tone or conjecture of text has proven to be difficult. Sentiment analysis involves identification of sentiment expressions, polarity, and strength of the expressions and their relationship to the subject. Sentiments are analyzed into categories such as positive or negative or into an n-point scale where n represents the number of sentiment category (Prabowo & Thelwall, 2009). Sentiment analysis lays the path to the computational study of people s opinions, appraisals, attitudes, and emotions. These opinions can be evaluated toward entities, individuals, issues, events, and topics. Bing and Zhang (2012) found sentiment analysis to be a useful technique despite being technically challenging. A specific challenge exists with developing a deep understanding of syntactical and semantic language rules. It can often be difficult to determine the explicit or implicit, regular and irregular that is needed for effective opinion and sentiment mining (Cambria, Schuller, Xia, & Havasi, 2013). Social media has become a substitution of offline media providing a medium for people to participate in political discussions and share political views. Opinions are shared on social media in many forms including textual posts, 80

news, images, emoticons, GIF s and videos (Hu & Huan, 2012). Twitter is a popular social media platform known for massively spreading instant messages called tweets. Twitter is a microblogging system that allows users to publish tweets of up to 140 characters in length. In the first quarter of 2017, there was an average number of 328 million monthly active Twitter users. Twitter has become a political platform where opinions are presented and exchanged (Agarwal, Xie, Vovsha, Rambow, Passonneau, 2011; Jiang, Yu, Zhou, Liu, & Zhao, 2011; Cui, Zhang, Liu, & Ma, 2011). Therefore, Twitter provides real-time access to globally expressed political opinions and sentiments of the 2016 presidential election. The researchers examined the sentiments of tweets that used certain hashtags that identify the presidential nominees Senator Hillary Clinton and Donald Trump. Twitter as a Medium to Measure Sentiments in Elections Researchers have studied the effects of social media on issues in the world s political landscape. In 2012, Sounman & Nadler completed one of the first empirical studies of social media s potential impact on the U.S. election. Their study examined the 2012 presidential candidates salience by using the number of mentions of the candidates names during the election on Twitter. Interestingly the authors found that while social media does substantially expand the possible modes and methods of election campaigning, high levels of social media activity on the part of presidential candidates have, as of yet, resulted in minimal effects on the amount of public attention they receive online (p. 455). However, additional studies have found Twitter s impact have contradicted the Sounman & Nadler (2012) study. Several researchers have used Twitter in the context of various elections, including in a geolocation-based analysis of the Indian elections (Omaima, et al., 2015) as well as for prediction of electoral results in a multi-party environment of United Kingdom elections 2015 (Burnap, et al., 2016). Previous research has yielded mixed results relating to the correlation between the tweets and the vote share (Bennet, 2016; Jansen & Koop, 2005). Other researchers have also investigated the usefulness of parts of speech to determine sentiments in the context of microblogging and found that parts-of-speech and emoticons may not be useful for microblogs such as Twitter (Efthymios et al., 2011). RESEARCH MODEL Figure 1. Implemented Research Model for Text Mining, Data Analysis and Visualization 81

Data collection from Twitter was initiated using the Twitter Application Programmable Interface (API) that requires an API key, API secret key, consumer key and consumer secret key. This initiation during data collection was achieved with R and SAP HANA Studio. The data from Twitter was requested using popular hashtags for each candidate. The data collection was completed daily from April 24, 2016 to November 28, 2016. The data contained more than 200,000 tweets including the date of creation and the Tweet ID. Figure 1 depicts the process we used in our text mining and sentiment analysis. First, we collected the Tweets with the hashtags associated with each of the candidates. Then, we cleansed and sorted the data into tables of a columnar database. Finally, we exported the tables as csv files into R and ran the sentiment analysis. The hashtags we used for each of the candidates are shown in Table 1 below. Candidate Name Donald Trump Hillary Clinton Table 1. Hashtags Used By Candidate Hashtags Used #Trump, #DonaldTrump, #Trump2016, #DonaldTrumpforPresident #Hillary2016, #HillaryClinton Dictionaries were used in this algorithm-based sentiment analysis approach to achieve consistency and accuracy. In the case of the 2016 presidential election, there were many positive terms (such as great or stronger) in the candidates campaign slogans. For this reason, the authors used a context-based custom dictionary by adopting the Stanford CoreNLP, Hu and Lu-KDD-2004 dictionaries. The dictionaries to analyze the tweets were customized to exclude specific words related to the campaigns or slogans, such as the words trump and great. We also added emoticons to the dictionaries to capture the sentiments that the Twitter users expressed via emoticons. TwitteR, Tm, Syuzhet, Ggplot2, Sentiment and stringr packages were used to perceive, process, and present the data dictionaries using R. Once the sentiments were obtained, Tableau was used as a support to visualize the results. RESULTS There was a significant difference in the sentiments of the candidates after the use of a customized context-based dictionary. The outcome, however, did not significantly change though there has been an accuracy enhancement as result of customized dictionary. The results indicate a shift towards the negative axis. Sentiment scores are given to each word within the tweet which are sourced from the pre-described dictionary of positive, negative and neutral words with respective sentiment scores. The value on the sentiment scale is hence the sum of the sentiment scores given to each word within a tweet. Here in our analysis we tried to study the overall sentiments using the aggregated sentiment scores of all the tweets in our dataset. Table 2 below graphs for each candidate reveal the sentiment of each candidate on a scale of -9 to 9, where -9 being the most negative and +9 being the most positive. 82

Sentiment Scale Table 2. Sentiment analysis of tweets for each candidate before and after dictionary customization Donald Trump Hillary Clinton Before Dictionary After Dictionary Before Dictionary After Dictionary Customization Customization Customization Customization -9 1 1 1 1-8 3 3 3 3-7 10 1 10 10-6 28 22 30 28-5 108 87 118 108-4 434 367 461 434-3 1457 1432 1578 1457-2 4687 5527 5008 4687-1 14034 15100 14936 14034 0 27264 29172 28603 27264 1 12860 10852 11364 12860 2 3575 2380 2646 3575 3 834 476 539 834 4 172 66 107 172 5 26 11 14 26 6 5 1 1 5 7 1 0 0 1 8 0 1 0 0 9 0 0 0 0 Figure 2. Emotional Analysis of Donald Trump s Tweets Figure 3. Emotional Analysis of Sen. Clinton s Tweets The above figure illustrates the emotional analysis of tweets related to each candidate. To validate the obtained sentiment analysis results, an Analysis of Variance (ANOVA) was performed using IBM SPSS. The results of the ANOVA suggest a significant difference in terms of disgust, fear, joy, surprise, trust and positive sentiments. Social media especially in this election played a crucial role in terms of exposing the people emotions. While both democrats and republicans had an equal share of fear and disgust towards the opposite candidates, it was interesting to see people expressing joy, surprise and trust carrying emotions. Ill humor, trolls, memes, biased fake news & polls and many 83

other factors can be a reason for these emotions to show significance. All of the remaining differences among the emotions were not significant. Table 3. ANOVA of extracted emotions ANOVA Table Sum of df Mean F Sig. Squares Square Between Groups.438 1.438 1.149.284 anger Within Groups 49969.319 130995.381 Total 49969.757 130996 Between Groups.351 1.351.974.324 anticipation Within Groups 47149.076 130995.360 Total 47149.427 130996 Between Groups 43.432 1 43.432 151.601.000 disgust Within Groups 37528.444 130995.286 Total 37571.875 130996 Between Groups 6.740 1 6.740 18.441.000 fear Within Groups 47879.066 130995.366 Total 47885.806 130996 Between Groups 23.226 1 23.226 76.946.000 joy Within Groups 39540.522 130995.302 Total 39563.748 130996 Between Groups.301 1.301.886.347 sadness Within Groups 44571.395 130995.340 Total 44571.696 130996 Between Groups 17212.335 1 17212.335 54694.364.000 surprise Within Groups 41224.170 130995.315 Total 58436.505 130996 Between Groups 55.342 1 55.342 108.836.000 trust Within Groups 66609.396 130995.508 Total 66664.738 130996 Between Groups.180 1.180.241.624 negative Within Groups 98164.329 130995.749 Total 98164.510 130996 Between Groups 10.962 1 10.962 14.978.000 positive Within Groups 95868.115 130995.732 Total 95879.077 130996 These sentiment analyses were compared to the Electoral College and popular vote results of the United States presidential election in 2016. The analysis of the results of the Twitter sentiments favored Senator Hillary Clinton in terms of positive sentiments. It is possible that Twitter users sentiments may indicate a correlation in how Twitter users plan to vote. A strong positive tweet for a candidate may ultimately result in a vote for that candidate. The results may also indicate that more strong positive tweets for Clinton could correlate with the popular vote outcome. In the 2016 election, Senator Clinton won the popular vote considerably with almost 2.9 million more votes. The strong positive sentiments for Senator Clinton could be influenced by the debates, controversies, interviews, and other significant events. LIMITATIONS Contextualizing the sentiment of tweets is challenging due to the limited contextual information available in a 140- character tweet. In addition, the quantity of tweets in this study are only a small sample of the total tweets sent during the timeframe of this study. The sample tweets had additional limitations because a Twitter API was used to collect the tweets. The maximum of tweets allowed to be collected per day is 140,000. 84

Limited search query terms were used to generate the sample tweets. The use of limited hashtags to query the data could have had a considerable effect on the quantity of tweets available for both candidates. This may have resulted in the loss of tweets containing positive or negative sentiments related to both major political party candidates. Demographics along with geographical mapping of the sentiments in the tweets were not considered in this research. FUTURE RESEARCH Data collected from one social media platform may limit the efficiency of the results. Future studies should extend this research to other social media. The tweets were collected randomly without considering factors like demographics or electoral geography. Future studies should examine geographical patterns of contextual information. In addition, further improving the customization of the context-based dictionaries could improve the accuracy of the results. Thus, the area of sentiment analysis has adequate future research to pursue in terms of techniques, data collection, and dictionary customization. To account for the discrepancy of the popular vote and the electoral vote in the 2016 election, researchers could assign categories to determine correlations that are more accurate to the electoral vote. CONCLUSIONS Our results indicate that people use social media platforms such as Twitter to express their positive or negative sentiments. Moreover, these sentiments may extend to the context of the general populations opinion on the 2016 presidential election events, debates, and controversies. Few studies have been conducted on the efficacy of social media sentiment analysis and the outcomes of major political elections. REFERENCES Agarwal, A., Xie, B., Vovsha, I., Rambow, O., & Passonneau, R. (2011). Sentiment analysis of twitter data. In Proceedings of the workshop on languages in social media (pp. 30-38). Association for Computational Linguistics. Bennett, S. (2016). Predicting elections with twitter: What 140 Characters Reveal about Political Sentiment. Retrieved from https://pdfs.semanticscholar.org/2888/d46d7ccfd844d0855dd90155b96ea93540a1.pdf Bing, L. (2010). Sentiment analysis and subjectivity. Handbook of Natural Language Processing (2nd ed.). Chapman and Hall. Bing, L., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis. Mining text data. 415-463 Borondo, J., Morales, A. J., Losada, J. C., & Benito, R. M. (2012). Characterizing and modeling an electoral campaign in the context of Twitter: 2011 Spanish Presidential election as a case study. Chaos, 22(2), 023138-023138-7. doi:10.1063/1.4729139 Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams, M. (2016). 140 characters to victory?: Using Twitter to predict the UK 2015 General Election. Electoral Studies, 41230-233. doi:10.1016/j.electstud.2015.11.017 Cambria, E., Schuller, B., Xia, Y., & Havasi, C. (2013). New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems, 28(2), 15-21. Cui, A., Zhang, M., Liu, Y., & Ma, S. (2011). Emotion tokens: Bridging the gap among multilingual twitter sentiment analysis. Asia Information Retrieval Symposium (238-249). Springer Berlin Heidelberg. Efthymios, K., Wilson, T., & Moore, J. D,. (2011). Twitter sentiment analysis: The good the bad and the omg! Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media. Barcelona, Spain. 85

Hu, X., & Huan, L. (2012). Text analytics in social media. In C. C. Aggarwal & C. Zhai (Eds.), Mining text data (385-414). New York: Springer. IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp. Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent twitter sentiment classification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 151-160). Association for Computational Linguistics. Omaima, A., Parack, S., & Chavan, B. (2015). Application of location-based sentiment analysis using Twitter for identifying trends towards Indian general elections 2014. Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. ACM, 2015. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1 2), 1-135. Prabowo, R., & Thelwall, M. (2009). Sentiment analysis: A combined approach. Journal of Informetrics, 3(2), 143-157. Ravi, K., & Ravi, V. (2015). A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems, 8914-8946. doi:10.1016/j.knosys.2015.06.015. Sounman, H., & Nadler, D. (2012). Which candidates do the public discuss online in an election campaign?: The use of social media by 2012 presidential candidates and its impact on candidate salience. Government Information Quarterly, 29(4), 455-461. doi:10.1016/j.giq.2012.06.004. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.r-project.org/. Wang, X., Wei, F., Liu, X., Zhou, M., Zhang, M. (2011). Proceedings of the 20th ACM International conference on Information and knowledge management. Glasgow, Scotland. 86