Media coverage in times of political crisis: a text mining approach

Similar documents
Belgium: Far beyond second order

The decision to extend the operational life of two nuclear power plants in Belgium: the opt-out on the phase-out?

Parliamentary Elections in Belgium

RESEARCH NOTE. How a New Issue Becomes an Owned Issue. Media Coverage and the Financial Crisis in Belgium ( )

University of Groningen. Individualism, nationalism, ethnocentrism and authoritarianism Toharudin, Toni

Who is leading the campaign charts? Comparing individual popularity on old and new media

Topicality, Time, and Sentiment in Online News Comments

The smell of crisis in political style: three Belgian cases

Should I stay or should I go? An experimental study on voter responses to pre-electoral coalitions

Do two parties represent the US? Clustering analysis of US public ideology survey

The Role of the News Media in the Shaping of Issue Ownership The emergence of the financial crisis" as a new political issue

Vote Compass Methodology

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene

Political participation of ethnic minorities in Belgium: From enfranchisement to ethnic vote

The Vulnerable Institutional Complexity The 2004 Regional Elections in Brussels

METHODOLOGY. Unique Scientific research method. Based on Agenda Setting. Analysis of 4,500 articles in the Belgian press ( )

The electoral threshold in the Belgian regional elections: the best way to fight fragmentation?

Contamination or Containment? Sub-state nationalism in Belgian political parties electoral manifestoes ( )

Cover Page. The handle holds various files of this Leiden University dissertation.

Introduction: Data & measurement

Pierre Baudewyns a, Régis Dandoy a & Min Reuchamps a a Université catholique de Louvain, Louvain-la-Neuve,

Subjectivity Classification

Pioneers in Mining Electronic News for Research

Stefanie Beyens, Vrije Universiteit Brussel Tom Verthé, Vrije Universiteit Brussel

Approaches to Analysing Politics Variables & graphs

City Crime Rankings

Why won t they join?

SDG 16 - Peace, justice and strong institutions (statistical annex)

Flanders is often cited as one of the European regions which is

Intra-party democracy in Belgium: On paper, in practice and through the eyes of the members

Improving the accuracy of outbound tourism statistics with mobile positioning data

The Pupitre System: A Desk News System for the Parliamentary Meeting Rooms. By Luis Armando Gonzalez, CIO at Library of the National Congress of Chile

Essential Questions Content Skills Assessments Standards/PIs. Identify prime and composite numbers, GCF, and prime factorization.

Polimetrics. Mass & Expert Surveys

SUMMARY REPORT: ANALYSIS OF THE QUESTIONNAIRES dr. E. Corn, G.F. Perilongo

Polimetrics. Lecture 2 The Comparative Manifesto Project

REPORT # Legislative Elections: An Analysis of Clean Election Participation and Outcomes

Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis

Party Polarization: A Longitudinal Analysis of the Gender Gap in Candidate Preference

The Pupitre System: A desk news system for the Parliamentary Meeting rooms

Why do parties choose some issue frames over others? A model of party issue framing.

Content Analysis of Network TV News Coverage

Strategic Communication Programme GENERATION TRENDS. Central Europe: Mosaic of Perspectives.

Intersections of political and economic relations: a network study

The Effects of Immigrant s Voting Rights: Evidence from a Natural Experiment. Simona Fiore

Perceptions of Corruption in Mass Publics

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Subreddit Recommendations within Reddit Communities

Analysing Public Science Debates through Blogs and Online News Sources

DU PhD in Home Science

Tariffs and Tariff Comparison

Georg Lutz, Nicolas Pekari, Marina Shkapina. CSES Module 5 pre-test report, Switzerland

Scheduling a meeting.

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media

The National Citizen Survey

WW I Awareness Research. Visit Flanders Research Department

2004 EUROPEAN PARLIAMENT ELECTION BRIEFING NO 15 THE EUROPEAN PARLIAMENT ELECTION IN BELGIUM JUNE

Benchmarks for text analysis: A response to Budge and Pennings

DISPROPORTIONATE MINORITY CONTACT

CHAPTER 5 SOCIAL INCLUSION LEVEL

How the Public, News Sources, and Journalists Think about News in Three Communities

SUMMARY OF THE TNI CHEMISTRY EXPERT COMMITTEE MEETING JUNE 14, The Committee held a conference call on Friday, June 14, 2013, at 2:00 pm EDT.

Critiques on Mining and Local Corruption in Africa

Quantifying and comparing web news portals article salience using the VoxPopuli tool

CENTER FOR URBAN POLICY AND THE ENVIRONMENT MAY 2007

Election Night Results Guide

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

CSES Module 5 Pretest Report: Greece. August 31, 2016

The legitimation of funding decisions in higher education: the role of policy framing

THE WORKMEN S CIRCLE SURVEY OF AMERICAN JEWS. Jews, Economic Justice & the Vote in Steven M. Cohen and Samuel Abrams

EXTRACTING POLICY POSITIONS FROM POLITICAL TEXTS USING WORDS AS DATA. Michael Laver, Kenneth Benoit, and John Garry * Trinity College Dublin

Identifying Factors in Congressional Bill Success

LABOUR-MARKET INTEGRATION OF IMMIGRANTS IN OECD-COUNTRIES: WHAT EXPLANATIONS FIT THE DATA?

The Extreme Right in Belgium and France. The Extreme Right in Western Europe

THE LOUISIANA SURVEY 2018

State of the World by United Nations Indicators. Audrey Matthews, Elizabeth Curtis, Wes Biddle, Valery Bonar

TAIWAN. CSES Module 5 Pretest Report: August 31, Table of Contents

Democracy as threat for populism M.M.A.C. van Ostaijen MSc MA

10 European integration, consensus politics and family migration policy in Belgium and the Netherlands

Immigrant Integration Policies in Belgium: Three-Levels Governance and the shrinking Role of the Federal State. Marco Martiniello

PARLIAMENTARY STUDIES PAPER 11

Discovering the signs of Dutch disease in Russia Mironov, Petronevich 2013 National Research University Higher School of Economics Institute

Mapping Policy Preferences with Uncertainty: Measuring and Correcting Error in Comparative Manifesto Project Estimates *

Does Belgium (Still) Exist? Differences in Political Culture between Flemings and Walloons

No Adults Allowed! Unsupervised Learning Applied to Gerrymandered School Districts

Liberal Revival Stalled Despite New Leader

Vote Au Pluriel: How People Vote When Offered to Vote Under Different Rules? Karine Van der Straeten (Toulouse School of Economoics, France),

EXTRACTING POLICY POSITIONS FROM POLITICAL TEXTS USING WORDS AS DATA * January 21, 2003

Party Loyalty in Belgium and Germany: The Eects Of Consumer Condence. and Government Approval

Growth and Poverty Reduction: An Empirical Analysis Nanak Kakwani

Discussion Paper Center for Research in Economic Analysis. Pillars and electoral behavior in Belgium: The neighborhood effect revisited

MEDIVA DIVERSITY INDICATORS Assessing the Media Capacity to Reflect Diversity & Promote Migrant Integration

Political changes in San Francisco during the past twenty years David Latterman February, 2017

Assessing the impact of the Sentencing Council s Fraud, Bribery and Money Laundering Definitive Guideline

Practice Questions for Exam #2

Summary of the Results of the 2015 Integrity Survey of the State Audit Office of Hungary

the third day of January, one thousand nine hundred and ninety-six prescribe personnel strengths for such fiscal year for the Armed

Texas. Better Newspaper Contest. Opens: Feb. 12, 2018 Deadline: March 22,

Do parties and voters pursue the same thing? Policy congruence between parties and voters on different electoral levels

Congressional Forecast. Brian Clifton, Michael Milazzo. The problem we are addressing is how the American public is not properly informed about

Transcription:

Media coverage in times of political crisis: a text mining approach Enric Junqué de Fortuny Tom De Smedt David Martens Walter Daelemans Faculty of Applied Economics Faculty of Arts Faculty of Applied Economics Faculty of Arts December 23, 2011 Abstract At the year end of 2011 Belgium formed a government, after a world record breaking period of 541 days of negotiations. We have gathered and analyzed all online news articles of Flemish newspapers. The results of our text mining analyses show interesting differences in media coverage and votes for some political parties and politicians. With opinion mining, we are able to automatically detect the sentiment of each article, thereby allowing to visualize how the tone of reporting evolved throughout the year, on a party, politician and even newspaper level. Since all analyses are based on text mining algorithms, a very objective overview of the manner of reporting is provided. 1 Introduction Belgium has seen a unique governmental crisis during which both political parties and politician figures have had wide media coverage. The media has played an important role in the courses taken by parties (for better or for worse). In this study, we analyze the attention and sentiments of popular media sources with respect to political parties and politicians. We argue that news sources are sentiment-rich resources and extract the sentiments using a technique called sentiment analysis. This gives us an unbiased view of the general tone towards politicians and the political crisis. 2 Methodology 2.1 Data acquisition Our corpus comprises of all articles published on websites of popular Flemish newspapers during a 10 month period (from January 1, 2011 to October 31, 2011). An overview of all covered newspapers is displayed in Table 1. Some newspapers (i.e. metro.be) were left out due to fact that they did not have an accessible on-line version. All articles were gathered using a custom built web-crawler. The crawler extracted articles from the sources websites using their built-in search functionalities. The keywords of interest are all Flemish party names and leading figures of political parties (see Table 2). The criterion for being a party of interest is based on the votes for that party in the 2010 Chamber Elections, normalized over the Flemish parties. A leading figure is a politician with a top ten ranking amount of preference votes in the 2010 Senate Elections. 2.2 Data processing A number of filtering steps are performed to clean up and whiten the raw data. First of all, the data is filtered so as to remove possible duplicate articles. Furthermore each article a is converted to a bag of words {w 1, w 2,..., w n } representation to allow computational processing. As a last preprocessing step, all stop-words are removed. E. Junqué de Fortuny and T. De Smedt contributed equally to this work. 1

News source Regional # Articles # Readers 2 d H Deviation De Redactie G 5,767 146,250 2 16.16% De Morgen G 11,303 256,800 2 21.53% GVA R 9,154 395,700 4 27.30% Het Belang Van Limburg R 3,511 423,700 2 37.43% Nieuwsblad G 7,320 1,002,200 4 20.24% De Standaard G 9,154 314,000 3 23.32% De Tijd G 10,061 123,300 0 22.71% Het Laatste Nieuws G 11,380 1,125,600 0 21.62% Table 1: News sources used for the analysis. Party CD&V N-VA Open VLD SP.A Vlaams Belang Political figures Marianne Thyssen,Rik Torfs, Etienne Schouppe,Sabine de Bethune,Peter Van Rompuy Bart de Wever,Helga Stevens Alexander de Croo,Dirk Sterckx Johan Vande Lanotte,Frank Vandenbroucke,Marleen Temmerman Filip Dewinter Table 2: Keywords used to build up the corpus. 2.3 Sentiment analysis For sentiment analysis, we used the Pattern web mining module for Python 1. The module contains a subjectivity lexicon of 3,000+ Dutch adjectives that occur frequently in product reviews, with scores for polarity (positive or negative between +1.0 and -1.0) and subjectivity (objective or subjective between 0.0 and 1.0). For example: boeiend (fascinating) = +0.9 and belabberd (lousy) = -0.6. The lexicon has been evaluated using 2,000 Dutch book reviews with a precision of.72 and a recall of.82 (De Smedt & Daelemans, submitted). In each newspaper article, we look for occurrences of a Flemish political party. We then calculate the polarity of each adjective that occurs in a window of 2 sentences before and 2 sentences after. An article can mention several party names, or switch tone. The given interval ensures a more reliable correlation between the political party being mentioned and the adjective s polarity score contrary to measuring all adjectives in the article. We furthermore exclude adjectives that score between -0.1 and +0.1 to reduce noise. This results in a set of 366,613 assessments, where one assessment corresponds to an adjective score linked to a party name. 3 Results 3.1 Media coverage deviation The coverage c(e, s) of an entity e by a newspaper s is defined as the amount of news articles published by the newspaper on that entity, normalized on the total amount of articles in the corpus A s. The popularity p(e) of a political party e is defined as the relative amount of preference votes v(e) for that entity (as compared to other entities in the top ranking set E). The deviation of a media source is the difference between coverage and popularity. That is: c(e, s) = # {a a A s e a} #A s (1) p(e) = v(e) v(e ) e E dev(e, s) = c(e, s) p(e) (3) dev(s) = e E dev(e, s) (4) (2) 1 http://www.clips.ua.ac.be/pages/pattern 2 Counted by the amount of readers of the printed version except for De Redactie which does not exist in a printed format. Instead, we used the number of unique visitors per day in 2009 as an estimation. Source: belga/odbs 3 Source: Federal Public Services Home Affairs (http://polling2010.belgium.be/) 2

Party Votes 3 Coverage Deviation N-VA 28.72 % 25.01 % -3.71% CD&V 17.91 % 24.16 % 6.25% SP.A 15.25 % 18.51 % 3.26% VLD 14.26 % 14.82 % 0.56% VB 12.81 % 6.07 % -6.74% LDD 7.23 % 7.79 % 0.56% Groen! 3.81 % 3.64 % -0.17% Figure 1: Discrepancy between media coverage and popularity for popular parties where a is a an article, represented as a bag of words {w 1, w 2,..., w na } with n a the amount of words in the article. For high values, the deviation can be an indicator for discrepancies between popularity in media and popularity by votes. Figure 1 shows that for some parties a significant deviation is found, with a maximal positive deviation towards CD&V and a maximal negative deviation towards Vlaams Belang (VB, Flemish Interest). This is in accordance with the fact that CD&V ran the interim government while the new government formations took place. We repeat the analysis for politicians, using the relative amount of preference votes for a party in 2010 as a comparison measure. As can be seen in Figure 2, the deviation with respect to a politician varies irrespective of the party from which the politician comes. For instance, a positive deviation towards Bart De Wever is not reflected in the (negative) deviation of his party (N-VA). This implies either an underlying person cult or a negative coverage of all other party members, depending on causality. It is also interesting to note the differences between different news source. To this extent we define a matrix, ranking all political parties by coverage per newspaper (Figure 3(a)). The major tendencies are similar to our Political Figure Votes 3 Coverage Deviation Bart De Wever 34.02 % 42.39 % 8.37 % Marianne Thyssen 13.63 % 6.04 % -7.59 % Alexander De Croo 12.48 % 12.67 % 0.20 % Filip Dewinter 8.40 % 6.21 % -2.20 % Johan Vande Lanotte 7.79 % 13.53 % 5.75 % Frank Vandenbroucke 7.27 % 4.89 % -2.38 % Rik Torfs 6.09 % 2.83 % -3.26 % Etienne Schouppe 2.96 % 4.91 % 1.95 % Helga Stevens 2.89 % 0.54 % -2.34 % Figure 2: Discrepancy between media coverage and popularity for popular politicans 3

(a) Ranking 7 (b) Deviation Figure 3: Comparison of media coverage of different parties by different newspapers. previous analysis, but some local differences do exist. We use the Hamming distance (Equation 5) to measure the amount of ranking difference for each newspaper, compared to the average ranking (see Table 1). As the Hamming distance increases, disagreement between the consensus ranking increases. A maximal hamming distance of 4 is found for regional newspaper GVA and Nieuwsblad. When we look at the total deviation of news papers (Equation 4), we see the same pattern emerge (regional newspapers deviate more than global ones). d H (v, w) = µ(a, b) = #E µ(v i, w i ) (5) i=1 { 0 a = b 1 a b (6) As we stated earlier, it follows from Figure 1 that the deviation between different parties is not uniformly distributed among all parties (i.e., equal to zero). A more fine grained analysis (Figure 3(b)) shows that general tendencies propagate to the local level (i.e., Vlaams Belang is under-represented in all newspapers). Interestingly though, significant local differences exist as well. For instance regional newspaper Het Belang Van Limburg has a large negative deviation towards N-VA. 3.2 Sentiments 3.2.1 Sentiments by party For a given political party, we take the distribution of positive vs. negative assessments (i.e., adjective polarity scores) as an indicator of the party s overall sentiment in media. Figure 3.2.1 shows the distribution for each party. Overall, 30-40% of newspaper coverage is assessed as negative. Highest negative scores are measured for the far-right Vlaams Belang (which is quarantined by the other parties): -30.4%, and for the N-VA: -28.8%. In 2010, the Dutch-speaking, right-wing N-VA emerged both as newcomer and largest party of the Belgian federal elections. The second largest party was the French-speaking, left-wing PS. While the N-VA ultimately seeks secession of Flanders from Belgium, the PS is inclined towards state interventionism. During the following year they were unable to form a government coalition. This has sparked media controversy, a possible explanation for the score. 4

Figure 4: Sentiment for each political party, with the percentage of positive news items on the left and negatives ones on the right. 3.2.2 Evolution of sentiments in time For a given political party, we group assessments in subsets of one week. We then calculate the simple moving average (SMA) across all weeks to smoothen fluctuation in individual parties and emphasize differences across parties. Figure 5(a) shows the SMA of each political party across all newspapers. It is interesting to note the peak with all parties (except Vlaams Belang) around July-August. During this time, the negotiating parties (negotiating for a government coalition since 2010) were on a three-week leave. Once negations resumed around August 15th, the peak drops. Figure 5(b) shows the SMA of each newspaper across political parties. The curves with the highest fluctuation are those for Het Belang Van Limburg and De Redactie. With these newspapers we measure a standard deviation on the SMA of 0.08 and 0.07 respectively, where other newspapers are in the 0.03-0.05 range. Het Belang Van Limburg also has the highest average sentiment: +0.15 against +0.13-0.14 for all other newspapers. De Standaard newspaper appears to deliver the most neutral political articles. 4 Conclusions We have analysed Flemish newspapers quantitatively during a period of political crisis. We have shown that there exists a deviation from popularity to media coverage for both political parties and their influential figures. The sentiment analysis results further provided a graphical overview of the tone of reporting throughout this period, where interesting changes are observed at key moments in the negotiations. 5

(a) Ranking (b) Deviation Figure 5: Sentiment of news items during 2011, for each party (a), and for each newspaper (b). 6