Text Mining Analysis of State of the Union Addresses: With a focus on Republicans and Democrats between 1961 and 2014

Similar documents
US History B. Syllabus. Course Overview. Course Goals. General Skills

Pen Argyl Area High School. Modern American History

US History B. Course Overview. Course Goals. General Skills. Syllabus

AMERICAN GOVERNMENT POWER & PURPOSE

Contemporary United States

Submission of the President s Budget in Transition Years

Debates and the Race for the White House Script

5.1d- Presidential Roles

Domestic Policy: Nixon through G.W. Bush. In what ways were 20 th century presidents impacted by economic and personal challenges?

Lesson Plan for United States Presidents and their Wars Timeline

I Can Statements. Chapter 19: World War II Begins. Chapter 20: America and World War II. American History Part B. America and the World

REPUBLICANS VS. DEMOCRATS:

Cold War Part III. STANDARD VUS.13c THE PRESIDENCY OF RICHARD NIXON DECREASED PUBLIC TRUST IN THE PRESIDENCY.

Marietta City Schools Pacing Guide. Month / Week CCS Benchmarks Skills/Activities Resources Assessment

Solutions. Algebra II Journal. Module 3: Standard Deviation. Making Deviation Standard

25% Tests, Finals and long term projects 25% Homework 25% Class Participation/Classwork

Post-War United States

Allegheny-Clarion Valley School District

APAH Reading Guide Chapter 31. Directions: Read pages and answer the following questions using many details and examples from the text.

Period 9 Essential TEKS Texas Essential Knowledge and Skills Correlation to APUSH Unit 9 (Period 9 of College Board Framework)

We ve looked at presidents as individuals - Now,

The Constitution of the United States of America

Presidential Project

A More Perfect Union The Three Branches of the Federal Government

CHAPTER 8 - POLITICAL PARTIES

Rural America Competitive Bush Problems and Economic Stress Put Rural America in play in 2008

College, Career & Civic Life (C3) Frameworks for Social Studies State Standards

CRS Report for Congress

Macroeconomics and Presidential Elections

LSP In-Class Activity 5 Working with PASW 20 points Due by Saturday, Oct. 17 th 11:59 pm ANSWERS

THE UNITED STATES IN THE MODERN WORLD

How do presidential candidates use television?

The US Economy: Are Republicans or Democrats Better?

Franklin D. Roosevelt To George W. Bush (Education Of The Presidents) READ ONLINE

Know how Mao Zedong and the Communists win the Communist Civil War and took over China from Chang Kai Shek?

U.S. Court System. The U.S. Supreme Court Building in Washington D. C. Diagram of the U.S. Court System

Americans fear the financial crisis has far-reaching effects for the whole nation and are more pessimistic about the economy than ever.

American Government. Chapter 11. The Presidency


DIOCESE OF HARRISBURG SOCIAL STUDIES CURRICULUM GRADE 7/8 United States History: Westward Expansion to Present Day

By Vanessa Van Edwards. Science of People

working paper Spending UNder President George W. BUSh No March 2009 (corrected) by Veronique de Rugy

DPI 613 Polling in the Real World: Using Survey Research To Win Elections and Govern

2000-Present. Challenges of the 21 st century, THIS IS A TRADITIONAL ASSIGNMENT. PRINT AND COMPLETE IN INK.

2. A bitter battle between Theodore Roosevelt and his successor, William H. Taft, led to.

Reading Essentials and Study Guide

Study Guide for Modern America Final Exam

Rise and Fall of a President

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

Harry Truman Dwight Eisenhower John F. Kennedy

JFK, Reagan, Clinton most popular recent ex-presidents

HI 283: The Twentieth Century American Presidency Boston University, Fall 2013 Wednesday 6-9 pm., CAS 229

SS7 CIVICS, CH. 8.1 THE GROWTH OF AMERICAN PARTIES FALL 2016 PP. PROJECT

United Nations. Marshall Plan. Israel. Mao Zedong. South Korea

Modern Presidents: President Nixon

Attack Politics Negativity in Presidential Campaigns since 1960 by Emmett H. Buell, Jr. and Lee Sigelman

Party Polarization: A Longitudinal Analysis of the Gender Gap in Candidate Preference

Analyse the reasons why slavery in the Americas was supported by different social and economic groups. 99

PLANNED COURSE 10th Grade Social Studies Wilkes-Barre Area School District

Review for U.S. History test tomorrow

The Obama/Romney Amendments

About the Survey. Rating and Ranking the Presidents

Should safety outweigh freedom?

U.S. HISTORY Mr. Walter

Stock Market Indicators: S&P 500 Presidential Cycles

TEKS 8C: Calculate percent composition and empirical and molecular formulas. Postwar Rebuilding and Growth

Guiding Question. Section 3 How did the process of choosing a President change over time?

Reagan s Ratings: Better in Retrospect

Georgia Studies. Unit 7: Modern Georgia and Civil Rights. Lesson 3: Georgia in Recent History. Study Presentation

Domestic Crises

Chapter 5: Political Parties Section 1

Willmar Public Schools Curriculum Mapping 7-12

Demographic Characteristics of U.S. Presidents

Young Voters in the 2010 Elections

AP GOVERNMENT CH. 13 READ pp

US History : Politics, Society, Culture and Religion. GCSE History. Revision Notes

The 2014 Legislative Elections

HISTORY 9769/03 Paper 3 US History Outlines, c May/June 2014

Research Skills. 2010, 2003 Copyright by Remedia Publications, Inc. All Rights Reserved. Printed in the U.S.A.

Objectives: CLASSROOM IDEAS: Research human rights violations since World War II and the United Nations response to them.

Chapter 12. The President. The historical development of the office of the President

Iowa Voting Series, Paper 4: An Examination of Iowa Turnout Statistics Since 2000 by Party and Age Group

Pacing Guide: Amory High School

even mix of Democrats and Republicans, Florida is often referred to as a swing state. A swing state is a

Content Connector. USH.2.4.a.1: Explain how the lives of American Indians changed with the development of the West.


Sul Ross State University Rio Grande College Political Science 3308 The Presidency (Web) Spring Semester 2017

Test-Taking Strategies and Practice

Political Circumstances and President Obama s Use of Statements of Administration Policy and. Signing Statements. Margaret Scarsdale

Western Europe: New Unity. After the end of World War II, most of Western Europe recovered economically and the region became more unified.

CHAPTER 29 & 30. Mr. Muller - APUSH

FEDERAL GOVERNMENT GOVT President & Domestic Policy October 11, Dr. Michael Sullivan. MoWe 5:30 6:50 MoWe 7 8:30

Copyrighted Material CHAPTER 1. Introduction

THE WORKMEN S CIRCLE SURVEY OF AMERICAN JEWS. Jews, Economic Justice & the Vote in Steven M. Cohen and Samuel Abrams

History, Evolution, and Practices of the President s State of the Union Address: Frequently Asked Questions

CLASSROOM Primary Documents

CHAPTER 17 NATIONAL SECURITY POLICYMAKING CHAPTER OUTLINE

Political Circumstances and President Obama s Use of Statements of Administration Policy and Signing Statements

Final Unit 3 Web Design President Project:

The White House and Press Timeline Compiled January 2017

Transcription:

Text Mining Analysis of State of the Union Addresses: With a focus on Republicans and Democrats between 1961 and 2014 Jonathan Tung University of California, Riverside Email: tung.jonathane@gmail.com Abstract Do the two major political parties, Republicans and Democrats, use certain terms more frequently than the opposing party? Do the different words used by the two parties represent conflicting ideology? Are there significant differences between the word usage patterns and text frequencies of the two parties? This paper examines the raw text frequencies and the correlations within documents to provide insight into the patterns hidden within the State of the Union Addresses.

1 Introduction The State of the Union address was not always a speech delivered to a joint session of Congress. While the United States Constitution requires the President of the United States to give Congress information regarding the State of the Union, there is nothing specifically stating that it must be in the form of a speech. In fact, from the early 19 th century, beginning with Thomas Jefferson, to the early 20 th century, Presidents delivered the State of the Union as a written report. It was not until 1913 when Woodrow Wilson re-established the practice of delivering a speech to Congress. Every President since has delivered at least one speech to Congress. Today, it is a massive event that is broadcast live on many networks that reaches millions across the nation. The purpose of this report is to identify potential word usage patterns or trends based on the transcripts of the State of the Union addresses. Another goal is to observe any potential differences in word usage between the two major political parties, Republicans and Democrats. I chose to focus my efforts on a more modern era, from 1961 to 2014. This spans the presidencies of John F. Kennedy to the incumbent Barack Obama. The most compelling reason for selecting this time period is because the presidential debate between Kennedy and Nixon in 1960 was the first televised debate, ushering in an era of televised speeches. Another reason I chose this specific time period is because before the two World Wars, political ideology was dramatically different from what it is today. Also, in the pre-world War era, Democrats were widely considered to be more conservative, while Republicans were considered to be more liberal. This is in stark contrast to today s political agendas. It should be noted that word frequencies in texts tend to follow a power distribution. That is, there are a small number of high-frequency words, and a large number of low-frequency words. This is especially true for a large collection of text documents such as the State of the Union addresses, which spans 225 years. [1] The breakdown of the presidencies in this time period is as follows: Table 1. Breakdown of Presidencies from 1961 2014 President Term Start Term End Political Affiliation John F. Kennedy 1961 1963 Democratic Lyndon B. Johnson 1963 1969 Democratic Richard Nixon 1969 1974 Republican Gerald Ford 1974 1977 Republican Jimmy Carter 1977 1981 Democratic Ronald Reagan 1981 1989 Republican George H. W. Bush 1989 1993 Republican Bill Clinton 1993 2001 Democratic George W. Bush 2001 2009 Republican Barack Obama 2009 Present Democratic As of today, within this time period, Democrats have been President of the United States for 25 years, while Republicans are at 28 years. This turns out to be a fairly even distribution of the presidential years. This was an unintentional, but welcome result of opting to start my analysis with President Kennedy.

2 Overview of all State of the Union Addresses (1789 2014) In order to get a general understanding of word usage in the State of the Union addresses, I decided to analyze all of the addresses together. This should lay the foundation for what s to come later on. The time period of interest in this case is from 1789 to 2014. In this time frame, there have been 44 Presidents, beginning with George Washington and ending with the incumbent, Barack Obama. While it is obvious that today s United States bears little resemblance, if any, to the United States at the time of its inception, we can still get a glimpse of the issues and topics that the Presidents opted to focus on during this time frame. The breakdown of the word usage can be seen in Figure 1. Figure 1. Word cloud of words used in all State of the Union Addresses From this breakdown, we can notice that some of the most frequently used terms are government, states, congress, united, and people. In fact, these are the five most frequently used words when examining all of the State of the Union addresses together. However, it would be irrational to focus solely on the most frequently used terms. It is possible that these terms are not really significant, in that they do not contribute to our understanding of the document. It is more reasonable to select words to focus on, regardless of their frequency. Ideally, these words will be pertinent to the issues from the respective time periods in which they were used.

The criteria I used to select these critical terms are whether they were listed on reputable sites regarding politics and political issues. I enlisted the help of the sites http://www.ontheissues.org and http://whitehouse.gov. Both contain sections on important issues that we face today. While it may seem odd to use the issues of contemporary society when examining the entire collection of documents, it is merely to lay the foundation for the highlighted era (1961-2014) that we will examine later. [2] [3] Some of these critical terms are: foreign policy, homeland security, war and peace, free trade, immigration, energy and oil, government reform, tax reform, education, health care, and civil rights. The entire list is composed of the issues listed on the two aforementioned sites. For significant phrases, I broke them down into their individual word components. The frequency of these critical terms can be seen in Figure 2. Out of the critical terms, I selected the 30 most frequently used to reduce clutter and improve the readability of the chart. Figure 2. Plot of 30 Most Frequently Used Critical Terms and their Frequency We can view the connections between the critical words in Figure 3. The dendrogram in Figure 3 exhibits a tree-like structure of word clusters. Words that are within the same immediate branch are more closely connected than words that are not part of the same immediate branch.

Figure 3. Plot of the clusters between Critical Terms From the chart, we observe several expected results, but we also observe some rather surprising results. We can see that the words foreign and trade are clustered together, which makes sense since they are often mentioned together as a phrase. Also, we can see that social and responsibility are clustered together, but are less connected with tax. An unanticipated result is that civil and rights are not in the same immediate cluster. However, I do not believe it is wise to read too much into the clusters between these words, as these documents spanned 225 years. It is completely reasonable and possible for these terms to have been used in different contexts, even as recently as 100 years ago. Again, the purpose of these preliminary analyses is to establish the groundwork for more detailed analysis later. 3 Overview of the Era from Kennedy to Obama (1961 2014) We now have some understanding of what to expect from the text data contained in the State of the Union addresses. It is now a good time to further examine the documents, shifting our focus to the era of interest, beginning with Kennedy and ending with the incumbent Obama. This era spans over 50 years; within these 50 years, there have been multiple historically significant issues. Some of these are the Cold War, the assassination of President Kennedy, the assassination of Dr. Martin Luther King Jr., the Moon landing, the Vietnam War, Watergate, Reaganomics, and 9/11. This is certainly not an all-inclusive list; rather it is a list of some of the most discussed

topics in this time period. Whether these events will be specifically mentioned by name in the documents remains to be seen. What we can do first is analyze the term frequencies of this time frame all together. A word cloud can be employed to provide a broad overview of some of the most critical words during this time. This word cloud can be seen in Figure 4. Figure 4. Word cloud of words used in State of the Union addresses between 1961 and 2014. Examining this word cloud, it is clear that many of the same terms from Figure 1 appear to be frequently used during this time period. In fact, we can look at the 30 most frequently used terms that we have deemed to be critical. The frequencies of these terms can be seen in Figure 5. We will notice that there are some differences between the most frequently used critical terms over the entire collection of documents and the documents from the era of interest. There are three terms from Figure 2 that do not appear in Figure 5, and there are three terms from Figure 5 that do not appear in Figure 2. From Figure 2, the terms principles, fiscal, and civil are not present in Figure 5. From Figure 5, the terms values, technology, and oil are not present in Figure 2. Some of these results are intriguing. That is, we should not be surprised that oil was not used with utmost frequency over the entire span of these documents. This is because oil was not of utmost importance in political discussions and daily life just 100 years ago, contrary to today. Similarly, technology was not nearly as important a century or two ago, as it is now.

Figure 5. Plot of 30 Most Frequently Used Critical Terms and their Frequency (1961-2014) I find it interesting that the terms principles and values essentially were swapped for each other. However, the terms civil and fiscal have frequencies of 65 and 79, respectively during the highlighted time period. This appears to indicate that these terms are not really used that often. Considering this time period spans 54 years, the terms civil and fiscal have been used slightly more than once per year. Now that we have established the most often used critical terms, we can delve deeper into the text, and examine whether there are differences in usage between the main two political parties, Republicans and Democrats. We want to see which terms belonged to Republicans and which terms belonged to Democrats. We will say that a term belonged to the Republicans if they used it more; we will say that a term belonged to the Democrats if they used it more. We would like to visualize the usage of terms between Republicans and Democrats between 1961 and 2014. To do this, we can use a word cloud that distinguishes between the two political parties. We can view which words belonged to Republicans and Democrats during this time period in Figure 6. In order to make this figure more visually appealing, words that belonged to Republicans are red, while words that belonged to Democrats are in blue. In general, words closer to the middle were more closely contested words. That is, words in the middle were used close to evenly by both parties, but ultimately is displayed in the color of the political party it belonged to.

Figure 6. Comparison Word Cloud of words used in State of the Union Addresses (1961-2014) From this word cloud, there are some alarming distinctions between the two political parties. One of the first things that we notice is the term terrorists in red. Upon further examination, we also notice that the terms iraq, saddam, and war are displayed in red. This gives the impression that during this time period, the Republican party opted to mention, and subsequently, focus their attention on external issues, specifically those relating to military conflict. From the Democrats, some of the terms that appear to be important are college, jobs, and education. This gives the impression that the Democratic party was more closely focused on internal issues, specifically those relating to the working class and the youth of the nation. It should be noted that within this time period, the Persian Gulf War, 9/11, and the War on Iraq were all overseen by Republican presidents. Thus, it makes sense that these were some of the issues they focused on. However, considering the fact that within this time period the Vietnam War and the War on Iraq were overseen by Democratic presidents, the terms closely related to military conflict appearing red is unsettling. Similarly, it is startling that Democrats are more closely associated to internal issues such as businesses and companies. From my viewpoint, I have always considered Republicans to be more closely associated with businesses and companies, as I believed that Republicans were pro-business, seeing that they are on the right end of the political spectrum, where the right is more conservative and the left is more liberal.

4 Further Analysis and Visualizations (1961 2014) We can visualize which terms belonged to which party in another way. This can be seen in Figure 7. [4] In this figure, words more closely associated to Democrats are more blue than red, while words that are closely associated to Republicans are more red than blue. More closely contested words sit near the middle of the plot and are essentially equal parts blue and red. Figure 7. Word Usage in State of the Union (1961-2014): Republicans vs Democrats At first glance, it is not easy to interpret this plot, because of how hotly contested a lot of these words were. However, we can clearly see that the terms progress and community belonged to Democrats while spending and taxes belonged to Republicans. This falls in line with the common belief that Democrats are more progressive and their stance on social ideas is based on community responsibility. [5] Similarly, Republicans are in favor of increased spending in terms of defense spending, and a large portion of this spending originates from taxes. It is not surprising to see that the attributes that define these political parties are reflected in the words they choose to use in the State of the Union addresses. We can visualize the frequency of the critical terms within the State of the Unions by using a scatter plot that has the frequency for Republicans on the x-axis and the frequency for Democrats on the y-axis. This scatterplot can be seen in Figure 8. By examining this scatterplot, we can confirm what we already know. That is, that government was the most frequently used term throughout the 54 speeches. It appears that the majority of these terms were quite evenly used by

the two parties. Upon visual inspection of the scatterplot, there is no obvious outlier that was used significantly more by one party than the other. However, there are some things that must be noted. It is evident that the terms education, jobs, health, and care were used more often by Democrats than Republicans. It is not immediately apparent whether there are terms that Republicans used more than Democrats. The aforementioned terms that were quite obviously used more frequently by Democrats than their opposition fall in line with the Democratic party s beliefs and ideology. Thus, it is no surprise that they emphasized these ideas within their speeches over the years. Figure 8. Scatterplot of Critical Term Frequency in State of the Union (1961-2014): Republicans vs Democrats Up until this point, we have merely observed word frequencies. We should be aware that simply analyzing text frequencies should not be considered adequate text analysis. While, text frequencies are undoubtedly useful, we must also be cognizant of its limitations and the drawbacks of relying so heavily on frequencies alone. It makes sense, then, to consider other methods of analyzing text mining data. We can also consider the topics mentioned and attempt to separate them. One method that is immensely useful to produce the results that we desire is k-means clustering. The essence of k- means clustering is to separate the data into a specified amount, k, of clusters or subdivisions of the entire data set, where each individual data point, or observation, is placed in the cluster with the nearest mean, or greatest similarity. The closer (in terms of physical distance) that a group of same-colored terms are, the closer they are related to each other, and thus we are more

confident about the topic. It is much easier to visualize this. We can see the results of the k- means clustering in Figure 9, with k arbitrarily chosen to be 6. Figure 9. K-means Clustering for State of the Union Text Data (k = 6) Within this plot, we can see some obvious distinct topics. For example, in the teal color (towards the left side of the plot), we can clearly see that this topic is about economics, public policy, and really anything that has deals with the fiscal side of politics. Also, towards the bottom right of the plot, in orange, we can reasonably conclude that this topic is about health care and the welfare of the citizens of the United States. In other words, this topic essentially deals with the internal well-being of the nation. From this plot, we can also notice that there are terms of the same color that are not all that similar in meaning. At the bottom of the plot, we see that oil and immigration are the same color. These are not typically thought of in the same way, but they are colored the same because they are more closely related than other terms that we encounter in the plot. It is important to be able to distinguish the different topics and ideas that were mentioned within the State of the Union. Another way of visualizing how closely related the important terms that we have been analyzing is to use a cluster dendogram. This cluster dendrogram can be seen in Figure 10. We can compare this cluster dendrogram to the one previously mentioned (Figure 3) and see what we find. However, we should first take a good look at this cluster dendrogram that focuses on the time period between 1961 and 2014.

Figure 10. Plot of the Clusters between Critical Terms (1961 2014) Above, the results are not too surprising. Perhaps the most surprising result is that social and security are not grouped in the same immediate branch. Considering the time period, Social Security had been around for at least three decades, which leads me to believe that it would have been mentioned as a phrase particularly often. This does not appear to be the case. In fact, many of the results in this cluster dendrogram are similar to Figure 9, the plot of the k- means clustering. This should help to eliminate any concerns we may have had about the validity of these two figures. We can clearly see that most, if not all, of the critical terms that we are interested in are connected to at least one other term in some way, meaning that even if one term or topic was mentioned very infrequently, we can look to terms or topics that are closely related to the original ones to provide us with insight into how these terms or topics were used by the two major political parties. 5 Limitations, Conclusions, and Further Studies The entire collection of State of the Union addresses spans from 1789 to 2014. It is difficult to draw conclusions based on the large time frame and the differences between the United States of America then, with only 13 states, and now, with 50 states and a booming population. However, if we shift our focus to the time frame from 1961 to 2014, a few patterns emerge. Immediately,

we notice that Republicans seem to be concentrating their addresses on external issues, such as terrorism and war. On the other hand, we notice that Democrats appear to be concentrating on internal issues, such as education and jobs. This passes the eye test, as these appear to be aligned with the party ideology listed on the website www.diffen.com. Overall, it appears to me that Democrats are more committed to the growth of the economy by promoting education and subsequently, the ability to obtain jobs that provide better compensation, while Republicans are committed to the prosperity of the nation by asserting military dominance over foreign countries. The collection of State of the Union documents is a large collection of data which can be analyzed in a variety of ways. Based on my educational background and general knowledge, I decided to analyze the documents using a statistical software, R, with the bulk of the computations and commands carried out by the text mining (tm) package. I also enlisted the help of several other packages including the topicmodels, wordcloud, and cluster packages. They are all immensely useful, especially in the context of analyzing large collections of documents, like the State of the Union addresses. It must be noted, however, that there are a number of limitations with the way I chose to go about the analysis. The bulk of my analysis revolves around raw text frequencies and computing similarities between documents based on text frequencies. Other approaches may involve looking at the trends in length of speeches by breaking down the number of sentences used in each speech, or assessing the polarity (emotional content) of each speech, especially mapping the polarity from start to finish of each particular speech. There are a multitude of things that can be done with this collection of documents, and I merely tackled one approach.

References [1] Marko Grobelnik, Dunja Mladenic. Text-Mining Tutorial, http://eprints.pascalnetwork.org/archive/00000017/01/tutorial_marko.pdf [2] Political Leaders views on the Issues, http://www.ontheissues.org/issues.htm [3] Issues The White House, http://www.whitehouse.gov/issues [4] Modified Cloud Mining Twitter with R, https://sites.google.com/site/miningtwitter/questions/talking-about/wordclouds/modified-cloud [5] Democrat vs Republican Difference and Comparison Diffen, http://www.diffen.com/difference/democrat_vs_republican