LSE Department of Methodology, MY428/528 - LT 2014 Qualitative Text Analysis Course Convenor: Dr. Aude Bicquelet (a.j.bicquelet@lse.ac.uk) Office Hours: Thursday 11:30-13:30
EXPLORATORY CONTENT ANALYSIS Week 8
Lecture Outline 1. Definitions & Ambitions ------------------------------------------------------------------ 2. Origins & Epistemological Foundations ------------------------------------------------------------------ 3. Applications of Exploratory CA Can Text-Mining help handle the data deluge in Public Policy Analysis? (Bicquelet and Weale, 2011) In a different Parliamentary voice? (Bicquelet et al. 2012) Right-Wing Nationalism in Political Manifestos ------------------------------------------------------------------ 4. The Alceste Software
DEFINITIONS & AMBITIONS
Exploratory Content Analysis Definition and Ambitions Definition Exploratory Content Analysis/Text Mining: Process of extracting information in large corpora with the aim of identifying patterns and relationship in textual data Ambitions To identify major patterns of argumentation within large corpora To reduce complex data through categorization and visualization techniques
Exploratory Content Analysis Ambitions When/Why using it? To match passive variables with classes of words in large corpora. To help the elaboration of codebooks for inductive approaches (Summative Content Analysis). To generate hypothesis to be tested by hypothetico-deductive analyses (Classical Content Analysis). To triangulate results generated by inductive approaches.
Exploratory Content Analysis Ambitions & Applications Applications Parliamentary debates (Schonhardt-Bailey 2008; Weale et al. 2012; Bara et al. 2007) Online public consultations (Bicquelet and Weale 2011) Political Manifestos (Bicquelet, 2007) Open-ended Questionnaires (Brugidou 2003; Lahlou 1996) Social Media Analysis/Sentiment Analysis/Marketing Applications (etc )
Exploratory Content Analysis Example - Parliamentary debates about the EU.
Exploratory Content Analysis Advantages & Critiques Advantages Generate classification free from preconceptions Produce fast results High reliability/ Replicability Critiques Analysis of the classes is an interpretative process (not free from preconceptions) Danger to overlook valuable information (impossible to evaluate weight or strength) Weak validity
ORIGINS & EPISTEMOLOGICAL FOUNDATIONS
Exploratory Content Analysis Origins and Epistemological Foundations Cognitive Anthropology (Cultural Domain Analysis) + Structural Linguistics + Statistics (Descriptive and Exploratory Data Analysis)
Exploratory Content Analysis Origins and Epistemological Foundations I. Cognitive Anthropology (Spradley, 1972) Cultural Domain Analysis (Borgatti, 1999) Organised set of words, concepts and sentences tell us a lot about how people think. Cluster of words are conceptual domains referring to a set of objects reflecting the everyday taxonomy of a native people. Free lists and pile sorts help to identify items in a cultural domain.
Exploratory Content Analysis Origins and Epistemological Foundations What is it like to live in London? (Free Lists 10 words) Resp. 1: grey; cold; lonely; food; expensive; travel; tube; work; walk; park. Resp. 2: Fun; parties; Art; movie; restaurant; rainy; shops; cheap; flatmates; exciting. Resp. 3: Tea; pub; galleries; sunny; cool; pricy; cycling; yoga; colleagues; kids.
Exploratory Content Analysis Origins and Epistemological Foundations What is it like to live in London? (Pile sorts/ judged similarities) Weather Feelings Relations Money Activities Grey Cold Rainy Sunny Lonely Fun Exciting Cool Family Friends Flatmate Kids Cheap Expensive Pricy Travel Work Walk Job Movie Gallery ( )
Exploratory Content Analysis Origins and Epistemological Foundations II. Structural Linguistics Semiotics (C.S. Peirce, 1877) We make sense of reality through word associations and co-occurrences. The meaning of a word is best understood by the set of words that co-occur with it. Two words that have similar co-occurrence patterns are semantically related ( Semantic Network Analysis)
Exploratory Content Analysis Origins and Epistemological Foundations Knife Restaurant Italian Waiter Horrified Pasta Boyfriend Smiled Knife Trial Italy Judge Horrified murder Boyfriend charged
Exploratory Content Analysis Origins and Epistemological Foundations Valentine Dinner Amanda Knox s trial Knife Restaurant Italian Waiter Horrified Pasta Boyfriend Smiled Knife Trial Italy Judge Horrified murder Boyfriend charged
Exploratory Content Analysis Methods of Analysis III. Descriptive Statistics Exploratory data Analysis (J. Tukey, 1977) Word frequencies Ranking (how early an items gets mentioned) KWIC Multidimensional Scaling Cluster Analysis Correspondence Analysis
Exploratory Content Analysis Methods of Analysis: Multidimensional Scaling
Exploratory Content Analysis Methods of Analysis: Multidimensional Scaling MDS maps the relations among items in a matrix. The algorithms work out the best spatial representation of a set of items that are represented by a set of similarities. Similarity Matrix Grey Lone. Kids Sun. Cheap Friend Price Fun Cold Cool Rain. Grey 1 0 0 1 0 0 0 0 1 0 1 Lone. 0 1 0 0 0 0 0 1 0 1 0 Kids 0 0 1 0 0 1 0 0 0 0 0 Sun 1 0 0 1 0 0 0 0 1 0 1 Cheap 0 0 0 0 1 0 1 0 0 0 0 Friends 0 0 1 0 0 1 0 0 0 0 0 Price 0 0 0 0 1 0 1 0 0 0 0 Fun 0 1 0 0 0 0 0 1 0 1 0 Cold 1 0 0 1 0 0 0 0 1 0 1 Cool 0 1 0 0 0 0 0 1 0 1 0 Rain 1 0 0 1 0 0 0 0 1 0 1
Exploratory Content Analysis Methods of Analysis: Cluster Analysis Cluster Analysis is another visualization method. Like MDS it operates on similarity matrices. In cluster analysis the aim is to divide a set of items into subgroups (clusters). Members of each subgroups are more like each other than they are like members of other subgroups.
Exploratory Content Analysis Methods of Analysis: Cluster Analysis
Exploratory Content Analysis Software Alceste Website: http://www.image-zafar.com/en/alceste-software Iramuteq (r) Website: http://www.r-project.org/ QDA Miner/Wordstat Website: http://provalisresearch.com/products/qualitative-data-analysissoftware/ T-Lab Website: http://www.tlab.it/en/presentation.php
Exploratory Content Analysis A mixed-method Approach A mixed-method Approach Generating statistical outputs is only the first step towards an integrated analysis of the results. It is always necessary to check word frequencies, clusters, correspondence and multidimensional analyses against the raw data. No software can replace the interpretative process of a corpus Exploratory Content Analysis is thus a Mixed-Method approach to data analysis.
APPLICATIONS OF EXPLORATORY CONTENT ANALYSIS
BICQUELET AND WEALE S ANALYSIS OF PUBLIC CONSULTATIONS (2011)
Bicquelet and Weale (2011) Research question: What are the most commonly expressed arguments for/against funding end of life medicines by different cohorts (patients, carers, NHS professional?) Data: Public Consultation on end of life medicines run by NICE in 2008. Software: Alceste Units of Analysis: Digitised answers to public consultation Sampling strategy: Purposive
Bicquelet and Weale (2011)
Bicquelet and Weale (2011)
Bicquelet and Weale (2011)
Bicquelet and Weale (2011) Why Alceste? Suitable for very large corpora Little coding (only requires the definition of unit of analyses and variables) Works purely on the occurrence and co-occurrence of typical word pairs in a corpus. Automatically produces classes made up of key terms. Key terms and classes are matched with the variables
Bicquelet and Weale (2011)
Bicquelet and Weale (2011)
BICQUELET et al. s ANALYSIS OF GENDER DIFFERENCES IN PARLIAMENT (2012)
Bicquelet et al. (2012) Research Question: Do men and women express similar (or different) types of arguments in parliamentary debates? Data: 6 Second reading debates from 1966 to 1988 in the UK House of Commons Software: Alceste Units of Analysis: Speech-acts Sampling Strategy: Purposive
Bicquelet et al. (2012) Date House Type of debate Initiator Party Government 22 nd July 1966 Commons Second Steel Liberal Labour (Wilson) 13 th Feb. 1970 Commons Second Conservative Labour (Wilson) 7 th Feb. 1975 Commons Second White Labour Labour (Wilson) 25 th Feb. 1977 Commons Second Benyon Conservative Labour (Callaghan) 13 th July 1979 Commons Second Corrie Conservative Conservative (Thatcher) 22 nd Jan. 1988 Commons Second Liberal Democrat Conservative (Thatcher)
Bicquelet et al. (2012) Coding For each debate: name of the speaker year gender (Unit of anlysis = speech act / utterence) Corpus Construction Integrated debate (compile all six debates within a single corpus) Analysis of six debates individually Type of Analysis: - Standard (Integrated debate) - Cross-data analysis - Tri-croisé on the variable gender (on each single debate)
Class of Sentences Main Themes in Class 1. Moral concerns Sanctity and value of life. Moral status of child. Effects of disability Unwanted pregnancy, strain on families. 2. Operation of medical facilities Permits and licensing Operation of medical facilities 3. Effects of 1967 legislation Estimates of number of abortions, including illegal Legal status of abortion Some polling evidence 4. Rhetoric of debate and procedure Congratulations or criticism of other speakers Character of procedure 5. Role of committees and reports Reference to committees of enquiry Reference to parliamentary committees 6. Reflections on debate Character of debate Role of parliament
Bicquelet et al. (2012)
Bicquelet et al. (2012) Results Women have used a larger array of rhetorical strategies than men Political alignment did not seem to influence the way arguments were being framed Women and Men deploy similar rhetorical strategies but the key difference is terms of their frequency If Women and Men speak in a different voice, this is only from time to time
Right-Wing Nationalism in Political Manifestos
Right-Wing Nationalism in Political Manifestos Research Question: is there a rhetoric of right-wing nationalism that is distinct from the language deployed by self-declared mainstream parties? Data: Manifestos of right-wing nationalist parties from three different countries (France, Italy and the UK). Software: Alceste Units of Analysis: Manifestos. Sampling Strategy: Purposive
Right-Wing Nationalism in Political Manifestos
Right-Wing Nationalism in Political Manifestos
Right-Wing Nationalism in Political Manifestos
Right-Wing Nationalism in Political Manifestos Results Right-wing political manifestos commonly use four types of argumentative patterns: The Rhetoric of Decadence The Rhetoric of Defence The Rhetoric of Tradition The Rhetoric of Difference While immigration was expected to be a dominant theme, it was in fact subordinated within concerns about home countries in relation to the European Union.
Useful Resources Bicquelet, A., Weale, A & Bara,J. (2012) In a different Parliamentary Voice? An analysis of gender differences in UK parliamentary debates about abortion Politics & Gender (Vol.8,N.1) Schonhardt-Bailey, C. (2008), The congressional debate on partial-birth abortion: Constitutional gravitas and moral passion. British Journal of Political Science (38:383 410). Bicquelet, A. & Weale, A. (2011), Coping with the cornucopia: Can Text Mining Help Handle the Data Deluge in Public Policy Analysis? Policy & Internet (Vol.3, N.4) Brugidou, M. (2003) Argumentation and Values: An Analysis of Ordinary Political Competence via an Open-Ended Question International Journal of Public Opinion Research, 15:4, pp. 413-430. Guérin-Pace, F (1998) Textual Statistics. An Exploratory Tool for the Social Sciences, New Methodological Approaches in the Social Sciences, 10:1, pp. 73-95. Lahlou, S. (1996) A Method to Extract Social Representations from Linguistic Corpora. Japanese Journal of Experimental Social Psychology. 36:1, pp. 278 291. Schonhardt-Bailey, C. (2005) Measuring Ideas More Effectively: An Analysis of Bush and Kerry's National Security Speeches, PS: Political Science and Politics, 38:3, pp. 701-711.