Cluster Analysis. (see also: Segmentation)

Size: px
Start display at page:

Download "Cluster Analysis. (see also: Segmentation)"

Transcription

1 Cluster Analysis (see also: Segmentation)

2 Cluster Analysis Ø Unsupervised: no target variable for training Ø Partition the data into groups (clusters) so that: Ø Observations within a cluster are similar in some sense Ø Observations in different clusters are different in some sense Ø There is no one correct answer, though there are good and bad clusters Ø No method words best all the time That s not very specific

3 (Some) Applications of Clustering Ø Customer segmentation: groups of customers with similar shopping or buying patterns Ø Dimension reduction: Ø cluster variables together Ø cluster individuals together and use cluster variable as proxy for demographic or behavioral variables Ø Image segmentation Ø Gather stores with similar characteristics for sales forecasting Ø Find related topics in text data Ø Find communities in social networks

4 Methodology Ø Hard vs. Fuzzy Clustering Ø Hard: objects can belong to only one cluster Ø k-means (PROC FASTCLUS) Ø DBSCAN Ø Hierarchical (PROC CLUSTER) Ø Fuzzy: objects can belong to more than one cluster (usually with some probability) Ø Gaussian Mixture Models

5 Methodology Ø Hierarchical vs. Flat Ø Hierarchical: clusters form a tree so you can visually see which clusters are most similar to each other.

6 Methodology Ø Hierarchical vs. Flat Ø Hierarchical: clusters form a tree so you can visually see which clusters are most similar to each other. Ø Agglomerative: points start out as individual clusters, and they are combined until everything is in one cluster. Ø Divisive: All points start in same cluster and at each step a cluster is divided into two clusters. Ø Flat: Clusters are created according to some other process, usually iteratively updating cluster assignments

7 Hierarchical Clustering (Agglomerative) Some Data A B C I H J D G E F

8 Hierarchical Clustering (Agglomerative) First Step

9 Hierarchical Clustering (Agglomerative) Second Step

10 Hierarchical Clustering (Agglomerative) Third Step

11 Hierarchical Clustering (Agglomerative) Forth Step

12 Hierarchical Clustering (Agglomerative) Fifth Step

13 Hierarchical Clustering (Agglomerative) Sixth Step

14 Hierarchical Clustering (Agglomerative) Seventh Step We might have known that we only wanted 3 clusters, in which case we d stop once we had 3.

15 Hierarchical Clustering (Agglomerative) Eighth Step

16 Hierarchical Clustering (Agglomerative) Final Step

17 Hierarchical Clustering Levels of the Dendrogram

18 Resulting Dendrogram A B C D E F G H I J

19 Linkages Which clusters/points are closest to each other? How do I measure the distance between a point/cluster and a cluster?

20 Linkages Single Linkage: Distance between the closest points in the clusters. (Minimum Spanning Tree)

21 Linkages Complete Linkage: Distance between the farthest points in the clusters.

22 Linkages Centroid Linkage: Distance between the centroids (means) of each cluster. x x

23 Linkages Average Linkage: Average distance between all points in the clusters.

24 Linkages Ward s Method: Increase in SSE (variance) when clusters are combined. centroid for cluster i, c i x Ø Default in SAS PROC CLUSTER Ø Shown mathematically similar to centroid linkage data points in cluster i: x 1, x 2,, x Ni

25 Hierarchical Clustering Summary Ø Disadvantages Ø Lacks global objective function: only makes decision based on local criteria. Ø Merging decisions are final. Once a point is assigned to a cluster, it stays there. Ø Computationally intensive, large storage requirements, not good for large datasets Ø Poor performance on noisy or high-dimensional data like text. Ø Advantages Ø Lacks global objective function: no complicated algorithm or problem with local minima Ø Creates hierarchy that can help choose the number of clusters and examine how those clusters relate to each other. Ø Can be used in conjunction with other faster methods

26 k- Means Clustering (PROC FASTCLUS in SAS) Ø The most popular clustering algorithm data points in Cluster 1 x Cluster 2 (C 2 ) centroid c 2 Cluster 1 (C 1 ) centroid c 1 x data points in Cluster 2 Ø Tries to minimize the sum of squared distances from each point to its cluster centroid. (Global objective function)

27 k- Means Algorithm Ø Start with k seed points Ø Randomly initialized (most software) Ø Determined methodically (SAS PROC FASTCLUS) Ø Assign each data point to the closest seed point. Ø The seed point then represents a cluster of data Ø Reset seed points to be the centroids of the cluster Ø Repeat steps 2-4 updating the cluster centroids until they do not change.

28 k- Means Interactive Demo (You may have to add the site to your exceptions list on the Java Control Panel to view.)

29 Choice of Distance Metric Ø Most distances like Euclidean, Manhattan, or Max will provide similar answers. Ø Use cosine distance (really 1-cos since cosine measures similarity) for text data. This is called spherical k-means. Ø Using Mahalanobis distance is essentially the Expectation-Maximization (EM method) for Gaussian Mixtures.

30 Determining Number of Clusters (SSE) Ø Try the algorithm with k=1,2,3, Ø Examine the objective function values Ø Look for a place where the marginal benefit to objective function for adding a cluster becomes small k=1 objective function (SSE) is 902

31 Determining Number of Clusters (SSE) Ø Try the algorithm with k=1,2,3, Ø Examine the objective function values Ø Look for a place where the marginal benefit to objective function for adding a cluster becomes small k=2 objective function (SSE) is 213

32 Determining Number of Clusters (SSE) Ø Try the algorithm with k=1,2,3, Ø Examine the objective function values Ø Look for a place where the marginal benefit to objective function for adding a cluster becomes small k=3 objective function (SSE) is 193

33 Determining Number of Clusters (SSE) Ø Try the algorithm with k=1,2,3, Ø Examine the objective function values Ø Look for a place where the marginal benefit to objective function for adding a cluster becomes small Objective Function k=1 k=2 k=3 k=4 Elbow => k=2

34 k- Means Summary Ø Disadvantages Ø Dependent on initialization (initial seeds) Ø Can be sensitive to outliers Ø If problem, should consider k-mediods (uses median not mean) Ø Have to input the number of clusters Ø Difficulty detecting non-spheroidal (globular) clusters Ø Advantages Ø Modest time/storage requirements. Ø Shown you can terminate method after small number of iterations with good results. Ø Good for wide variety of data types

35 Cluster Validation How do I know that my clusters are actually clusters? Ø Lots of techniques/metrics have been proposed Ø Measure separation between clusters Ø Measure cohesion within clusters Ø All have merit, most are difficult to interpret in the context of statistical significance.

36 Cluster Validation Ø To establish statistical significance: Ø Show that you can t do just as well with randomized data (i.e. assume the null hypothesis of no clusters) Ø Simulate ~1000 random data sets choosing from the distributions or ranges of your variables. Cluster them with the same number of clusters. Record the SSE (k-means objective function) or validity metric of choice. Use this to show that your actual SSE is far better than you could expect to achieve if no clusters exist.

37 Profiling Clusters Now that we have clusters, how do we describe them? Ø Use basic descriptives and hypothesis tests to show differences between clusters Ø Use a decision tree to predict cluster Ø SAS EM has segment profiler node

38 Other types of Clustering (self- study) Ø DBSCAN Density based algorithm designed to find dense areas of points. Capable of identifying noise points which do not belong to any clusters. Ø Graph/Network Clustering Spectral clustering and modularity maximization. Covered in Social Network Analysis in Spring.

39

40 Some Explanation of SAS s Clustering Output (SELF- STUDY) Because it s not exceedingly easy to figure out online!

41 Cubic Clustering Criterion (CCC) Ø Only available in SAS (to my knowledge) Ø CCC > 2 means that clustering is good Ø 0 > CCC > 2 means clustering requires examination Ø If slightly negative, risk of outliers is low Ø If ~< -30 then risk of outliers is high Ø Should not be used with single or complete linkage, but with centroid or ward s method. Ø Each cluster must have >10 observations. Source: Tufféry, Stéphane. Data Mining and Statistics for Decision Making. Wiley 2011

42 Determining Number of Clusters with the Cubic Clustering Criterion (CCC) Ø A partition into k clusters is good when we see a dip in CCC for k-1 clusters and a peak for k clusters. Ø After k clusters, the CCC should either a gradually decrease or a gradual rise (the latter event happens when more isolated groups or points are present) 1 Source: Tufféry, Stéphane. Data Mining and Statistics for Decision Making. Wiley 2011

43 Determining Number of Clusters with the Cubic Clustering Criterion (CCC) Image Source: Tufféry, Stéphane. Data Mining and Statistics for Decision Making. Wiley 2011

44 Determining Number of Clusters with the Cubic Clustering Criterion (CCC) WARNING: Do not expect the CCC to be common knowledge outside of the SAS domain.

45 Overall R- Squared and Pseudo- F These statistics draw connections between a final clustering and ANOVA. Ø Total Sum of Squares (SST) Ø Between Group Sum of Squares (SSB) Ø Within Group Sum of Squares (SSW) Ø This is the k-means objective previously referred to as SSE. Ø Minimizing SSW => Maximizing SSB Ø SST = SSB + SSW. Ø Overall R 2 = SSB/SST Ø b

46 Example: PenDigit Data Ø Goal: Automatic recognition of handwritten digits Ø Digit database of 250 samples from 44 writers Ø Subjects wrote digits in random order inside boxes of 500 by 500 tablet pixel resolution Ø Spatial resampling to obtain a constant number of regularly spaced points on the trajectory Ø (x #, x % ) give the first point coordinate Ø (x ',x ( ) give the second point coordinate Ø etc.

47 Example: PenDigit Data proc fastclus run; data=datasets.pendigittest maxclusters=10 out = clus; var x1--x16;

48 Example: PenDigit Data The first step to creating your own hierarchical dendrogram.

49 Example: PenDigit Data proc glm data= clus; class cluster; model x1 = cluster; run; quit;

50 Example: PenDigit Data

51 Example: PenDigit Data

52 Example: PenDigit Data Essentially using the centroids as predictions and then computing R- squared.

Dimension Reduction. Why and How

Dimension Reduction. Why and How Dimension Reduction Why and How The Curse of Dimensionality As the dimensionality (i.e. number of variables) of a space grows, data points become so spread out that the ideas of distance and density become

More information

No Adults Allowed! Unsupervised Learning Applied to Gerrymandered School Districts

No Adults Allowed! Unsupervised Learning Applied to Gerrymandered School Districts No Adults Allowed! Unsupervised Learning Applied to Gerrymandered School Districts Divya Siddarth, Amber Thomas 1. INTRODUCTION With more than 80% of public school students attending the school assigned

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Linearly Separable Data SVM: Simple Linear Separator hyperplane Which Simple Linear Separator? Classifier Margin Objective #1: Maximize Margin MARGIN MARGIN How s this look? MARGIN

More information

Instructors: Tengyu Ma and Chris Re

Instructors: Tengyu Ma and Chris Re Instructors: Tengyu Ma and Chris Re cs229.stanford.edu Ø Probability (CS109 or STAT 116) Ø distribution, random variable, expectation, conditional probability, variance, density Ø Linear algebra (Math

More information

Do two parties represent the US? Clustering analysis of US public ideology survey

Do two parties represent the US? Clustering analysis of US public ideology survey Do two parties represent the US? Clustering analysis of US public ideology survey Louisa Lee 1 and Siyu Zhang 2, 3 Advised by: Vicky Chuqiao Yang 1 1 Department of Engineering Sciences and Applied Mathematics,

More information

AMONG the vast and diverse collection of videos in

AMONG the vast and diverse collection of videos in 1 Broadcasting oneself: Visual Discovery of Vlogging Styles Oya Aran, Member, IEEE, Joan-Isaac Biel, and Daniel Gatica-Perez, Member, IEEE Abstract We present a data-driven approach to discover different

More information

Random Forests. Gradient Boosting. and. Bagging and Boosting

Random Forests. Gradient Boosting. and. Bagging and Boosting Random Forests and Gradient Boosting Bagging and Boosting The Bootstrap Sample and Bagging Simple ideas to improve any model via ensemble Bootstrap Samples Ø Random samples of your data with replacement

More information

A comparative analysis of subreddit recommenders for Reddit

A comparative analysis of subreddit recommenders for Reddit A comparative analysis of subreddit recommenders for Reddit Jay Baxter Massachusetts Institute of Technology jbaxter@mit.edu Abstract Reddit has become a very popular social news website, but even though

More information

Statistical Analysis of Corruption Perception Index across countries

Statistical Analysis of Corruption Perception Index across countries Statistical Analysis of Corruption Perception Index across countries AMDA Project Summary Report (Under the guidance of Prof Malay Bhattacharya) Group 3 Anit Suri 1511007 Avishek Biswas 1511013 Diwakar

More information

Probabilistic earthquake early warning in complex earth models using prior sampling

Probabilistic earthquake early warning in complex earth models using prior sampling Probabilistic earthquake early warning in complex earth models using prior sampling Andrew Valentine, Paul Käufl & Jeannot Trampert EGU 2016 21 st April www.geo.uu.nl/~andrew a.p.valentine@uu.nl A case

More information

Subreddit Recommendations within Reddit Communities

Subreddit Recommendations within Reddit Communities Subreddit Recommendations within Reddit Communities Vishnu Sundaresan, Irving Hsu, Daryl Chang Stanford University, Department of Computer Science ABSTRACT: We describe the creation of a recommendation

More information

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships Neural Networks Overview Ø s are considered black-box models Ø They are complex and do not provide much insight into variable relationships Ø They have the potential to model very complicated patterns

More information

Web Mining: Identifying Document Structure for Web Document Clustering

Web Mining: Identifying Document Structure for Web Document Clustering Web Mining: Identifying Document Structure for Web Document Clustering by Khaled M. Hammouda A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of

More information

Classifier Evaluation and Selection. Review and Overview of Methods

Classifier Evaluation and Selection. Review and Overview of Methods Classifier Evaluation and Selection Review and Overview of Methods Things to consider Ø Interpretation vs. Prediction Ø Model Parsimony vs. Model Error Ø Type of prediction task: Ø Decisions Interested

More information

Computational challenges in analyzing and moderating online social discussions

Computational challenges in analyzing and moderating online social discussions Computational challenges in analyzing and moderating online social discussions Aristides Gionis Department of Computer Science Aalto University Machine learning coffee seminar Oct 23, 2017 social media

More information

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012 Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012 Abstract In this paper we attempt to develop an algorithm to generate a set of post recommendations

More information

Evaluating the Connection Between Internet Coverage and Polling Accuracy

Evaluating the Connection Between Internet Coverage and Polling Accuracy Evaluating the Connection Between Internet Coverage and Polling Accuracy California Propositions 2005-2010 Erika Oblea December 12, 2011 Statistics 157 Professor Aldous Oblea 1 Introduction: Polls are

More information

UTS:IPPG Project Team. Project Director: Associate Professor Roberta Ryan, Director IPPG. Project Manager: Catherine Hastings, Research Officer

UTS:IPPG Project Team. Project Director: Associate Professor Roberta Ryan, Director IPPG. Project Manager: Catherine Hastings, Research Officer IPPG Project Team Project Director: Associate Professor Roberta Ryan, Director IPPG Project Manager: Catherine Hastings, Research Officer Research Assistance: Theresa Alvarez, Research Assistant Acknowledgements

More information

POPULATION AGEING: a Cross-Disciplinary Approach Harokopion University, Tuesday 25 May 2010 Drawing the profile of elder immigrants in Greece

POPULATION AGEING: a Cross-Disciplinary Approach Harokopion University, Tuesday 25 May 2010 Drawing the profile of elder immigrants in Greece POPULATION AGEING: a Cross-Disciplinary Approach Harokopion University, Tuesday 25 May 2010 Drawing the profile of elder immigrants in Greece Alexandra TRAGAKI Department of Geography, Harokopion University

More information

Blockmodels/Positional Analysis Implementation and Application. By Yulia Tyshchuk Tracey Dilacsio

Blockmodels/Positional Analysis Implementation and Application. By Yulia Tyshchuk Tracey Dilacsio Blockmodels/Positional Analysis Implementation and Application By Yulia Tyshchuk Tracey Dilacsio Articles O Wasserman and Faust Chapter 12 O O Bearman, Peter S. and Kevin D. Everett (1993). The Structure

More information

Partition Decomposition for Roll Call Data

Partition Decomposition for Roll Call Data Partition Decomposition for Roll Call Data G. Leibon 1,2, S. Pauls 2, D. N. Rockmore 2,3,4, and R. Savell 5 Abstract In this paper we bring to bear some new tools from statistical learning on the analysis

More information

Probabilistic Latent Semantic Analysis Hofmann (1999)

Probabilistic Latent Semantic Analysis Hofmann (1999) Probabilistic Latent Semantic Analysis Hofmann (1999) Presenter: Mercè Vintró Ricart February 8, 2016 Outline Background Topic models: What are they? Why do we use them? Latent Semantic Analysis (LSA)

More information

Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal

Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal Dawei Du, Dan Simon, and Mehmet Ergezer Department of Electrical and Computer Engineering Cleveland State University

More information

8 5 Sampling Distributions

8 5 Sampling Distributions 8 5 Sampling Distributions Skills we've learned 8.1 Measures of Central Tendency mean, median, mode, variance, standard deviation, expected value, box and whisker plot, interquartile range, outlier 8.2

More information

Compare Your Area User Guide

Compare Your Area User Guide Compare Your Area User Guide October 2016 Contents 1. Introduction 2. Data - Police recorded crime data - Population data 3. How to interpret the charts - Similar Local Area Bar Chart - Within Force Bar

More information

Response to the Report Evaluation of Edison/Mitofsky Election System

Response to the Report Evaluation of Edison/Mitofsky Election System US Count Votes' National Election Data Archive Project Response to the Report Evaluation of Edison/Mitofsky Election System 2004 http://exit-poll.net/election-night/evaluationjan192005.pdf Executive Summary

More information

Committee for Economic Development: October Business Leader Study. Submitted to:

Committee for Economic Development: October Business Leader Study. Submitted to: ZOGBY INTERNATIONAL Committee for Economic Development: October Business Leader Study Submitted to: Mike Petro Vice President of Business and Government Policy and Chief of Staff Submitted by: Zogby International

More information

Experiments on Data Preprocessing of Persian Blog Networks

Experiments on Data Preprocessing of Persian Blog Networks Experiments on Data Preprocessing of Persian Blog Networks Zeinab Borhani-Fard School of Computer Engineering University of Qom Qom, Iran Behrouz Minaie-Bidgoli School of Computer Engineering Iran University

More information

Research Statement. Jeffrey J. Harden. 2 Dissertation Research: The Dimensions of Representation

Research Statement. Jeffrey J. Harden. 2 Dissertation Research: The Dimensions of Representation Research Statement Jeffrey J. Harden 1 Introduction My research agenda includes work in both quantitative methodology and American politics. In methodology I am broadly interested in developing and evaluating

More information

Discovering Migrant Types Through Cluster Analysis: Changes in the Mexico-U.S. Streams from 1970 to 2000

Discovering Migrant Types Through Cluster Analysis: Changes in the Mexico-U.S. Streams from 1970 to 2000 Discovering Migrant Types Through Cluster Analysis: Changes in the Mexico-U.S. Streams from 1970 to 2000 Extended Abstract - Do not cite or quote without permission. Filiz Garip Department of Sociology

More information

Situational Analysis: Peterborough & the Kawarthas

Situational Analysis: Peterborough & the Kawarthas Canadian Centre for Economic Analysis Toronto Situational Analysis: February 2018 Geospatial Data Analysis Group ISBN: 978-1-989077-03-0 c 2018 Canadian Centre for Economic Analysis The Canadian Centre

More information

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 - Comparison Sorts - 1 - Sorting Ø We have seen the advantage of sorted data representations for a number of applications q Sparse vectors q Maps q Dictionaries Ø Here we consider the problem of how to efficiently

More information

Police patrol districting method and simulation evaluation using agent-based model & GIS

Police patrol districting method and simulation evaluation using agent-based model & GIS Zhang and Brown Security Informatics 2013, 2:7 RESEARCH Open Access Police patrol districting method and simulation evaluation using agent-based model & GIS Yue Zhang * and Donald E Brown Abstract Police

More information

Parties, Candidates, Issues: electoral competition revisited

Parties, Candidates, Issues: electoral competition revisited Parties, Candidates, Issues: electoral competition revisited Introduction The partisan competition is part of the operation of political parties, ranging from ideology to issues of public policy choices.

More information

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info Ms. Ashwini Gharde 1, Mrs. Ashwini Yerlekar 2 1 M.Tech Student, RGCER, Nagpur Maharshtra, India 2 Asst. Prof, Department of Computer

More information

Instant Runoff Voting s Startling Rate of Failure. Joe Ornstein. Advisor: Robert Norman

Instant Runoff Voting s Startling Rate of Failure. Joe Ornstein. Advisor: Robert Norman Instant Runoff Voting s Startling Rate of Failure Joe Ornstein Advisor: Robert Norman June 6 th, 2009 --Abstract-- Instant Runoff Voting (IRV) is a sophisticated alternative voting system, designed to

More information

Potential alliances for Turkey in coming WTO agricultural negotiations. CIHEAM Analytic note. N 20 June Berna Türkekul

Potential alliances for Turkey in coming WTO agricultural negotiations. CIHEAM Analytic note. N 20 June Berna Türkekul CIHEAM Analytic note N 20 June 2007 Potential alliances for Turkey in coming WTO agricultural negotiations Berna Türkekul Ege University Faculty of Agriculture Agricultural Economics Department Potential

More information

REVEALING THE GEOPOLITICAL GEOMETRY THROUGH SAMPLING JONATHAN MATTINGLY (+ THE TEAM) DUKE MATH

REVEALING THE GEOPOLITICAL GEOMETRY THROUGH SAMPLING JONATHAN MATTINGLY (+ THE TEAM) DUKE MATH REVEALING THE GEOPOLITICAL GEOMETRY THROUGH SAMPLING JONATHAN MATTINGLY (+ THE TEAM) DUKE MATH gerrymander manipulate the boundaries of an electoral constituency to favor one party or class. achieve (a

More information

Ideological Perfectionism on Judicial Panels

Ideological Perfectionism on Judicial Panels Ideological Perfectionism on Judicial Panels Daniel L. Chen (ETH) and Moti Michaeli (EUI) and Daniel Spiro (UiO) Chen/Michaeli/Spiro Ideological Perfectionism 1 / 46 Behavioral Judging Formation of Normative

More information

Economics 470 Some Notes on Simple Alternatives to Majority Rule

Economics 470 Some Notes on Simple Alternatives to Majority Rule Economics 470 Some Notes on Simple Alternatives to Majority Rule Some of the voting procedures considered here are not considered as a means of revealing preferences on a public good issue, but as a means

More information

The 2017 TRACE Matrix Bribery Risk Matrix

The 2017 TRACE Matrix Bribery Risk Matrix The 2017 TRACE Matrix Bribery Risk Matrix Methodology Report Corruption is notoriously difficult to measure. Even defining it can be a challenge, beyond the standard formula of using public position for

More information

A Cluster-Based Approach for identifying East Asian Economies: A foundation for monetary integration

A Cluster-Based Approach for identifying East Asian Economies: A foundation for monetary integration A Cluster-Based Approach for identifying East Asian Economies: A foundation for monetary integration Hazel Yuen a, b a Department of Economics, National University of Singapore, email:hazel23@singnet.com.sg.

More information

A Retrospective Study of State Aid Control in the German Broadband Market

A Retrospective Study of State Aid Control in the German Broadband Market A Retrospective Study of State Aid Control in the German Broadband Market Tomaso Duso 1 Mattia Nardotto 2 Jo Seldeslachts 3 1 DIW Berlin, TU Berlin, Berlin Centre for Consumer Policies, CEPR, and CESifo

More information

Social Rankings in Human-Computer Committees

Social Rankings in Human-Computer Committees Social Rankings in Human-Computer Committees Moshe Bitan 1, Ya akov (Kobi) Gal 3 and Elad Dokow 4, and Sarit Kraus 1,2 1 Computer Science Department, Bar Ilan University, Israel 2 Institute for Advanced

More information

IDENTIFYING FAULT-PRONE MODULES IN SOFTWARE FOR DIAGNOSIS AND TREATMENT USING EEPORTERS CLASSIFICATION TREE

IDENTIFYING FAULT-PRONE MODULES IN SOFTWARE FOR DIAGNOSIS AND TREATMENT USING EEPORTERS CLASSIFICATION TREE IDENTIFYING FAULT-PRONE MODULES IN SOFTWARE FOR DIAGNOSIS AND TREATMENT USING EEPORTERS CLASSIFICATION TREE Bassey. A. Ekanem 1, Nseabasi Essien 2 1 Department of Computer Science, Delta State Polytechnic,

More information

The Seventeenth Amendment, Senate Ideology, and the Growth of Government

The Seventeenth Amendment, Senate Ideology, and the Growth of Government The Seventeenth Amendment, Senate Ideology, and the Growth of Government Danko Tarabar College of Business and Economics 1601 University Ave, PO BOX 6025 West Virginia University Phone: 681-212-9983 datarabar@mix.wvu.edu

More information

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES

Political Economics II Spring Lectures 4-5 Part II Partisan Politics and Political Agency. Torsten Persson, IIES Lectures 4-5_190213.pdf Political Economics II Spring 2019 Lectures 4-5 Part II Partisan Politics and Political Agency Torsten Persson, IIES 1 Introduction: Partisan Politics Aims continue exploring policy

More information

Agent Modeling of Hispanic Population Acculturation and Behavior

Agent Modeling of Hispanic Population Acculturation and Behavior Agent of Hispanic Population Acculturation and Behavior Agent Modeling of Hispanic Population Acculturation and Behavior Lyle Wallis Dr. Mark Paich Decisio Consulting Inc. 201 Linden St. Ste 202 Fort Collins

More information

A GENERAL TYPOLOGY OF PERSONAL NETWORKS OF IMMIGRANTS WITH LESS THAN 10 YEARS LIVING IN SPAIN

A GENERAL TYPOLOGY OF PERSONAL NETWORKS OF IMMIGRANTS WITH LESS THAN 10 YEARS LIVING IN SPAIN 1 XXIII International Sunbelt Social Network Conference 14-16th, February, Cancún (México) A GENERAL TYPOLOGY OF PERSONAL NETWORKS OF IMMIGRANTS WITH LESS THAN 10 YEARS LIVING IN SPAIN Isidro Maya Jariego

More information

* Source: Part I Theoretical Distribution

* Source:   Part I Theoretical Distribution Problem: A recent report from Pew Research Center (September 14, 2018) discussed key finding about U.S. immigrants. One result was that Mexico is the top origin country of the U.S. immigrant population.

More information

Structural Folds: Generative Disruption in Overlapping Groups. Balázs Vedres David Stark

Structural Folds: Generative Disruption in Overlapping Groups. Balázs Vedres David Stark Structural Folds: Generative Disruption in Overlapping Groups Balázs Vedres David Stark Columbia University Central European University Santa Fe Institute AJS, January 2010: Vedres, Balázs, and David Stark.

More information

THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS

THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS Mireille Hildebrandt Research Professor at Vrije Universiteit Brussel (Law) Parttime Full Professor at Radboud University Nijmegen (CS)

More information

IN THE UNITED STATES DISTRICT COURT FOR THE EASTERN DISTRICT OF PENNSYLVANIA

IN THE UNITED STATES DISTRICT COURT FOR THE EASTERN DISTRICT OF PENNSYLVANIA IN THE UNITED STATES DISTRICT COURT FOR THE EASTERN DISTRICT OF PENNSYLVANIA Mahari Bailey, et al., : Plaintiffs : C.A. No. 10-5952 : v. : : City of Philadelphia, et al., : Defendants : PLAINTIFFS EIGHTH

More information

Deep Learning Working Group R-CNN

Deep Learning Working Group R-CNN Deep Learning Working Group R-CNN Includes slides from : Josef Sivic, Andrew Zisserman and so many other Nicolas Gonthier February 1, 2018 Recognition Tasks Image Classification Does the image contain

More information

twentieth century and early years of the twenty-first century, reversed its net migration result,

twentieth century and early years of the twenty-first century, reversed its net migration result, Resident population in Portugal in working ages, according to migratory profiles, 2008 EPC 2012, Stockholm Maria Graça Magalhães, Statistics Portugal and University of Évora (PhD student) Maria Filomena

More information

List of Tables and Appendices

List of Tables and Appendices Abstract Oregonians sentenced for felony convictions and released from jail or prison in 2005 and 2006 were evaluated for revocation risk. Those released from jail, from prison, and those served through

More information

Acculturation over time among adolescents from immigrant Chinese families

Acculturation over time among adolescents from immigrant Chinese families Acculturation over time among adolescents from immigrant Chinese families Catherine L. Costigan University of Victoria Workshop on the Immigrant Family May 28-29, 2012 Population Change and Lifecourse

More information

EUROPEAN CITIZENSHIP

EUROPEAN CITIZENSHIP Standard Eurobarometer 81 Spring 2014 EUROPEAN CITIZENSHIP REPORT Fieldwork: June 2014 This survey has been requested and co-ordinated by the European Commission, Directorate-General for Communication.

More information

Hoboken Public Schools. AP Statistics Curriculum

Hoboken Public Schools. AP Statistics Curriculum Hoboken Public Schools AP Statistics Curriculum AP Statistics HOBOKEN PUBLIC SCHOOLS Course Description AP Statistics is the high school equivalent of a one semester, introductory college statistics course.

More information

Key Considerations for Implementing Bodies and Oversight Actors

Key Considerations for Implementing Bodies and Oversight Actors Implementing and Overseeing Electronic Voting and Counting Technologies Key Considerations for Implementing Bodies and Oversight Actors Lead Authors Ben Goldsmith Holly Ruthrauff This publication is made

More information

Chapter 8: Recursion

Chapter 8: Recursion Chapter 8: Recursion Presentation slides for Java Software Solutions for AP* Computer Science 3rd Edition by John Lewis, William Loftus, and Cara Cocking Java Software Solutions is published by Addison-Wesley

More information

SIERRA LEONE 2012 ELECTIONS PROJECT PRE-ANALYSIS PLAN: INDIVIDUAL LEVEL INTERVENTIONS

SIERRA LEONE 2012 ELECTIONS PROJECT PRE-ANALYSIS PLAN: INDIVIDUAL LEVEL INTERVENTIONS SIERRA LEONE 2012 ELECTIONS PROJECT PRE-ANALYSIS PLAN: INDIVIDUAL LEVEL INTERVENTIONS PIs: Kelly Bidwell (IPA), Katherine Casey (Stanford GSB) and Rachel Glennerster (JPAL MIT) THIS DRAFT: 15 August 2013

More information

Processes. Criteria for Comparing Scheduling Algorithms

Processes. Criteria for Comparing Scheduling Algorithms 1 Processes Scheduling Processes Scheduling Processes Don Porter Portions courtesy Emmett Witchel Each process has state, that includes its text and data, procedure call stack, etc. This state resides

More information

A model for election night forecasting applied to the 2004 South African elections

A model for election night forecasting applied to the 2004 South African elections Volume 22 (1), pp. 89 103 http://www.orssa.org.za ORiON ISSN 0529-191-X c 2006 A model for election night forecasting applied to the 2004 South African elections JM Greben C Elphinstone J Holloway Received:

More information

Iowa Voting Series, Paper 6: An Examination of Iowa Absentee Voting Since 2000

Iowa Voting Series, Paper 6: An Examination of Iowa Absentee Voting Since 2000 Department of Political Science Publications 5-1-2014 Iowa Voting Series, Paper 6: An Examination of Iowa Absentee Voting Since 2000 Timothy M. Hagle University of Iowa 2014 Timothy M. Hagle Comments This

More information

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants

1. The Relationship Between Party Control, Latino CVAP and the Passage of Bills Benefitting Immigrants The Ideological and Electoral Determinants of Laws Targeting Undocumented Migrants in the U.S. States Online Appendix In this additional methodological appendix I present some alternative model specifications

More information

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling Deqing Yang, Yanghua Xiao, Hanghang Tong, Junjun Zhang and Wei Wang School of Computer Science Shanghai Key Laboratory of Data Science

More information

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8 Hoboken Public Schools Project Lead The Way Curriculum Grade 8 Project Lead The Way HOBOKEN PUBLIC SCHOOLS Course Description PLTW Gateway s 9 units empower students to lead their own discovery. The hands-on

More information

QUALITY OF LIFE IN TALLINN AND IN THE CAPITALS OF OTHER EUROPEAN UNION MEMBER STATES

QUALITY OF LIFE IN TALLINN AND IN THE CAPITALS OF OTHER EUROPEAN UNION MEMBER STATES QUALITY OF LIFE IN TALLINN AND IN THE CAPITALS OF OTHER EUROPEAN UNION MEMBER STATES Marika Kivilaid, Mihkel Servinski Statistics Estonia The article gives an overview of the results of the perception

More information

Ward profile information packs: Ryde North East

Ward profile information packs: Ryde North East % of Island population % of Island population Ward profile information packs: The information within this pack is designed to offer key data and information about this ward in a variety of subjects. It

More information

Efficiency Consequences of Affirmative Action in Politics Evidence from India

Efficiency Consequences of Affirmative Action in Politics Evidence from India Efficiency Consequences of Affirmative Action in Politics Evidence from India Sabyasachi Das, Ashoka University Abhiroop Mukhopadhyay, ISI Delhi* Rajas Saroy, ISI Delhi Affirmative Action 0 Motivation

More information

Using a Fuzzy-Based Cluster Algorithm for Recommending Candidates in eelections

Using a Fuzzy-Based Cluster Algorithm for Recommending Candidates in eelections Using a Fuzzy-Based Cluster Algorithm for Recommending Candidates in eelections Luis Terán University of Fribourg, Switzerland Andreas Lander Institut de Hautes Études en Administration Publique (IDHEAP),

More information

The Timeline Method of Studying Electoral Dynamics. Christopher Wlezien, Will Jennings, and Robert S. Erikson

The Timeline Method of Studying Electoral Dynamics. Christopher Wlezien, Will Jennings, and Robert S. Erikson The Timeline Method of Studying Electoral Dynamics by Christopher Wlezien, Will Jennings, and Robert S. Erikson 1 1. Author affiliation information CHRISTOPHER WLEZIEN is Hogg Professor of Government at

More information

KNOW THY DATA AND HOW TO ANALYSE THEM! STATISTICAL AD- VICE AND RECOMMENDATIONS

KNOW THY DATA AND HOW TO ANALYSE THEM! STATISTICAL AD- VICE AND RECOMMENDATIONS KNOW THY DATA AND HOW TO ANALYSE THEM! STATISTICAL AD- VICE AND RECOMMENDATIONS Ian Budge Essex University March 2013 Introducing the Manifesto Estimates MPDb - the MAPOR database and

More information

Publicizing malfeasance:

Publicizing malfeasance: Publicizing malfeasance: When media facilitates electoral accountability in Mexico Horacio Larreguy, John Marshall and James Snyder Harvard University May 1, 2015 Introduction Elections are key for political

More information

Cities and product variety: evidence from restaurants

Cities and product variety: evidence from restaurants 1 / 20 Cities and product variety: evidence from restaurants Nathan Schiff School of Economics Shanghai University of Finance and Economics Urban Land Institute Award Ceremony March 22, 2016 2 / 20 Quality

More information

Analysis of National Identity Data Based on ISSP Questionnaires

Analysis of National Identity Data Based on ISSP Questionnaires 1 Analysis of National Identity Data Based on ISSP Questionnaires Bachelor s Thesis for acquiring the degree of Bachelor of Science (B.Sc.) in Economics at the School of Business and Economics of Humboldt-Universität

More information

Network Indicators: a new generation of measures? Exploratory review and illustration based on ESS data

Network Indicators: a new generation of measures? Exploratory review and illustration based on ESS data Network Indicators: a new generation of measures? Exploratory review and illustration based on ESS data Elsa Fontainha 1, Edviges Coelho 2 1 ISEG Technical University of Lisbon, e-mail: elmano@iseg.utl.pt

More information

The parametric g- formula in SAS JESSICA G. YOUNG CIMPOD 2017 CASE STUDY 1

The parametric g- formula in SAS JESSICA G. YOUNG CIMPOD 2017 CASE STUDY 1 The parametric g- formula in SAS JESSICA G. YOUNG CIMPOD 2017 CASE STUDY 1 Structure of the workshop Part I: Motivation Ø Why we might use the parametric g- formula and how it works in general Part II:

More information

DANISH TECHNOLOGICAL INSTITUTE. Supporting Digital Literacy Public Policies and Stakeholder Initiatives. Topic Report 2.

DANISH TECHNOLOGICAL INSTITUTE. Supporting Digital Literacy Public Policies and Stakeholder Initiatives. Topic Report 2. Supporting Digital Literacy Public Policies and Stakeholder Initiatives Topic Report 2 Final Report Danish Technological Institute Centre for Policy and Business Analysis February 2009 1 Disclaimer The

More information

Understanding the Effect of Gerrymandering on Voter Influence through Shape-based Metrics

Understanding the Effect of Gerrymandering on Voter Influence through Shape-based Metrics Understanding the Effect of Gerrymandering on Voter Influence through Shape-based Metrics Jack Cackler 1 and Luke Bornn 2 1 Department of Biostatistics, Harvard University 2 Department of Statistics, Harvard

More information

DU PhD in Home Science

DU PhD in Home Science DU PhD in Home Science Topic:- DU_J18_PHD_HS 1) Electronic journal usually have the following features: i. HTML/ PDF formats ii. Part of bibliographic databases iii. Can be accessed by payment only iv.

More information

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner Abstract For our project, we analyze data from US Congress voting records, a dataset that consists

More information

Acculturation Strategies : The Case of the Muslim Minority in the United States

Acculturation Strategies : The Case of the Muslim Minority in the United States Acculturation Strategies : The Case of the Muslim Minority in the United States Ziad Swaidan, Jackson State University Kimball P. Marshall, Jackson State University J. R. Smith, Jackson State University

More information

Introduction to Path Analysis: Multivariate Regression

Introduction to Path Analysis: Multivariate Regression Introduction to Path Analysis: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #7 March 9, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Outline. From Pixels to Semantics Research on automatic indexing and retrieval of large collections of images. Research: Main Areas

Outline. From Pixels to Semantics Research on automatic indexing and retrieval of large collections of images. Research: Main Areas From Pixels to Semantics Research on automatic indexing and retrieval of large collections of images James Z. Wang PNC Technologies Career Development Professorship School of Information Sciences and Technology

More information

Maternity support policies: a cluster analysis of 22 European Union countries

Maternity support policies: a cluster analysis of 22 European Union countries Maternity support policies: a cluster analysis of 22 European Union countries Martina Pezer Institute of Public Finance, Smičiklasova 21, Zagreb, Croatia martina.pezer@ijf.hr Abstract: Maternity support

More information

RECOMMENDED CITATION: Pew Research Center, May, 2017, Partisan Identification Is Sticky, but About 10% Switched Parties Over the Past Year

RECOMMENDED CITATION: Pew Research Center, May, 2017, Partisan Identification Is Sticky, but About 10% Switched Parties Over the Past Year NUMBERS, FACTS AND TRENDS SHAPING THE WORLD FOR RELEASE MAY 17, 2017 FOR MEDIA OR OTHER INQUIRIES: Carroll Doherty, Director of Political Research Jocelyn Kiley, Associate Director, Research Bridget Johnson,

More information

Progressives in Alberta

Progressives in Alberta Progressives in Alberta Public opinion on policy, political leaders, and the province s political identity Conducted for Progress Alberta Report prepared by David Coletto, PhD Methodology This study was

More information

A COMPARISON OF ARIZONA TO NATIONS OF COMPARABLE SIZE

A COMPARISON OF ARIZONA TO NATIONS OF COMPARABLE SIZE A COMPARISON OF ARIZONA TO NATIONS OF COMPARABLE SIZE A Report from the Office of the University Economist July 2009 Dennis Hoffman, Ph.D. Professor of Economics, University Economist, and Director, L.

More information

The Direct Democracy Deficit in Two-tier Voting

The Direct Democracy Deficit in Two-tier Voting The Direct Democracy Deficit in Two-tier Voting Preliminary Notes Please do not circulate Nicola Maaser and Stefan Napel, March 2011 Abstract A large population of citizens have single-peaked preferences

More information

Supreme Court of Florida

Supreme Court of Florida Supreme Court of Florida No. AOSC18-58 IN RE: JUROR SELECTION PLAN: MIAMI-DADE COUNTY ADMINISTRATIVE ORDER Section 40.225, Florida Statutes, provides for the selection of jurors to serve within the county

More information

Living in the Shadows or Government Dependents: Immigrants and Welfare in the United States

Living in the Shadows or Government Dependents: Immigrants and Welfare in the United States Living in the Shadows or Government Dependents: Immigrants and Welfare in the United States Charles Weber Harvard University May 2015 Abstract Are immigrants in the United States more likely to be enrolled

More information

Supporting Information for Do Perceptions of Ballot Secrecy Influence Turnout? Results from a Field Experiment

Supporting Information for Do Perceptions of Ballot Secrecy Influence Turnout? Results from a Field Experiment Supporting Information for Do Perceptions of Ballot Secrecy Influence Turnout? Results from a Field Experiment Alan S. Gerber Yale University Professor Department of Political Science Institution for Social

More information

Wisconsin Economic Scorecard

Wisconsin Economic Scorecard RESEARCH PAPER> May 2012 Wisconsin Economic Scorecard Analysis: Determinants of Individual Opinion about the State Economy Joseph Cera Researcher Survey Center Manager The Wisconsin Economic Scorecard

More information

Analyzing Racial Disparities in Traffic Stops Statistics from the Texas Department of Public Safety

Analyzing Racial Disparities in Traffic Stops Statistics from the Texas Department of Public Safety Analyzing Racial Disparities in Traffic Stops Statistics from the Texas Department of Public Safety Frank R. Baumgartner, Leah Christiani, and Kevin Roach 1 University of North Carolina at Chapel Hill

More information

Data Assimilation in Geosciences

Data Assimilation in Geosciences Data Assimilation in Geosciences Alberto Carrassi The Nordic Centre of Excellence for ensemble-based data assimilation Laurent Bertino (Lead), Alberto Carrassi (Co-Lead), Colin Grudzien (PD), Patrick Raanes

More information

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining G. Ritschard (U. Geneva), D.A. Zighed (U. Lyon 2), L. Baccaro (IILS & MIT), I. Georgiu (IILS

More information

information it takes to make tampering with an election computationally hard.

information it takes to make tampering with an election computationally hard. Chapter 1 Introduction 1.1 Motivation This dissertation focuses on voting as a means of preference aggregation. Specifically, empirically testing various properties of voting rules and theoretically analyzing

More information

Should the Democrats move to the left on economic policy?

Should the Democrats move to the left on economic policy? Should the Democrats move to the left on economic policy? Andrew Gelman Cexun Jeffrey Cai November 9, 2007 Abstract Could John Kerry have gained votes in the recent Presidential election by more clearly

More information