Analysis of the Reputation System and User Contributions on a Question Answering Website: StackOverflow

Similar documents
A comparative analysis of subreddit recommenders for Reddit

Identifying Factors in Congressional Bill Success

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Relative Performance Evaluation and the Turnover of Provincial Leaders in China

Classifier Evaluation and Selection. Review and Overview of Methods

Measurement and Analysis of an Online Content Voting Network: A Case Study of Digg

An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems

DU PhD in Home Science

CSE 190 Professor Julian McAuley Assignment 2: Reddit Data. Forrest Merrill, A Marvin Chau, A William Werner, A

The Civic Mission of MOOCs: Measuring Engagement across Political Differences in Forums

Probabilistic Latent Semantic Analysis Hofmann (1999)

Constraint satisfaction problems. Lirong Xia

CS 229: r/classifier - Subreddit Text Classification

Subreddit Recommendations within Reddit Communities

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Here, have an upvote: communication behaviour and karma on Reddit

Classification of posts on Reddit

CSE 190 Assignment 2. Phat Huynh A Nicholas Gibson A

COSC-282 Big Data Analytics. Final Exam (Fall 2015) Dec 18, 2015 Duration: 120 minutes

Towards Tackling Hate Online Automatically

Narrative Manifesto PREPARED BY

Statistical Analysis of Corruption Perception Index across countries

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

Supporting Information for Inclusion and Public. Policy: Evidence from Sweden s Introduction of. Noncitizen Suffrage

On the Measurement and Validation of Political Ideology

Project Presentations - 1

Instructors: Tengyu Ma and Chris Re

What makes people feel free: Subjective freedom in comparative perspective Progress Report

Analyzing Racial Disparities in Traffic Stops Statistics from the Texas Department of Public Safety

Incumbency as a Source of Spillover Effects in Mixed Electoral Systems: Evidence from a Regression-Discontinuity Design.

12 Socio Economic Effects

Does Inequality in Skills Explain Inequality of Earnings Across Countries?

Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

RBS SAMPLING FOR EFFICIENT AND ACCURATE TARGETING OF TRUE VOTERS

Socially-Informed Timeline Generation for Complex Events

CS 886: Multiagent Systems. Fall 2016 Kate Larson

GLOBAL WAGE REPORT 2016/17

Appendix to Non-Parametric Unfolding of Binary Choice Data Keith T. Poole Graduate School of Industrial Administration Carnegie-Mellon University

oductivity Estimates for Alien and Domestic Strawberry Workers and the Number of Farm Workers Required to Harvest the 1988 Strawberry Crop

WHO MIGRATES? SELECTIVITY IN MIGRATION

Introduction to Path Analysis: Multivariate Regression

Schooling and Cohort Size: Evidence from Vietnam, Thailand, Iran and Cambodia. Evangelos M. Falaris University of Delaware. and

Do People Pay More Attention to Earthquakes in Western Countries?

Online Appendix: The Effect of Education on Civic and Political Engagement in Non-Consolidated Democracies: Evidence from Nigeria

Michael Sugimura, B.A. Washington, DC April 3, 2016

IV. Labour Market Institutions and Wage Inequality

Economy of U.S. Tariff Suspensions

Tengyu Ma Facebook AI Research. Based on joint work with Rong Ge (Duke) and Jason D. Lee (USC)

Vote Compass Methodology

Quality of Institutions : Does Intelligence Matter?

Factors which influence the sentencing of domestic violence offenders

Modeling Blogger Influence in a Community

Experiments on Data Preprocessing of Persian Blog Networks

Social Computing in Blogosphere

MODE 4 COMMITMENTS IN ACTION WTO Seminar Mode 4 at Work. Geneva, 10 October 2018

The Karma of Digg: Reciprocity in Online Social Networks

Elite Polarization and Mass Political Engagement: Information, Alienation, and Mobilization

Migrant Wages, Human Capital Accumulation and Return Migration

Combating Friend Spam Using Social Rejections

Connecting Voting Theory and Graph Theory

5A. Wage Structures in the Electronics Industry. Benjamin A. Campbell and Vincent M. Valvano

Transferability of Skills, Income Growth and Labor Market Outcomes of Recent Immigrants in the United States. Karla Diaz Hadzisadikovic*

Split Decisions: Household Finance when a Policy Discontinuity allocates Overseas Work

Patterns in Congressional Earmarks

Behavioural Anomalies Explain Variation in Voter Turnout

Impact of Economic Freedom and Women s Well-Being

FOURIER ANALYSIS OF THE NUMBER OF PUBLIC LAWS David L. Farnsworth, Eisenhower College Michael G. Stratton, GTE Sylvania

Latin American Immigration in the United States: Is There Wage Assimilation Across the Wage Distribution?

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Analysis of Categorical Data from the California Department of Corrections

Introduction: Data & measurement

Abstract. research studies the impacts of four factors on inequality income level, emigration,

A New Paradigm for the Study of Corruption in Different Cultures

Fall Detection for Older Adults with Wearables. Chenyang Lu

Statistics, Politics, and Policy

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

The Determinants of Low-Intensity Intergroup Violence: The Case of Northern Ireland. Online Appendix

The Intersection of Social Media and News. We are now in an era that is heavily reliant on social media services, which have replaced

Just War or Just Politics? The Determinants of Foreign Military Intervention

A REPLICATION OF THE POLITICAL DETERMINANTS OF FEDERAL EXPENDITURE AT THE STATE LEVEL (PUBLIC CHOICE, 2005) Stratford Douglas* and W.

Public Opinions towards Gun Control vs. Gun Ownership. Society today is witnessing a major increase in violent crimes involving guns.

GST 104: Cartographic Design Lab 6: Countries with Refugees and Internally Displaced Persons Over 1 Million Map Design

Bachelorproject 2 The Complexity of Compliance: Why do member states fail to comply with EU directives?

SCATTERGRAMS: ANSWERS AND DISCUSSION

THE SPN JACKPOT REWARDS

Support Vector Machines

Computational challenges in analyzing and moderating online social discussions

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media

List of Tables and Appendices

Analysis of Social Voting Patterns on Digg

IMMIGRATION REFORM, JOB SELECTION AND WAGES IN THE U.S. FARM LABOR MARKET

Ideological Perfectionism on Judicial Panels

CROSS BORDER MOVEMENT AND ACHIEVEMENTS OF MIGRANT WORKERS - CHANGING PERSPECTIVES ISSN

Voting Power in the FOMC

Semi-supervised graph labelling reveals increasing partisanship in the United States Congress

Computational Social Choice: Spring 2017

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset

Do parties and voters pursue the same thing? Policy congruence between parties and voters on different electoral levels

How to Form Winning Coalitions in Mixed Human-Computer Settings

Transcription:

Analysis of the Reputation System and User Contributions on a Question Answering Website: StackOverflow Dana Movshovitz-Attias Yair Movshovitz-Attias Peter Steenkiste Christos Faloutsos August 27, 2013 - ASONAM

Motivation Q&A networks are gaining popularity Most information is created by a small set of expert users. How to find and motivate expert users? Case study: StackOverflow

User Reputation User

Users that answered Accepted Answer Upvotes

StackOverflow Analysis In this work: Analysis of the SO reputation system (expert users) Participation patterns of expert and non-expert users SVD and PageRank analysis of the SO interaction graph Prediction of influential users using first months of activity

StackOverflow Dataset All actions performed in years: 2008-2012 3.5 M questions, 6.9 M answers, 1.3 M users 2.1 M accepted answers (62% of Q) Total votes: 5.5 M for Q 13 M for A

SO Reputation Users gain reputation by participating in site activities 2012 reputation range: 1-465K http://stackoverflow.com/faq#reputation

SO Reputation Assumption: reputation indicates expertise Expert SO users: top 1% (13087 users) reputation >= 2400

SO Reputation 09-10 : change in reputation scheme Rewarding users who provide good A rather than Q Q upvote: +10 +5

SO Reputation Log-logistic pattern with some deviations: 1. Lower-end is discretized (mixture of log-logistic functions)

SO Reputation 2. User sharing among Stack Exchange websites 100 rep bonus for users with rep>200 New SO account: 101 rep Old SO account: +100 rep

SO Interaction Graph Nodes = Users Edges define interactions: 1. User asked Q User answered 2. User asked Q User answered accepted A User asked Q User answered upvoted A The latter two graphs represent a more meaningful interaction, since the answerer is acknowledged of providing useful information

PageRank: Not Correlated with Reputation PR vs. Deg: Answers PR vs. Rep: Answers log(pr) log(pr) PR is based on graph connectivity log(deg) PR vs. Deg: Accepted A log(rep) PR vs. Rep: Accepted A PR is better correlated with degree than log(pr) log(pr) log(deg) PR vs. Deg: Upvoted A log(pr) log(pr) log(rep) PR vs. Rep: Upvoted A reputation PR distribution is similar over all three interaction graphs log(deg) log(rep)

Explaining Anomalous Users with High PageRank Highlighted: 5 users with high PR and rep=1 These users had their accounts temporarily suspended for problematic behavior (e.g. serial up- or down-voting) 4/5 have high rep online and in old SO snapshot (3K-47K) 1/5 still suspended

Singular Value Decomposition (SVD) The SVD of an adjacency matrix, A, is! A = U x x V T Columns of U: left-singular vectors Eigen-vectors of AAT Columns of V: right-singular vectors Eigen-vectors of AT A

Singular Value Decomposition (SVD) User asked Q User answered accepted A A A = U x x V T Using Identify anomalous questioners using first columns of U (U1, U2,...) Identify anomalous answerers using first columns of V (V1, V2,...)

Anomalous Questioners Have high reputation: 1K - 3K Mainly earned by asking questions Answer-to-Question Z-score All Users -0.04 Anomalous Questioners -9.84

Anomalous Answerers 29K Nodes! Among highest reputation of SO users: 194K - 465K Mainly earned for helpful (accepted) answers Answer-to-Question Z-score All Users -0.04 Anomalous Answerers 108.63

User Contributions Over Time Cumulative mean answers per month Cumulative mean questions per month log(# A) log(# Q) Rep 2400 1 < Rep < 400 log(# months) log(# months) Cumulative mean upvoted answers per month Cumulative mean accepted answers per month log(# upvoted A) log(# accepted A) log(# months) log(# months)

Cumulative mean questions per month Follows log-linear growth for most of the users activity time on site predictable pattern of site usage Expert users answer/ask more Q a month

Identifying Expert Users The analysis shows that expert users contribute more to SO throughout their time on the site This indicates that one can predict which users will become experts based on their early interaction patterns

Identifying Expert Users Problem Statement: Given information of a user s activity on SO in the first N months, we classify this user into one of two classes expert, or non-expert Label Reputation Expert > 2400 Non Expert < 2400

Experimental Setup Filter out users that are not active on SO for at least a year. Ground truth labels are based on the current reputation. Train/test sets are split such that the reputation r of users is % users Min rep Max rep 1/3 1 400 1/3 400 2400 1/3 2400

User Activity Model Answers Questions Accepted Upvoted Upvotes Comments QA Ratio AA Ratio UA Ratio

Results F-measure ROC A. Pal, F. M. Harper, and J. A. Konstan, Exploring question selection bias to identify experts and potential experts in community question answering,

Summary We analyzed the SO reputation scheme: PageRank is not well correlated with user expertise but is effective in detecting anomalous users Both experts and non-experts exhibit log-linear growth in their engagement on the site Expert users contribute drastically more as soon as they join the site They can be identified reliably within a month of use