Combating Friend Spam Using Social Rejections

Similar documents
Computational challenges in analyzing and moderating online social discussions

The Australian Society for Operations Research

Measurement and Analysis of an Online Content Voting Network: A Case Study of Digg

Subreddit Recommendations within Reddit Communities

VOTING DYNAMICS IN INNOVATION SYSTEMS

Learning Systems. Research at the Intersection of Machine Learning & Data Systems. Joseph E. Gonzalez

Constraint satisfaction problems. Lirong Xia

Michael Laver and Ernest Sergenti: Party Competition. An Agent-Based Model

A Framework for the Quantitative Evaluation of Voting Rules

Tengyu Ma Facebook AI Research. Based on joint work with Yuanzhi Li (Princeton) and Hongyang Zhang (Stanford)

Political Districting for Elections to the German Bundestag: An Optimization-Based Multi-Stage Heuristic Respecting Administrative Boundaries

Tengyu Ma Facebook AI Research. Based on joint work with Rong Ge (Duke) and Jason D. Lee (USC)

Junk News on Military Affairs and National Security: Social Media Disinformation Campaigns Against US Military Personnel and Veterans

Voting and Complexity

October Next Generation Smart Border Security Ability. Quality. Delivery.

Collective Decisions, Error and Trust in Wireless Networks

Minimum Spanning Tree Union-Find Data Structure. Feb 28, 2018 CSCI211 - Sprenkle. Comcast wants to lay cable in a neighborhood. Neighborhood Layout

Analysis of the Reputation System and User Contributions on a Question Answering Website: StackOverflow

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

ECE250: Algorithms and Data Structures Trees

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Search Trees. Chapter 10. CSE 2011 Prof. J. Elder Last Updated: :51 PM

Coalitional Game Theory for Communication Networks: A Tutorial

Fall Detection for Older Adults with Wearables. Chenyang Lu

Title: Solving Problems by Searching AIMA: Chapter 3 (Sections 3.1, 3.2 and 3.3)

A Bloom Filter Based Scalable Data Integrity Check Tool for Large-scale Dataset

Police patrol districting method and simulation evaluation using agent-based model & GIS

Optimizing space-based microlensing exoplanet surveys

Designing police patrol districts on street network

Game Theory. Jiang, Bo ( 江波 )

Cluster Analysis. (see also: Segmentation)

Hoboken Public Schools. Project Lead The Way Curriculum Grade 8

Coalitional Game Theory

CHAPTER 2 LITERATURE REVIEW

Randomized Pursuit-Evasion in Graphs

Polydisciplinary Faculty of Larache Abdelmalek Essaadi University, MOROCCO 3 Department of Mathematics and Informatics

Game theoretical techniques have recently

NEW PERSPECTIVES ON THE LAW & ECONOMICS OF ELECTIONS

Uninformed search. Lirong Xia

Last Time. Bit banged SPI I2C LIN Ethernet. u Embedded networks. Ø Characteristics Ø Requirements Ø Simple embedded LANs

THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS

Quality of Service in Optical Telecommunication Networks

Polarization, Partisanship and Junk News Consumption over Social Media in the US COMPROP DATA MEMO / FEBRUARY 6, 2018

Iterated Prisoner s Dilemma on Alliance Networks

Junk News on Military Affairs and National Security: Social Media Disinformation Campaigns Against US Military Personnel and Veterans

How to Change a Group s Collective Decision?

arxiv: v5 [cs.gt] 21 Jun 2014

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Support Vector Machines

Random Forests. Gradient Boosting. and. Bagging and Boosting

HPCG on Tianhe2. Yutong Lu 1,Chao Yang 2, Yunfei Du 1

Data manipulation in the Mexican Election? by Jorge A. López, Ph.D.

Pathbreakers? Women's Electoral Success and Future Political Participation

A Calculus for End-to-end Statistical Service Guarantees

CSE 308, Section 2. Semester Project Discussion. Session Objectives

Event Based Sequential Program Development: Application to Constructing a Pointer Program

Adaptive QoS Control for Real-Time Systems

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Real- Time Wireless Control Networks for Cyber- Physical Systems

File Systems: Fundamentals

The usage of electronic voting is spreading because of the potential benefits of anonymity,

Cloud Tutorial: AWS IoT. TA for class CSE 521S, Fall, Jan/18/2018 Haoran Li

Algorithms, Games, and Networks February 7, Lecture 8

ECN MODEL LENIENCY PROGRAMME

Adapting the Social Network to Affect Elections

Socially-Informed Timeline Generation for Complex Events

Countering Adversary Attacks on Democracy. It's Not Just About Elections. Thought Leader Summary

Exploring QR Factorization on GPU for Quantum Monte Carlo Simulation

Influence in Social Networks

Analyzing and Representing Two-Mode Network Data Week 8: Reading Notes

Automated Classification of Congressional Legislation

ANALYSIS OF SOCIAL INTERACTIONS IN A SOCIAL NEWS APPLICATION

Midterm Review. EECS 2011 Prof. J. Elder - 1 -

Complexity of Strategic Behavior in Multi-Winner Elections

Integrative Analytics for Detecting and Disrupting Transnational Interdependent Criminal Smuggling, Money, and Money-Laundering Networks

Control Complexity of Schulze Voting

STUDY GUIDE FOR TEST 2

Comparison of the Psychometric Properties of Several Computer-Based Test Designs for. Credentialing Exams

Public Affairs 856 Trade, Competition, and Governance in a Global Economy Lecture 22 4/10/2017. Instructor: Prof. Menzie Chinn UW Madison Spring 2017

The NRA and Gun Control ADPR 5750 Spring 2016

VoteCastr methodology

Secure Electronic Voting

Public Affairs 856 Trade, Competition, and Governance in a Global Economy Lecture 23 4/18/2018. Instructor: Prof. Menzie Chinn UW Madison Spring 2018

Generalized Scoring Rules: A Framework That Reconciles Borda and Condorcet

Wind power integration and consumer behavior: a complementarity approach

Head, Financial Crime Control (FCC) Supported by: Operational Risk & Compliance Committee (ORCC)

2011 The Pursuant Group, Inc.

DU PhD in Home Science

Legislative Brief The Information Technology (Amendment) Bill, 2006

FRIEND OR FAUX? Teaching students to separate fact from fiction in the age of Fake News.

On the Complexity of Voting Manipulation under Randomized Tie-Breaking

Understanding New Attacks on Section 230 Immunity

Case 3:13-cv REP-LO-AD Document Filed 10/07/15 Page 1 of 23 PageID# APPENDIX A: Richmond First Plan. Dem Lt. Dem Atty.

Why Biometrics? Why Biometrics? Biometric Technologies: Security and Privacy 2/25/2014. Dr. Rigoberto Chinchilla School of Technology

Women and Power: Unpopular, Unwilling, or Held Back? Comment

A New Method of the Single Transferable Vote and its Axiomatic Justification

Update on Facebook s Civil Rights Audit

Priority Queues & Heaps

An Investigation into a Circuit Based Supply Chain Analyzer for FPGAs

Hyo-Shin Kwon & Yi-Yi Chen

Transcription:

Combating Friend Spam Using Social Rejections Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Kamesh Munagala Cyprus Univ. of Technology Duke University Duke University

Friend Spam in online social networks (OSNs) 2

Friend Spam in online social networks (OSNs) Friend spam: unwanted friend requests Fake account 3

Friend Spam in online social networks (OSNs) Friend spam: unwanted friend requests Ø Degrade user experience (e.g., annoying) Ø Introduce false OSN links Fake account 4

False OSN links are harmful Pollute the underlying social graph Ø Detrimental to social search and online ad targeting Ø Jeopardize online privacy and safety 5

False OSN links undermine the effectiveness of Sybil defense The defense relies on genuine social links Ø SybilLimit [S&P 08], SybilRank [NSDI 12] Ø # undetected Sybils (fake accounts) is bounded to O(log V ) per link between Sybils and legitimate users OSN links Non-Sybil region Sybil region 6

Existing counter- measures Privacy sexings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN 7

Existing counter- measures Privacy sexings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN Spam request filtering using machine learning (ML) Ø Facebook Immune System (SNS 11) Ø Individual user features are manipublable 8

Rejecto: Combating friend spam using social rejections 9

Observation: the cost of connecting to real users False OSN links come with social rejections Legitimate users Friend spammers 10

Observation: the cost of connecting to real users False OSN links come with social rejections Legitimate users Friend spammers 11

Observation: the cost of connecting to real users False OSN links come with social rejections Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted Legitimate users Friend spammers 12

Observation: the cost of connecting to real users False OSN links come with social rejections Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted Many rejections Legitimate users Friend spammers 13

Live fake accounts in the wild Each has a significant number of pending requests Ø Fake Facebook accounts from underground market Ø More measurement results in the paper Number of requests 120 100 80 60 40 20 0 Pending requests Friends 0 10 20 30 40 Anonymized fake account ID 14

How reliable is social rejection? AXackers inevitably trigger rejections Ø Disproportionally large number of accounts and requests Ø Requests inevitably hit cautious users Rejection towards innocent users is non- manipulable Ø A rejection is guarded by a feedback loop between the request sender and the receiver Ø Legitimate users rarely receive rejections Ø Fundamentally different from negative ratings on online services (e.g., YouTube) 15

Challenges to use social rejection AXack strategies Ø Collusion: fake accounts collude to accept requests Ø Arbitrarily boost the request acceptance rate of an individual account Ø Self- rejection: mimic legitimate users rejecting others Ø Whitewash the part of rejecting fake accounts System challenge Ø Gigantic user base with enormous requests and rejections 16

Rejecto in a nutshell A strategy- proof formulation Ø Graph cut on a rejection- augmented social graph Ø Low aggregate acceptance rate of the requests from spammers to legitimate users An effective and near- linear algorithm Ø Based on the Kernighan- Lin (KL) algorithm [The Bell System Technical Journal, 1970] A scalable implementation Ø Layered on top of Apache Spark [Zaharia et al. NSDI 12] 17

Outline Key insight System design Evaluation 18

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups 19

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) H S 20

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) H S 21

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) Fake accounts cannot arbitrarily improve AAR H S 22

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) Fake accounts cannot arbitrarily improve AAR H S 23

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 24

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 25

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 26

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 27

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted A small AAR ratio cut 28

A graph cut model Augments a social graph with rejections Ø Directed rejection edges Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 29

A graph cut model Augments a social graph with rejections Immune to collusion and self-rejection strategies Ø Directed rejection edges Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 30

Outline Key insight System design Evaluation 31

Finding the MAAR cut MAAR cut is NP- hard is challenging Ø Reduced from MIN- RATIO- CUT problem [Leighton & Rao, JACM 79] Ø Detailed reduction in the paper Existing work on cut- based problems in undirected graphs Ø State of the art: O(log V ) approximation algorithms with complexity of O( V 2 ) [Madry, FOCS 10] 32

Finding the MAAR cut MAAR cut is NP- hard is challenging Ø Reduced from MIN- RATIO- CUT problem [Leighton & Rao, Ø The JACM 79] approximation factor O(log V ) is too loose Ø Detailed Ø O( V reduction 2 ) complexity in the is paper prohibitive Ø Do not support parallel graph processing Existing work on cut- based problems in undirected graphs Ø State of the art: O(log V ) approximation algorithms with complexity of O( V 2 ) [Madry, FOCS 10] 33

Our approach: an effective and efficient search algorithm Finds a MAAR cut by interchanging misplaced nodes Ø Based on the Kernighan- Lin (KL) algorithm Ø O( V ) complexity Ø Can scale up to multimillion- node social graphs 34

Our approach: an effective and efficient search algorithm Finds a MAAR cut by interchanging misplaced nodes Ø Based on the Kernighan- Lin (KL) algorithm Ø O( V ) complexity Ø Can scale up to multimillion- node social graphs 35

A primer on the Kernighan- Lin (KL) algorithm Searches a balanced cut in undirected graphs Ø Minimizes #cross- partition edges Ø Reduces cross- partition edges by swapping nodes Ø Fudiccia et al. improved to O( V ) [DAC 82] Ø Widely used in VLSI layout design U V-U 36

A primer on the Kernighan- Lin (KL) algorithm Searches a balanced cut in undirected graphs Ø Minimizes #cross- partition edges Ø Reduces cross- partition edges by swapping nodes Ø Fudiccia et al. improved to O( V ) [DAC 82] How to use KL to find the MAAR cut? Ø Widely used in VLSI layout design Ø Additional directed rejection edges Ø Non- linear MAAR objective function U V-U 37

Transforming the MAAR cut problem Convert to a set of bipartition problems Ø Each with a parameterized linear objective function Ø Rejection and social links can be unified F ( V S, S) ( ) +!" R ( V S, S) F V S, S F ( V S, S) k!" R ( V S, S) Solvable by KL after unifying the rejections and OSN links according to the parameter k 38

Why can we do the transformation? The MAAR cut is an optimal solution to one of the converted family of bipartition problems Ø The converted problem is determined by the MAAR cut ratio k* Theorem: In a rejection- augmented social graph, if the cut C * = U *,U * is the minimum aggregate acceptance rate (MAAR) cut, and F ( U *,U * ) = k * (k* > 0),!" * R U,U * C* is the optimal solution to the bipartition problem that minimizes. ( ) k * R!" U,U F U,U 39

Optimization and implementation Support seed pre- placement to reduce false positives Ø Seeds of both legitimate users and spamming accounts Prototype on Apache Spark Ø Distribute the large social graph to workers Ø Keep only a tractable set of algorithm states on the master 40

Outline Key insight System design Evaluation 41

Evaluation Extensive simulations on real social networks Ø Sensitivity analysis Ø Resilience to axack strategies Ø Compared to VoteTrust Simulations under Sybil axack Ø In- depth defense with social- graph- based Sybil defense A Rejecto prototype on an Amazon EC2 cluster Ø Performance analysis on large graph processing 42

Rejecto is insensitive to spam request volume Request flooding axacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Precision/recall 1 0.9 0.8 Rejecto VoteTrust 0.7 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account All fake accounts send out spam requests Precision/recall 1 0.8 0.6 0.4 Rejecto VoteTrust 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Only half of the fake accounts send out spam requests 43

Precision/recall Rejecto is insensitive to spam request volume Request flooding axacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Rejecto uncovers fakes behind 1 1 the actively spamming ones 0.9 0.8 Rejecto Rejecto VoteTrust 0.6 0.8 VoteTrust 0.4 0.7 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Number of requests per fake account All fake accounts send out spam requests Precision/recall Only half of the fake accounts send out spam requests 44

Rejecto is resilient to a]ack strategies Our MAAR cut model is immune to manipulation Precision/recall 1 0.8 Collusion strategy to form dense connections among fake accounts 0.6 Rejecto 0.4 VoteTrust 0.2 0 5 10 15 20 25 30 35 40 # of non-attack edges per fake account Precision/recall 1 0.8 0.6 0.4 0.2 Self-rejection strategy to let half of the fakes reject the rest as legitimate users do Rejecto VoteTrust 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Self-rejection rate among fake accounts 45

Rejecto and social- graph- based Sybil detection form a defense in depth Rejecto makes fakes hard to get additional links Ø Defense in depth with SybilRank Area under the ROC curve 1 0.8 0.6 0.4 Facebook ca-astroph 1000 2000 3000 4000 5000 Number of accounts removed by Rejecto Improvement 46

Rejecto can handle multimillion- user social graphs Performance on an EC2 cluster Ø Spark 0.9.2 Ø 5 c3.8xlarge VMs Ø A larger cluster yields bexer performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execu5on 5me 288 sec 669 sec 1767 sec 8049 sec 7.7 hours 47

Rejecto can handle multimillion- user social graphs Performance on an EC2 cluster Ø Spark 0.9.2 Execution time grows gracefully with the graph size Ø 5 c3.8xlarge VMs Ø A larger cluster yields bexer performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execu5on 5me 288 sec 669 sec 1767 sec 8049 sec 7.7 hours 48

Conclusion Rejecto: uncovers friend spammers using social rejections Ø Immune to axack strategies Ø Efficient Ø Scalable 49