Combating Friend Spam Using Social Rejections

Combating Friend Spam Using Social Rejections Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Kamesh Munagala Cyprus Univ. of Technology Duke University Duke University

Friend Spam in online social networks (OSNs) 2

Friend Spam in online social networks (OSNs) Friend spam: unwanted friend requests Fake account 3

Friend Spam in online social networks (OSNs) Friend spam: unwanted friend requests Ø Degrade user experience (e.g., annoying) Ø Introduce false OSN links Fake account 4

False OSN links are harmful Pollute the underlying social graph Ø Detrimental to social search and online ad targeting Ø Jeopardize online privacy and safety 5

False OSN links undermine the effectiveness of Sybil defense The defense relies on genuine social links Ø SybilLimit [S&P 08], SybilRank [NSDI 12] Ø # undetected Sybils (fake accounts) is bounded to O(log V ) per link between Sybils and legitimate users OSN links Non-Sybil region Sybil region 6

Existing counter- measures Privacy sexings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN 7

Existing counter- measures Privacy sexings for OSN users Ø Restrict requests only from friends of friends Ø Subtract from the openness of the OSN Spam request filtering using machine learning (ML) Ø Facebook Immune System (SNS 11) Ø Individual user features are manipublable 8

Rejecto: Combating friend spam using social rejections 9

Observation: the cost of connecting to real users False OSN links come with social rejections Legitimate users Friend spammers 10

Observation: the cost of connecting to real users False OSN links come with social rejections Legitimate users Friend spammers 11

Observation: the cost of connecting to real users False OSN links come with social rejections Ø Social rejections: rejected, ignored, and reported requests Ø Spam requests are less likely to be accepted Legitimate users Friend spammers 12

Live fake accounts in the wild Each has a significant number of pending requests Ø Fake Facebook accounts from underground market Ø More measurement results in the paper Number of requests 120 100 80 60 40 20 0 Pending requests Friends 0 10 20 30 40 Anonymized fake account ID 14

How reliable is social rejection? AXackers inevitably trigger rejections Ø Disproportionally large number of accounts and requests Ø Requests inevitably hit cautious users Rejection towards innocent users is non- manipulable Ø A rejection is guarded by a feedback loop between the request sender and the receiver Ø Legitimate users rarely receive rejections Ø Fundamentally different from negative ratings on online services (e.g., YouTube) 15

Challenges to use social rejection AXack strategies Ø Collusion: fake accounts collude to accept requests Ø Arbitrarily boost the request acceptance rate of an individual account Ø Self- rejection: mimic legitimate users rejecting others Ø Whitewash the part of rejecting fake accounts System challenge Ø Gigantic user base with enormous requests and rejections 16

Rejecto in a nutshell A strategy- proof formulation Ø Graph cut on a rejection- augmented social graph Ø Low aggregate acceptance rate of the requests from spammers to legitimate users An effective and near- linear algorithm Ø Based on the Kernighan- Lin (KL) algorithm [The Bell System Technical Journal, 1970] A scalable implementation Ø Layered on top of Apache Spark [Zaharia et al. NSDI 12] 17

Outline Key insight System design Evaluation 18

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups 19

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) H S 20

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) H S 21

Rejecto s formulation of spammer detection Main idea: put spamming accounts into groups F ( H, S) Aggregate acceptance rate (AAR) F ( H, S) +!" R ( H, S) Fake accounts cannot arbitrarily improve AAR H S 22

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 24

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 25

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 26

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted 27

Spam requests lead to a low aggregate acceptance rate Lower than the requests from a set of legitimate users Ø Spam requests are less likely to be accepted A small AAR ratio cut 28

A graph cut model Augments a social graph with rejections Ø Directed rejection edges Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 29

A graph cut model Augments a social graph with rejections Immune to collusion and self-rejection strategies Ø Directed rejection edges Finds the cut with the minimum aggregate acceptance rate (MAAR) Ø Graph partitioning based on requests and rejections Iteratively cuts off groups of suspicious accounts Ø Prunes their links and rejections from the social graph 30

Outline Key insight System design Evaluation 31

Finding the MAAR cut MAAR cut is NP- hard is challenging Ø Reduced from MIN- RATIO- CUT problem [Leighton & Rao, JACM 79] Ø Detailed reduction in the paper Existing work on cut- based problems in undirected graphs Ø State of the art: O(log V ) approximation algorithms with complexity of O( V 2 ) [Madry, FOCS 10] 32

Finding the MAAR cut MAAR cut is NP- hard is challenging Ø Reduced from MIN- RATIO- CUT problem [Leighton & Rao, Ø The JACM 79] approximation factor O(log V ) is too loose Ø Detailed Ø O( V reduction 2 ) complexity in the is paper prohibitive Ø Do not support parallel graph processing Existing work on cut- based problems in undirected graphs Ø State of the art: O(log V ) approximation algorithms with complexity of O( V 2 ) [Madry, FOCS 10] 33

Our approach: an effective and efficient search algorithm Finds a MAAR cut by interchanging misplaced nodes Ø Based on the Kernighan- Lin (KL) algorithm Ø O( V ) complexity Ø Can scale up to multimillion- node social graphs 34

A primer on the Kernighan- Lin (KL) algorithm Searches a balanced cut in undirected graphs Ø Minimizes #cross- partition edges Ø Reduces cross- partition edges by swapping nodes Ø Fudiccia et al. improved to O( V ) [DAC 82] Ø Widely used in VLSI layout design U V-U 36

Transforming the MAAR cut problem Convert to a set of bipartition problems Ø Each with a parameterized linear objective function Ø Rejection and social links can be unified F ( V S, S) ( ) +!" R ( V S, S) F V S, S F ( V S, S) k!" R ( V S, S) Solvable by KL after unifying the rejections and OSN links according to the parameter k 38

Why can we do the transformation? The MAAR cut is an optimal solution to one of the converted family of bipartition problems Ø The converted problem is determined by the MAAR cut ratio k* Theorem: In a rejection- augmented social graph, if the cut C * = U *,U * is the minimum aggregate acceptance rate (MAAR) cut, and F ( U *,U * ) = k * (k* > 0),!" * R U,U * C* is the optimal solution to the bipartition problem that minimizes. ( ) k * R!" U,U F U,U 39

Optimization and implementation Support seed pre- placement to reduce false positives Ø Seeds of both legitimate users and spamming accounts Prototype on Apache Spark Ø Distribute the large social graph to workers Ø Keep only a tractable set of algorithm states on the master 40

Outline Key insight System design Evaluation 41

Evaluation Extensive simulations on real social networks Ø Sensitivity analysis Ø Resilience to axack strategies Ø Compared to VoteTrust Simulations under Sybil axack Ø In- depth defense with social- graph- based Sybil defense A Rejecto prototype on an Amazon EC2 cluster Ø Performance analysis on large graph processing 42

Rejecto is insensitive to spam request volume Request flooding axacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Precision/recall 1 0.9 0.8 Rejecto VoteTrust 0.7 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account All fake accounts send out spam requests Precision/recall 1 0.8 0.6 0.4 Rejecto VoteTrust 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Only half of the fake accounts send out spam requests 43

Precision/recall Rejecto is insensitive to spam request volume Request flooding axacks on a Facebook sample graph Ø Fake accounts connect with each other as normal users do Rejecto uncovers fakes behind 1 1 the actively spamming ones 0.9 0.8 Rejecto Rejecto VoteTrust 0.6 0.8 VoteTrust 0.4 0.7 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Number of requests per fake account Number of requests per fake account All fake accounts send out spam requests Precision/recall Only half of the fake accounts send out spam requests 44

Rejecto is resilient to a]ack strategies Our MAAR cut model is immune to manipulation Precision/recall 1 0.8 Collusion strategy to form dense connections among fake accounts 0.6 Rejecto 0.4 VoteTrust 0.2 0 5 10 15 20 25 30 35 40 # of non-attack edges per fake account Precision/recall 1 0.8 0.6 0.4 0.2 Self-rejection strategy to let half of the fakes reject the rest as legitimate users do Rejecto VoteTrust 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Self-rejection rate among fake accounts 45

Rejecto and social- graph- based Sybil detection form a defense in depth Rejecto makes fakes hard to get additional links Ø Defense in depth with SybilRank Area under the ROC curve 1 0.8 0.6 0.4 Facebook ca-astroph 1000 2000 3000 4000 5000 Number of accounts removed by Rejecto Improvement 46

Rejecto can handle multimillion- user social graphs Performance on an EC2 cluster Ø Spark 0.9.2 Ø 5 c3.8xlarge VMs Ø A larger cluster yields bexer performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execu5on 5me 288 sec 669 sec 1767 sec 8049 sec 7.7 hours 47

Rejecto can handle multimillion- user social graphs Performance on an EC2 cluster Ø Spark 0.9.2 Execution time grows gracefully with the graph size Ø 5 c3.8xlarge VMs Ø A larger cluster yields bexer performance # Users 0.5M 1M 2M 5M 10M # Edges ~8M ~16M ~32M ~80M ~160M Execu5on 5me 288 sec 669 sec 1767 sec 8049 sec 7.7 hours 48

Conclusion Rejecto: uncovers friend spammers using social rejections Ø Immune to axack strategies Ø Efficient Ø Scalable 49