An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems

Similar documents
An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

A comparative analysis of subreddit recommenders for Reddit

Identifying Factors in Congressional Bill Success

Computational challenges in analyzing and moderating online social discussions

Social Computing in Blogosphere

Analysis of the Reputation System and User Contributions on a Question Answering Website: StackOverflow

Project Presentations - 1

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Experiments on Data Preprocessing of Persian Blog Networks

Comment Mining, Popularity Prediction, and Social Network Analysis

Subreddit Recommendations within Reddit Communities

Designing police patrol districts on street network

CS 229: r/classifier - Subreddit Text Classification

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media

Classifier Evaluation and Selection. Review and Overview of Methods

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Polarisation in Political Twitter Conversations

Modeling Blogger Influence in a Community

Measurement and Analysis of an Online Content Voting Network: A Case Study of Digg

The Karma of Digg: Reciprocity in Online Social Networks

Name Phylogeny. A Generative Model of String Variation. Nicholas Andrews, Jason Eisner and Mark Dredze

Business Wire. At a Glance. January 13, 2015 at 9am - January 20, 2015 at 9am Page VC. 2% Positive Peak: 1 mentions on January 14th at 4pm

Wasserman & Faust, chapter 5

Statistical Analysis of Corruption Perception Index across countries

What's in a name? The Interplay between Titles, Content & Communities in Social Media

COSC-282 Big Data Analytics. Final Exam (Fall 2015) Dec 18, 2015 Duration: 120 minutes

HPCG on Tianhe2. Yutong Lu 1,Chao Yang 2, Yunfei Du 1

CSE 190 Assignment 2. Phat Huynh A Nicholas Gibson A

Using a Fuzzy-Based Cluster Algorithm for Recommending Candidates in eelections

Influence in Social Networks

Role of Political Identity in Friendship Networks

THE POWER OF SOCIAL MEDIA:

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Constraint satisfaction problems. Lirong Xia

Adapting the Social Network to Affect Elections

Social Media in Staffing Guide. Best Practices for Building Your Personal Brand and Hiring Talent on Social Media

A New Method of the Single Transferable Vote and its Axiomatic Justification

Us and Them Adversarial Politics on Twitter

STATISTICS BRIEF URBAN PUBLIC TRANSPORT IN THE 21 ST CENTURY

User Perception of Information Credibility of News on Twitter

Appendix to Non-Parametric Unfolding of Binary Choice Data Keith T. Poole Graduate School of Industrial Administration Carnegie-Mellon University

Geneva Engage Awards 2017

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Modeling blogger influence in a community

Cross Social Media Recommenda1on

CS 4407 Algorithms Greedy Algorithms and Minimum Spanning Trees

Governance in Social Media

Hyo-Shin Kwon & Yi-Yi Chen

Jack Dorsey: Co-Founder of Twitter. The most remarkable change has been the means for public relations practitioners to get their

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

IN POLITICS, WHAT YOU KNOW IS LESS IMPORTANT THAN WHAT YOU D LIKE TO BELIEVE

Development Report The Rise of the South 13 Analysis on Cambodia

Do Individual Heterogeneity and Spatial Correlation Matter?

Return on Investment from Inbound Marketing through Implementing HubSpot Software

FOURIER ANALYSIS OF THE NUMBER OF PUBLIC LAWS David L. Farnsworth, Eisenhower College Michael G. Stratton, GTE Sylvania

Patterns in Congressional Earmarks

Popularity Prediction of Reddit Texts

Evolutionary Game Path of Law-Based Government in China Ying-Ying WANG 1,a,*, Chen-Wang XIE 2 and Bo WEI 2

Refinements of Nash equilibria. Jorge M. Streb. Universidade de Brasilia 7 June 2016

The Evolution of Voter Intent Since the 1995 Referendum Myths and Realities.

Estimating the Margin of Victory for Instant-Runoff Voting

Networked Games: Coloring, Consensus and Voting. Prof. Michael Kearns Networked Life NETS 112 Fall 2013

Issues in Information Systems Volume 18, Issue 2, pp , 2017

The Party is Over Here: Structure and Content in the 2010 Election

Digital Access, Political Networks and the Diffusion of Democracy Introduction and Background

The Australian Society for Operations Research

A Framework for the Quantitative Evaluation of Voting Rules

The NRA and Gun Control ADPR 5750 Spring 2016

World Statistics Day Prepared by the United Nations Statistics Division

Events and Memes in Media- rich Social Informa7on Networks

Approval Voting Theory with Multiple Levels of Approval

Recruiting Your Way to Victory: Varying Strategies in Insurgent/Counterinsurgent Warfare

Structural Folds: Generative Disruption in Overlapping Groups. Balázs Vedres David Stark

Beyond Binary Labels: Political Ideology Prediction of Twitter Users

A secure environment for trading

General Election Opinion Poll. 20 th December 2015

Understanding factors that influence L1-visa outcomes in US

Evaluating the Connection Between Internet Coverage and Polling Accuracy

List of Tables and Appendices

Are Immigrants skills priced differently? : Evidence from job polarization in France

Analyzing and Representing Two-Mode Network Data Week 8: Reading Notes

CS 229 Final Project - Party Predictor: Predicting Political A liation

ScotlandSeptember18.com. Independence Referendum Survey. January Phase 1 and 2 results TNS. Independence Referendum Survey

Jean-Claude Trichet: Completing Economic and Monetary Union

Economic Growth, Foreign Investments and Economic Freedom: A Case of Transition Economy Kaja Lutsoja

Refocusing Express Entry July Stakeholder Consultations

EXPO2015 Social Media Team EXPO2015 Social Media Team Expo 2015 Report on social media activities October 2015

Lifespan and propagation of information in On-line Social Networks: a Case Study

Kicking Butts Online. March 13, PM EST

Europe and the US: Preferences for Redistribution

Social Media based Analysis of Refugees in Turkey

Cosentino Brands Monthly Social Media Report. December/End of the Year 2014

The economics* tourism

Case Bb (elastic, 1D vertical gradient)

CSC304 Lecture 16. Voting 3: Axiomatic, Statistical, and Utilitarian Approaches to Voting. CSC304 - Nisarg Shah 1

Media and State Stability Lessons Learned

Biogeography-Based Optimization Combined with Evolutionary Strategy and Immigration Refusal

The Personal. The Media Insight Project

Servilla: Service Provisioning in Wireless Sensor Networks. Chenyang Lu

Transcription:

An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems Quentin Grossetti 1,2 Supervised by Cédric du Mouza 2, Camelia Constantin 1 and Nicolas Travers 2 1 LIP6 - Université Pierre Marie Curie - Paris, France 2 CEDRIC Laboratory - CNAM - Paris, France BDA - Novembre 2017 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 1 / 31

Introduction Context Growth of microblogging plateforms since 2000 700 millions of messages/day in 2017 300 millions of messages/day in 2017 70 millions of publications/day in 2017 70 millions of pictures/day in 2017 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 2 / 31

Introduction Real life examples An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 3 / 31

Introduction Real life examples Finding Users of Interest in Micro-blogging Systems (EDBT 2016) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 3 / 31

Problem How to connect users to relevant messages? Recommendation of messages 700M new messages every day 300M of users Real time An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 4 / 31

Table of contents 1 State of the art 2 Data Analysis Topology Retweets Homophily 3 Approach Similarity graph Propagation Model 4 Experiments Protocol Results Updating strategies 5 Conclusion 6 Annexes An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 5 / 31

State of the art State of the art Content-based [Lops (2011)] Method Pros Cons Content-based No need of interactions tweets are hard to describe An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 6 / 31

State of the art State of the art Collaborative filtering [Schafer (2007)] Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results too large matrix An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 6 / 31

State of the art State of the art Matrix Factorization [Koren (2009)] Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results too large matrix Matrix Factorization efficient to fight sparsity matrix growing too fast An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 6 / 31

State of the art State of the art Hybrid systems [Bostandjiev (2010)] Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results too large matrix Matrix Factorization efficient to fight sparsity matrix growing too fast Hybrid systems increase user engagement hard to describe relationship An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 6 / 31

State of the art State of the art Random walks models [Sharma (2016)] Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results too large matrix Matrix Factorization efficient to fight sparsity matrix growing too fast Hybrid systems increase user engagement hard to describe relationship Random walks models very cheap low memory An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 6 / 31

State of the art State of the art Not only recommendations User recommendation (topology,content-based, demographic etc...) Hashtag (Bayesian model, euclidien...) Timeline Filtering (Deep Learning) Few papers on tweets recommendation except Twitter in 2016 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 7 / 31

Data Analysis Data Analysis Dataset Updated connected component from the graph found in [Kwak (2009)]. No of nodes 2,182,867 No of edges 325,451,980 No of tweets 2,571,173,369 Avg. out-degree 57.8 Avg. in-degree 69.4 max out-degree 348,595 max in-degree 185,401 Diameter 15 Average shortest path 3.7 Table Twitter dataset characteristics An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 8 / 31

Data Analysis Topology Data Analysis Topology 10 10 10 8 Number of paths 10 6 10 4 10 2 Small world with average distance of 3.7 10 0 1 2 3 4 5 10 15 Smallest path Figure Twitter smallest paths distribution An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 9 / 31

Data Analysis Retweets Data Analysis Retweets 10 10 10 9 Number of tweets 10 8 10 7 10 6 10 5 10 4 10 3 1 retweet - 7% 2-5 retweets - 1% 6+ - 0,2% 0 1 2-5 6-50 51-200201-500 500+ Number of retweets Figure Distribution of the number of retweets per tweet An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 10 / 31

Data Analysis Retweets Data Analysis Lifespan 10 7 10 6 Nb of messages 10 5 10 4 10 3 < 1hour : 40% < 3days : 90% 10 2 10 100 500 1,000 Lifespan (in hours) Figure Lifespan of a message An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 11 / 31

Data Analysis Homophily Data Analysis Homophily Distance No of users % Mean similarity 1 3 229 02,65 0,0085 2 32 668 26,86 0,0014 3 81 645 67,13 0,0009 4 3 820 03,14 0,0010 5 43 00,03 0,0014 6 1 0 0,0008 Impossible 216 0,18 0,0017 Table Evolution of the similarity score through distance in the network sim(u, v) = i L u L v 1 log(1+pop(i)) L u L v (1) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 12 / 31

Data Analysis Homophily Table Link beetween distance in the network and position in the Top-N An ranking Homophily-based Top-NApproach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 13 / 31 Data Analysis Homophily 10 2 Average score 0.5 0 0 5 10 15 20 25 Position in the ranking Distances distribution (%) Rank Average Distance 1 2 3 4 1 1,55 57,03 31,53 10,64 0,8 2 1,68 49,60 33,13 16,87 0,4 3 1,8 42,45 36,02 20,72 0,8 4 1,86 38,71 38,71 20,56 2,02 5 1,98 31,44 40,16 27,59 0,81

Data Analysis Homophily Data Analysis Conclusions Many conclusions from this analysis : Freshness is crucial (Messages dies very fast) real-time recommendation Few users have high similarity use transitivity Distance 2 successfully gather important users rely on this homophily An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 14 / 31

Approach Similarity graph Similarity Graph Building process V Y Z2 U W X Z3 Z Z1 Z4 Figure Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 15 / 31

Approach Similarity graph Graphe de similarité Exemple de construction V Y Z2 U W X Z3 Z Z1 Z4 Figure Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 15 / 31

Approach Similarity graph Similarity Graph Building process V Y Z2 U W X Z3 Z Z1 Z4 Figure Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 15 / 31

Approach Similarity graph Graphe de similarité Exemple de construction V Y Z2 U W X Z3 Z Z1 Z4 Figure Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 15 / 31

Approach Similarity graph Similarity Graph Building process V sim(u, v) U sim(u, y) Y sim(u, z1) Z1 Figure Similarity Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 15 / 31

Approach Similarity graph Similarity Graph Characteristics Twitter Network Similarity Graph No of nodes 2 182 867 1 149 374 No of edges 325,451,980 4 950 417 Avg. similarity score 0.008 Mean out-degree 57.8 5.9 Table Similarity Graph Characteristics An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 16 / 31

Approach Similarity graph Propagation Model In a nutshell p(u, t) = v Fu p(u v, t) Fu (2) With Fu the set of users influential to u and p(u v, t) a probability estimation that u likes t determined by the behavior of the user v. p(u v, t) = p(v, t) sim(u, v) (3) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 17 / 31

Approach Similarity graph Propagation Model Example V 0.1 Y 0.3 0.4 0.8 U 0.5 W 0.5 X Figure Propagation example An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 18 / 31

Approach Propagation Model Propagation Model Example V 0.1 Y 0.3 0.4 0.8 U 0.5 W 0.5 X t1 Figure Propagation example - a tweet t1 is published An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 18 / 31

Approach Propagation Model Propagation Model Example V 0.1 Y 0.3 0.4 0.8 U 0.5 W 0.5 X t1 Figure Propagation example - X shares/likes t1 p(x, t1) = 1 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 18 / 31

Approach Propagation Model Propagation Model Example V 0.1 Y 0.3 0.4 0.8 U 0.5 W 0.5 X t1 Figure Propagation example - Propagation p(w, t1) = p(w v,t) v Fw Fw = 0+1 0.5 2 = 0.25 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 18 / 31

Approach Propagation Model Propagation Model Example V 0.1 Y 0.3 0.4 0.8 U 0.5 W 0.5 X t1 Figure Propagation example - Propagation p(u, t1) = 0.25 0.5 2 = 0.0625 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 18 / 31

Approach Propagation Model diagonally dominant. An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 19 / 31 Propagation Model Convergence Let n be users (u 1, u 2,..., u n ) : a 11 p u1 + a 12 p u2 +... + a 1n p un = b 1 a 21 p u1 + a 22 p u2 +... + a 2n p un = b 2... =... a n1 p u1 + a n2 p u2 +... + a nn p un = b n Could also be written as Ap = b with A = u 1 u 2 u n u 1 a 11 a 12... a 1n u 2 a 21 a 22... a 2n....... p = u n a n1 a n2... a nn p(u 1 ) p(u 2 ). b = p(u n ) b 1 b 2 b n. Because u, v sim(u, v) 1, a jj a ij for every i, the matrix A is j i

Approach Propagation Model Propagation Model Optimizations Speed up the convergence Let (u, t1) = p(u, t) k+1 p(u, t) k If (u, t1) < β we stop the propagation An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 20 / 31

Approach Propagation Model Propagation Model Optimizations Speed up the convergence Let (u, t1) = p(u, t) k+1 p(u, t) k If (u, t1) < β we stop the propagation Limitation of popular messages If p(u, t) < f (t) no need to propagate. f (t) = 1 k p k p +pop(t) p An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 20 / 31

Experiments Protocol Experiments Protocol 34 Millions of messages shared at least twice (130M Rt actions) Split the ranked set 90% - 10% Compute recommendation during this 10% for 1500 random users (500 small, 500 medium, 500 big) Comparison with CF : naive collaborative filtering Bayes : probabilistic model GraphJet : Twitter used solution An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 21 / 31

Experiments Results Experiments Hits Number of hits ( 10 4 ) 2.5 104 2 1.5 1 0.5 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Linear growth of CF Fast growth for SimGraph GraphJet stuck around 5000 hits Figure Hits pour 1500 utilisateurs An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 22 / 31

Experiments Results Experiments Hits according to user profiles Number of hits 800 600 400 200 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user 6,000 5,000 4,000 3,000 2,000 1,000 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user 1.5 1 0.5 10 4 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Figure 500 small Figure 500 medium Figure 500 big users small < 50 ; medium < 1000 ; big > 1000 Tendencies are very stables no matter the profile of users An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 23 / 31

Experiments Results Experiments Hits accuracy Avg. number of shares 10 2 10 1 Bayes CF GraphJet SimGraph 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Figure Hits popularity Bayes targets close messages GraphJet targets popular messages CF and SimGraph are mixing both popular and close messages An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 24 / 31

Experiments Results Experiments F1 scores F1 Score ( 10 2 ) 1 10 2 0.8 0.6 0.4 0.2 Bayes CF GraphJet SimGraph Small values Peak around 20 recommendations 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Figure F1 Scores An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 25 / 31

Experiments Results Experiments Running time init. (per user) init total time time (per message) total time (70 cores //) total time 1,149,374 users 13,238,941 Tweets (Trial period) init + recos Bayes 10ms 0.04h 975ms 51.22h 51.26h CF 8,583ms 39.40h 0.5ms 0.02h 41.01h SimGraph 311ms 1.41h 38ms 2.00h 3.41h init. (per user) init total time time (per user) total time (70 cores //) total time 1,149,374 users 1,149,374 users * 66 days (Trial period) init + recos GraphJet 0ms 0h 14ms 4.2h 4.2h Table Initialization and recommendation time (in ms) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 26 / 31

Experiments Updating strategies Experiments Updating strategies How to update SimGraph? Split the last 10% in 2 Evaluate hits prediction impact for the remaining 5% : do nothing recompute everything update only weights crossfold An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 27 / 31

Experiments Updating strategies Experiments Updating strategies 6,000 Number of hits 5,000 4,000 3,000 2,000 recompute everything do nothing 1,000 crossfold update weights 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Figure Hits / updating strategies doing nothing is the same as updating weights crossfold (very cheap) works very well An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 28 / 31

Experiments Updating strategies Experiments Convergence property of the SimGraph Iteration Number of edges 1 4 950 417 2 7 519 031 3 10 836 129 4 11 496 445 5 11 678 747 Table Number of edges evolution through iterations An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 29 / 31

Conclusion Conclusion Contribution Construction and analysis of a large Twitter dataset Method relying on homophily to find nearest neighbors at low cost Construction and optimization of a convergent propagation model Comparison of the recommendations made by our model with state of the art solutions Possibility for the model to be updated at low cost An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 30 / 31

Conclusion Conclusion Future works Densify points of comparison between users Burst recommendation bubbles Work on the crossfold convergence of the model Add a popularity prediction optimization An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Conclusion Thanks for you attention! An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Annexes ANNEXES An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Annexes Annexes Lifespan and popularity 10 4 Nombre moyen de retweets 10 3 10 2 10 1 10 0 10 0 10 1 10 2 10 3 10 4 Durée de vie moyenne (heures) Strong correlation up to 10 3 hours After a month, the correlation fades Figure Correlation entre durée de vie et popularité An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Annexes Annexes Topology Number of paths 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 1 10 0 0 10 20 Shortest distance Diameter of 21 for an average path of 7.5 Figure Smallest path distribution for the similarity graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Annexes Annexes Similarities 10 2 Score moyen 0.5 Really weak scores Breaks after the fifth most similar user 0 0 5 10 15 20 25 Position dans le classement Figure Score similarity evolution An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31

Annexes Figure Parts of hits included in SimGraph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31 Annexes Intersections Ratio of hits in common with SimGraph 1 0.8 0.6 0.4 0.2 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user

Annexes Annexes Number of recommendations Number of actual recommendations 140 120 100 80 60 40 20 Bayes CF GraphJet SimGraph 0 20 40 60 80 100 120 140 160 180 200 Number of daily recommendations per user Figure Recall capacity CF is less limited Other methods are bunched together Threshold effect for SimGraph and Bayes An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems BDA - Novembre 2017 31 / 31