An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

Similar documents
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems

A comparative analysis of subreddit recommenders for Reddit

arxiv: v2 [cs.si] 10 Apr 2017

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Cross Social Media Recommenda1on

Computational challenges in analyzing and moderating online social discussions

Socially-Informed Timeline Generation for Complex Events

Experiments on Data Preprocessing of Persian Blog Networks

CS 229: r/classifier - Subreddit Text Classification

Social Computing in Blogosphere

Smartocracy: Social Networks for Collective Decision Making

Entity Linking Enityt Linking. Laura Dietz University of Massachusetts. Use cursor keys to flip through slides.

Identifying Factors in Congressional Bill Success

Identifying Ideological Perspectives of Web Videos using Patterns Emerging from Folksonomies

Subreddit Recommendations within Reddit Communities

Big Data, information and political campaigns: an application to the 2016 US Presidential Election

PREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB

Introduction to Text Modeling

The Rise of Guardians: Fact-checking URL Recommendation to Combat Fake News

Identifying Ideological Perspectives of Web Videos Using Folksonomies

Cluster Analysis. (see also: Segmentation)

Web Mining: Identifying Document Structure for Web Document Clustering

Doctoral Research Agenda

Comment Mining, Popularity Prediction, and Social Network Analysis

Users reading habits in online news portals

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Probabilistic Latent Semantic Analysis Hofmann (1999)

CS388: Natural Language Processing Coreference Resolu8on. Greg Durrett

Measurement and Analysis of an Online Content Voting Network: A Case Study of Digg

Support Vector Machines

REPORT DOCUMENTATION PAGE. Trend Monitoring and Forecasting. Byeong Ho Kang N/A AOARD UNIT APO AP AFRL/AFOSR/IOA(AOARD)

Coalitional Game Theory

THE PRIMITIVES OF LEGAL PROTECTION AGAINST DATA TOTALITARIANISMS

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

A NOVEL EFFICIENT REVIEW REPORT ON GOOGLE S PAGE RANK ALGORITHM

Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus

Dimension Reduction. Why and How

The Social Web: Social networks, tagging and what you can learn from them. Kristina Lerman USC Information Sciences Institute

Popularity Prediction of Reddit Texts

COSC-282 Big Data Analytics. Final Exam (Fall 2015) Dec 18, 2015 Duration: 120 minutes

Research Collection. Newspaper 2.0. Master Thesis. ETH Library. Author(s): Vinzens, Gianluca A. Publication Date: 2015

Yang Zhang. Contact Information. Department of Political Science Washington University in St. Louis 253 Seigle Hall St.

Network Indicators: a new generation of measures? Exploratory review and illustration based on ESS data

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Psychological Factors

COMPUTATIONAL CREATIVITY EVALUATION

Polydisciplinary Faculty of Larache Abdelmalek Essaadi University, MOROCCO 3 Department of Mathematics and Informatics

Coreference Semantics from Web Features. Mohit Bansal and Dan Klein UC Berkeley

Michael Laver and Ernest Sergenti: Party Competition. An Agent-Based Model

Ward profile information packs: Ryde North East

A Network Theory of Military Alliances

Robust Electric Power Infrastructures. Response and Recovery during Catastrophic Failures.

An Analysis on the US New Media Public Diplomacy Toward China on WeChat Public Account

arxiv: v4 [cs.cl] 7 Jul 2015

How to identify experts in the community?

CSE 190 Professor Julian McAuley Assignment 2: Reddit Data. Forrest Merrill, A Marvin Chau, A William Werner, A

Fall Detection for Older Adults with Wearables. Chenyang Lu

Designing police patrol districts on street network

Influence in Social Networks

MONERS: A news recommender for the mobile web

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Deep Learning and Visualization of Election Data

The Australian Society for Operations Research

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info

Wasserman & Faust, chapter 5

Higher education global trends and emerging opportunities to Kevin Van-Cauter Higher Education Adviser The British Council

Modeling Blogger Influence in a Community

Voting and Complexity

Performance and Structures of the German Science System 2012

Do two parties represent the US? Clustering analysis of US public ideology survey

Textual Predictors of Bill Survival in Congressional Committees

Instructors: Tengyu Ma and Chris Re

Predicting Congressional Votes Based on Campaign Finance Data

Photographers: Your Web & Social Media Brand. Mike Anthony & Martin Cregg

The Pupitre System: A desk news system for the Parliamentary Meeting rooms

Comparison Sorts. EECS 2011 Prof. J. Elder - 1 -

Demographics of News Sharing in the U.S. Twittersphere

OPTIMIZING THE NEW CANADIAN EXPERIENCE SHAGUN FLAWSON AGOSH

User Perception of Information Credibility of News on Twitter

Changing deprivation in East London. Mark Fransham University of Oxford

arxiv: v1 [cs.si] 30 Apr 2013

Estimating the Margin of Victory for Instant-Runoff Voting

Classifier Evaluation and Selection. Review and Overview of Methods

Project Presentations - 1

Name Phylogeny. A Generative Model of String Variation. Nicholas Andrews, Jason Eisner and Mark Dredze

Events and Memes in Media- rich Social Informa7on Networks

Analysis of the Reputation System and User Contributions on a Question Answering Website: StackOverflow

Deep Learning Working Group R-CNN

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene

Essential Questions Content Skills Assessments Standards/PIs. Identify prime and composite numbers, GCF, and prime factorization.

Digital Access, Political Networks and the Diffusion of Democracy Introduction and Background

Prim's MST Algorithm with step-by-step execution

Police patrol districting method and simulation evaluation using agent-based model & GIS

A Large-Scale Study on Persian Weblogs

Role of Political Identity in Friendship Networks

Analyzing the Power Consumption Behavior of a Large Scale Data Center

Design and Analysis of College s CPC-Building. System Based on.net Platform

Combating Friend Spam Using Social Rejections

Us and Them Adversarial Politics on Twitter

Transcription:

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling Deqing Yang, Yanghua Xiao, Hanghang Tong, Junjun Zhang and Wei Wang School of Computer Science Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

ferent settings of cross-domain recommendation. The lines betwee Motivation user-item relations such as ratings. The last one is more challenging Ø Weibo is theacross largestdomains counterpart of Twitter in China er-item relations can barely be obtained. User tagging is one of the most important services of Weibo because of tag s performance on user profiling However, many(a) Weibo users do not have any tag! (b) clouds of Weibo users (a) and Douban movies (b). Very few comm ns Ø cananbeefficient found. tag recommendation algorithm for Weibo users is necessary gue that RAHD-ER is a more challenging setting than previou

Challenges and Basic Idea Ø Challenges Data sparsity Diversity Semantic redundancy Ø Basic idea of our solution A recommendation algorithm integrating three steps: 1. Recommendation by Homophily l Use a local tag propagation scheme 2. Expanding tags by co-occurrence 3. Removing semantic redundancy based on a Chinese knowledge graph l An ESA-based semantic interpretation user tags input candidate parameters Step 1: tags Step 2: Recommendation Expansion by by Homophily Co-occurrence Chinese knowledge graph data candidate tags Step 3: Removing Semantic Redundancy output final tags Fig. 1. Framework of tag recommendation algorithm.

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Empirical Study Ø Dataset More than 2.1 million Weibo users, 875,186 unique tags Ø Homophily in Tagging Behavior Notations l Real Tags (RTu) vs. Collective Tags (CTu) l CTu is the tags that are most frequently used by a user s friends, we use a score tf(t) to quantify the likelihood that a tag t belong to CTu r(t) tf(t) = r(t ) Metrics of evaluation t T (Neg(u)) l Precision, Average Precision, ndcg

Homophily in Tagging Behavior Ø Results of tag similarity comparisons Fig. 3. Matching performance of CT u to RT u show that tag similarity is more evident for social friends than general users. Fig. 4. The proportion of top-k expanded list. Some (near-)synonyms of the pa of them are complementary Results: Next, we show our empirical results which in general justif

Tag Co-occurrence Ø Co-occurrence in Tagging Behavior Use a TF-IDF based scheme to measure the extent to which we recommend another tag t that co-occurs with t to the user Table 1. Co-occurrence tags ranked by tf-idf score. machine learning tour advertisement data mining food media NLP movie marketing recommender sys. fashion communication information retrieval music design computer vision listen to music photography pattern recognition 80s Internet A.I. freedom innovation big data travel movie search engine photography art Internet indoorsy fashion

Semantic Redundancy Ø Semantic redundancy in recommended tag list Find (near-)synonyms in the co-occurrence expanded tags through a method based on a Chinese knowledge graph (introduced afterward) 15%~20% expanded tags are (near-)synonyms T u to RT u ent for so- Fig. 4. The proportion of (near-)synonyms in top-k expanded list. Some expanded tags are (near-)synonyms of the parent tags, but most of them are complementary in semantic.

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Step1:Recommendation by Homophily Construct a Weibo Influence Graph G l A vertex is a user, each directed edge eu v indicates the social influence from user u to user v, the edge weight wuv can be quantified by the frequency that v retweets u Social influence l For a directed path p = (u0, u1,..., ur ) in G, the social influence along p from u0 to ur equals to r 1 w ui u si(p) = i+1 i=0 u:u u i+1 w uui+1 l Let Pr(v,u) be the set of all paths of length r from v to u, thus the social influence of v on u at radius r is si r (v, u) = si(p) p P r (v,u)

Step1:Recommendation by Homophily (cont.) Objective: compute a tag score vector S u = v V si(v, u)t v = v V r si j (v, u)t v j=0 Compute Su iteratively by summing up the weighted social influences from u s out- neighbors at different radii Algorithm 2 Step1: Computing u s tag score vector S u. Input: u, r; Output: S u ; 1: S u φ; 2: layer 0 u; 3: si 0 (u, u) 1; 4: if u has origin tags then 5: S u T u ; 6: end if 7: for i=1 to r do 8: layer i {all in-neighbors of the nodes in layer i 1 }; 9: for v layer i do 10: if v has real tags then 11: for each v s out-neighbor x do 12: si i (v, u) w vx si w v i 1 (x, u); x x:v x v :v x 13: end for 14: S u S u + si i (v, u) T v ; 15: end if 16: end for 17: end for 18: return S u ;

Step1:Recommendation by Homophily (cont.) Ø Optimization Setting a shorter r l Most information diffusion in Twitter is less than 2 hops Suppressing general tags l Use an IDF-like factor

Step 2: Expanding tags by co-occurrence For each tag t generated in Step 1 (these tags constitute the set C), we select top-q co-occurring tags, denoted by ti Define a new score for each ti that is used for ranking the tags generated in this step { s(ti ) t i C; ŝ(t i )= λ s(p(t i )) s p(t i )(t i ) Z otherwise s a damping parameter, is used for normaliz s( ) is the output score of Step 1, p(ti) is ti s parent tag that were generated in Step 1, sp(ti)(ti) is the co-occurrence score of ti to p(ti), and Z is a normalized factor

Step3: Removing semantic redundancy Construct a Chinese Knowledge Graph (CKG) Map each tag into a concept in CKG Compute the semantic similarity of two tags by the cosine distance of their concept vectors -- an ESA-based semantic interpretation. Each concept vector entry is: c j = ts j(i) cat(j) where tsj (i) is the tf-idf score of tag i in concept j s article page and cat(j) is concept j s category set. Remove the tag if it is too semantically closed (nearsynonym) to the tags which are ranked ahead

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Experimental settings Ø Human assessments We inquired 500 test users whether they will accept the tags recommended by the algorithms Ø Baselines FREQ. TF-IDF Collaborative Filtering (CF) TWEET Flickr tag recommendation based on collective knowledge. In Proc. of WWW (2008)

Results Ø Global performance 268 test users with real tags (a) P@k (b) MAP@k (c) ndcg Fig. 5. Human assessment results of the recommended tags to the test users having real tags. All 500 test users (a) P@k (b) MAP@k (c) ndcg Fig. 6. Human assessment results of the recommended tags to all the test users. Global Performance: From the 3000 seed users in our dataset, we randomly selected

Results Ø Effectiveness of each step Step 2: 75.11% of the expanded tags are newly discovered tags, 35.37% of these new tags were accepted by the test users Step 3: 14.55% of the tags generated in Step 2 were identified as (near-)synonyms by our algorithm, and 74.7% of these tags were labeled as real (near-)synonyms by the test users

Results Ø Inference of user profiles gories. Statistics by profile category category accuracy location 94.64% mended tags. occupation 76.47% user education 95.24% real tags religion 99.21% CF Case studies usera userb algorithm tag list music, fashion movie, food, listen to music, tour, 80s TWEET Jehovah, Miss HongKong, beauty, child, good man FREQ. Christian, food, movie, 80s, tour TF-IDF Christian, Bible, Emmanuel, micro fashion, tide LTPA Christian, Bible, faith, God s baby girl, God s child TWEET Shantou (a Chinese city), WeChat, Internet, Shantou people, girl FREQ. tour, food, movie, Internet, music TF-IDF machine learning, Internet, data mining, Fudan University, technology LTPA machine learning, IT, Internet, Fudan University, data mining

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Conclusion Ø We justified the homophily in tagging behavior of Weibo users through empirical studies Ø We proposed a tag recommendation algorithm including three steps: A local propagation based tag recommendation Co-occurrence based tag expansion Removing semantic redundancy based Chinese knowledge graph

Thanks a lot! Q and A Deqing Yang yangdeqing@fudan.edu.cn Personal page: http://gdm.fudan.edu.cn/knowledgeworks/people/?person=yangdeqing