An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling Deqing Yang, Yanghua Xiao, Hanghang Tong, Junjun Zhang and Wei Wang School of Computer Science Shanghai Key Laboratory of Data Science Fudan University, Shanghai, China

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

ferent settings of cross-domain recommendation. The lines betwee Motivation user-item relations such as ratings. The last one is more challenging Ø Weibo is theacross largestdomains counterpart of Twitter in China er-item relations can barely be obtained. User tagging is one of the most important services of Weibo because of tag s performance on user profiling However, many(a) Weibo users do not have any tag! (b) clouds of Weibo users (a) and Douban movies (b). Very few comm ns Ø cananbeefficient found. tag recommendation algorithm for Weibo users is necessary gue that RAHD-ER is a more challenging setting than previou

Challenges and Basic Idea Ø Challenges Data sparsity Diversity Semantic redundancy Ø Basic idea of our solution A recommendation algorithm integrating three steps: 1. Recommendation by Homophily l Use a local tag propagation scheme 2. Expanding tags by co-occurrence 3. Removing semantic redundancy based on a Chinese knowledge graph l An ESA-based semantic interpretation user tags input candidate parameters Step 1: tags Step 2: Recommendation Expansion by by Homophily Co-occurrence Chinese knowledge graph data candidate tags Step 3: Removing Semantic Redundancy output final tags Fig. 1. Framework of tag recommendation algorithm.

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Empirical Study Ø Dataset More than 2.1 million Weibo users, 875,186 unique tags Ø Homophily in Tagging Behavior Notations l Real Tags (RTu) vs. Collective Tags (CTu) l CTu is the tags that are most frequently used by a user s friends, we use a score tf(t) to quantify the likelihood that a tag t belong to CTu r(t) tf(t) = r(t ) Metrics of evaluation t T (Neg(u)) l Precision, Average Precision, ndcg

Homophily in Tagging Behavior Ø Results of tag similarity comparisons Fig. 3. Matching performance of CT u to RT u show that tag similarity is more evident for social friends than general users. Fig. 4. The proportion of top-k expanded list. Some (near-)synonyms of the pa of them are complementary Results: Next, we show our empirical results which in general justif

Tag Co-occurrence Ø Co-occurrence in Tagging Behavior Use a TF-IDF based scheme to measure the extent to which we recommend another tag t that co-occurs with t to the user Table 1. Co-occurrence tags ranked by tf-idf score. machine learning tour advertisement data mining food media NLP movie marketing recommender sys. fashion communication information retrieval music design computer vision listen to music photography pattern recognition 80s Internet A.I. freedom innovation big data travel movie search engine photography art Internet indoorsy fashion

Semantic Redundancy Ø Semantic redundancy in recommended tag list Find (near-)synonyms in the co-occurrence expanded tags through a method based on a Chinese knowledge graph (introduced afterward) 15%~20% expanded tags are (near-)synonyms T u to RT u ent for so- Fig. 4. The proportion of (near-)synonyms in top-k expanded list. Some expanded tags are (near-)synonyms of the parent tags, but most of them are complementary in semantic.

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Step1:Recommendation by Homophily Construct a Weibo Influence Graph G l A vertex is a user, each directed edge eu v indicates the social influence from user u to user v, the edge weight wuv can be quantified by the frequency that v retweets u Social influence l For a directed path p = (u0, u1,..., ur ) in G, the social influence along p from u0 to ur equals to r 1 w ui u si(p) = i+1 i=0 u:u u i+1 w uui+1 l Let Pr(v,u) be the set of all paths of length r from v to u, thus the social influence of v on u at radius r is si r (v, u) = si(p) p P r (v,u)

Step1:Recommendation by Homophily (cont.) Objective: compute a tag score vector S u = v V si(v, u)t v = v V r si j (v, u)t v j=0 Compute Su iteratively by summing up the weighted social influences from u s out- neighbors at different radii Algorithm 2 Step1: Computing u s tag score vector S u. Input: u, r; Output: S u ; 1: S u φ; 2: layer 0 u; 3: si 0 (u, u) 1; 4: if u has origin tags then 5: S u T u ; 6: end if 7: for i=1 to r do 8: layer i {all in-neighbors of the nodes in layer i 1 }; 9: for v layer i do 10: if v has real tags then 11: for each v s out-neighbor x do 12: si i (v, u) w vx si w v i 1 (x, u); x x:v x v :v x 13: end for 14: S u S u + si i (v, u) T v ; 15: end if 16: end for 17: end for 18: return S u ;

Step1:Recommendation by Homophily (cont.) Ø Optimization Setting a shorter r l Most information diffusion in Twitter is less than 2 hops Suppressing general tags l Use an IDF-like factor

Step 2: Expanding tags by co-occurrence For each tag t generated in Step 1 (these tags constitute the set C), we select top-q co-occurring tags, denoted by ti Define a new score for each ti that is used for ranking the tags generated in this step { s(ti ) t i C; ŝ(t i )= λ s(p(t i )) s p(t i )(t i ) Z otherwise s a damping parameter, is used for normaliz s( ) is the output score of Step 1, p(ti) is ti s parent tag that were generated in Step 1, sp(ti)(ti) is the co-occurrence score of ti to p(ti), and Z is a normalized factor

Step3: Removing semantic redundancy Construct a Chinese Knowledge Graph (CKG) Map each tag into a concept in CKG Compute the semantic similarity of two tags by the cosine distance of their concept vectors -- an ESA-based semantic interpretation. Each concept vector entry is: c j = ts j(i) cat(j) where tsj (i) is the tf-idf score of tag i in concept j s article page and cat(j) is concept j s category set. Remove the tag if it is too semantically closed (nearsynonym) to the tags which are ranked ahead

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Experimental settings Ø Human assessments We inquired 500 test users whether they will accept the tags recommended by the algorithms Ø Baselines FREQ. TF-IDF Collaborative Filtering (CF) TWEET Flickr tag recommendation based on collective knowledge. In Proc. of WWW (2008)

Results Ø Global performance 268 test users with real tags (a) P@k (b) MAP@k (c) ndcg Fig. 5. Human assessment results of the recommended tags to the test users having real tags. All 500 test users (a) P@k (b) MAP@k (c) ndcg Fig. 6. Human assessment results of the recommended tags to all the test users. Global Performance: From the 3000 seed users in our dataset, we randomly selected

Results Ø Effectiveness of each step Step 2: 75.11% of the expanded tags are newly discovered tags, 35.37% of these new tags were accepted by the test users Step 3: 14.55% of the tags generated in Step 2 were identified as (near-)synonyms by our algorithm, and 74.7% of these tags were labeled as real (near-)synonyms by the test users

Results Ø Inference of user profiles gories. Statistics by profile category category accuracy location 94.64% mended tags. occupation 76.47% user education 95.24% real tags religion 99.21% CF Case studies usera userb algorithm tag list music, fashion movie, food, listen to music, tour, 80s TWEET Jehovah, Miss HongKong, beauty, child, good man FREQ. Christian, food, movie, 80s, tour TF-IDF Christian, Bible, Emmanuel, micro fashion, tide LTPA Christian, Bible, faith, God s baby girl, God s child TWEET Shantou (a Chinese city), WeChat, Internet, Shantou people, girl FREQ. tour, food, movie, Internet, music TF-IDF machine learning, Internet, data mining, Fudan University, technology LTPA machine learning, IT, Internet, Fudan University, data mining

Outline Ø Introduction Ø Empirical Study Ø Algorithm Ø Evaluation Ø Conclusion

Conclusion Ø We justified the homophily in tagging behavior of Weibo users through empirical studies Ø We proposed a tag recommendation algorithm including three steps: A local propagation based tag recommendation Co-occurrence based tag expansion Removing semantic redundancy based Chinese knowledge graph

Thanks a lot! Q and A Deqing Yang yangdeqing@fudan.edu.cn Personal page: http://gdm.fudan.edu.cn/knowledgeworks/people/?person=yangdeqing