Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Harnessing social signals to enhance a search

4,108 views

Published on

This paper describes an approach of information retrieval which takes into account social signals associated with Web resources to estimate its relevance to a query. We show how these data, which are in the form of actions within social activities (e.g. like, tweet), can be exploited to quantify social properties such as popularity and reputation. We propose a model that combines the social relevance, estimated from these properties, with the conventional textual relevance. We evaluated the effectiveness of our approach on IMDb dataset containing 32706 resources and their social characteristics collected from several social networks. We used also the selected criteria to learn models to determine their effectiveness in information retrieval. Our experimental results are promising and show the interest of integrating social signals in retrieval model to enhance a search.

Published in: Social Media
  • Be the first to comment

  • Be the first to like this

Harnessing social signals to enhance a search

  1. 1. Ismaïl BADACHE, Mohand BOUGHANEM IRIT, Toulouse University, France {badache, boughanem}@irit.fr Warsaw, Poland
  2. 2. Presentation Plan Introduction Related Work Approach of Social Information Retrieval Experimental Results4 1 3 Conclusion 2 5
  3. 3. 1.1 Emergence of social Web 1 Number of active users 2013 1,2 1,4 1,7 2,4 2011 2012 2013 2014 Number of Internet users Social content per 1 minute 41000 Publications 1,8 Million Like ~350 GB of Data Facebook Source: blogdumoderateur.com quantcast.com semiocast.com 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results
  4. 4. Video Photo Web Page Web Resources Resource . . . Social Networks Bookmark Comment Share/Recommend Motion/Vote Like/+1 Interaction Extraction and quantification of social properties Information Retrieval Model (Ranking) Integration Query 2 Results Fig 1. Global presentation of our work Social Signals (Source of Evidence) Popularity Reputation Freshness
  5. 5. 3 1.2 Example of Social Signals 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results
  6. 6. 1.3 Research Issues Can these social data help the search systems for guiding the users to reach a better quality or more relevant content? 2 How effective is each individual social signal for ranking resources for a given query? What are the ranking correlations created by these social data? 3 4 How to combine these social data in form of social properties? What are the most useful of them to take into account in a model search? 4 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 1 What happens when a user clicks on like or dislike button or posts a comment for a resource, say a Web page, photo or video?
  7. 7. Sources of evidence (Social Features) Properties Models Authors • Number of : clicks, votes, records and recommendations. Popularity Importance Linear combination (Karweg et al., 2011) • Number of : like, dislike, comments on YouTube. • The playcount (number of times a user listens to a track on lastfm) Importance Machine learning and Linear combination (Chelaru et al., 2012) (Khodaei et al. 2012) • Presence of a URL in a tweet. (Alonso et al., 2010) • Number of retweets. • Number of annotations (tags). Popularity Machine learning (Yang et al., 2012) (Hong et al., 2011) (Pantel et al., 2012) 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 2.1 Related Work 5
  8. 8. • Our IR approach consists of exploiting various and heterogeneous social signals from different social networks to define social properties to take into account in retrieval model. We associate to each Web resource a priori relevance based on these social properties. This relevance is then combined with a classical topical relevance. 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 3.1 A Modular Approach for Social IR 6
  9. 9. • We assume that resource r can be represented both by a set of textual key-words 𝑟𝑤={𝑤1, 𝑤2, … 𝑤 𝑛} and a set of social actions (signals) performed on this resource, 𝑟𝑎={𝑎1, 𝑎2, … 𝑎 𝑚}. • We consider a set X={Popularity, Reputation, Freshness} of 3 social properties that characterize a resource r. Each property is quantified by a specific actions group. These properties are considered as a priori knowledge of a resource. 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 3.2 Social Signals and Social Properties 7 Web Resource - Textual key-words - Social Signals - Like - +1 - Share - Comment - Dates of actions Web Resource - Textual key-words - Social Signals - Like - +1 - Share - Comment - Dates of actions Reputation Popularity Freshness
  10. 10. 𝑓𝑥 𝑟, 𝐺 = 𝑖=1, 𝑎 𝑖 𝑥 ∈ 𝐴 𝑚 𝐶𝑜𝑢𝑛𝑡 (𝑎𝑖 𝑥 , 𝑟, 𝐺) 3.1 Proposed Approach • Popularity: The resource popularity can be estimated according to the rate of sharing this resource on social networks. • Reputation: The resource reputation can be estimated based on social activities that have positive meaning such as Facebook like. Indeed, resource reputation depends on the degree of users' appreciation on social networks. The general formula is the following: 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 3.3 Estimation of Popularity and Reputation 8 𝑓𝑥(𝑟, 𝐺) 𝑁𝑜𝑟𝑚= 𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 ) 𝑀𝐴𝑋 𝑓𝑥 𝑟, 𝐺 − 𝑀𝐼𝑁(𝑓𝑥 𝑟, 𝐺 ) (1) (2)
  11. 11. 3.1 Proposed Approach • Let 𝑇𝑎 𝑖 ={𝑡1,𝑎 𝑖 , 𝑡2,𝑎 𝑖 , … 𝑡 𝑘,𝑎 𝑖 } a set of k moments (date) at which action 𝑎𝑖 was produced. A moment t represents the datetime for each action a of the same type. • Freshness: We assume that a resource is fresh if recent social signals were associated with it. For that purpose, we define freshness as follows: "a date of each social action (e.g., date of comment, date of share) performed on a resource on social networks can be exploited to measure the recency of these social actions, hence the freshness of information". Its formula is the following: 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 3.4 Estimation of Freshness 9 𝑓𝐹 𝑟, 𝐺 = 1 1 𝑚 𝑖=1 𝑚 ( 1 𝑘 𝑗=1 𝑘 𝑇𝑖𝑚𝑒(𝑡𝑗,𝑎 𝑖 , 𝑟, 𝐺)) (3)
  12. 12. 3.1 Proposed Approach • The combination of topical relevance with social relevance is given by the following formula: • Social Score: Regarding the social score 𝑅𝑒𝑙 𝑆(𝑞, 𝑟, 𝐺), we specify that this score takes into account these social properties, which are in the form of three normalized factors that are combined linearly by the following formula: 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results Score of Topical Relevance Score of Social Relevance 𝑅𝑒𝑙 𝑞, 𝑟, 𝐺 = α ∙ 𝑅𝑒𝑙 𝑇(𝑞, 𝑟) + (1 - α) ∙ 𝑅𝑒𝑙 𝑆(𝑞, 𝑟, 𝐺) Freshness 𝑅𝑒𝑙 𝑆 𝑞, 𝑟, 𝐺 = β ∙ 𝑓𝐹(𝑟, 𝐺) + λ ∙ 𝑓𝑃(𝑟, 𝐺) + δ ∙ 𝑓𝑅(𝑟, 𝐺) Popularity Reputation 3.5 First Method : Linear Combination 10 (4) (5)
  13. 13. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 3. Approach of SIR 4. Experimental Results 3.6 Second Method : Machine Learning Models 11 Original Dataset Training Dataset Attribute Selection Algorithms - WrapperSubsetEval1 - CfsSubsetEval1 - ReliefFAttributeEval2 - SVMAttributeEval3 Learning Algorithms - Naïve Bayes1 - J482 - SVM3 Cross-Fold Evaluation Repeat 5 x for 5-Fold Cross Validation Fig 2. Machine Learning Process Topical model results for all topics
  14. 14. 3.1 Proposed Approach • Objectives 1. Studying the impact of each individual integration of social signals on the performance of retrieval process. 2. Studying the impact of combining these social signals as social properties. 3. Studying the ranking correlation between social signals and relevance. • Evaluation challenge 1. Absence of a standard framework for evaluation in social IR. 2. Collect social signals from 5 social networks and mount experimentation. 1. Introduction 2. Related Work 5. Conclusion 4.1 Experimental Evaluation 12 3. Approach of SIR 4. Experimental Results
  15. 15. 3.1 Proposed Approach • Textual Content: 32706 Documents Film in English extracted from IMDb. • Social Content: 8 social data from 5 social networks. 1. Introduction 2. Related Work 5. Conclusion 4.2 Description of DataSet 13 3. Approach of SIR 4. Experimental Results ID Title Year Released Runtime Genre Director Writer Actors Plot Poster url - indexed indexed indexed indexed indexed indexed indexed indexed indexed - - ACEBOOK Like Share Comment Date of last action WITTER Tweet GOOGLE+ +1 Share LINKEDDELICIOUS Bookmark
  16. 16. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 4.3 Quantifying of Social Properties 14 3. Approach of SIR 4. Experimental Results Social Properties Social Signals Social Networks Popularity P Number of « Comment » C1 Facebook Number of « Tweet » C2 Twitter Number of « Share » C3 LinkedIn Number of « Share » C4 Facebook Reputation R Number of « Like » C5 Google+ Number of « +1 » C6 Facebook Number of « Bookmark » C7 Delicious Freshness F Date of last action C8 Facebook • Each social property is quantified based on social signals according to their nature and signification.
  17. 17. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 4.4 Results: Linear Combination 15 3. Approach of SIR 4. Experimental Results 0 0,1 0,2 0,3 0,4 0,5 0,6 Like Share Comment Tweet Mention+1 Share(LIn) Bookmark Individual Integration of Social Signals 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 Freshness F Reputation R Popularity P R+F P+F P+R All Properties Different Combinations of Social Signals (Social Properties) 0 0,1 0,2 0,3 0,4 BM25 Lucene Model Baselines (Topical Models) P@10 P@20 nDCG@10 nDCG@20 Facebook signals
  18. 18. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 4.5 Results: Machine Learning 16 3. Approach of SIR 4. Experimental Results Table 1. Selected Social Signals With Attribute Selection Algorithms ++ : Highly selected + : Moderately selected
  19. 19. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 4.5 Results: Machine Learning 17 3. Approach of SIR 4. Experimental Results Naïve Bayes SVM J48 P@20 0,5105 0,5131 0,689 0,5105 0,5131 0,689 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 Naïve Bayes (CFS) Naïve Bayes (WRP) SVM (SVM) J48 (RLF) P@20 0,5315 0,5105 0,5131 0,689 0,5315 0,5105 0,5131 0,689 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 Machine learning results with using Attribute Selection Algorithms Machine learning without using Attribute Selection Algorithms
  20. 20. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 4.6 Results: Ranking Correlation Analysis 18 3. Approach of SIR 4. Experimental Results Fig 3. Spearman correlation between social signals and relevance Fig 4. Spearman correlation between social properties and relevance
  21. 21. 3.1 Proposed Approach 1. Introduction 2. Related Work 5. Conclusion 5. Conclusion 19 3. Proposed Approaches 4. Experimental Results • Social Information Retrieval Model - Topical relevance (retrieval model based content only). - Social relevance (retrieval model based content and social features). - Attribute selection algorithms and machine learning. • Experimental Evaluation - Superiority of proposed approach compared to textual models (baselines). - Positive ranking correlation between social signals and relevance. • Perspectives - Integration of other social features. - Further study on the impact of the temporal property. - Comparison of the proposed models with other social models. - Experimental evaluation on larger dataset.
  22. 22. http://www.irit.fr/~Ismail.Badache/

×