Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Social Priors to Estimate Relevance of a Resource

3,949 views

Published on

In this paper we propose an approach that exploits social data associated with a Web resource to measure its a priori relevance. We show how these interaction traces left by the users on the resources, which are in the form of social signals as the number of like and share, can be exploited to quantify social properties such as popularity and reputation. We propose to model these properties as a priori probability that we integrate into language model. We evaluated the effectiveness of our approach on IMDb dataset containing 167438 resources and their social signals collected from several social networks. Our experimental results are statistically significant and show the interest of integrating social properties in a search model to enhance the information retrieval.

Published in: Social Media
  • Be the first to comment

Social Priors to Estimate Relevance of a Resource

  1. 1. Ismaïl BADACHE, Mohand BOUGHANEM IRIT, Toulouse University, France {badache, boughanem}@irit.fr
  2. 2. PresentationPlan Introduction RelatedWork Approach of Social Information Retrieval ExperimentalResults 4 1 3 Conclusion 2 5
  3. 3. 1.1 Emergence of social Web Numberof active users2013 1,2 1,4 1,7 2,4 2011 2012 2013 2014 Numberof Internet users Social content per 1 minute 41000 Publications 1,8 Million Like ~350 GB of Data Facebook Source: blogdumoderateur.com quantcast.com semiocast.com 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults 1
  4. 4. Video Photo Web Page Web Resources Resource. . . Social Networks Bookmark Comment Share/Recommend Motion/Vote Like/+1 Interaction Extraction and quantification of social properties Information RetrievalModel (Ranking) IntegrationQueryResults Fig1. Global presentation of our work Social Signals (Source of Evidence) Popularity Reputation Freshness 2
  5. 5. 1.2 Example of Social Signals 3 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  6. 6. 1.3 Research Issues Whatarethemostusefulsignalsandpropertiestoevaluateapriorirelevance(importance)ofaresource? 2 Whattheoreticalmodeltocombineapriorirelevanceofresourcewithitstopicalrelevance? 3 What is the impact of social properties on IR system performance? 4 1 Howtotranslatesocialsignalsintosocialproperties? 4 Whatarethemostfavoredsignalsandpropertieswhileusingattributeselectionalgorithms?andwhatarethemostcorrelatedwithdocumentsrelevance? 5 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  7. 7. 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults 2.1 Related Work 5 Sources ofevidence (Social Features) Properties Models Authors •Numberof:clicks,votes,recordsandrecommendations. Popularity Importance Linearcombination (Karweg et al., 2011) •Numberof:like,dislike,commentsonYouTube. •Theplaycount(numberoftimesauserlistenstoatrackonlastfm) •PresenceofaURLinatweet. Importance Machine learning and Linear combination (Chelaru et al., 2012) (Khodaei et al. 2012) (Alonso et al., 2010) •Numberofretweets. •Numberofannotations(tags). Popularity Machine learning (Yang et al., 2012) (Hong et al., 2011) (Pantelet al., 2012) •Socialapprovalvotes Importance Machine learning (Kazaiand Milic- Frayling.,2009)
  8. 8. •OurIRapproachconsistsofexploitingvariousandheterogeneoussocialsignalsfromdifferentsocialnetworkstotakeintoaccountinretrievalmodel. Inaddition,insteadofconsideringsocialfeaturesseparatelyasdoneinthepreviousworks,weproposetocombinethemtomeasurespecificsocialproperties,namelythepopularityandthereputationofaresource.Wealsoevaluatetheimpactoffreshnessofsignalintheperformance.Inourwork,weuselanguagemodelthatprovideatheoreticalfoundedwaytotakeintoaccountthenotionofaprioriprobabilitiesofadocument. 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults 3.1 A Modular Approach for Social IR 6
  9. 9. •WeassumethatresourceDcanberepresentedbothbyasetoftextualkey-words 퐷푤={푤1,푤2,…푤푛}andasetofsocialactions(signals)performedonthisresource,퐷푎={푎1,푎2,…푎푚}. •WeconsiderasetX={Popularity,Reputation,Freshness}of3socialpropertiesthatcharacterizearesourceD.Eachpropertyisquantifiedbyaspecificactionsgroup.Thesepropertiesareconsideredasaprioriknowledgeofaresource. 3.2 Social Signals and Social Properties Web Resource -Textual key-words -Social Signals -Like -+1 -Share -Comment -Dates of actions Web Resource -Textual key-words -Social Signals -Like -+1 -Share -Comment -Dates of actions Reputation Popularity Freshness 7 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  10. 10. 3.1 ProposedApproach •Thelanguagemodellingapproachcomputestheprobability푃(퐷|푄)ofadocumentDbeinggeneratedbyaqueryQbyusingtheBayestheorem: •푃(퐷)isadocumentpriorprobability.Itisusefulforincorporatingothersourcesofinformationtotheretrievalprocess. •푃(푄)canbeignoredbecauseitdoesnotaffecttherankingofdocuments. 3.3 Query Likelihood and Document Priors (1) (2) 8 푆푐표푟푒푄,퐷=푃퐷푄= 푃(퐷)∙푃(푄|퐷) 푃(푄) 푆푐표푟푒푄,퐷=푃퐷푄=푷푫∙푃(푄|퐷) Document Prior Probability Query-Likelihood Score 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  11. 11. 3.1 ProposedApproach •PopularityP:Theresourcepopularitycanbeestimatedaccordingtotherateofsharingthisresourceonsocialnetworks. •ReputationR:TheresourcereputationcanbeestimatedbasedonsocialactivitiesthathavepositivemeaningsuchasFacebooklike.Indeed,resourcereputationdependsonthedegreeofusers'appreciationonsocialnetworks. Thegeneralformulaisthefollowing: Where: 3.4 Estimating Priors: Popularity and Reputation 푃푥푎푖 푥= 퐶표푢푛푡(푎푖 푥,퐷) 퐶표푢푛푡(푎. 푥,퐷) (3) (4) 9 푃푥퐷= 푎푖 푥∈퐴 푃푥푎푖 푥 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  12. 12. 3.1 ProposedApproach •ToavoidZeroprobability,wesmooth푃푥푎푖 푥bycollectionCusingDirichlet. Theformulabecomesasfollows: Where: •퐶표푢푛푡푎풊 푥,퐷representsnumberofoccurrenceofspecificaction푎푖 푥performedonaresource. •푎푖 푥designsaction푎푖usedtomeasureaproperty푥.푎. 푥isthetotalnumberofsocialsignalsassociatedtoproperty푥,indocumentsDorincollectionC. 3.5 Estimating Priors: Popularity Pand Reputation R (5) (6) 10 푃푥퐷= 푎푖 푥∈퐴 퐶표푢푛푡푎푖 푥,퐷+휇∙푃(푎푖 푥|퐶) 퐶표푢푛푡푎∙ 푥,퐷+휇 푃(푎푖 푥|퐶)= 퐶표푢푛푡(푎푖 푥,퐶) 퐶표푢푛푡(푎. 푥,퐶) 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  13. 13. 3.1 ProposedApproach •Inadditiontosimplecountingofsocialactions,weproposetoconsiderthetimeassociatedwithsignal.Weassumethattheresourceassociatedwithfreshsignalsshouldbepromotedcomparingtothoseassociatedwitholdsignals.Therefore, insteadofcountingeachoccurrenceofagivensignal,webiasthiscounting, noted퐶표푢푛푡퐵,bythedateoftheoccurrenceofthesignal.Thecorrespondingformulaisasfollows: •푇푎푖={푡1,푎푖,푡2,푎푖,…푡푘,푎푖}asetofkdatetimeatwhicheachaction푎푖wasproduced. •푓퐹(푡푗,푎푖 푥,퐷)representsfreshnessfunction,estimatedbyusingGaussianKernel,itcalculatesadistancebetweencurrenttime푡푐푢푟푟푒푛푡andactiontime푡푗,푎푖 푥 3.6 Estimating Priors with considering Freshness F 퐶표푢푛푡퐵푡푗,푎푖 푥,퐷= 푗=1 푘 푓퐹(푡푗,푎푖 푥,퐷) = 푗=1 푘 푒푥푝− ‖푡푐푢푟푟푒푛푡−푡푗,푎푖 푥‖22휎2 (7) 11 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  14. 14. 3.1 ProposedApproach •Inourcase,wehavevarioussourcesofsocialinformationthatinfluencestheaprioriprobabilityofrelevance.Thisprobabilityiscalculatedbycombiningtwomainsocialproperties(popularityandreputation).Theproblemcanbeformalizedasfollows: •푃푃퐷,푃푅(퐷)defineaprioriprobabilitiesrelatedtopopularityPandreputationRthatincludefreshnessfunction. •푃푃⊕푅퐷definestheprobabilityofpriorscombination. 3.7 Combining Priors (8) 12 푃푃⊕푅퐷=푃푃(퐷)∙푃푅(퐷) 1. Introduction 2. RelatedWork 5. Conclusion 3. Approach of SIR 4. ExperimentalResults
  15. 15. 3.1 ProposedApproach •Objectives 1.First,toevaluatewhethersocialsignals,takenfromdifferentsocialnetworksimprovethesearch. 2.Second,toevaluatetheimpactofeachsignaltakenseparatelyandgroupedtorepresentacertainproperty. 3.andfinallytomeasuretheimpactofthefreshness. •Evaluationchallenge 1.AbsenceofastandardframeworkforevaluationinsocialIR. 2.Collectsocialsignalsfrom5socialnetworksandmountexperimentation. 1. Introduction 2. RelatedWork 5. Conclusion 4.1 Experimental Evaluation 3. Approach of SIR 4. ExperimentalResults 13
  16. 16. 3.1 ProposedApproach •TextualContent:167438DocumentsfromINEXIMDb. 4.2 Description of DataSet 3. Approach of SIR 4. ExperimentalResults 14 Field Description Status ID Identifying the film (document) - Title Film's title indexed Year Year of the film release indexed Rated Film classficationby content type - Released Date of making the film indexed Runtime Length of the film indexed Genre Film genre (Action, Drama, etc.) indexed Director Director of the film project indexed Writer Writers and writers of the film indexed Actors Main actors of the film indexed Plot Text summary of the film indexed Poster URL of the link poster - url URL of the Web source document - UGC Social data recovered - 1. Introduction 2. RelatedWork 5. Conclusion
  17. 17. 3.1 ProposedApproach •SocialContent:8socialdatafrom5socialnetworks. •QueryandRelevanceJudgment:fromINEXIMDb -30queries(topics)andtheirQrelsfromthesetofINEXIMDb. -Top1000documentsreturnedbyeachtopic. 4.2 Description of DataSet 3. Approach of SIR 4. ExperimentalResults ACEBOOK Like Share Comment Date oflast action WITTER Tweet GOOGLE+ +1 Share LINKED DELICIOUS Bookmark 15 1. Introduction 2. RelatedWork 5. Conclusion
  18. 18. 3.1 ProposedApproach 4.3 Quantifying of Social Properties 3. Approach of SIR 4. ExperimentalResults SocialProperties SocialSignals Social Networks Popularity P Numberof«Comment» C1 Facebook Numberof «Tweet» C2 Twitter Numberof «Share» C3 LinkedIn Numberof «Share» C4 Facebook Reputation R Numberof « Like» C5 Google+ Numberof «+1» C6 Facebook Numberof «Bookmark» C7 Delicious Freshness F Dates oflastactions C8 Facebook •Eachsocialpropertyisquantifiedbasedonsocialsignalsaccordingtotheirnatureandsignification. 16 1. Introduction 2. RelatedWork 5. Conclusion
  19. 19. 0 0,1 0,2 0,3 0,4 0,5 0,6 Like Share Comment Tweet Mention+1 Bookmark Share(LIn) Resultsof individualintegrationof social signals 3.1 ProposedApproach 4.4 Results: Single Priors and Combination Priors 3. Approach of SIR 4. ExperimentalResults Facebook signals 17 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 Popularity Reputation All Criteria All Properties Differentcombinationsof social signals(social properties) 0 0,1 0,2 0,3 0,4 0,5 Lucene Solr ML.Hiemstra baselines (Topical Models) P@10 P@20 nDCG MAP 1. Introduction 2. RelatedWork 5. Conclusion
  20. 20. 3.1 ProposedApproach 4.4 Results: Impact of the Freshness 3. Approach of SIR 4. ExperimentalResults 18 0 0,1 0,2 0,3 0,4 0,5 Lucene Solr ML.Hiemstra baselines (Topical Models) P@10 P@20 nDCG MAP 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 Share Comment Share+Comment Popularity All Criteria All Properties Without Integration of Freshness 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 Share Comment Share+Comment Popularity All Criteria All Properties WithIntegrationof Freshness F F F F F F F 1. Introduction 2. RelatedWork 5. Conclusion
  21. 21. 3.1 ProposedApproach 4.5 Results: Feature Selection Algorithms Study 3. Approach of SIR 4. ExperimentalResults Table 1. SelectedSocial Signals WithAttributeSelectionAlgorithms ---: Highly selected ---: Moderately selected ---: Lessfavored 19 1. Introduction 2. RelatedWork 5. Conclusion
  22. 22. 3.1 ProposedApproach 4.6 Results: Ranking Correlation Analysis 3. Approach of SIR 4. ExperimentalResults Fig 1.Spearman correlation between social signals and relevance Fig2.Spearman correlationbetweensocial propertiesand relevance 20 1. Introduction 2. RelatedWork 5. Conclusion
  23. 23. 3.1 ProposedApproach 4.6 Results: Ranking Correlation Analysis 3. Approach of SIR 4. ExperimentalResults Fig3.Spearman's Rho correlation values for the social signals pairs 21 Thesocialsignalspairs:(tweet,share(LIn)),(bookmark,Tweet)and(mention+1, bookmark)arehighlycorrelated,i.e.,thesimilarityscoresofthesepairsarehigherthan0.70 bookmark, share(LIn) are the less important criteriafollowed by mention+1. 1. Introduction 2. RelatedWork 5. Conclusion
  24. 24. 3.1 ProposedApproach 1. Introduction 2. RelatedWork 5. Conclusion 5. Conclusion 3. ProposedApproaches 4. ExperimentalResults •Social Information Retrieval based on Language Model -Topical relevance (retrieval model based content only). -Social relevance (retrieval model based content and social features). •Experimental Evaluation -Superiority of proposed approach compared to textual models (baselines). -Positive ranking correlation between social signals and relevance. -Attribute selection algorithms. •Perspectives -Integration of other social features. -Further study on the impact of the temporal property. -Comparison of the proposed models with other social models. -Experimentalevaluationon other types of dataset. 22
  25. 25. http://www.irit.fr/~Ismail.Badache/ Thank you @IIiX2014for travel support

×