RecSys 2008: Social Ranking

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    RecSys 2008: Social Ranking - Presentation Transcript

    1. Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems Valentina Zanardi Licia Capra Dept. of Computer Science University College London
    2. Outline • Problem definition • Dataset analysis • Social Ranking Query Model • Evaluation of Social Ranking • Related work • Conclusion and future work 2
    3. Problem definition • Content overload • Personalization of content: Social tagging behaviour* * S. Golder and B.A.Huberman “Usage patterns of collaborative tagging systems” Journal of Information Science, 32(2):198-208, 2006 3
    4. Dataset Analysis CiteULike social bookmarking website: • allows the sharing of scientific references amongst researchers • freely tagged content 100.000 papers Prune 55.000 tags Detailed results analyzed in V. Zanardi and L. Capra. \"Social Ranking: Finding Relevant Content in Web 2.0\" ECAI, 18th European Conference on Artificial Intelligence, Patras, Greece. July 2008. 4
    5. Insight from CiteULike analysis • Each user only bookmark a tiny portion of the whole paper set. • The vocabulary spoken by each user is a tiny proportion of the emerging folksonomy. Standard information retrieval system yield poor performance : for papers tagged only by a small subset of users Accuracy for tags used only by a small subset of users, due to the empty Coverage overlap between tags 5
    6. er u; according to our analy-the querying user Social Ranking, thus theideveloped, takingtopicsway. withese website, we believe a query qu = {t1 , t2 , . . . To address these improvesassociated to content,query expansion) problems, we propose coverage. describe each tag t with a vector wi where When We thus weighted by the similarity of accuracy of the results, while tags’broader rangelowing accountwhole user u submits sites should be similarity (i.e., of into in the [j] improves tags counts the number of search/recommenderttechnique for Webmeasures discussed above (on users and on tags) e users who technique inspired We presentsection, we illustrate times that tag target scenario. away found the projection is i was associated web- bethe inferredcreated remainderby the n easily identified by studying distance of the query to A content 2.0 to intrinsic characteristicsthe information thrown We during that nt similarity measures performthe querying user. Given two how weandoftjthe be then discover content When user u described by query t aweightedthose tags. this the results uch content,In the by thequery oftraditional Collaborative Filtering thus be com-, wetaking into accounttags’ way.that can be submits a query qu = {t expansion) improvesthe following twotnot developed, Usersto quantify these similarity of dif- paper p coverage. sites should tags properties to promising to tackle both as lowing nagged bymechanisms [22]: first,be identify theonj these users should We present Users extensive experimental study we have conducted users with significant. o accuracy users’ those tags. we (Section results how similar associated of the target1scenario.. ,We ,found steps shown in [14], different similarity i pute and similaritywe ofthe users interests coverage; the inferred u; the 2.2.1), query iwe)compute shown the angletdifferent theirvector wimeasures perform dif- CiteULike dataset to to the studythe have conductedofon sim(ttofollowingtheas We to of toin A content,search/recommender technique for Web 2.0 web- be described by In we remainder of the accuracy and illustrate howbe [14], 2 .to ti twithdiscover where w that can who created (http://www.citeulike.org/), according tosection, tags coverage: thus describe each tag tacklesimilarity content i [j] distance this intrinsic characteristics chose similarity tags’ we t , com- . to two take place: Users on the we n demon- our ,analy- two cosine the t of as between a Users an extensive experimental querying user weighted by the similarity j the querying user sf proportional the quantified properties sites in terms of developed, ferently, both in terms of accuracy a promisingthus be both rent differentRanking neatly to clus- dif-easilydemon- we accuracy and how resultstheboth tags’ 14],itstagssis, such community should perform dif- those tags. We presentthese two number of times that, tag the.was and coverage;found or howconstantly (http://www.citeulike.org/), how similarity good performance, vectors: (Section 2.2.2), such content, andwith- combine ferently, should taking into account these chose he CiteULikecan measuresmeasures coverage, identified by studying of Users for Improved Accuracy:t1accuracy n , two steps take place: ing similarity Social similaritypute users’ similarity (Section 2.2.1), coverage: compute improves perform be we counts 2 ti intrinsic characteristics tof, . . , t associated to we Tags ugh dataset broadly the users who created be Tags ompromising onand coverage;coverage;position our- usersand how we cosine-based similarity for although target scenario. We of query tags qu is in the Social techniques together3).chosewe by these study we have conducted onj . Given followingQuery j ,Expansion:goodtackle both trating how impact of neatly an extensive experimental udyterms users’ tagto use slightly tagged of of people accuracy and Content (Section (Section chose accuracyother improves coverage, with- We similarity 2.2.3). Clustering the paper p cosine-based set two 1. ti and its we then quantify the performance, its constan tags’ similarity for tags t constantly ge, accuracyto Rankinginsimilarity (Section 2.2.2), quantifiedvary a similarity sim(twi j·twothe the mosttheto be promising towe combine these w two properties activity. of we should be terms the t s with respecttend accuracya way that is proportional tosim(t , t ) = cos(w , infor = of activity,j evencosine ofso to include, besides {t to tstudy } (for wh users lot ac- ut compromising onconstantly good performance, s the Clustering of Users j ) Improved)Accuracy: although angle between their planthe set of query tag imilarity scored other works Usersthe 3). We position (http://www.citeulike.org/), demon- a ratheritiny as tocoverage: 1. Query Expansion: i | i ∈ qu the impa (Section area in dataset 4, in Section although ||w ,planand|| study the although for itshigher techniques together our-ce foreach domain. We thus iden- its constantly good performance, the CiteULike hin withrcrespect to other workseven though tags cano4, neatly improves coverage, with- re presenting our conclusions and future directions ofurbe (Section 2.2.3). vary avectors: we i || ∗ ||wjpanded whole impact of other similarity i tive usersUsers i w j users bookmark accuracyportion of the lot in terms of activity, even the most ac- studypresenting our Users’stratingfuturesimilaritys of broadlypaper set. bookmark a ratherthe future. clearly ) Improvedso to include,future. . ,{ti | ti that } plan toou2.2.1 the of otherSimilarity Section es Tags elves Second, in other in the area measures in the besides tn+m ∈ qu the similarity. impact and how Social Ranking rch rather,impactdomains of ofcompromising Tagsaccuracy (Section tive We position our- study (Section 5). in conclusions similarity directions clus- that portion Users ti panded This suggests Clustering sim(ti forde- users measures inTags users have the ,whole = ·1), those tags tn+1 , . . Tags or, Re tered Users to theout related) query Re tiny of of efore s 3). interests that map to a small proportion of thewi wj Accuracy: although .ty knowledge, people tend to use slightly on esearch thisSocial taggingselves with respect Similaritythus paper set. Sectionsuggests that juserscos(waclearly terms || ∗activity, even 1), most ac- tags tn+1 , . . . , tn+ ry to Users different subsets 2.2.1 withinas 2.2.3UsersTwo-Step Queryrelation- similarity confirmed perform idif-[14], jdifferent the those tags perform dif- 0 he future. enlarged set. of them Users’ to other works differentinterests that measures small measuresasby tags’ in similar = Tags’ Similarity Social ranking fined iden- in Model Users sim(ti , t ) = vary i ,lot ) =de- have w We be- Users shown in [14], in the area similarity 4,map toThisusersproportion in most ||w t||) to the query typically provides a 3-dimensional This content. a perform dif- j of the of 2.2.2 deemed sim(ti , i (for which Tags (Section 5). Resources each domain. We is users bookmark ||w Tags Resources whole CiteULike shown similarity measures ferently, query the usage: each 2.2.2exploits the confirmedthe tags’ as query directions of Tags’ tive Similarity j ) Users fined in [14], different MODEL that tags that are resources our conclusions terms weCiteULike andofcoverage; subset of ferently, both users similar and shown Users adeemed most∈ of the whole the ∈ [n + tags + m rather tiny portion [1, n] and j query 1, n (for ’confirm,tify theusers’ users, resources typically tomodel bookmark content. relation- sim(ti , twe ≤ 1, with define constantly good performance, arity asship between moresimilar (or, rather, related)in(users ofaaccuracy masters aThispaper set.similaritychose in termsihave clearly to coverage; we chose Socialpresenting 5). before and future rityMODEL thusthe similarity tagging The bothsimilarity be- both in terms Two-Step is and coverage;suggests that,We ≤of accuracyide-similarity as ∈ [n + Similarity follows: and tagsprovides 3-dimensional small we choseby whole Users ferently, propose re- accuracy two Model whole user 2. same pair similarity (i.e., sim-users, resources cosine-based (users bookmarkQuery of folksonomyfollows: ) theqmoreofresources and to inclu 2.2.3 This Tags tags, of tags, the more(Section this enlarged set. We for its constantly sharingperformance, set, which i tj call 1,its is constructed so j follow research ∗ with ults, while tags’ with a certain number of tags). Different each and users andconstantlyThiscosine-based similarity for , tags’∈ ∗[1, n] he expanding the querycosine-based to ship resourcesresources discussed above (on users its smalltags)the thewhole to awe subset in the fol-sim(t sourcesas the evaluationmoreconfirm, that users’ similarity Users similarity forgoodtags’ ofpropose that as we plan to proportion impact of other similarity Tags between measures folksonomy, We define usage: definitions and tagsTagsuser masters a on re-interestsperformance, small part similarity map fined good the Datasetlieve, Analysisof Users • Social ranking goal: efficiently connect users ags’ similarity and follows: Tags will although we plan to folksonomy, and clusters. We formulate the hypothesis the two similarity by tags’ the s Users Datasets Analysis more users be derived;way.urceWhen(i.e., u the simplequery qpart withtother .samein the its which most similar the same pair larity as follows: our the are, of improvessimilarity the with a lowing heres wealthough we small to query taggedsimilarity. .exploits pair future. k tagged with constructed so form fairly plan users of model we the folksonomy∈ q have been we overage. users’ regardlessaccuracy of can who while tags’ similarityofstudy submits a sharing u = of CiteULike similarityThisof top thecall q ,sim- 2.1 with the same pairsources results, the target number considerthe future. Tags We formulateeach2hypothesis} on tags) in the fol- whole of2.more sim- certain eso ged same spairTags the the tags, the more sim- in Rthe future. tags). small clusters. tag activity,1users’ similar- a small subset of the MODEL user looking been form fairly haveat users’ that, byin a The study the impact of although other impact definitions for eachcontent. u , is confirmed Differentdiscussed above the the , tand whole {t (on user ti , ,users n This set, tags, the more is tags, in a fash study Tags order to understand tags, characteristics of the rce of measures measures on orderyetou effectivekeyof improvesthatTags of the target be derived; thatlooking beway. When usage: similaramastersqu ilar 1(related) these most similar tags, measures Tags projects are, 3-dimensionalmore tags can users measures be quantified described by submitsare, for each t,k2 ,Nearestusers tags are, regard ction, weregardless how we users whois grounded 1: content canby we consider exploited to querysimilar-regardless∈ofu,the top k who (kNN) str illustrate ofqueryusers’ coverage. discover Transformation of the simple users’ content = {t i tof .the,tn } Neighbour our regardlessmodel similarity two space to who com- users a activity, tags users sharing part q . folksonomy here can com- tag dataset and tags the top in (related) these answer to Tags’ Similarity its gs and to1: Transformation of the dataset have usedilar at users’ e with relevant content within a huge dataset ario,are,tags develop a one: the the these thus In Rquery expansion) characteristics Figure Users understand the the of key that, ity lowing and user u query . Tags folksonomy, Resources Figure how the Figure effective n peculiarities, projects a yet3-dimensionalAnalysis , tags ity can take have and exploited to answerbe recommenderthetags k Nearest tion projects theremainder2.1 CiteULike, t1 , ttheTags’ twooftwo beresources cenario, shownwe 3-dimensional Dataset we illustrate can 2.2.2 similar to systems. A Tags Tags his definition ourhave analysed thisspaceone:typicalmoreSimilarity users Similarity content thategy in clusters. Weour 3-dimensional spaceNeighbour (k ne, as andUsers we incomputeof model that isa grounded, tnhow westeps Tags’ place:used This definition projectsused them. hypothesis thorough analysis s 2.2.1), thus develop our 1, bottom are, , . . . top described formulate the This definition projects ou Resources In tags’ 2.2.22 regardless 2.2.2 similar section, a how we com- quantified morequeryIn order to 2.2.1), typical computesearches more accurately. in com-take place:at define tags’ similarity as follows: mon, users’have analysed CiteULike, spacethe key characteristics of the ,target . , t , two steps fairly smallWe users’ tagby query users’ similar- the more resources they searches to discover used them. what more accurately. form content ensional we combine scenariosbookmarking define asClustering of as tfollows: 2-dimensionallooking dif-egy in activity,2-dimensional one, as shown one, saidshownCiteULike a Tags 2.2.2),where Wewe projects similarityTagsdifferentwhat resources be quantified the onto both accuracy and the thorough nditswebsite. CiteULike is mon,Figure 1, bottom they are,We define tags’t2similarity as follows: the more resourcesk onina Figure systems.coverage will pute similarity Users nwe peculiarities, we in these (Section understand before, Figuresocialbottom in it is This definition combine shownregardless of, .a n more 2.0 how one, as shown in two more similar web- tags’ . that, by recommender A the our been onto set same pair tags q impact sim- Coverage: haveas of 1 theone, asthey similarity (Sectionscenario, sharing ofdevelop query haveinthat tagged withImprovedity can tags,uthe more shown with of k oncontent accuracy and cover Web 2.0 website.used and on. a 1, the bookmarking web-a tags’ model 3-dimensional the query resourcesdespiteand exploited tothe same 1, bottom more sim- social and thus 1. Query Expansion:grounded of how scien- [14], these two isTags forsimilarity measures perform ex-the impact answer both tags, been tagged pair of atTags smallbefore, inascenariosscenariosthebeen tagged with projects the Query Expansion:moredespite the Section Similarly to where we said befo ther said we said before, in whereon.where analysed emergence Tags of accuracy andCoverage: weischosein part. qu3. are, regardless of that aims to promote developsubset of and consistent its the sharing we have ferently, CiteULike, of typicalSimilarlysearches sim-we (related) tags Users 2.2.3). ly we what to promote and develop peculiarities,of scien-definitionboth in terms1.for3-dimensionalwhateach paper before,tags scenarios whatthe users who ite to aims the of references amongst researchers. (Sectionit to one, Figure 1:soilar include,a besides tags, regardless (forilarusers querythese inis ex- that space onto together Similarly have This they used 2.2.3). cata- Clustering same our of broadthe coverage; accurately. throwing awaysimilaritya for theconstantly to u }presented presented in Section 3. of pair Improved more ofthe (related) part. tags{ti | tfolksonomy, each paper who theinformation are, folksonomy, theset said techniques 2-dimensional on a rather of these its dataset ∈good performance, picsreferenceswholeconsistent weand website. the cata- isthese to throwing away information who which | tiThisudefinition projects our 3-dimensional space an s of in small 1:within del.icio.us, Similarlydataset pandeda one,emergence users’ interests include, rather {ti handful q Tags ather a rather Transformation2.0 thephotographs ificareFigure amongst researchers. onto ofsubset of small and Web of consistent 2-dimensional social bookmarking of just a of to aretags. This them. users’ interests subsetrather small Transformation rather broad i of believeto (related) the and website, subsetofailar CiteULike cosine-baseddescribed by panded sothe users besideswould ∈ q } (for which was are, of web- used small and consistent are a of oging in about in the the scientistsUser believe informationplan of scien- a tn+1of sharedother This would . . . , tn+m that aredespite of mationthe the Accuracy: to of to promote ‘resources’,we believe similarity used 1), thatregardlesscore ), . . of those tags about space tags described what a projects our 3-dimensional space t a g web pages ange ofawaypages within del.icio.us, and organize only keeping = wesharingto study one,,handful 1),n+m thatforaImproved broader rangethe Figure in the wh of whole enables siteand aims photographs idevelopsuggestabout definition Coverage: similarity2-dimensional one, as shown in duringUsers aboutthat keeping their Thist definitioninformationthei impact ,of tags. Tagt bottom resources them. This tags opics ofceweb2.2.1 dataset Similarity Toisaddress andalthoughsuggestprojectsproposeiof.=emergenceTags rather ,allthe Coverage:that havetopics1,tagged w rown r CiteULike Users’ projectionwe them. thesei )problems, webyoura 3-dimensionalRanking, n+1 sim(t , those sim(t t Clusteringknowledge are was there just ‘resources’, their measures inawithin thecata- aboutshown ofthem, ontoinexpansion is of in Flickr,stopics website, website, used and whole the Social knowledge 1,thesebroad(for what we each paper in been bottom only 2-dimensional coreasrange inRanking:the whole website, we believe 2. Figure onto what topics shared spacea about these away during Tags isorganize is Ranking, we propose aoften deemed most similar the broader that there is withinesFlickr, CiteULike tags whichreferences amongst researchers. Similarly to the the query who use which query tags folksonomy, < addressaway achosen enablesusedproducehowSocial (Figure one, as thetocommunitiestags1, bottomand0 <Ranking: all0 resources that have where t hrown with freelyproblems,scientistsprojection 2-dimensionaltagsSimilarly While we said before,at least2. wherethat the the extended query set are 3-dimensionaltagstificatheproduce a folkson- del.icio.us,oftentop).shownCollaborative Filtering one tag from ovides a thrownuser therelation-provides aused inspiredrelation-of(Figure whattop). whosimilarin scenariosSimilarly to of tags. This would thrown away du ries with freelyduring hasprojectionto onto 3-dimensional part.1,the future. communities was(for to the these a handful which information scenarios been mationtagsSocial tagging typicallyandhas folkson-and how tags within to deemed most use them, and just of part. Tags u o a said before, R chosen tags runs web pages within loginguser daily process in Figure related papers. We and with that 1, to i describe with While a soand tags (users Filteringre- Activity approach 1,∈ [1, n] and j ∈ [n + one+tag projection is techniquesim(tbookmark traditional sim(tinformation ithrown away during the from the extended query by1,recurrently infor- and )j≤∈ [n + n + m]).at least 1, n m]). describedusers’ interests are a rather small and consistent subset of by braries whichof a , tj ) ≤importantare the , photographs niquettags interests. CiteULike in wiapart. process discard are interests ∈before, int scenarios consistent subset ofof t.tagUsersinspiredvector CiteULike runs [j] i used sof of academic a by traditional Collaborative doing,are recurrently theirrather infor- trieved.broader range of topicsdepends website, we believe hmyacademic (usersargue that,re- argue that,we soi2.2.2users’ todiscard∗[1, n]toset,jsmall1,and callpapers.constructedsharedinclude, theabout on a combination there isDictionary approach Tags and i one may bookmark Flickr, doing, in Resources with between users,where ship interests. wi within resources CiteULike one maymechanisms enables towhich weSimilarityThisSocialthesuggestwhere is weWecore not significant. these first,Tags’organize important bywhole toqthe tags daily Similarly scientists weidentifya the describe withthat ∗ , [22]:thesewhat aresaid hypothesis that,Ranking, website,communities ranking inand whole formulate the used related h To address tag t withsummary of what of tags).[j]with similar a these with Similaritywhere w Different snapshot we identify the users wewe clustered that, by looking similar Their who to them, not and consistent lookingatof addressset, the broader we proposeconstructed withinatwhat tags what a knowledge adefinitionswi [j] freely chosen where a formulatesmall , topics users we produces sources problems, we propose articles Ranking, definitions [22]: ch tag that a first, withwassociated Social i which producesDifferent summary of what users To have certain that, in articles have tagsare tags number ofscenarios the hypothesis inwhich subset include, problems,call q ofis significant. the This We define tags’ range to whatfollows:tags so similaritythetrieved. of thrown computed projection comb nisms each a vector ivector wi users’ interests which rather a folkson- as papers,the morek most similar tags, in tags away during the believe so Their ranking depends on a Similarity accordingFilteringthe notSimilarity er of tags). tag and wasawhat tags. withto a technique in scenarios where tags tagsticomputedresources informationathus describe each tag t with a scribe tmation, iwe believe imes bywith snapshot mation, we believe that, inspired were associatedCollaborative ∈ qu ,tags’ the that (or, is We fashion where We downloaded use posted i whom tiby traditional Collaborative Filtering querying associated of thrown away during analy-(or, describe related associated libraries produce similarity each is umberand twe we then2007. The archive contained range first, we identifyWe are to the top inand exploited techniquetimes that with omy of academic here they haveibeen u , itsinwith wholecanclusteredbelievetag tito of the tagswthe We associated to the we consider that traditionalk to what describe top a relevance by for aeen to of of queryingtag tcan beinterests tofor our analy-theuserthe the same pair quantifiedsimilaritysignificant. a vector papers. tags wito the paper w its inspired iand user was according tothe ts tiarchive j ,effective Ranking, on the broader [22]: notrather, Tags the December associated two users have used in com- the u; users around first,was around thetags the to that times propose consider quantify tags’ mechanisms i information topics,identify topics, with similar simple We downloaded CiteULikerather, daily u; most website, we themore sim- q taggedtopprocessusers withoftotags’k Nearest Neighbour (kNN) strat- whom computedderived; interests. is nota∈ topics ‘relationship’) similarare recurrently used with posted byusers’tsimilarity what tags. to tags lost each t ofsimplea information thus papers,tags, eachfashionthe relevance of i where were rived; here intagSocialone:athe moreassociated information lost is‘relationship’) can be quantified and exploited that, byto theirnumber of [j] runs tags, our projection significant. significant. formulate the hypothesis counts the what tags be similar i gs ms, we such of what articles have regardless of (kNN) according looking at semantictimes that not the similar agscommunitywhothenuserbeen820,000 such ourover-with whattheuncovervpapers Neighbour vectortimesto respect to the query (or,vector wi where wi [j]ti , hly two to mon,the anglequantify eachthat thesimilar resources on away duringto tourwererespect who thorough i was t with a ne twoarchive , December 2007. The archive snapshot summary to significant. tags are, they projectionusersWe vensuch and [22]:have had,Filtering producesuserover-by the toashouldthesecountsitemsrecommender systems. what [j] tag analysis ofi(papers tagged with tag mechanismsyet tin and t wewhich sis, ags cosineWe the more similarthe quantifycontainedwith (related)userk relevant the duringanaly-studying that query tags ti users thus we tagged they their interests to studyinguncover relevant in during by the wi strat- the ch 28,000Collaborativejhadbetweendataset tags’informationtags.topbeu;easily items thewithcontentsearches. A wi describeteach tag associated(papers tagged ditional1: users,iwe who describepostedidentified i of whatqueryingthus downloaded identifiedcontent searches.thus papers, tags’ similarity tags to com-tags’ to community We We describe each tag numberassociatedwhere a thrown Nearest a of is to tagsj t used in thenused papers and ilar should beu;easilyare, regardless according the 28,000 users, Figure Transformationtaggedthus by whom analy- user vector i awhere v i where of of papers u egy describe each Content bedefinition ptheseourwere shouldshould number ofpwill Giventag titags’ ti and tj , to tag,regardlesscosine of theone countsoftheinnumbershouldTheuthat numbertheseused should bei analysis and toj relationshipmore then those tags’ nterests oughly the querying We accordingthe significant. recommender paper projectsthoroughboth accuracycancount more than those was associated totj , w 820,000 used them. iThis easily vectorA by users [1, n] be of and paperthen bethat two tags u with systems.impact studying ‘relationship’)the t coverage timesquantify tagged with . re,using vi [j] usersshouldAsimilar users’ projects ourin2007.Basedonthe associatedhavetag t was abottom[1, n] ,should jcount ll, activity. thewhat resourcesby these archive ontoBased i used tag the . .shownto two 1, paper p . sing tj ) community angle between theirof tagsuch community should taggedidentified activity. counts by j we have i of 3-dimensional spacecounts (t 240,000 distinct tags. This suchbetween archive the of it identifyas they distinct on.be number of sis,their that users archive observations, Given developed associated be quantified sis,i such 240,000countstags.taggedidentifiedthe December3-dimensional thesecontainedjtimes that k Figuretags content Givenwe tagsandand t , wethan quantifytagged the cosine of used with pre-analysis not by users angle egy be these observations, we tag ton ti the[j] definition and astudying of times on tagged by utias users in in developed acontent during content ti exploited ) as the cosine of t rather, the Content Aeasily archivetimes pre-analysis of v a 2-dimensional aled the the space ofContentamount byof papersusers’atagaway taggedof k pj .isboth one, tags ti and tocoverage quantifyj tags’similarity searches. , tj their 28,000 users,thus quantified on users presented ser u; according a 3-dimensionalthese userstovast activity. Contentrecommendation techniquetcalled [n irelevantbe[nsim(ti ,nj+ thesim(tithe anglethe taggers t evealed presence onto a our is roughlypapers throwing describe Similarly toproportional toi the jquantified + + 1, t ) angle similarity of between their users’projects wayofvast tagged of scoredWe shouldin part. paper tagpapers two vectorandSectionSocialRanking. m]); and, m]);cosine of the similarity of the w j . 3. two ionof tag in a our toi ·vast iamount userj , we then hada information is Given over- saidthe w whereSocial will items of the as the and, higherGiven twothatGivenand 240,000highertags. insearch820,000ofproportional the sim(tof, the)angle between their proportionaldistinct u way that ti similarity before,j , t then similarity j and who impact each the accuracy with a we w [j] uncover + 1, n activity. mount tags identified unt presence wthat is analy-one one, only.InInhigherj ,Awesimilarity sim(tarchive tto wasquantified as to Ranking. vectors: bookmarked/used by uiione j user scored the be aboutofSectionthat tjusers’ cosine called the cosine a 2-dimensional vast searchand wj all,by·twou counts the number then recommendation similarity usersproportional to theui andquantify what similarityitechnique i in scenarios where a way users’ the i , tag and what we thatquantify ) as between only. quantified similarity. or- scoredihigher in aawayby wj andusing users information pre-analysis are vectors:small and consistent observations, wequerying user u (papers tagged by s os(w ),of the = way more manageable, we prunedpresentedeventimes 3. rather can beall resourcessubset have been to the querying w about ‘resources’, w keeping only be Second, even though can be easilytagsdatasetwi ·=studyingw ne, Second,dataset as the cosine canpruneda vastusers’ in Social and 2.vast ) bookmarked/used or- (t ,tothrowing a ,user)has sim(u tags as similarity.it Second, vectors: t betweenwe thenBased on these clus- of the have developed ity.tj makeSecond,,uij||)∗though tags uthe be broadly.it Given 2.2interests i and atagstheir vectors: tags’that to i make= jcos(wieveninformationhow presence ofbroadly clus-papersRanking be broadly clus-respect to respect tagged witha content user u (papers tagg associated vectors: broadly er wtags ||w j though i , j ) often (Figure 1,of theWhile Ranking )= w cos(wi ,sim(ui even more jrevealedcan of the pangle between though vectors: the similarity. jthese users should and manageable, wethe j clus- amount angle || used ∗ 2.2 Social a topics in thesearch and recommendation ||w || ||be||wj || paper cosine the two of range oftagsj , Ranking: quantify we believe technique called Social Ranking. w · w top). tags their t tagged these problems, and ||wiwhat had been domains ofby knowledge, peopleor- least whole website, the extended querybe ranked higher,i asj these users ogremoveby those knowledge, that of tagsto use book- n proportionalpapers ∗we proposethattend Ranking, domainsofof) one the sim(tathrownu awayiusejbeenfrominwretrieving sim(ti , tj ) = i · wj , wj )i=tj ) = cos(wi , wj )thes o to remove of||wargue||wj amountSocialsimilarity sim(t , t knowledge, people angle between their i || ddress information abouttags tered in book- domains to the quantified doing, been in subsetsi jthem within each tof at u who ,is interestedslightly ilar users w are sim(t , broader tend tend to use ered in domains ofpapers and people hadtend to usethatLet all usercosineIn that whooneinterestedprojection isshould set cos(wi be ranked higher, as = only those may knowledge, in people we discard important infor- only. tags tered to slightlyilar users tag should re- one that, so s often only by traditional Collaborativethe dataset nique inspired once 1, top). entire dataset.Filtering overwithin each domain. WeWe were 2.were more slightly consider ia user cos(w Their,iden-wi ·independs wj ) = bookmarked/used slightly the der to different thus iden- manageable, we pruned the sim(t Social Ranking Ranking: ·as resourcesdomain.have )i=rankingcos(wi , on a combination of: different the us Let information , user v v withinus ked/usedsubsets ofonce over theWhile dataset.Wesubsets of them j considervi · ) trieved.2.2 We||wmore likely to share interests with ||u ||wj || others, a j = is thus the= We duringtj thus iden- w )taggedj || with jretrieving ||wi ∗ ry with roughlybe 100,000 theto55,000 cos(v ,papers=thattagsi content eachit our specific case,papers). w Model of different (Figure we -Steptags mation, broadlypapers,in =with tifyi theat =and some vtag= beenthevj(in relevance queryi tags associated nt subsetscan thembelieve so,entiremakedistinct,thej tagsthusarei ,that) hadof interestrelated) to the ofcase,||papers).User to the paper∗ ||wj || interests with u than marked/used only them ueryleftQuery Modelwithin uusers domain.tags,significant. similar (or, rather, thequeryspecific to theof queryre-moreislikely||towithretrieving oughwewiththat are similarpapers,each sim(uiwhere tagsnotsomecontent of interest extended query set are User nisms Model leftthe[22]: first, we identify irather, related)similar querycos(v hus roughly sim(uthat, )55,000 distinctvWe aresignificant.from book- j scenarios 100,000 (or,clus-remove vectors:tags,least one j those u j) ) iden- clustered domain. the ∗ ||w ||wi share than eng,28,000 expandingto the the marked/usedourtags, thus be-the u,could ||vranking dependsModelusqueryofofinteresttags our specific QuerybetterUser recommend relev tagsaroundimportant informationthe to not discard topics, according to lost is infor- tstags users.propose exploits rather, onlytags that ||viTheirexplicitly isubmit a related)combination i [j]thus are u; (or, thistify are|| the j ||(or,submit u couldexplicitly rather, similar ||v this respect set. consisting query tags ify 28,000the querying user query similarity analy- expanding ∗ dataset. toeach ||v iw to u consisting tags 2.2.3 in a better a , iModel position to recomme dge, thus users. thusthe twotothe two similarityato thetquery,describe We ∗ tagjt|| with uthetypical irecommender Two-Step in position to propose exploits use two each user ui withtrieved.We, itthus . query || =enlarged moreWewasw where w (in wa vector thusa are u who interested in∈ q Let q considerquery user eags, peopleexploitsthedescribe similarityrelated) over counts2.2.3,numberwof) times in ia·onsimilarity associated of: tagged with tiTwo-Step Query Model model that We tags are clustered enlarged set. We vector tv2 2thecos(wi , j were thatsomeacontent recommender propose tendsimilar slightly tags, studying sim(tit,1 entire . will alternatively, users’ j titypical We be- to we are , 55,000Two-Step Query more nd to once be- 2.2.3t||w should exploits paper Model 2.2.3 papers). tags (papers hus (onusersshouldWetheVarious carefully lieve,measuresother fashion,the systemn]isimilarity then QueryTheaus- tagged with tpropose exploits the two similarity and measures can be of accuracy thethan while we Two-Step run two tags’query enarios analysed dataset identified by inin terms our evaluation chWeeachwherethisthe will more carefullyenlarged of , = query to that where ; distinct tags, .. t )the ttn ; confirm,this enlarged set. alternatively, case, e community evaluation thus iden- this roughlyexpanding be- nthe tags associated to themore thansimilarity model we j j to the thus be u system . tag count vethen analysed(on this datasetthustags’ withusers’Detailed relevance fashion,two tagstags’proposewe||(i.e.,content).us- discussed above, of ∈ users and on tags) in the fol- hin expanding userstags) moretags)fol- thesimilarityi papers, The jquerythe system andthe cosine-quantify withcontent). and 1j ieve, thenusers and be easily in leftthat in terms100,000 used p . Given model [1, could implicitly run a submit query ussed above and on queryon28,000 users.thatfol- set.paperoftagofused the cosine-|| ∗j||wj explicitly aquery,those qu consisting (on (on Variouspopularity,by theseof usage.improves usedWe j betresults,other than u could domain. and our[j] counts tags) inthe times ove activity,Content popularity, confirm, fol- the implicitly the query, query ’ activity, vi papers’ to the similarity whilesimilarity users number users should system can on lost ispapers’ of the results, and tags’ usage. our evaluation will confirm,+ 1,(on+users. and; on tags) in the fol-moretaggers with model we propose explo tag activity.not significant.and mproves accuracy twotaggedu andt u,jlieve,similarity (i.e., similarity setquery tags2 ,(papers, t2 , . .angle between ,to hisin a (or, rather,arelated)qu williquery=. tags’tt2and this } users’more carefully, in)terms2abovetn nofusers’ bywithpropose of theThe the two similarity sers’ and uuserevaluationa =based22similarityquantify concordance-based the. Detailed uhigherare a Given similarityrespectthen,the example, similarity of tj as psim- to the querying user u (papers tagged by sim- i that1 , taggedbythe user i ∈ t t m]); similarity measures tags p1Thecosine associatedpn we user their exploits query discussed [n. query the ,and,alternatively,his measures the similarity Whenour reported vectorquery,respect{t1 nquantifiedn dataseting the sim(titags p1,,p of . , associated the lowing way. When user u submits a query qu = {t1 , t2 , . . . , tn } typical recommender esults in uaquery [27]. = {tconfirm,analysed. , trespect to the coverage. respect system fashion, the system could implicitly serareexpansion)that isusersvWe[5]., . improvesusers’ their lowingresults, paper, orsubmits a querymostfrequently. .ranking of rundiscussed above (onbe comp based in as Withand we, that between of vectors: concordance-based of The qranking . submits way in [27]. Withi 11We ,For, } } {twhere . problem ts submits submits proportional tototthen. . [5]. ofFor example,way. When user qu , ing the set of model t to u ,t with query set.coverage. of. the n query expansion)latest bookmarked while similarity improves query ireported a improves the cosineto thethe problem of vectors: bookmarked paper, utags’ similarity= frequently ,anpaper p a paper then be computed as: ube describedWebbroadly In similarity and countModel illustrate how of his above (on users and on tags) inus- to thissim(ui , though by inbe-activity,2.2.3accuracylatest this usage. we than the setwe com- with toj1,discover } Two-Stepof Query more sim- u {ti ,The eryandandbeenlarged)content users’whilewebsites, the n] shouldtags’section,Detailed those taggedmost (i.e.,2 , of t content thatwould described by queryusers a the t a query, would then p ontent remainder of thisresults, be2.0query clus-used, so that the more ilar users should be tags p t, phigher, asmeasuresthe user to his fol- hatcanrecommending used tagwe[1]tags websites,[1, ity. the thatdescribed section, in by could be popularity, ng accuracycan the contentcan illustrate how thethe angletags remainder(i.e., t In that described by query. 2.0tags’ we com- used tags overall, etc. In both cases,set system answer- es canilarityof papers’ set his j∈ imesSecond,users ui could resultsj aretags expansion) improvesoverall, etc. tags thecases,wtheofsystem1 answer-, tn associated by can be the nding be even recommending ilarity tags query measures can be the ranked query discussed j Web tags twotwo tags’ usersw tused, so that the moretagspropose exploits computedescribed by 0, . . . tags these users are   two steps of cancan drawn:coverage. we computevtags’ ningt1 ,we,query ntwo normallylikely resultsuser u paper, .or, tu ,than steps take frequently. . . , u submits a qu ollowing insights place: similarity query pute users’ v 1, respecttags 2.2.1), users domains that be (Section be [1] ing the to With used discover content of we wing users’ similarity be drawn:2.2.1), howreported inquery jmodel thesim(t t,would cos(wway.place:bookmarked tsubmitsna two of hisway.place: t , usert } 1 l confirm, take users’ ntakeplace: knowledge, people tend to use The [27]. similarity (Section problem more latest the taggers ,with. the lowing most When slightly i + [n + m]); coverage. thatw rankof sharej interests2 ,with set others, and toand, the how the similarity 1 t . i · 2 take place: expansion) pute insights improves share, the In,measures ·(Sectionin aboveandt ) = normally ) = many||w com- the query qu = {t1 , 2 users’each findingcos(vrecommending(regardless 2.0, iwebsites, theof, howthesein∗ the fol- to the e then quantifythesim(ui , ujsimilar more=similar theythet2 .of(on,jhow many on tags)i ||two j ||

    + UCL-CS MobiSysUCL-CS MobiSys, 2 years ago

    custom

    500 views, 1 favs, 1 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 500
      • 498 on SlideShare
      • 2 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 0
    Most viewed embeds
    • 2 views on http://mobisys.cs.ucl.ac.uk

    more

    All embeds
    • 2 views on http://mobisys.cs.ucl.ac.uk

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories