SlideShare a Scribd company logo
1 of 6
Social Ranking:
  Uncovering Relevant
Content Using Tag-based
Recommender Systems
          Valentina Zanardi
              Licia Capra
      Dept. of Computer Science
      University College London
Outline
• Problem definition
• Dataset analysis

• Social Ranking Query Model

• Evaluation of Social Ranking

• Related work

• Conclusion and future work
                                 2
Problem definition
• Content overload




• Personalization of content: Social tagging
  behaviour*



  * S. Golder and B.A.Huberman “Usage patterns of collaborative tagging systems” Journal of Information Science,
  32(2):198-208, 2006                                                                                        3
Dataset Analysis
    CiteULike social bookmarking website:
•   allows the sharing of scientific references amongst
    researchers
•   freely tagged content

                                                           100.000 papers
                                              Prune
                                                            55.000 tags



    Detailed results analyzed in V. Zanardi and L. Capra. quot;Social Ranking: Finding
    Relevant Content in Web 2.0quot; ECAI, 18th European Conference on Artificial
    Intelligence, Patras, Greece. July 2008.                                     4
Insight from CiteULike analysis
                                          •    Each user only bookmark a tiny
                                               portion of the whole paper set.
                                          • The vocabulary spoken by each
                                              user is a tiny proportion of the
                                                 emerging folksonomy.



           Standard information retrieval system yield poor performance :



                    for papers tagged only by a small subset of users
Accuracy


              for tags used only by a small subset of users, due to the empty
Coverage
                                   overlap between tags
                                                                             5
er u; according to our analy-the querying user Social Ranking, thus theideveloped, takingtopicsway. withese website, we believe a query qu = {t1 , t2 , . . .
                            To address these improvesassociated to
   content,query expansion) problems, we propose                             coverage. describe each tag t with a vector wi where When
                                                                                  We thus
                    weighted by the similarity of accuracy of the results, while tags’broader rangelowing accountwhole user u submits
                                                                                                 sites should                 be similarity (i.e.,         of into in the
                                                                                                                                                                                     [j]
                                                   improves tags counts the number of search/recommenderttechnique for Webmeasures discussed above (on users and on tags)
   e users who technique inspired We presentsection, we illustrate times that tag target scenario. away found the projection is                      i was associated web-
 bethe inferredcreated remainderby the
   n easily identified by studying
                            distance of the query to                                                   A content                                                              2.0 to
                                                                                                 intrinsic characteristicsthe information thrown We during
                                                                                                                             that
  nt similarity measures performthe querying user. Given two how weandoftjthe be then discover content When user u described by query t
                       aweightedthose tags.                         this the results
  uch content,In the by thequery oftraditional Collaborative Filtering thus be com-, wetaking into accounttags’ way.that can be submits a query qu = {t
                                                               expansion) improvesthe following twotnot developed, Usersto quantify these
                                                    similarity of dif- paper p                     coverage.
                                                                                                   sites should     tags properties to promising to tackle both as           lowing
nagged bymechanisms [22]: first,be identify theonj
                          these users should We present Users
      extensive experimental study we have conducted users with                                                                     significant.
  o accuracy users’ those tags. we (Section results how similar associated of the target1scenario.. ,We ,found steps shown in [14], different similarity
                                                                                                                              i
                   pute and similaritywe
 ofthe users interests coverage; the inferred u; the 2.2.1), query iwe)compute shown the angletdifferent theirvector wimeasures perform dif-
CiteULike dataset to to the studythe have conductedofon sim(ttofollowingtheas We to of toin A content,search/recommender technique for Web 2.0 web- be described by
                                                       In we remainder of the accuracy and illustrate howbe [14], 2 .to ti twithdiscover where w that can
                       who created
                                   (http://www.citeulike.org/), according tosection, tags coverage: thus describe each tag tacklesimilarity content i [j]
                                                                            distance this intrinsic characteristics
                                                                     chose similarity                                              tags’ we t ,           com- . to two                              take place:




                                                                                                                                Users
                                                       on                                                  the we                                                            n
                                                                                demon-            our ,analy- two cosine
                                                                                                   the t of as                                                   between              a




                                                                        Users
      an extensive experimental querying user weighted by the similarity j the querying user
 sf proportional                        the quantified                                                                         properties         sites in terms of developed, ferently, both in terms of accuracy a
                                                                                                                                                       promisingthus be both
 rent differentRanking neatly to clus- dif-easilydemon- we accuracy and how resultstheboth tags’
 14],itstagssis, such community should perform dif- those tags. We presentthese two number of times that, tag the.was and coverage;found
  or howconstantly (http://www.citeulike.org/), how
                   similarity good performance, vectors:
                                          (Section 2.2.2),
                                                       such content, andwith-                        combine ferently,                                    should                              taking into account these chose
  he CiteULikecan measuresmeasures coverage, identified by studying of Users for Improved Accuracy:t1accuracy n , two steps take place:
  ing similarity   Social similaritypute users’ similarity (Section 2.2.1), coverage: compute
                                                        improves
                                                      perform be                                                                  we
                                                                                                                             counts                                                  2 ti
                                                                                                                                                 intrinsic characteristics tof, . . , t
                                                                                                                                                                                                  associated to         we




                                                                                                  Tags
   ugh                      dataset broadly the users who created
                                   be



                                           Tags
   ompromising onand coverage;coverage;position our- usersand how we cosine-based similarity for although target scenario. We of query tags qu is
    in the Social  techniques together3).chosewe by these study we have conducted onj . Given followingQuery j ,Expansion:goodtackle both
  trating how impact of neatly an extensive experimental
  udyterms users’ tagto use slightly tagged
                          of
     of people accuracy and Content              (Section (Section chose
                                 accuracyother improves coverage, with-
                                                                     We
                                                             similarity          2.2.3).         Clustering               the
                                                                                                                             paper p                                                          cosine-based set
                                                                                                                                                         two 1. ti and its we then quantify the performance, its constan
                                                                                                                                                                                                             tags’ similarity for
                                                                                                                                                               tags             t constantly
   ge, accuracyto Rankinginsimilarity (Section 2.2.2), quantifiedvary a similarity sim(twi j·twothe the mosttheto be promising towe
                                                                                                                             combine these w two properties
                                         activity. of we                                            should be
                                                                                                                                       terms the t
  s with respecttend accuracya way that is proportional tosim(t , t ) = cos(w , infor = of activity,j evencosine ofso to include, besides {t to tstudy } (for wh
                                                                                                           users             lot                                                       ac-
   ut compromising onconstantly good performance, s the Clustering of Users j ) Improved)Accuracy: although angle between their planthe set of query tag
   imilarity scored other works Usersthe 3). We position (http://www.citeulike.org/), demon- a ratheritiny as tocoverage: 1. Query Expansion: i | i ∈ qu the impa
                                                     (Section area in dataset 4,
                                                        in                   Section
                                                                                                                            although ||w ,planand|| study the although
                       for itshigher techniques together our-ce
    foreach domain. We thus iden-
          its constantly good performance,             the CiteULike
hin withrcrespect to other workseven though tags cano4, neatly improves coverage, with-
 re presenting our conclusions and future directions ofurbe                            (Section 2.2.3). vary avectors: we i || ∗ ||wjpanded whole impact of other similarity
                                                                                                       i tive usersUsers i w
                                                                                                             j
                                                                                                             users        bookmark               accuracyportion of the
                                                                                                                              lot in terms of activity, even the most ac-
 studypresenting our Users’stratingfuturesimilaritys of broadlypaper set. bookmark a ratherthe future. clearly ) Improvedso to include,future. . ,{ti | ti that }
  plan toou2.2.1 the of otherSimilarity Section
                      es




                                                                            Tags
  elves                                    Second, in other in  the area                                                                                                                      measures in the besides tn+m ∈ qu
                the similarity. impact and how Social Ranking
   rch rather,impactdomains of ofcompromising Tagsaccuracy (Section tive We position our-
                     study
         (Section 5). in conclusions similarity directions
                                                                                                              clus-                                that portion Users ti panded
                                                                                                                            This suggests Clustering sim(ti forde-
                                                                                                                   users measures inTags users have the ,whole = ·1), those tags tn+1 , . .
   Tags




  or, Re tered Users to theout
                           related)                      query                        Re                                                               tiny         of of
   efore         s
                                                                                                             3). interests that map to a small proportion of thewi wj Accuracy: although
  .ty
                                                            knowledge, people tend to use slightly
                                                                                 on
  esearch thisSocial taggingselves with respect Similaritythus paper set. Sectionsuggests that juserscos(waclearly terms || ∗activity, even 1), most ac- tags tn+1 , . . . , tn+
   ry to Users different subsets 2.2.1 withinas 2.2.3UsersTwo-Step Queryrelation- similarity confirmed perform idif-[14], jdifferent the those tags perform dif- 0
 he future.
                           enlarged set. of them Users’ to other works differentinterests that measures small measuresasby tags’ in similar = Tags’ Similarity
                                                                                             Social ranking
                                                                                                           fined
                                                                                                             iden- in Model Users
                                                                                                                                             sim(ti , t ) = vary i ,lot ) =de-
                                                                                                                                                                   have w
                                                      We be- Users shown in [14], in the area similarity 4,map toThisusersproportion in most ||w t||) to the query
                                                         typically provides a 3-dimensional This content. a perform dif- j of the of 2.2.2                          deemed sim(ti , i                                               (for which




                                                                                                                                Tags
                 (Section 5).




                                                                                                   Resources
                                                                            each domain. We                                                                 is users bookmark ||w




                                                                        Tags
                                            Resources
                                                                                                           whole CiteULike                                                      shown                          similarity measures
                                                                          ferently, query the usage: each 2.2.2exploits the confirmedthe tags’
                                                                                                        as query directions of Tags’ tive                   Similarity j )
                      Users




                                                                                                             fined in [14], different
    MODEL that tags that are resources our conclusions terms weCiteULike andofcoverage; subset of ferently, both users similar and
                                                                                                             shown




                                                                                                                        Users
                                                                                                                                                                                       adeemed most∈ of the whole the ∈ [n + tags + m
                                                                                                                                                                                         rather tiny portion [1, n] and j query 1, n (for
   ’confirm,tify theusers’ users, resources typically tomodel bookmark content. relation- sim(ti , twe ≤ 1, with define constantly good performance,
arity asship between moresimilar (or, rather, related)in(users ofaaccuracy masters aThispaper set.similaritychose in termsihave clearly to coverage; we chose
                                                       Socialpresenting 5).
                                                       before                                         and future
  rityMODEL thusthe similarity tagging The bothsimilarity be- both in terms Two-Step is and coverage;suggests that,We ≤of accuracyide-similarity as ∈ [n +
       Similarity    follows:                                                      and tagsprovides 3-dimensional small we choseby whole


                                                                Users
                                                                                                        ferently, propose re- accuracy two Model
                                                                                                             whole           user
2. same pair similarity (i.e., sim-users, resources cosine-based (users bookmarkQuery of folksonomyfollows: ) theqmoreofresources and to inclu
                                                                                                                              2.2.3                                      This




                                                                                                  Tags
                       tags, of tags, the more(Section this enlarged set. We for its constantly sharingperformance, set, which i tj call 1,its is constructed so j follow
                                                       research                                                                                                                                                ∗ with
 ults, while tags’ with a certain number of tags). Different each and users andconstantlyThiscosine-based similarity for , tags’∈ ∗[1, n]
   he                                    expanding the querycosine-based   to
                                                   ship resourcesresources discussed above (on users its smalltags)the thewhole to awe                        subset in the fol-sim(t
                   sourcesas the evaluationmoreconfirm, that users’ similarity Users similarity forgoodtags’ ofpropose that as we plan to proportion impact of other similarity
                                           Tags




                                                             between measures                              folksonomy, We define
                                                                                                             usage: definitions
                                                                                                         and tagsTagsuser masters a on re-interestsperformance, small
                                                                                                                                                          part similarity map
                                                                                                                                                          fined good                                                      the
      Datasetlieve,       Analysisof Users
                                • Social ranking goal: efficiently connect users
   ags’ similarity and follows: Tags will although we plan to folksonomy, and clusters. We formulate the hypothesis the two similarity by tags’
                                                          the
 s Users Datasets Analysis more users be derived;way.urceWhen(i.e., u the simplequery qpart withtother .samein the its which most similar the same pair
   larity as follows: our the
       are, of improvessimilarity the with a lowing heres wealthough we small to query taggedsimilarity. .exploits pair future. k tagged with constructed so
                                                                                                           form fairly plan users of model we the folksonomy∈ q have been we
 overage. users’   regardlessaccuracy of can who while tags’ similarityofstudy submits a sharing u = of CiteULike similarityThisof top thecall q ,sim-
2.1 with the same pairsources results, the target number considerthe future. Tags We formulateeach2hypothesis} on tags) in the fol- whole
                                                    of2.more sim- certain eso
   ged same spairTags the the tags, the more sim- in Rthe future. tags). small clusters. tag activity,1users’ similar- a small subset of the
                                                               MODEL                                           user looking been
                                                                                                             form fairly haveat users’
                                                                                                           that, byin a
                                                                                                                                 The study the impact of although
                                                                                                                                                   other
                                                                                                                                 impact definitions for eachcontent. u , is confirmed
                                                                                                                            Differentdiscussed above the the , tand
                                                                                                                                                          whole
                                                                                                                                                                  {t (on user ti
                                                                                                                                                                     , ,users n         This set, tags, the more is tags, in a fash
                                                                                                                                                                                                         study




                                                                                   Tags
   order to understand tags, characteristics of
    the                     rce    of                                                                                        measures                                       measures
  on orderyetou effectivekeyof improvesthatTags of the target be derived; thatlooking beway. When usage: similaramastersqu ilar 1(related) these most similar tags,
                                                                          measures
           Tags




         projects are, 3-dimensionalmore tags can users measures be quantified described by submitsare, for each t,k2 ,Nearestusers tags are, regard
ction, weregardless how we users whois grounded 1: content canby we consider exploited to querysimilar-regardless∈ofu,the top k who (kNN) str
                      illustrate ofqueryusers’ coverage. discover Transformation of the simple users’ content = {t i tof .the,tn } Neighbour
                                our regardlessmodel similarity two  space to who
                                                           com- users                                                                              a activity, tags users sharing part q . folksonomy
                                                                                                                   here can com- tag dataset and tags the top
                                                                                                                             in (related) these answer to Tags’ Similarity
                                                                                                                                                                                                                 its
  gs and to1: Transformation of the dataset have usedilar at users’
                        e


                                  with relevant content within a huge dataset
 ario,are,tags develop a one: the the
    these thus
     In               Rquery expansion) characteristics Figure Users
                         understand the the of    key                                                        that,
                                                                                                           ity               lowing and                    user u                  query                    .




                                                                                                                        Tags
                                                                                                                                                          folksonomy,




                                                                                                   Resources
  Figure how the Figure effective
  n peculiarities, projects a yet3-dimensionalAnalysis , tags ity can take have and exploited to answerbe recommenderthetags k Nearest
  tion projects theremainder2.1 CiteULike, t1 , ttheTags’ twooftwo beresources
  cenario, shownwe 3-dimensional Dataset we illustrate                                                                                                               can 2.2.2 similar to systems. A
                                                                Tags
                      Tags




 his definition ourhave analysed thisspaceone:typicalmoreSimilarity users Similarity content thategy in clusters. Weour 3-dimensional spaceNeighbour (k
  ne, as andUsers we incomputeof model that isa grounded, tnhow westeps Tags’ place:used This definition projectsused them. hypothesis thorough analysis
   s 2.2.1), thus develop our 1, bottom are, , . . .                                                                                                                                                           top
                                                                                                                                                                                  described formulate the This definition projects ou
                                            Resources




                            In                             tags’           2.2.22 regardless 2.2.2
                                                     similar section, a how we                               com-           quantified
                                        morequeryIn order to 2.2.1), typical computesearches more accurately. in com-take place:at define tags’ similarity as follows:
                   mon, users’have analysed CiteULike, spacethe key characteristics of the ,target . , t , two steps fairly smallWe users’ tagby query users’ similar- the more resources
                                                                     they                                    searches
                                                                                                                             to discover
                                                                                                                            used them.
                                                                                                                what more accurately.                     form                content
  ensional we combine scenariosbookmarking define asClustering of as tfollows: 2-dimensionallooking dif-egy in activity,2-dimensional one, as shown
   one, saidshownCiteULike a Tags 2.2.2),where Wewe projects similarityTagsdifferentwhat resources be quantified the onto both accuracy and the thorough
 nditswebsite. CiteULike is mon,Figure 1, bottom they are,We define tags’t2similarity as follows: the more resourcesk onina Figure systems.coverage will
                       pute                similarity Users
   nwe peculiarities, we in these (Section understand
                          before, Figuresocialbottom
                                    in it is This definition combine shownregardless of, .a n more
    2.0 how one, as shown in two more similar                                       web-                      tags’                     .                 that, by                                 recommender                       A
                                                              the                                       our been onto set same pair tags q impact sim-      Coverage: haveas of
                                                                                                                               1                                    theone,
             asthey    similarity (Sectionscenario, sharing ofdevelop query haveinthat tagged withImprovedity can tags,uthe more shown with of k oncontent accuracy and cover
Web 2.0 website.used and on. a 1, the bookmarking web-a tags’ model 3-dimensional the query resourcesdespiteand exploited tothe same 1, bottom more sim-
                                                           social and thus 1. Query Expansion:grounded of
                                                                            how scien-                             [14],
                                                                                                    these two isTags forsimilarity measures perform ex-the impact answer both tags,  been tagged                      pair of
 atTags smallbefore, inascenariosscenariosthebeen tagged with projects the Query Expansion:moredespite the Section Similarly to where we said befo
  ther said we said before, in whereon.where analysed emergence Tags of accuracy andCoverage: weischosein part. qu3. are, regardless of
 that aims to promote                            developsubset of
                              and consistent its the sharing we have ferently, CiteULike, of typicalSimilarlysearches sim-we (related) tags
                                                        Users




    2.2.3).
 ly we what to promote and develop peculiarities,of scien-definitionboth in terms1.for3-dimensionalwhateach paper before,tags scenarios whatthe users who
  ite to aims
                                                                                                                                            the                  of
  references amongst researchers. (Sectionit to one, Figure 1:soilar include,a besides tags, regardless (forilarusers querythese inis ex-
        that space onto together Similarly have This
                                                   they used 2.2.3). cata-                         Clustering same our of broadthe coverage; accurately.
                                                                                      throwing awaysimilaritya for theconstantly to u }presented presented in Section 3.
                                                                                                                       of        pair Improved more ofthe
                                                                                                                (related) part. tags{ti | tfolksonomy, each paper who
                                                                                                                 theinformation are, folksonomy, theset said
                       techniques                2-dimensional
                                                       on                                                                        a rather                                            of
                                                                                                                             these its dataset ∈good performance,
picsreferenceswholeconsistent weand website. the cata- isthese to throwing away information who which | tiThisudefinition projects our 3-dimensional space an
   s of in small 1:within del.icio.us, Similarlydataset pandeda one,emergence users’ interests include, rather {ti                              handful q
                              Tags




   ather a rather Transformation2.0 thephotographs
  ificareFigure amongst researchers. onto ofsubset of
                                   small and Web of     consistent 2-dimensional social bookmarking of just a of to aretags. This them. users’ interests subsetrather small
                                                                                                      Transformation rather broad i of
                                                                  believeto (related)
                  the and website, subsetofailar CiteULike cosine-baseddescribed by panded sothe users besideswould ∈ q } (for which
                                                                                                           was are, of web-                                                 used small and consistent are a of
 oging in about in the the scientistsUser believe informationplan of scien- a tn+1of sharedother This would . . . , tn+m that aredespite of
  mationthe the Accuracy: to of to promote
                                  ‘resources’,we believe similarity used 1), thatregardlesscore ), . . of those tags about
                                                   space                                                   tags described what a projects our 3-dimensional space
                                                                                                                                                              t a
   g       web pages
 ange ofawaypages within del.icio.us, and organize only keeping = wesharingto study one,,handful 1),n+m thatforaImproved broader rangethe Figure in the wh
                     of whole enables siteand aims photographs idevelopsuggestabout definition Coverage: similarity2-dimensional one, as shown in
                                duringUsers aboutthat keeping their Thist definitioninformationthei impact ,of tags. Tagt bottom resources
                                                                                                                 them. This tags
opics ofceweb2.2.1 dataset Similarity Toisaddress andalthoughsuggestprojectsproposeiof.=emergenceTags rather ,allthe Coverage:that havetopics1,tagged w
  rown r CiteULike Users’ projectionwe them. thesei )problems, webyoura 3-dimensionalRanking, n+1
                                                                                      sim(t ,                          those sim(t t             Clusteringknowledge are
                                                                                                             was              there just
                                                                ‘resources’, their measures inawithin thecata- aboutshown ofthem, ontoinexpansion
                                                                                                                                       is                           of
 in Flickr,stopics                      website, website, used and
                                             whole                                                           the
                                                                                                                                                      Social knowledge 1,thesebroad(for what we each paper in been bottom
                                                                                                           only 2-dimensional coreasrange inRanking:the whole website, we believe
                                                                                                                                                              2. Figure
                                                                                                        onto                                              what topics
                                                                                                                                                        shared spacea about
                     these away during Tags isorganize is Ranking,
                                                            we propose aoften         deemed most similar the broader   that there is
withinesFlickr, CiteULike tags whichreferences amongst researchers. Similarly to the the query who use which query tags folksonomy, <
addressaway achosen enablesusedproducehowSocial (Figure one, as thetocommunitiestags1, bottomand0 <Ranking: all0 resources that have where t
  hrown with freelyproblems,scientistsprojection 2-dimensionaltagsSimilarly While we said before,at least2. wherethat the the extended query set are
                     3-dimensionaltagstificatheproduce a folkson- del.icio.us,oftentop).shownCollaborative Filtering one tag from
 ovides a thrownuser therelation-provides aused inspiredrelation-of(Figure whattop). whosimilarin scenariosSimilarly to of tags. This would thrown away du
  ries with freelyduring hasprojectionto onto 3-dimensional part.1,the future. communities was(for to the these a handful which information scenarios been
  mationtagsSocial tagging typicallyandhas folkson-and how tags within to deemed most use them, and just
                                                                                                                                                                         of part.
   Tags




               u
             o
                                                                             a                                                                                                                                   said before,
          R
                                     chosen tags runs web pages within
                                                       loginguser daily process                                                            in Figure related papers. We
                                                                                                              and with that 1, to i describe with      While
                                                                   a soand tags (users Filteringre- Activity approach 1,∈ [1, n] and j ∈ [n + one+tag projection is
                                                                      techniquesim(tbookmark traditional sim(tinformation ithrown away during the from the extended query
                                                                                                         by1,recurrently infor- and )j≤∈ [n + n + m]).at least 1, n m]).
                                                                                                                                                                 describedusers’ interests are a rather small and consistent subset of
                                                                                                                                                                              by
   braries                                                whichof a
                                                                                                , tj ) ≤importantare the ,
                                                                                                                       photographs
 niquettags interests. CiteULike in wiapart. process discard are interests ∈before, int scenarios consistent subset ofof
 t.tagUsersinspiredvector CiteULike runs [j]                                                                                i used
 sof of academic a by traditional Collaborative doing,are recurrently theirrather infor- trieved.broader range of topicsdepends website, we believe
hmyacademic (usersargue that,re- argue that,we soi2.2.2users’ todiscard∗[1, n]toset,jsmall1,and callpapers.constructedsharedinclude, theabout on a combination             there isDictionary approach
                                                        Tags




      and i one may bookmark Flickr, doing, in
                               Resources




                    with between users,where
                       ship interests. wi within           resources CiteULike
                                                   one maymechanisms enables towhich weSimilarityThisSocialthesuggestwhere is weWecore not significant. these
                                                                                                   first,Tags’organize important bywhole toqthe tags
                                                                           daily Similarly scientists weidentifya the describe withthat ∗ ,
                                                                                          [22]:thesewhat aresaid hypothesis that,Ranking, website,communities ranking inand whole
                                                                                                           formulate the used                                related
  h To address tag t withsummary of what of tags).[j]with similar
                         a these with Similaritywhere w Different
                              snapshot we identify the users                                                   wewe clustered that, by looking similar Their who to them,
                                                                                                                            not and consistent lookingatof
                                                                                     addressset, the broader we proposeconstructed withinatwhat tags                        what a                        knowledge
                                              adefinitionswi [j] freely chosen where a formulatesmall , topics users we
      produces sources problems, we propose articles Ranking, definitions
                   [22]:
 ch tag that a first, withwassociated Social i
which producesDifferent summary of what users
                                                                                 To have
                                                  certain that, in articles have tagsare tags
                                                              number ofscenarios                                           the hypothesis inwhich subset include,
                                                                                                         problems,call q ofis       significant.                             the
                                                                                      This We define tags’ range to whatfollows:tags so similaritythetrieved. of thrown computed projection comb
 nisms each a vector ivector wi users’ interests which rather a folkson- as papers,the morek most similar tags, in tags away during the                                                believe so Their ranking depends on a
                                                                                                              Similarity accordingFilteringthe notSimilarity
er of tags). tag and wasawhat tags. withto a technique in scenarios where tags tagsticomputedresources informationathus describe each tag t with a
 scribe tmation, iwe believe
  imes bywith snapshot mation, we believe that, inspired were associatedCollaborative ∈ qu ,tags’ the that (or, is We fashion
                                                       where We downloaded                                                                                                                            use
    posted i whom tiby traditional Collaborative Filtering querying associated of thrown away during analy-(or, describe related associated
                                                       libraries                                               produce similarity each                                                                                                         is
umberand twe we then2007. The archive contained range first, we identifyWe are to the top inand exploited
      techniquetimes that with omy of academic here they haveibeen u , itsinwith wholecanclusteredbelievetag tito of the tagswthe We associated to the
                                                                                     we consider that traditionalk to what describe top a relevance
                                                                                                            by                        for
 aeen to of of queryingtag tcan beinterests tofor our analy-theuserthe the same pair quantifiedsimilaritysignificant. a vector papers. tags wito the paper w
                                                                                                                                                               its
                          inspired iand user was according tothe
  ts tiarchive j ,effective Ranking, on the broader [22]: notrather,
                                           Tags
                 the December associated two users have used in com-
                                                     the u; users
                   around first,was around thetags the to
                   that
    times propose consider quantify tags’ mechanisms    i information
                                    topics,identify topics, with similar
                                                       simple           We downloaded CiteULikerather, daily u; most website, we themore sim-
                                                                                                              q taggedtopprocessusers withoftotags’k Nearest Neighbour (kNN) strat-
                               whom computedderived; interests. is nota∈ topics ‘relationship’) similarare recurrently used with
         posted byusers’tsimilarity what tags. to tags lost each t ofsimplea information thus papers,tags, eachfashionthe relevance of i where
                                                                                                             were
 rived; here intagSocialone:athe moreassociated information lost is‘relationship’) can be quantified and exploited that, byto theirnumber of [j]
                                                                                                             runs                                          tags, our         projection
                                                                                                         significant. significant. formulate the hypothesis counts the what tags
                                                                                                                                                  be similar
                                                                                                                                                                                                                                           i
 gs
ms, we
 such
                                                                                                           of what articles have regardless of (kNN) according looking at semantictimes that
                                                                                                                         not the      similar
agscommunitywhothenuserbeen820,000 such ourover-with whattheuncovervpapers Neighbour vectortimesto respect to the query (or,vector wi where wi [j]ti ,
 hly two to mon,the anglequantify eachthat thesimilar resources on away duringto tourwererespect who thorough i was t with a
   ne twoarchive , December 2007. The archive snapshot summary to significant. tags are, they projectionusersWe
vensuch and [22]:have had,Filtering producesuserover-by the toashouldthesecountsitemsrecommender systems. what [j] tag analysis ofi(papers tagged with tag
 mechanismsyet tin and t wewhich sis,
  ags cosineWe the more similarthe quantifycontainedwith (related)userk relevant the duringanaly-studying that query tags
          ti users thus we tagged they their interests to studyinguncover relevant in during by the wi strat- the
ch 28,000Collaborativejhadbetweendataset tags’informationtags.topbeu;easily items thewithcontentsearches. A wi describeteach tag associated(papers tagged
 ditional1: users,iwe who describepostedidentified i of whatqueryingthus downloaded identifiedcontent searches.thus papers, tags’ similarity tags to
                                                           com-tags’ to community We We describe each tag numberassociatedwhere
                                                                              a                               thrown Nearest                                   a of is to
                    tagsj  t              used in thenused papers and ilar
                                    should beu;easilyare, regardless                                                            according
 the 28,000 users,
  Figure                 Transformationtaggedthus by whom analy- user vector i awhere v i where
                           of                          of                papers u                                                     egy
                                                                         describe each Content bedefinition ptheseourwere shouldshould number ofpwill Giventag titags’ ti and tj ,
                                                                                                             to
 tag,regardlesscosine of theone countsoftheinnumbershouldTheuthat numbertheseused should bei analysis and toj relationshipmore then those tags’
  nterests
  oughly                  the querying                 We accordingthe significant. recommender paper projectsthoroughboth accuracycancount more than those was associated totj , w
                                                                  820,000                     used them. iThis easily vectorA by users [1, n] be of and paperthen bethat two tags
                                                                                                               u with systems.impact studying ‘relationship’)the t coverage timesquantify tagged with        .
  re,using vi [j] usersshouldAsimilar users’ projects ourin2007.Basedonthe associatedhavetag t was abottom[1, n] ,should jcount
   ll, activity. thewhat resourcesby these archive ontoBased i used tag the . .shownto two 1, paper p .
   sing tj ) community angle between theirof tagsuch community should taggedidentified
                                                                                    activity. counts                                 by j we have   i of 3-dimensional spacecounts
   (t 240,000 distinct tags. This suchbetween archive
                        the of it
  identifyas they distinct on.be number of sis,their that users archive observations, Given developed associated be quantified
 sis,i such 240,000countstags.taggedidentifiedthe December3-dimensional thesecontainedjtimes that k Figuretags content Givenwe tagsandand t , wethan quantifytagged
    the cosine of used with pre-analysis not by users    angle                        egy                   be these observations, we tag ton                                   ti
                                               the[j] definition and astudying of times on tagged by utias users in in developed acontent during content ti exploited ) as the cosine of t
                                                                                                                                                          rather,
                     the Content Aeasily archivetimes     pre-analysis                                                                of
                                                   v                                                   a 2-dimensional
 aled the the space ofContentamount byof papersusers’atagaway taggedof k pj .isboth one, tags ti and tocoverage quantifyj tags’similarity searches. , tj their
                                                                    28,000 users,thus quantified on users presented
 ser u; according a 3-dimensionalthese userstovast activity. Contentrecommendation techniquetcalled [n irelevantbe[nsim(ti ,nj+ thesim(tithe anglethe taggers t
  evealed presence onto a our is roughlypapers throwing describe Similarly toproportional toi the jquantified + + 1, t ) angle similarity of between their
 users’projects wayofvast tagged of scoredWe shouldin part. paper tagpapers two vectorandSectionSocialRanking. m]); and, m]);cosine of the similarity of the w  j . 3.                              two
   ionof tag in a our toi ·vast iamount userj , we then hada information is Given over- saidthe w whereSocial will items of the as the and,
   higherGiven twothatGivenand 240,000highertags. insearch820,000ofproportional the sim(tof, the)angle between their
                                                         proportionaldistinct u way that ti similarity before,j , t then similarity
                                                                                                                                                                                                                       j
                                                                               and who impact each
                                                                                      the                                         accuracy
                                                                                                                             with a                            we w [j]
                                                                                                                                                              uncover + 1, n
                    activity.
   mount tags identified
 unt
                        presence
                                      wthat is analy-one one, only.InInhigherj ,Awesimilarity sim(tarchive tto wasquantified as to Ranking. vectors:
                        bookmarked/used by             uiione j user scored the be aboutofSectionthat tjusers’ cosine called the cosine
                                          a 2-dimensional                             vast
                                                                                                   searchand
                                                wj all,by·twou counts the number then recommendation similarity
                                          usersproportional to theui andquantify what similarityitechnique i in scenarios where
                                                                                                  a way users’ the i , tag
                                                                                                               and          what we
                                                                                                              thatquantify ) as
                                                                                                                                                                                                                 between
                                                                          only. quantified
                                                                   similarity. or-
 scoredihigher in aawayby wj andusing users information pre-analysis are vectors:small and consistent observations, wequerying user u (papers tagged by s
  os(w ),of the = way more manageable, we prunedpresentedeventimes 3. rather can beall resourcessubset have been to the querying
                  w about ‘resources’, w keeping only be Second, even though can
  be easilytagsdatasetwi ·=studyingw
   ne, Second,dataset as the cosine canpruneda vastusers’ in Social and 2.vast
                        ) bookmarked/used                                                or-
  (t ,tothrowing a ,user)has sim(u tags as similarity.it Second, vectors: t betweenwe thenBased on these clus- of the have developed
  ity.tj makeSecond,,uij||)∗though tags uthe be broadly.it Given 2.2interests i and atagstheir vectors: tags’that
 to i make= jcos(wieveninformationhow presence ofbroadly clus-papersRanking be broadly clus-respect to respect tagged witha content user u (papers tagg
                                                                                                                                                        associated vectors:
                                                                                                                                                            broadly
   er               wtags ||w j though i , j ) often (Figure 1,of theWhile Ranking
                          )= w
   cos(wi ,sim(ui even more jrevealedcan of the pangle between though vectors:
                          the
 similarity. jthese users should and                    manageable, wethe j clus- amount angle
                                                         ||
                                                    used ∗                                       2.2 Social a topics in thesearch and recommendation
                                                 ||w || ||be||wj || paper cosine the two of range oftagsj , Ranking: quantify we believe technique called Social Ranking. w · w
                                                                                                top). tags         their t
 tagged these problems, and ||wiwhat had been domains ofby knowledge, peopleor- least whole website, the extended querybe ranked higher,i asj these users
ogremoveby those knowledge, that of tagsto use book-
  n proportionalpapers ∗we proposethattend Ranking, domainsofof) one the sim(tathrownu awayiusejbeenfrominwretrieving sim(ti , tj ) = i · wj , wj )i=tj ) = cos(wi , wj )thes
  o to remove of||wargue||wj amountSocialsimilarity sim(t , t knowledge, people angle between their
                                       i ||
   ddress information abouttags tered in book-
        domains to the quantified doing, been in subsetsi jthem within each tof at u who ,is interestedslightly ilar users w are sim(t ,
                                                                                                    broader                            tend tend to use
   ered in domains ofpapers and people hadtend to usethatLet all usercosineIn that whooneinterestedprojection isshould set cos(wi be ranked higher, as =
       only those may knowledge, in people we discard important infor- only.
                                                     tags                    tered                                                             to         slightlyilar users
                                                                                                                                                           tag                                          should re-
                       one                        that, so
   s often only by traditional Collaborativethe dataset
 nique inspired once 1, top). entire dataset.Filtering
                                      overwithin each domain. WeWe were           2.were more slightly consider ia user cos(w Their,iden-wi ·independs wj ) =
                                                                             bookmarked/used
                                                                                     slightly
                                                the der to different thus iden- manageable, we pruned the sim(t Social Ranking
                                                                                      Ranking: ·as resourcesdomain.have )i=rankingcos(wi , on a combination of:
                                                                             different                  the us
                                                                                                       Let information , user
                                                                                                     v v withinus
ked/usedsubsets ofonce over theWhile dataset.Wesubsets of them j considervi · ) trieved.2.2 We||wmore likely to share interests with ||u ||wj || others, a
                                                                                                                                   j =
                                                                                                                                                     is
                                                                                                                                                      thus the=
                                                                                                                                                We duringtj thus iden-
                                                                                                                                                      w           )taggedj ||   with
                                                                                                                                                                           jretrieving
                                                                                                                                                                                                                           ||wi ∗
   ry with roughlybe 100,000 theto55,000 cos(v ,papers=thattagsi content eachit our specific case,papers).
 w Model of
 different (Figure we
 -Steptags mation, broadlypapers,in =with tifyi theat =and some vtag= beenthevj(in relevance queryi tags associated
 nt subsetscan thembelieve so,entiremakedistinct,thej tagsthusarei ,that) hadof interestrelated) to the ofcase,||papers).User to the paper∗ ||wj || interests with u than
marked/used only them
ueryleftQuery Modelwithin uusers domain.tags,significant. similar (or, rather, thequeryspecific to theof queryre-moreislikely||towithretrieving
 oughwewiththat are similarpapers,each sim(uiwhere tagsnotsomecontent of interest extended query set are User
 nisms Model
   leftthe[22]: first, we identify irather, related)similar querycos(v
  hus                      roughly        sim(uthat, )55,000 distinctvWe aresignificant.from book-
                                                               j
                                                                     scenarios
                                   100,000 (or,clus-remove vectors:tags,least one j
                                                                           those    u j) )            iden-
                                                                                                     clustered                          domain.
                                                                                                                                                                the
                                                                                                                                                                       ∗ ||w                       ||wi        share              than
eng,28,000 expandingto the the marked/usedourtags, thus be-the u,could ||vranking dependsModelusqueryofofinteresttags our specific QuerybetterUser recommend relev
                tagsaroundimportant informationthe to not
                   discard topics, according to lost is   infor-
  tstags users.propose exploits rather, onlytags that ||viTheirexplicitly isubmit a related)combination i [j]thus are
                                               u; (or, thistify                                are|| the j ||(or,submit
                                                                                                 u couldexplicitly rather,
                                                                                                       similar ||v this respect set. consisting query tags
   ify
 28,000the querying user query similarity analy- expanding ∗ dataset. toeach ||v iw to u consisting tags 2.2.3 in a better a , iModel position to recomme
 dge, thus users. thusthe twotothe two similarityato thetquery,describe We ∗ tagjt|| with uthetypical irecommender Two-Step in position to
   propose exploits use two each user ui withtrieved.We, itthus . query || =enlarged moreWewasw where w (in                                         wa vector thusa are u who interested in∈
                                                                                                                                                     q
                                                                                                                                                     Let
                                                                                                                                                     q          considerquery  user
 eags, peopleexploitsthedescribe similarityrelated) over counts2.2.3,numberwof) times in ia·onsimilarity associated of: tagged with tiTwo-Step Query Model
  model that We tags are clustered enlarged set. We vector tv2 2thecos(wi , j were thatsomeacontent recommender
      propose tendsimilar slightly tags, studying sim(tit,1 entire . will alternatively, users’ j titypical We be- to
                  we are                                                                                   , 55,000Two-Step Query more
   nd to                                                                          once                                                                          be-
                                                                                                                             2.2.3t||w should exploits paper Model 2.2.3 papers). tags
                                                                                                                                                                              (papers
 hus (onusersshouldWetheVarious carefully lieve,measuresother fashion,the systemn]isimilarity then QueryTheaus- tagged with tpropose exploits the two similarity
                                                  and measures can be of accuracy thethan while we Two-Step run two tags’query
 enarios analysed dataset identified by inin terms our evaluation
 chWeeachwherethisthe will more carefullyenlarged of                                                     , = query to that
                                                                                                           where ; distinct tags,
                                                                                                           ..
                                                                                                      t )the ttn ; confirm,this enlarged set.
                                                                                                                        alternatively,                                                                 case,
 e community evaluation thus iden- this roughlyexpanding be- nthe tags associated to themore thansimilarity model we j j
                                                             to the thus be u system .
                                                                                                                                                    tag count
vethen analysed(on this datasetthustags’ withusers’Detailed relevance fashion,two tagstags’proposewe||(i.e.,content).us- discussed above, of ∈ users and on tags) in the fol-
  hin expanding userstags) moretags)fol- thesimilarityi papers, The jquerythe system andthe cosine-quantify withcontent).
                                                                                     and            1j
  ieve, thenusers and be easily in leftthat in terms100,000 used p . Given model [1, could implicitly run a submit query
  ussed above and on queryon28,000 users.thatfol- set.paperoftagofused the cosine-|| ∗j||wj explicitly aquery,those qu consisting (on
        (on Variouspopularity,by theseof usage.improves usedWe j betresults,other than u could
                     domain.
            and our[j] counts tags) inthe times
 ove activity,Content popularity,                     confirm, fol-                    the                                                            implicitly the query,                                       query
   ’ activity, vi papers’ to the     similarity whilesimilarity users
                                                    number users should                            system
                                                                                                    can
 on lost ispapers’ of the results, and tags’ usage. our evaluation will confirm,+ 1,(on+users. and; on tags) in the fol-moretaggers with model we propose explo
 tag activity.not significant.and
   mproves accuracy twotaggedu andt u,jlieve,similarity (i.e., similarity setquery tags2 ,(papers, t2 , . .angle between ,to hisin a
 (or, rather,arelated)qu williquery=. tags’tt2and this } users’more carefully, in)terms2abovetn nofusers’ bywithpropose of theThe the two similarity
   sers’
and uuserevaluationa =based22similarityquantify concordance-based the.
                                                                               Detailed
uhigherare a Given similarityrespectthen,the example, similarity of tj as psim- to the querying user u (papers tagged by sim-
                                                                                                                                         i
                                                                                                                                            that1 , taggedbythe user i ∈
                                                                                                                                                  t
                                                                                                                                                 t m]); similarity
                                                                                                        measures tags p1Thecosine associatedpn we user their exploits query
                                                                                                                       discussed [n. query the ,and,alternatively,his
                                                                                                                                                                            measures
                                                                                                                                                                      the similarity
Whenour reported vectorquery,respect{t1 nquantifiedn dataseting the sim(titags p1,,p of . , associated the lowing way. When user u submits a query qu = {t1 , t2 , . . . , tn }                       typical recommender
  esults in uaquery [27]. = {tconfirm,analysed. , trespect to the coverage. respect system fashion, the system could implicitly
 serareexpansion)that isusersvWe[5]., . improvesusers’ their lowingresults, paper, orsubmits a querymostfrequently. .ranking of rundiscussed above (onbe comp
                   based in as Withand we, that between of vectors: concordance-based of The qranking .
     submits way in [27]. Withi 11We ,For, } }       {twhere . problem
  ts submits submits proportional tototthen. . [5]. ofFor example,way. When user
                                                           qu                ,                     ing the set of                                       model                t to
    u                                                      ,t
                   with query set.coverage. of. the n query expansion)latest bookmarked while       similarity
                                                                                                     improves
 query ireported a improves the cosineto thethe problem of vectors: bookmarked paper, utags’ similarity= frequently ,anpaper p a paper then be computed as:
                                     ube describedWebbroadly In similarity and countModel illustrate how of his above (on users and on tags) inus-
         to thissim(ui , though by inbe-activity,2.2.3accuracylatest this usage. we than the setwe com- with toj1,discover }
                                                                                       Two-Stepof        Query more                                        sim- u {ti ,The
 eryandandbeenlarged)content users’whilewebsites, the n] shouldtags’section,Detailed those taggedmost (i.e.,2 , of t content thatwould described by queryusers a
                                                                                                         the                                                                       t                            a query, would then
                                                                                                                                                                                                                       p
 ontent remainder of thisresults, be2.0query clus-used, so that the more ilar users should be tags p t, phigher, asmeasuresthe user to his fol-
hatcanrecommending used tagwe[1]tags websites,[1,
  ity. the thatdescribed section, in by could be popularity,
 ng accuracycan the contentcan illustrate how thethe                       angletags remainder(i.e.,
   t In that described by query. 2.0tags’ we com- used tags overall, etc. In both cases,set system answer-
   es canilarityof                                                          papers’                                                                   set      his                j∈
  imesSecond,users ui could resultsj aretags expansion) improvesoverall, etc. tags thecases,wtheofsystem1 answer-, tn associated by can be the
   nding be even      recommending ilarity         tags query                                                               measures can be the ranked query
                                                                                                                                                discussed
                                       j                     Web                                                                                                                                                                             tags
                                                                                                                                                twotwo tags’
                                                                                                                                                          usersw
                                                               tused, so that the moretagspropose exploits computedescribed by 0, . . . tags these users are                                                                                    
 two steps of cancan drawn:coverage. we computevtags’ ningt1 ,we,query ntwo normallylikely resultsuser u paper, .or, tu ,than steps take frequently. . . , u submits a qu
 ollowing insights place: similarity query pute users’ v 1, respecttags 2.2.1), users
       domains that be (Section be [1]                                                                                                           ing the
                                                                                                        to
                                                                                             With used discover content of we
 wing users’ similarity be drawn:2.2.1), howreported inquery jmodel thesim(t t,would cos(wway.place:bookmarked tsubmitsna two of hisway.place: t , usert } 1
  l confirm, take users’
  ntakeplace: knowledge, people tend to use The [27]. similarity (Section problem more latest the taggers ,with. the lowing most When
                                                                              slightly i +
                                                                                      [n             + m]);           coverage. thatw rankof sharej interests2 ,with set others, and
                                                                                                                 toand, the how
                                                                                                                     the                                      similarity 1 t .
                                                                                                                                                              i ·
                                                                                                                                                                                 2
take place:
 expansion)
 pute      insights improves
                                                   share, the In,measures ·(Sectionin aboveandt ) = normally ) = many||w com- the query qu = {t1 , 2
                                    users’each findingcos(vrecommending(regardless 2.0, iwebsites, theof, howthesein∗ the fol- to the
 e then quantifythesim(ui , ujsimilar more=similar theythet2 .of(on,jhow many on tags)i ||two j ||

More Related Content

Viewers also liked

Social Computing Research
Social Computing ResearchSocial Computing Research
Social Computing ResearchUCL-CS MobiSys
 
Media Sharing on Urban Transport
Media Sharing on Urban TransportMedia Sharing on Urban Transport
Media Sharing on Urban TransportUCL-CS MobiSys
 
Sentient Computing For Innovation
Sentient Computing For InnovationSentient Computing For Innovation
Sentient Computing For InnovationUCL-CS MobiSys
 
Reading Group: Computer of the 21st Century
Reading Group: Computer of the 21st CenturyReading Group: Computer of the 21st Century
Reading Group: Computer of the 21st CenturyUCL-CS MobiSys
 
Choisir sa solution ALM: séminaire
Choisir sa solution ALM: séminaireChoisir sa solution ALM: séminaire
Choisir sa solution ALM: séminaireEnalean
 

Viewers also liked (6)

Social Computing Research
Social Computing ResearchSocial Computing Research
Social Computing Research
 
Media Sharing on Urban Transport
Media Sharing on Urban TransportMedia Sharing on Urban Transport
Media Sharing on Urban Transport
 
Sentient Computing For Innovation
Sentient Computing For InnovationSentient Computing For Innovation
Sentient Computing For Innovation
 
Reading Group: Computer of the 21st Century
Reading Group: Computer of the 21st CenturyReading Group: Computer of the 21st Century
Reading Group: Computer of the 21st Century
 
Pervasive Computing
Pervasive ComputingPervasive Computing
Pervasive Computing
 
Choisir sa solution ALM: séminaire
Choisir sa solution ALM: séminaireChoisir sa solution ALM: séminaire
Choisir sa solution ALM: séminaire
 

Similar to RecSys 2008: Social Ranking

Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processingasapteam
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommendergu wendong
 
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET Journal
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET Journal
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory acijjournal
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET Journal
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Balázs Hidasi
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationKayla Jones
 
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...IEEEMEMTECHSTUDENTSPROJECTS
 
Named Entity Recognition using Bi-LSTM and Tenserflow Model
Named Entity Recognition using Bi-LSTM and Tenserflow ModelNamed Entity Recognition using Bi-LSTM and Tenserflow Model
Named Entity Recognition using Bi-LSTM and Tenserflow ModelIRJET Journal
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Editor IJARCET
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions R A Akerkar
 

Similar to RecSys 2008: Social Ranking (20)

Toward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k ProcessingToward Personalized Peer-to-Peer Top-k Processing
Toward Personalized Peer-to-Peer Top-k Processing
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
 
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
IRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using CobwebIRJET - Twitter Spam Detection using Cobweb
IRJET - Twitter Spam Detection using Cobweb
 
Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory Generating domain specific sentiment lexicons using the Web Directory
Generating domain specific sentiment lexicons using the Web Directory
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social Web
 
Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...Context-aware similarities within the factorization framework (CaRR 2013 pres...
Context-aware similarities within the factorization framework (CaRR 2013 pres...
 
Lise Getoor, "
Lise Getoor, "Lise Getoor, "
Lise Getoor, "
 
IRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET - Deep Collaborrative Filtering with Aspect Information
IRJET - Deep Collaborrative Filtering with Aspect Information
 
An Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation GenerationAn Unsupervised Approach For Reputation Generation
An Unsupervised Approach For Reputation Generation
 
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...
IEEE 2014 JAVA DATA MINING PROJECTS Multi comm finding community structure in...
 
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...
2014 IEEE JAVA DATA MINING PROJECT Multi comm finding community structure in ...
 
Named Entity Recognition using Bi-LSTM and Tenserflow Model
Named Entity Recognition using Bi-LSTM and Tenserflow ModelNamed Entity Recognition using Bi-LSTM and Tenserflow Model
Named Entity Recognition using Bi-LSTM and Tenserflow Model
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382Ijarcet vol-2-issue-4-1374-1382
Ijarcet vol-2-issue-4-1374-1382
 
paper
paperpaper
paper
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions
 

Recently uploaded

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 

Recently uploaded (20)

Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

RecSys 2008: Social Ranking

  • 1. Social Ranking: Uncovering Relevant Content Using Tag-based Recommender Systems Valentina Zanardi Licia Capra Dept. of Computer Science University College London
  • 2. Outline • Problem definition • Dataset analysis • Social Ranking Query Model • Evaluation of Social Ranking • Related work • Conclusion and future work 2
  • 3. Problem definition • Content overload • Personalization of content: Social tagging behaviour* * S. Golder and B.A.Huberman “Usage patterns of collaborative tagging systems” Journal of Information Science, 32(2):198-208, 2006 3
  • 4. Dataset Analysis CiteULike social bookmarking website: • allows the sharing of scientific references amongst researchers • freely tagged content 100.000 papers Prune 55.000 tags Detailed results analyzed in V. Zanardi and L. Capra. quot;Social Ranking: Finding Relevant Content in Web 2.0quot; ECAI, 18th European Conference on Artificial Intelligence, Patras, Greece. July 2008. 4
  • 5. Insight from CiteULike analysis • Each user only bookmark a tiny portion of the whole paper set. • The vocabulary spoken by each user is a tiny proportion of the emerging folksonomy. Standard information retrieval system yield poor performance : for papers tagged only by a small subset of users Accuracy for tags used only by a small subset of users, due to the empty Coverage overlap between tags 5
  • 6. er u; according to our analy-the querying user Social Ranking, thus theideveloped, takingtopicsway. withese website, we believe a query qu = {t1 , t2 , . . . To address these improvesassociated to content,query expansion) problems, we propose coverage. describe each tag t with a vector wi where When We thus weighted by the similarity of accuracy of the results, while tags’broader rangelowing accountwhole user u submits sites should be similarity (i.e., of into in the [j] improves tags counts the number of search/recommenderttechnique for Webmeasures discussed above (on users and on tags) e users who technique inspired We presentsection, we illustrate times that tag target scenario. away found the projection is i was associated web- bethe inferredcreated remainderby the n easily identified by studying distance of the query to A content 2.0 to intrinsic characteristicsthe information thrown We during that nt similarity measures performthe querying user. Given two how weandoftjthe be then discover content When user u described by query t aweightedthose tags. this the results uch content,In the by thequery oftraditional Collaborative Filtering thus be com-, wetaking into accounttags’ way.that can be submits a query qu = {t expansion) improvesthe following twotnot developed, Usersto quantify these similarity of dif- paper p coverage. sites should tags properties to promising to tackle both as lowing nagged bymechanisms [22]: first,be identify theonj these users should We present Users extensive experimental study we have conducted users with significant. o accuracy users’ those tags. we (Section results how similar associated of the target1scenario.. ,We ,found steps shown in [14], different similarity i pute and similaritywe ofthe users interests coverage; the inferred u; the 2.2.1), query iwe)compute shown the angletdifferent theirvector wimeasures perform dif- CiteULike dataset to to the studythe have conductedofon sim(ttofollowingtheas We to of toin A content,search/recommender technique for Web 2.0 web- be described by In we remainder of the accuracy and illustrate howbe [14], 2 .to ti twithdiscover where w that can who created (http://www.citeulike.org/), according tosection, tags coverage: thus describe each tag tacklesimilarity content i [j] distance this intrinsic characteristics chose similarity tags’ we t , com- . to two take place: Users on the we n demon- our ,analy- two cosine the t of as between a Users an extensive experimental querying user weighted by the similarity j the querying user sf proportional the quantified properties sites in terms of developed, ferently, both in terms of accuracy a promisingthus be both rent differentRanking neatly to clus- dif-easilydemon- we accuracy and how resultstheboth tags’ 14],itstagssis, such community should perform dif- those tags. We presentthese two number of times that, tag the.was and coverage;found or howconstantly (http://www.citeulike.org/), how similarity good performance, vectors: (Section 2.2.2), such content, andwith- combine ferently, should taking into account these chose he CiteULikecan measuresmeasures coverage, identified by studying of Users for Improved Accuracy:t1accuracy n , two steps take place: ing similarity Social similaritypute users’ similarity (Section 2.2.1), coverage: compute improves perform be we counts 2 ti intrinsic characteristics tof, . . , t associated to we Tags ugh dataset broadly the users who created be Tags ompromising onand coverage;coverage;position our- usersand how we cosine-based similarity for although target scenario. We of query tags qu is in the Social techniques together3).chosewe by these study we have conducted onj . Given followingQuery j ,Expansion:goodtackle both trating how impact of neatly an extensive experimental udyterms users’ tagto use slightly tagged of of people accuracy and Content (Section (Section chose accuracyother improves coverage, with- We similarity 2.2.3). Clustering the paper p cosine-based set two 1. ti and its we then quantify the performance, its constan tags’ similarity for tags t constantly ge, accuracyto Rankinginsimilarity (Section 2.2.2), quantifiedvary a similarity sim(twi j·twothe the mosttheto be promising towe combine these w two properties activity. of we should be terms the t s with respecttend accuracya way that is proportional tosim(t , t ) = cos(w , infor = of activity,j evencosine ofso to include, besides {t to tstudy } (for wh users lot ac- ut compromising onconstantly good performance, s the Clustering of Users j ) Improved)Accuracy: although angle between their planthe set of query tag imilarity scored other works Usersthe 3). We position (http://www.citeulike.org/), demon- a ratheritiny as tocoverage: 1. Query Expansion: i | i ∈ qu the impa (Section area in dataset 4, in Section although ||w ,planand|| study the although for itshigher techniques together our-ce foreach domain. We thus iden- its constantly good performance, the CiteULike hin withrcrespect to other workseven though tags cano4, neatly improves coverage, with- re presenting our conclusions and future directions ofurbe (Section 2.2.3). vary avectors: we i || ∗ ||wjpanded whole impact of other similarity i tive usersUsers i w j users bookmark accuracyportion of the lot in terms of activity, even the most ac- studypresenting our Users’stratingfuturesimilaritys of broadlypaper set. bookmark a ratherthe future. clearly ) Improvedso to include,future. . ,{ti | ti that } plan toou2.2.1 the of otherSimilarity Section es Tags elves Second, in other in the area measures in the besides tn+m ∈ qu the similarity. impact and how Social Ranking rch rather,impactdomains of ofcompromising Tagsaccuracy (Section tive We position our- study (Section 5). in conclusions similarity directions clus- that portion Users ti panded This suggests Clustering sim(ti forde- users measures inTags users have the ,whole = ·1), those tags tn+1 , . . Tags or, Re tered Users to theout related) query Re tiny of of efore s 3). interests that map to a small proportion of thewi wj Accuracy: although .ty knowledge, people tend to use slightly on esearch thisSocial taggingselves with respect Similaritythus paper set. Sectionsuggests that juserscos(waclearly terms || ∗activity, even 1), most ac- tags tn+1 , . . . , tn+ ry to Users different subsets 2.2.1 withinas 2.2.3UsersTwo-Step Queryrelation- similarity confirmed perform idif-[14], jdifferent the those tags perform dif- 0 he future. enlarged set. of them Users’ to other works differentinterests that measures small measuresasby tags’ in similar = Tags’ Similarity Social ranking fined iden- in Model Users sim(ti , t ) = vary i ,lot ) =de- have w We be- Users shown in [14], in the area similarity 4,map toThisusersproportion in most ||w t||) to the query typically provides a 3-dimensional This content. a perform dif- j of the of 2.2.2 deemed sim(ti , i (for which Tags (Section 5). Resources each domain. We is users bookmark ||w Tags Resources whole CiteULike shown similarity measures ferently, query the usage: each 2.2.2exploits the confirmedthe tags’ as query directions of Tags’ tive Similarity j ) Users fined in [14], different MODEL that tags that are resources our conclusions terms weCiteULike andofcoverage; subset of ferently, both users similar and shown Users adeemed most∈ of the whole the ∈ [n + tags + m rather tiny portion [1, n] and j query 1, n (for ’confirm,tify theusers’ users, resources typically tomodel bookmark content. relation- sim(ti , twe ≤ 1, with define constantly good performance, arity asship between moresimilar (or, rather, related)in(users ofaaccuracy masters aThispaper set.similaritychose in termsihave clearly to coverage; we chose Socialpresenting 5). before and future rityMODEL thusthe similarity tagging The bothsimilarity be- both in terms Two-Step is and coverage;suggests that,We ≤of accuracyide-similarity as ∈ [n + Similarity follows: and tagsprovides 3-dimensional small we choseby whole Users ferently, propose re- accuracy two Model whole user 2. same pair similarity (i.e., sim-users, resources cosine-based (users bookmarkQuery of folksonomyfollows: ) theqmoreofresources and to inclu 2.2.3 This Tags tags, of tags, the more(Section this enlarged set. We for its constantly sharingperformance, set, which i tj call 1,its is constructed so j follow research ∗ with ults, while tags’ with a certain number of tags). Different each and users andconstantlyThiscosine-based similarity for , tags’∈ ∗[1, n] he expanding the querycosine-based to ship resourcesresources discussed above (on users its smalltags)the thewhole to awe subset in the fol-sim(t sourcesas the evaluationmoreconfirm, that users’ similarity Users similarity forgoodtags’ ofpropose that as we plan to proportion impact of other similarity Tags between measures folksonomy, We define usage: definitions and tagsTagsuser masters a on re-interestsperformance, small part similarity map fined good the Datasetlieve, Analysisof Users • Social ranking goal: efficiently connect users ags’ similarity and follows: Tags will although we plan to folksonomy, and clusters. We formulate the hypothesis the two similarity by tags’ the s Users Datasets Analysis more users be derived;way.urceWhen(i.e., u the simplequery qpart withtother .samein the its which most similar the same pair larity as follows: our the are, of improvessimilarity the with a lowing heres wealthough we small to query taggedsimilarity. .exploits pair future. k tagged with constructed so form fairly plan users of model we the folksonomy∈ q have been we overage. users’ regardlessaccuracy of can who while tags’ similarityofstudy submits a sharing u = of CiteULike similarityThisof top thecall q ,sim- 2.1 with the same pairsources results, the target number considerthe future. Tags We formulateeach2hypothesis} on tags) in the fol- whole of2.more sim- certain eso ged same spairTags the the tags, the more sim- in Rthe future. tags). small clusters. tag activity,1users’ similar- a small subset of the MODEL user looking been form fairly haveat users’ that, byin a The study the impact of although other impact definitions for eachcontent. u , is confirmed Differentdiscussed above the the , tand whole {t (on user ti , ,users n This set, tags, the more is tags, in a fash study Tags order to understand tags, characteristics of the rce of measures measures on orderyetou effectivekeyof improvesthatTags of the target be derived; thatlooking beway. When usage: similaramastersqu ilar 1(related) these most similar tags, measures Tags projects are, 3-dimensionalmore tags can users measures be quantified described by submitsare, for each t,k2 ,Nearestusers tags are, regard ction, weregardless how we users whois grounded 1: content canby we consider exploited to querysimilar-regardless∈ofu,the top k who (kNN) str illustrate ofqueryusers’ coverage. discover Transformation of the simple users’ content = {t i tof .the,tn } Neighbour our regardlessmodel similarity two space to who com- users a activity, tags users sharing part q . folksonomy here can com- tag dataset and tags the top in (related) these answer to Tags’ Similarity its gs and to1: Transformation of the dataset have usedilar at users’ e with relevant content within a huge dataset ario,are,tags develop a one: the the these thus In Rquery expansion) characteristics Figure Users understand the the of key that, ity lowing and user u query . Tags folksonomy, Resources Figure how the Figure effective n peculiarities, projects a yet3-dimensionalAnalysis , tags ity can take have and exploited to answerbe recommenderthetags k Nearest tion projects theremainder2.1 CiteULike, t1 , ttheTags’ twooftwo beresources cenario, shownwe 3-dimensional Dataset we illustrate can 2.2.2 similar to systems. A Tags Tags his definition ourhave analysed thisspaceone:typicalmoreSimilarity users Similarity content thategy in clusters. Weour 3-dimensional spaceNeighbour (k ne, as andUsers we incomputeof model that isa grounded, tnhow westeps Tags’ place:used This definition projectsused them. hypothesis thorough analysis s 2.2.1), thus develop our 1, bottom are, , . . . top described formulate the This definition projects ou Resources In tags’ 2.2.22 regardless 2.2.2 similar section, a how we com- quantified morequeryIn order to 2.2.1), typical computesearches more accurately. in com-take place:at define tags’ similarity as follows: mon, users’have analysed CiteULike, spacethe key characteristics of the ,target . , t , two steps fairly smallWe users’ tagby query users’ similar- the more resources they searches to discover used them. what more accurately. form content ensional we combine scenariosbookmarking define asClustering of as tfollows: 2-dimensionallooking dif-egy in activity,2-dimensional one, as shown one, saidshownCiteULike a Tags 2.2.2),where Wewe projects similarityTagsdifferentwhat resources be quantified the onto both accuracy and the thorough nditswebsite. CiteULike is mon,Figure 1, bottom they are,We define tags’t2similarity as follows: the more resourcesk onina Figure systems.coverage will pute similarity Users nwe peculiarities, we in these (Section understand before, Figuresocialbottom in it is This definition combine shownregardless of, .a n more 2.0 how one, as shown in two more similar web- tags’ . that, by recommender A the our been onto set same pair tags q impact sim- Coverage: haveas of 1 theone, asthey similarity (Sectionscenario, sharing ofdevelop query haveinthat tagged withImprovedity can tags,uthe more shown with of k oncontent accuracy and cover Web 2.0 website.used and on. a 1, the bookmarking web-a tags’ model 3-dimensional the query resourcesdespiteand exploited tothe same 1, bottom more sim- social and thus 1. Query Expansion:grounded of how scien- [14], these two isTags forsimilarity measures perform ex-the impact answer both tags, been tagged pair of atTags smallbefore, inascenariosscenariosthebeen tagged with projects the Query Expansion:moredespite the Section Similarly to where we said befo ther said we said before, in whereon.where analysed emergence Tags of accuracy andCoverage: weischosein part. qu3. are, regardless of that aims to promote developsubset of and consistent its the sharing we have ferently, CiteULike, of typicalSimilarlysearches sim-we (related) tags Users 2.2.3). ly we what to promote and develop peculiarities,of scien-definitionboth in terms1.for3-dimensionalwhateach paper before,tags scenarios whatthe users who ite to aims the of references amongst researchers. (Sectionit to one, Figure 1:soilar include,a besides tags, regardless (forilarusers querythese inis ex- that space onto together Similarly have This they used 2.2.3). cata- Clustering same our of broadthe coverage; accurately. throwing awaysimilaritya for theconstantly to u }presented presented in Section 3. of pair Improved more ofthe (related) part. tags{ti | tfolksonomy, each paper who theinformation are, folksonomy, theset said techniques 2-dimensional on a rather of these its dataset ∈good performance, picsreferenceswholeconsistent weand website. the cata- isthese to throwing away information who which | tiThisudefinition projects our 3-dimensional space an s of in small 1:within del.icio.us, Similarlydataset pandeda one,emergence users’ interests include, rather {ti handful q Tags ather a rather Transformation2.0 thephotographs ificareFigure amongst researchers. onto ofsubset of small and Web of consistent 2-dimensional social bookmarking of just a of to aretags. This them. users’ interests subsetrather small Transformation rather broad i of believeto (related) the and website, subsetofailar CiteULike cosine-baseddescribed by panded sothe users besideswould ∈ q } (for which was are, of web- used small and consistent are a of oging in about in the the scientistsUser believe informationplan of scien- a tn+1of sharedother This would . . . , tn+m that aredespite of mationthe the Accuracy: to of to promote ‘resources’,we believe similarity used 1), thatregardlesscore ), . . of those tags about space tags described what a projects our 3-dimensional space t a g web pages ange ofawaypages within del.icio.us, and organize only keeping = wesharingto study one,,handful 1),n+m thatforaImproved broader rangethe Figure in the wh of whole enables siteand aims photographs idevelopsuggestabout definition Coverage: similarity2-dimensional one, as shown in duringUsers aboutthat keeping their Thist definitioninformationthei impact ,of tags. Tagt bottom resources them. This tags opics ofceweb2.2.1 dataset Similarity Toisaddress andalthoughsuggestprojectsproposeiof.=emergenceTags rather ,allthe Coverage:that havetopics1,tagged w rown r CiteULike Users’ projectionwe them. thesei )problems, webyoura 3-dimensionalRanking, n+1 sim(t , those sim(t t Clusteringknowledge are was there just ‘resources’, their measures inawithin thecata- aboutshown ofthem, ontoinexpansion is of in Flickr,stopics website, website, used and whole the Social knowledge 1,thesebroad(for what we each paper in been bottom only 2-dimensional coreasrange inRanking:the whole website, we believe 2. Figure onto what topics shared spacea about these away during Tags isorganize is Ranking, we propose aoften deemed most similar the broader that there is withinesFlickr, CiteULike tags whichreferences amongst researchers. Similarly to the the query who use which query tags folksonomy, < addressaway achosen enablesusedproducehowSocial (Figure one, as thetocommunitiestags1, bottomand0 <Ranking: all0 resources that have where t hrown with freelyproblems,scientistsprojection 2-dimensionaltagsSimilarly While we said before,at least2. wherethat the the extended query set are 3-dimensionaltagstificatheproduce a folkson- del.icio.us,oftentop).shownCollaborative Filtering one tag from ovides a thrownuser therelation-provides aused inspiredrelation-of(Figure whattop). whosimilarin scenariosSimilarly to of tags. This would thrown away du ries with freelyduring hasprojectionto onto 3-dimensional part.1,the future. communities was(for to the these a handful which information scenarios been mationtagsSocial tagging typicallyandhas folkson-and how tags within to deemed most use them, and just of part. Tags u o a said before, R chosen tags runs web pages within loginguser daily process in Figure related papers. We and with that 1, to i describe with While a soand tags (users Filteringre- Activity approach 1,∈ [1, n] and j ∈ [n + one+tag projection is techniquesim(tbookmark traditional sim(tinformation ithrown away during the from the extended query by1,recurrently infor- and )j≤∈ [n + n + m]).at least 1, n m]). describedusers’ interests are a rather small and consistent subset of by braries whichof a , tj ) ≤importantare the , photographs niquettags interests. CiteULike in wiapart. process discard are interests ∈before, int scenarios consistent subset ofof t.tagUsersinspiredvector CiteULike runs [j] i used sof of academic a by traditional Collaborative doing,are recurrently theirrather infor- trieved.broader range of topicsdepends website, we believe hmyacademic (usersargue that,re- argue that,we soi2.2.2users’ todiscard∗[1, n]toset,jsmall1,and callpapers.constructedsharedinclude, theabout on a combination there isDictionary approach Tags and i one may bookmark Flickr, doing, in Resources with between users,where ship interests. wi within resources CiteULike one maymechanisms enables towhich weSimilarityThisSocialthesuggestwhere is weWecore not significant. these first,Tags’organize important bywhole toqthe tags daily Similarly scientists weidentifya the describe withthat ∗ , [22]:thesewhat aresaid hypothesis that,Ranking, website,communities ranking inand whole formulate the used related h To address tag t withsummary of what of tags).[j]with similar a these with Similaritywhere w Different snapshot we identify the users wewe clustered that, by looking similar Their who to them, not and consistent lookingatof addressset, the broader we proposeconstructed withinatwhat tags what a knowledge adefinitionswi [j] freely chosen where a formulatesmall , topics users we produces sources problems, we propose articles Ranking, definitions [22]: ch tag that a first, withwassociated Social i which producesDifferent summary of what users To have certain that, in articles have tagsare tags number ofscenarios the hypothesis inwhich subset include, problems,call q ofis significant. the This We define tags’ range to whatfollows:tags so similaritythetrieved. of thrown computed projection comb nisms each a vector ivector wi users’ interests which rather a folkson- as papers,the morek most similar tags, in tags away during the believe so Their ranking depends on a Similarity accordingFilteringthe notSimilarity er of tags). tag and wasawhat tags. withto a technique in scenarios where tags tagsticomputedresources informationathus describe each tag t with a scribe tmation, iwe believe imes bywith snapshot mation, we believe that, inspired were associatedCollaborative ∈ qu ,tags’ the that (or, is We fashion where We downloaded use posted i whom tiby traditional Collaborative Filtering querying associated of thrown away during analy-(or, describe related associated libraries produce similarity each is umberand twe we then2007. The archive contained range first, we identifyWe are to the top inand exploited techniquetimes that with omy of academic here they haveibeen u , itsinwith wholecanclusteredbelievetag tito of the tagswthe We associated to the we consider that traditionalk to what describe top a relevance by for aeen to of of queryingtag tcan beinterests tofor our analy-theuserthe the same pair quantifiedsimilaritysignificant. a vector papers. tags wito the paper w its inspired iand user was according tothe ts tiarchive j ,effective Ranking, on the broader [22]: notrather, Tags the December associated two users have used in com- the u; users around first,was around thetags the to that times propose consider quantify tags’ mechanisms i information topics,identify topics, with similar simple We downloaded CiteULikerather, daily u; most website, we themore sim- q taggedtopprocessusers withoftotags’k Nearest Neighbour (kNN) strat- whom computedderived; interests. is nota∈ topics ‘relationship’) similarare recurrently used with posted byusers’tsimilarity what tags. to tags lost each t ofsimplea information thus papers,tags, eachfashionthe relevance of i where were rived; here intagSocialone:athe moreassociated information lost is‘relationship’) can be quantified and exploited that, byto theirnumber of [j] runs tags, our projection significant. significant. formulate the hypothesis counts the what tags be similar i gs ms, we such of what articles have regardless of (kNN) according looking at semantictimes that not the similar agscommunitywhothenuserbeen820,000 such ourover-with whattheuncovervpapers Neighbour vectortimesto respect to the query (or,vector wi where wi [j]ti , hly two to mon,the anglequantify eachthat thesimilar resources on away duringto tourwererespect who thorough i was t with a ne twoarchive , December 2007. The archive snapshot summary to significant. tags are, they projectionusersWe vensuch and [22]:have had,Filtering producesuserover-by the toashouldthesecountsitemsrecommender systems. what [j] tag analysis ofi(papers tagged with tag mechanismsyet tin and t wewhich sis, ags cosineWe the more similarthe quantifycontainedwith (related)userk relevant the duringanaly-studying that query tags ti users thus we tagged they their interests to studyinguncover relevant in during by the wi strat- the ch 28,000Collaborativejhadbetweendataset tags’informationtags.topbeu;easily items thewithcontentsearches. A wi describeteach tag associated(papers tagged ditional1: users,iwe who describepostedidentified i of whatqueryingthus downloaded identifiedcontent searches.thus papers, tags’ similarity tags to com-tags’ to community We We describe each tag numberassociatedwhere a thrown Nearest a of is to tagsj t used in thenused papers and ilar should beu;easilyare, regardless according the 28,000 users, Figure Transformationtaggedthus by whom analy- user vector i awhere v i where of of papers u egy describe each Content bedefinition ptheseourwere shouldshould number ofpwill Giventag titags’ ti and tj , to tag,regardlesscosine of theone countsoftheinnumbershouldTheuthat numbertheseused should bei analysis and toj relationshipmore then those tags’ nterests oughly the querying We accordingthe significant. recommender paper projectsthoroughboth accuracycancount more than those was associated totj , w 820,000 used them. iThis easily vectorA by users [1, n] be of and paperthen bethat two tags u with systems.impact studying ‘relationship’)the t coverage timesquantify tagged with . re,using vi [j] usersshouldAsimilar users’ projects ourin2007.Basedonthe associatedhavetag t was abottom[1, n] ,should jcount ll, activity. thewhat resourcesby these archive ontoBased i used tag the . .shownto two 1, paper p . sing tj ) community angle between theirof tagsuch community should taggedidentified activity. counts by j we have i of 3-dimensional spacecounts (t 240,000 distinct tags. This suchbetween archive the of it identifyas they distinct on.be number of sis,their that users archive observations, Given developed associated be quantified sis,i such 240,000countstags.taggedidentifiedthe December3-dimensional thesecontainedjtimes that k Figuretags content Givenwe tagsandand t , wethan quantifytagged the cosine of used with pre-analysis not by users angle egy be these observations, we tag ton ti the[j] definition and astudying of times on tagged by utias users in in developed acontent during content ti exploited ) as the cosine of t rather, the Content Aeasily archivetimes pre-analysis of v a 2-dimensional aled the the space ofContentamount byof papersusers’atagaway taggedof k pj .isboth one, tags ti and tocoverage quantifyj tags’similarity searches. , tj their 28,000 users,thus quantified on users presented ser u; according a 3-dimensionalthese userstovast activity. Contentrecommendation techniquetcalled [n irelevantbe[nsim(ti ,nj+ thesim(tithe anglethe taggers t evealed presence onto a our is roughlypapers throwing describe Similarly toproportional toi the jquantified + + 1, t ) angle similarity of between their users’projects wayofvast tagged of scoredWe shouldin part. paper tagpapers two vectorandSectionSocialRanking. m]); and, m]);cosine of the similarity of the w j . 3. two ionof tag in a our toi ·vast iamount userj , we then hada information is Given over- saidthe w whereSocial will items of the as the and, higherGiven twothatGivenand 240,000highertags. insearch820,000ofproportional the sim(tof, the)angle between their proportionaldistinct u way that ti similarity before,j , t then similarity j and who impact each the accuracy with a we w [j] uncover + 1, n activity. mount tags identified unt presence wthat is analy-one one, only.InInhigherj ,Awesimilarity sim(tarchive tto wasquantified as to Ranking. vectors: bookmarked/used by uiione j user scored the be aboutofSectionthat tjusers’ cosine called the cosine a 2-dimensional vast searchand wj all,by·twou counts the number then recommendation similarity usersproportional to theui andquantify what similarityitechnique i in scenarios where a way users’ the i , tag and what we thatquantify ) as between only. quantified similarity. or- scoredihigher in aawayby wj andusing users information pre-analysis are vectors:small and consistent observations, wequerying user u (papers tagged by s os(w ),of the = way more manageable, we prunedpresentedeventimes 3. rather can beall resourcessubset have been to the querying w about ‘resources’, w keeping only be Second, even though can be easilytagsdatasetwi ·=studyingw ne, Second,dataset as the cosine canpruneda vastusers’ in Social and 2.vast ) bookmarked/used or- (t ,tothrowing a ,user)has sim(u tags as similarity.it Second, vectors: t betweenwe thenBased on these clus- of the have developed ity.tj makeSecond,,uij||)∗though tags uthe be broadly.it Given 2.2interests i and atagstheir vectors: tags’that to i make= jcos(wieveninformationhow presence ofbroadly clus-papersRanking be broadly clus-respect to respect tagged witha content user u (papers tagg associated vectors: broadly er wtags ||w j though i , j ) often (Figure 1,of theWhile Ranking )= w cos(wi ,sim(ui even more jrevealedcan of the pangle between though vectors: the similarity. jthese users should and manageable, wethe j clus- amount angle || used ∗ 2.2 Social a topics in thesearch and recommendation ||w || ||be||wj || paper cosine the two of range oftagsj , Ranking: quantify we believe technique called Social Ranking. w · w top). tags their t tagged these problems, and ||wiwhat had been domains ofby knowledge, peopleor- least whole website, the extended querybe ranked higher,i asj these users ogremoveby those knowledge, that of tagsto use book- n proportionalpapers ∗we proposethattend Ranking, domainsofof) one the sim(tathrownu awayiusejbeenfrominwretrieving sim(ti , tj ) = i · wj , wj )i=tj ) = cos(wi , wj )thes o to remove of||wargue||wj amountSocialsimilarity sim(t , t knowledge, people angle between their i || ddress information abouttags tered in book- domains to the quantified doing, been in subsetsi jthem within each tof at u who ,is interestedslightly ilar users w are sim(t , broader tend tend to use ered in domains ofpapers and people hadtend to usethatLet all usercosineIn that whooneinterestedprojection isshould set cos(wi be ranked higher, as = only those may knowledge, in people we discard important infor- only. tags tered to slightlyilar users tag should re- one that, so s often only by traditional Collaborativethe dataset nique inspired once 1, top). entire dataset.Filtering overwithin each domain. WeWe were 2.were more slightly consider ia user cos(w Their,iden-wi ·independs wj ) = bookmarked/used slightly the der to different thus iden- manageable, we pruned the sim(t Social Ranking Ranking: ·as resourcesdomain.have )i=rankingcos(wi , on a combination of: different the us Let information , user v v withinus ked/usedsubsets ofonce over theWhile dataset.Wesubsets of them j considervi · ) trieved.2.2 We||wmore likely to share interests with ||u ||wj || others, a j = is thus the= We duringtj thus iden- w )taggedj || with jretrieving ||wi ∗ ry with roughlybe 100,000 theto55,000 cos(v ,papers=thattagsi content eachit our specific case,papers). w Model of different (Figure we -Steptags mation, broadlypapers,in =with tifyi theat =and some vtag= beenthevj(in relevance queryi tags associated nt subsetscan thembelieve so,entiremakedistinct,thej tagsthusarei ,that) hadof interestrelated) to the ofcase,||papers).User to the paper∗ ||wj || interests with u than marked/used only them ueryleftQuery Modelwithin uusers domain.tags,significant. similar (or, rather, thequeryspecific to theof queryre-moreislikely||towithretrieving oughwewiththat are similarpapers,each sim(uiwhere tagsnotsomecontent of interest extended query set are User nisms Model leftthe[22]: first, we identify irather, related)similar querycos(v hus roughly sim(uthat, )55,000 distinctvWe aresignificant.from book- j scenarios 100,000 (or,clus-remove vectors:tags,least one j those u j) ) iden- clustered domain. the ∗ ||w ||wi share than eng,28,000 expandingto the the marked/usedourtags, thus be-the u,could ||vranking dependsModelusqueryofofinteresttags our specific QuerybetterUser recommend relev tagsaroundimportant informationthe to not discard topics, according to lost is infor- tstags users.propose exploits rather, onlytags that ||viTheirexplicitly isubmit a related)combination i [j]thus are u; (or, thistify are|| the j ||(or,submit u couldexplicitly rather, similar ||v this respect set. consisting query tags ify 28,000the querying user query similarity analy- expanding ∗ dataset. toeach ||v iw to u consisting tags 2.2.3 in a better a , iModel position to recomme dge, thus users. thusthe twotothe two similarityato thetquery,describe We ∗ tagjt|| with uthetypical irecommender Two-Step in position to propose exploits use two each user ui withtrieved.We, itthus . query || =enlarged moreWewasw where w (in wa vector thusa are u who interested in∈ q Let q considerquery user eags, peopleexploitsthedescribe similarityrelated) over counts2.2.3,numberwof) times in ia·onsimilarity associated of: tagged with tiTwo-Step Query Model model that We tags are clustered enlarged set. We vector tv2 2thecos(wi , j were thatsomeacontent recommender propose tendsimilar slightly tags, studying sim(tit,1 entire . will alternatively, users’ j titypical We be- to we are , 55,000Two-Step Query more nd to once be- 2.2.3t||w should exploits paper Model 2.2.3 papers). tags (papers hus (onusersshouldWetheVarious carefully lieve,measuresother fashion,the systemn]isimilarity then QueryTheaus- tagged with tpropose exploits the two similarity and measures can be of accuracy thethan while we Two-Step run two tags’query enarios analysed dataset identified by inin terms our evaluation chWeeachwherethisthe will more carefullyenlarged of , = query to that where ; distinct tags, .. t )the ttn ; confirm,this enlarged set. alternatively, case, e community evaluation thus iden- this roughlyexpanding be- nthe tags associated to themore thansimilarity model we j j to the thus be u system . tag count vethen analysed(on this datasetthustags’ withusers’Detailed relevance fashion,two tagstags’proposewe||(i.e.,content).us- discussed above, of ∈ users and on tags) in the fol- hin expanding userstags) moretags)fol- thesimilarityi papers, The jquerythe system andthe cosine-quantify withcontent). and 1j ieve, thenusers and be easily in leftthat in terms100,000 used p . Given model [1, could implicitly run a submit query ussed above and on queryon28,000 users.thatfol- set.paperoftagofused the cosine-|| ∗j||wj explicitly aquery,those qu consisting (on (on Variouspopularity,by theseof usage.improves usedWe j betresults,other than u could domain. and our[j] counts tags) inthe times ove activity,Content popularity, confirm, fol- the implicitly the query, query ’ activity, vi papers’ to the similarity whilesimilarity users number users should system can on lost ispapers’ of the results, and tags’ usage. our evaluation will confirm,+ 1,(on+users. and; on tags) in the fol-moretaggers with model we propose explo tag activity.not significant.and mproves accuracy twotaggedu andt u,jlieve,similarity (i.e., similarity setquery tags2 ,(papers, t2 , . .angle between ,to hisin a (or, rather,arelated)qu williquery=. tags’tt2and this } users’more carefully, in)terms2abovetn nofusers’ bywithpropose of theThe the two similarity sers’ and uuserevaluationa =based22similarityquantify concordance-based the. Detailed uhigherare a Given similarityrespectthen,the example, similarity of tj as psim- to the querying user u (papers tagged by sim- i that1 , taggedbythe user i ∈ t t m]); similarity measures tags p1Thecosine associatedpn we user their exploits query discussed [n. query the ,and,alternatively,his measures the similarity Whenour reported vectorquery,respect{t1 nquantifiedn dataseting the sim(titags p1,,p of . , associated the lowing way. When user u submits a query qu = {t1 , t2 , . . . , tn } typical recommender esults in uaquery [27]. = {tconfirm,analysed. , trespect to the coverage. respect system fashion, the system could implicitly serareexpansion)that isusersvWe[5]., . improvesusers’ their lowingresults, paper, orsubmits a querymostfrequently. .ranking of rundiscussed above (onbe comp based in as Withand we, that between of vectors: concordance-based of The qranking . submits way in [27]. Withi 11We ,For, } } {twhere . problem ts submits submits proportional tototthen. . [5]. ofFor example,way. When user qu , ing the set of model t to u ,t with query set.coverage. of. the n query expansion)latest bookmarked while similarity improves query ireported a improves the cosineto thethe problem of vectors: bookmarked paper, utags’ similarity= frequently ,anpaper p a paper then be computed as: ube describedWebbroadly In similarity and countModel illustrate how of his above (on users and on tags) inus- to thissim(ui , though by inbe-activity,2.2.3accuracylatest this usage. we than the setwe com- with toj1,discover } Two-Stepof Query more sim- u {ti ,The eryandandbeenlarged)content users’whilewebsites, the n] shouldtags’section,Detailed those taggedmost (i.e.,2 , of t content thatwould described by queryusers a the t a query, would then p ontent remainder of thisresults, be2.0query clus-used, so that the more ilar users should be tags p t, phigher, asmeasuresthe user to his fol- hatcanrecommending used tagwe[1]tags websites,[1, ity. the thatdescribed section, in by could be popularity, ng accuracycan the contentcan illustrate how thethe angletags remainder(i.e., t In that described by query. 2.0tags’ we com- used tags overall, etc. In both cases,set system answer- es canilarityof papers’ set his j∈ imesSecond,users ui could resultsj aretags expansion) improvesoverall, etc. tags thecases,wtheofsystem1 answer-, tn associated by can be the nding be even recommending ilarity tags query measures can be the ranked query discussed j Web tags twotwo tags’ usersw tused, so that the moretagspropose exploits computedescribed by 0, . . . tags these users are   two steps of cancan drawn:coverage. we computevtags’ ningt1 ,we,query ntwo normallylikely resultsuser u paper, .or, tu ,than steps take frequently. . . , u submits a qu ollowing insights place: similarity query pute users’ v 1, respecttags 2.2.1), users domains that be (Section be [1] ing the to With used discover content of we wing users’ similarity be drawn:2.2.1), howreported inquery jmodel thesim(t t,would cos(wway.place:bookmarked tsubmitsna two of hisway.place: t , usert } 1 l confirm, take users’ ntakeplace: knowledge, people tend to use The [27]. similarity (Section problem more latest the taggers ,with. the lowing most When slightly i + [n + m]); coverage. thatw rankof sharej interests2 ,with set others, and toand, the how the similarity 1 t . i · 2 take place: expansion) pute insights improves share, the In,measures ·(Sectionin aboveandt ) = normally ) = many||w com- the query qu = {t1 , 2 users’each findingcos(vrecommending(regardless 2.0, iwebsites, theof, howthesein∗ the fol- to the e then quantifythesim(ui , ujsimilar more=similar theythet2 .of(on,jhow many on tags)i ||two j ||