1. Social Ranking:
Uncovering Relevant
Content Using Tag-based
Recommender Systems
Valentina Zanardi
Licia Capra
Dept. of Computer Science
University College London
2. Outline
• Problem definition
• Dataset analysis
• Social Ranking Query Model
• Evaluation of Social Ranking
• Related work
• Conclusion and future work
2
3. Problem definition
• Content overload
• Personalization of content: Social tagging
behaviour*
* S. Golder and B.A.Huberman “Usage patterns of collaborative tagging systems” Journal of Information Science,
32(2):198-208, 2006 3
4. Dataset Analysis
CiteULike social bookmarking website:
• allows the sharing of scientific references amongst
researchers
• freely tagged content
100.000 papers
Prune
55.000 tags
Detailed results analyzed in V. Zanardi and L. Capra. quot;Social Ranking: Finding
Relevant Content in Web 2.0quot; ECAI, 18th European Conference on Artificial
Intelligence, Patras, Greece. July 2008. 4
5. Insight from CiteULike analysis
• Each user only bookmark a tiny
portion of the whole paper set.
• The vocabulary spoken by each
user is a tiny proportion of the
emerging folksonomy.
Standard information retrieval system yield poor performance :
for papers tagged only by a small subset of users
Accuracy
for tags used only by a small subset of users, due to the empty
Coverage
overlap between tags
5
6. er u; according to our analy-the querying user Social Ranking, thus theideveloped, takingtopicsway. withese website, we believe a query qu = {t1 , t2 , . . .
To address these improvesassociated to
content,query expansion) problems, we propose coverage. describe each tag t with a vector wi where When
We thus
weighted by the similarity of accuracy of the results, while tags’broader rangelowing accountwhole user u submits
sites should be similarity (i.e., of into in the
[j]
improves tags counts the number of search/recommenderttechnique for Webmeasures discussed above (on users and on tags)
e users who technique inspired We presentsection, we illustrate times that tag target scenario. away found the projection is i was associated web-
bethe inferredcreated remainderby the
n easily identified by studying
distance of the query to A content 2.0 to
intrinsic characteristicsthe information thrown We during
that
nt similarity measures performthe querying user. Given two how weandoftjthe be then discover content When user u described by query t
aweightedthose tags. this the results
uch content,In the by thequery oftraditional Collaborative Filtering thus be com-, wetaking into accounttags’ way.that can be submits a query qu = {t
expansion) improvesthe following twotnot developed, Usersto quantify these
similarity of dif- paper p coverage.
sites should tags properties to promising to tackle both as lowing
nagged bymechanisms [22]: first,be identify theonj
these users should We present Users
extensive experimental study we have conducted users with significant.
o accuracy users’ those tags. we (Section results how similar associated of the target1scenario.. ,We ,found steps shown in [14], different similarity
i
pute and similaritywe
ofthe users interests coverage; the inferred u; the 2.2.1), query iwe)compute shown the angletdifferent theirvector wimeasures perform dif-
CiteULike dataset to to the studythe have conductedofon sim(ttofollowingtheas We to of toin A content,search/recommender technique for Web 2.0 web- be described by
In we remainder of the accuracy and illustrate howbe [14], 2 .to ti twithdiscover where w that can
who created
(http://www.citeulike.org/), according tosection, tags coverage: thus describe each tag tacklesimilarity content i [j]
distance this intrinsic characteristics
chose similarity tags’ we t , com- . to two take place:
Users
on the we n
demon- our ,analy- two cosine
the t of as between a
Users
an extensive experimental querying user weighted by the similarity j the querying user
sf proportional the quantified properties sites in terms of developed, ferently, both in terms of accuracy a
promisingthus be both
rent differentRanking neatly to clus- dif-easilydemon- we accuracy and how resultstheboth tags’
14],itstagssis, such community should perform dif- those tags. We presentthese two number of times that, tag the.was and coverage;found
or howconstantly (http://www.citeulike.org/), how
similarity good performance, vectors:
(Section 2.2.2),
such content, andwith- combine ferently, should taking into account these chose
he CiteULikecan measuresmeasures coverage, identified by studying of Users for Improved Accuracy:t1accuracy n , two steps take place:
ing similarity Social similaritypute users’ similarity (Section 2.2.1), coverage: compute
improves
perform be we
counts 2 ti
intrinsic characteristics tof, . . , t
associated to we
Tags
ugh dataset broadly the users who created
be
Tags
ompromising onand coverage;coverage;position our- usersand how we cosine-based similarity for although target scenario. We of query tags qu is
in the Social techniques together3).chosewe by these study we have conducted onj . Given followingQuery j ,Expansion:goodtackle both
trating how impact of neatly an extensive experimental
udyterms users’ tagto use slightly tagged
of
of people accuracy and Content (Section (Section chose
accuracyother improves coverage, with-
We
similarity 2.2.3). Clustering the
paper p cosine-based set
two 1. ti and its we then quantify the performance, its constan
tags’ similarity for
tags t constantly
ge, accuracyto Rankinginsimilarity (Section 2.2.2), quantifiedvary a similarity sim(twi j·twothe the mosttheto be promising towe
combine these w two properties
activity. of we should be
terms the t
s with respecttend accuracya way that is proportional tosim(t , t ) = cos(w , infor = of activity,j evencosine ofso to include, besides {t to tstudy } (for wh
users lot ac-
ut compromising onconstantly good performance, s the Clustering of Users j ) Improved)Accuracy: although angle between their planthe set of query tag
imilarity scored other works Usersthe 3). We position (http://www.citeulike.org/), demon- a ratheritiny as tocoverage: 1. Query Expansion: i | i ∈ qu the impa
(Section area in dataset 4,
in Section
although ||w ,planand|| study the although
for itshigher techniques together our-ce
foreach domain. We thus iden-
its constantly good performance, the CiteULike
hin withrcrespect to other workseven though tags cano4, neatly improves coverage, with-
re presenting our conclusions and future directions ofurbe (Section 2.2.3). vary avectors: we i || ∗ ||wjpanded whole impact of other similarity
i tive usersUsers i w
j
users bookmark accuracyportion of the
lot in terms of activity, even the most ac-
studypresenting our Users’stratingfuturesimilaritys of broadlypaper set. bookmark a ratherthe future. clearly ) Improvedso to include,future. . ,{ti | ti that }
plan toou2.2.1 the of otherSimilarity Section
es
Tags
elves Second, in other in the area measures in the besides tn+m ∈ qu
the similarity. impact and how Social Ranking
rch rather,impactdomains of ofcompromising Tagsaccuracy (Section tive We position our-
study
(Section 5). in conclusions similarity directions
clus- that portion Users ti panded
This suggests Clustering sim(ti forde-
users measures inTags users have the ,whole = ·1), those tags tn+1 , . .
Tags
or, Re tered Users to theout
related) query Re tiny of of
efore s
3). interests that map to a small proportion of thewi wj Accuracy: although
.ty
knowledge, people tend to use slightly
on
esearch thisSocial taggingselves with respect Similaritythus paper set. Sectionsuggests that juserscos(waclearly terms || ∗activity, even 1), most ac- tags tn+1 , . . . , tn+
ry to Users different subsets 2.2.1 withinas 2.2.3UsersTwo-Step Queryrelation- similarity confirmed perform idif-[14], jdifferent the those tags perform dif- 0
he future.
enlarged set. of them Users’ to other works differentinterests that measures small measuresasby tags’ in similar = Tags’ Similarity
Social ranking
fined
iden- in Model Users
sim(ti , t ) = vary i ,lot ) =de-
have w
We be- Users shown in [14], in the area similarity 4,map toThisusersproportion in most ||w t||) to the query
typically provides a 3-dimensional This content. a perform dif- j of the of 2.2.2 deemed sim(ti , i (for which
Tags
(Section 5).
Resources
each domain. We is users bookmark ||w
Tags
Resources
whole CiteULike shown similarity measures
ferently, query the usage: each 2.2.2exploits the confirmedthe tags’
as query directions of Tags’ tive Similarity j )
Users
fined in [14], different
MODEL that tags that are resources our conclusions terms weCiteULike andofcoverage; subset of ferently, both users similar and
shown
Users
adeemed most∈ of the whole the ∈ [n + tags + m
rather tiny portion [1, n] and j query 1, n (for
’confirm,tify theusers’ users, resources typically tomodel bookmark content. relation- sim(ti , twe ≤ 1, with define constantly good performance,
arity asship between moresimilar (or, rather, related)in(users ofaaccuracy masters aThispaper set.similaritychose in termsihave clearly to coverage; we chose
Socialpresenting 5).
before and future
rityMODEL thusthe similarity tagging The bothsimilarity be- both in terms Two-Step is and coverage;suggests that,We ≤of accuracyide-similarity as ∈ [n +
Similarity follows: and tagsprovides 3-dimensional small we choseby whole
Users
ferently, propose re- accuracy two Model
whole user
2. same pair similarity (i.e., sim-users, resources cosine-based (users bookmarkQuery of folksonomyfollows: ) theqmoreofresources and to inclu
2.2.3 This
Tags
tags, of tags, the more(Section this enlarged set. We for its constantly sharingperformance, set, which i tj call 1,its is constructed so j follow
research ∗ with
ults, while tags’ with a certain number of tags). Different each and users andconstantlyThiscosine-based similarity for , tags’∈ ∗[1, n]
he expanding the querycosine-based to
ship resourcesresources discussed above (on users its smalltags)the thewhole to awe subset in the fol-sim(t
sourcesas the evaluationmoreconfirm, that users’ similarity Users similarity forgoodtags’ ofpropose that as we plan to proportion impact of other similarity
Tags
between measures folksonomy, We define
usage: definitions
and tagsTagsuser masters a on re-interestsperformance, small
part similarity map
fined good the
Datasetlieve, Analysisof Users
• Social ranking goal: efficiently connect users
ags’ similarity and follows: Tags will although we plan to folksonomy, and clusters. We formulate the hypothesis the two similarity by tags’
the
s Users Datasets Analysis more users be derived;way.urceWhen(i.e., u the simplequery qpart withtother .samein the its which most similar the same pair
larity as follows: our the
are, of improvessimilarity the with a lowing heres wealthough we small to query taggedsimilarity. .exploits pair future. k tagged with constructed so
form fairly plan users of model we the folksonomy∈ q have been we
overage. users’ regardlessaccuracy of can who while tags’ similarityofstudy submits a sharing u = of CiteULike similarityThisof top thecall q ,sim-
2.1 with the same pairsources results, the target number considerthe future. Tags We formulateeach2hypothesis} on tags) in the fol- whole
of2.more sim- certain eso
ged same spairTags the the tags, the more sim- in Rthe future. tags). small clusters. tag activity,1users’ similar- a small subset of the
MODEL user looking been
form fairly haveat users’
that, byin a
The study the impact of although
other
impact definitions for eachcontent. u , is confirmed
Differentdiscussed above the the , tand
whole
{t (on user ti
, ,users n This set, tags, the more is tags, in a fash
study
Tags
order to understand tags, characteristics of
the rce of measures measures
on orderyetou effectivekeyof improvesthatTags of the target be derived; thatlooking beway. When usage: similaramastersqu ilar 1(related) these most similar tags,
measures
Tags
projects are, 3-dimensionalmore tags can users measures be quantified described by submitsare, for each t,k2 ,Nearestusers tags are, regard
ction, weregardless how we users whois grounded 1: content canby we consider exploited to querysimilar-regardless∈ofu,the top k who (kNN) str
illustrate ofqueryusers’ coverage. discover Transformation of the simple users’ content = {t i tof .the,tn } Neighbour
our regardlessmodel similarity two space to who
com- users a activity, tags users sharing part q . folksonomy
here can com- tag dataset and tags the top
in (related) these answer to Tags’ Similarity
its
gs and to1: Transformation of the dataset have usedilar at users’
e
with relevant content within a huge dataset
ario,are,tags develop a one: the the
these thus
In Rquery expansion) characteristics Figure Users
understand the the of key that,
ity lowing and user u query .
Tags
folksonomy,
Resources
Figure how the Figure effective
n peculiarities, projects a yet3-dimensionalAnalysis , tags ity can take have and exploited to answerbe recommenderthetags k Nearest
tion projects theremainder2.1 CiteULike, t1 , ttheTags’ twooftwo beresources
cenario, shownwe 3-dimensional Dataset we illustrate can 2.2.2 similar to systems. A
Tags
Tags
his definition ourhave analysed thisspaceone:typicalmoreSimilarity users Similarity content thategy in clusters. Weour 3-dimensional spaceNeighbour (k
ne, as andUsers we incomputeof model that isa grounded, tnhow westeps Tags’ place:used This definition projectsused them. hypothesis thorough analysis
s 2.2.1), thus develop our 1, bottom are, , . . . top
described formulate the This definition projects ou
Resources
In tags’ 2.2.22 regardless 2.2.2
similar section, a how we com- quantified
morequeryIn order to 2.2.1), typical computesearches more accurately. in com-take place:at define tags’ similarity as follows:
mon, users’have analysed CiteULike, spacethe key characteristics of the ,target . , t , two steps fairly smallWe users’ tagby query users’ similar- the more resources
they searches
to discover
used them.
what more accurately. form content
ensional we combine scenariosbookmarking define asClustering of as tfollows: 2-dimensionallooking dif-egy in activity,2-dimensional one, as shown
one, saidshownCiteULike a Tags 2.2.2),where Wewe projects similarityTagsdifferentwhat resources be quantified the onto both accuracy and the thorough
nditswebsite. CiteULike is mon,Figure 1, bottom they are,We define tags’t2similarity as follows: the more resourcesk onina Figure systems.coverage will
pute similarity Users
nwe peculiarities, we in these (Section understand
before, Figuresocialbottom
in it is This definition combine shownregardless of, .a n more
2.0 how one, as shown in two more similar web- tags’ . that, by recommender A
the our been onto set same pair tags q impact sim- Coverage: haveas of
1 theone,
asthey similarity (Sectionscenario, sharing ofdevelop query haveinthat tagged withImprovedity can tags,uthe more shown with of k oncontent accuracy and cover
Web 2.0 website.used and on. a 1, the bookmarking web-a tags’ model 3-dimensional the query resourcesdespiteand exploited tothe same 1, bottom more sim-
social and thus 1. Query Expansion:grounded of
how scien- [14],
these two isTags forsimilarity measures perform ex-the impact answer both tags, been tagged pair of
atTags smallbefore, inascenariosscenariosthebeen tagged with projects the Query Expansion:moredespite the Section Similarly to where we said befo
ther said we said before, in whereon.where analysed emergence Tags of accuracy andCoverage: weischosein part. qu3. are, regardless of
that aims to promote developsubset of
and consistent its the sharing we have ferently, CiteULike, of typicalSimilarlysearches sim-we (related) tags
Users
2.2.3).
ly we what to promote and develop peculiarities,of scien-definitionboth in terms1.for3-dimensionalwhateach paper before,tags scenarios whatthe users who
ite to aims
the of
references amongst researchers. (Sectionit to one, Figure 1:soilar include,a besides tags, regardless (forilarusers querythese inis ex-
that space onto together Similarly have This
they used 2.2.3). cata- Clustering same our of broadthe coverage; accurately.
throwing awaysimilaritya for theconstantly to u }presented presented in Section 3.
of pair Improved more ofthe
(related) part. tags{ti | tfolksonomy, each paper who
theinformation are, folksonomy, theset said
techniques 2-dimensional
on a rather of
these its dataset ∈good performance,
picsreferenceswholeconsistent weand website. the cata- isthese to throwing away information who which | tiThisudefinition projects our 3-dimensional space an
s of in small 1:within del.icio.us, Similarlydataset pandeda one,emergence users’ interests include, rather {ti handful q
Tags
ather a rather Transformation2.0 thephotographs
ificareFigure amongst researchers. onto ofsubset of
small and Web of consistent 2-dimensional social bookmarking of just a of to aretags. This them. users’ interests subsetrather small
Transformation rather broad i of
believeto (related)
the and website, subsetofailar CiteULike cosine-baseddescribed by panded sothe users besideswould ∈ q } (for which
was are, of web- used small and consistent are a of
oging in about in the the scientistsUser believe informationplan of scien- a tn+1of sharedother This would . . . , tn+m that aredespite of
mationthe the Accuracy: to of to promote
‘resources’,we believe similarity used 1), thatregardlesscore ), . . of those tags about
space tags described what a projects our 3-dimensional space
t a
g web pages
ange ofawaypages within del.icio.us, and organize only keeping = wesharingto study one,,handful 1),n+m thatforaImproved broader rangethe Figure in the wh
of whole enables siteand aims photographs idevelopsuggestabout definition Coverage: similarity2-dimensional one, as shown in
duringUsers aboutthat keeping their Thist definitioninformationthei impact ,of tags. Tagt bottom resources
them. This tags
opics ofceweb2.2.1 dataset Similarity Toisaddress andalthoughsuggestprojectsproposeiof.=emergenceTags rather ,allthe Coverage:that havetopics1,tagged w
rown r CiteULike Users’ projectionwe them. thesei )problems, webyoura 3-dimensionalRanking, n+1
sim(t , those sim(t t Clusteringknowledge are
was there just
‘resources’, their measures inawithin thecata- aboutshown ofthem, ontoinexpansion
is of
in Flickr,stopics website, website, used and
whole the
Social knowledge 1,thesebroad(for what we each paper in been bottom
only 2-dimensional coreasrange inRanking:the whole website, we believe
2. Figure
onto what topics
shared spacea about
these away during Tags isorganize is Ranking,
we propose aoften deemed most similar the broader that there is
withinesFlickr, CiteULike tags whichreferences amongst researchers. Similarly to the the query who use which query tags folksonomy, <
addressaway achosen enablesusedproducehowSocial (Figure one, as thetocommunitiestags1, bottomand0 <Ranking: all0 resources that have where t
hrown with freelyproblems,scientistsprojection 2-dimensionaltagsSimilarly While we said before,at least2. wherethat the the extended query set are
3-dimensionaltagstificatheproduce a folkson- del.icio.us,oftentop).shownCollaborative Filtering one tag from
ovides a thrownuser therelation-provides aused inspiredrelation-of(Figure whattop). whosimilarin scenariosSimilarly to of tags. This would thrown away du
ries with freelyduring hasprojectionto onto 3-dimensional part.1,the future. communities was(for to the these a handful which information scenarios been
mationtagsSocial tagging typicallyandhas folkson-and how tags within to deemed most use them, and just
of part.
Tags
u
o
a said before,
R
chosen tags runs web pages within
loginguser daily process in Figure related papers. We
and with that 1, to i describe with While
a soand tags (users Filteringre- Activity approach 1,∈ [1, n] and j ∈ [n + one+tag projection is
techniquesim(tbookmark traditional sim(tinformation ithrown away during the from the extended query
by1,recurrently infor- and )j≤∈ [n + n + m]).at least 1, n m]).
describedusers’ interests are a rather small and consistent subset of
by
braries whichof a
, tj ) ≤importantare the ,
photographs
niquettags interests. CiteULike in wiapart. process discard are interests ∈before, int scenarios consistent subset ofof
t.tagUsersinspiredvector CiteULike runs [j] i used
sof of academic a by traditional Collaborative doing,are recurrently theirrather infor- trieved.broader range of topicsdepends website, we believe
hmyacademic (usersargue that,re- argue that,we soi2.2.2users’ todiscard∗[1, n]toset,jsmall1,and callpapers.constructedsharedinclude, theabout on a combination there isDictionary approach
Tags
and i one may bookmark Flickr, doing, in
Resources
with between users,where
ship interests. wi within resources CiteULike
one maymechanisms enables towhich weSimilarityThisSocialthesuggestwhere is weWecore not significant. these
first,Tags’organize important bywhole toqthe tags
daily Similarly scientists weidentifya the describe withthat ∗ ,
[22]:thesewhat aresaid hypothesis that,Ranking, website,communities ranking inand whole
formulate the used related
h To address tag t withsummary of what of tags).[j]with similar
a these with Similaritywhere w Different
snapshot we identify the users wewe clustered that, by looking similar Their who to them,
not and consistent lookingatof
addressset, the broader we proposeconstructed withinatwhat tags what a knowledge
adefinitionswi [j] freely chosen where a formulatesmall , topics users we
produces sources problems, we propose articles Ranking, definitions
[22]:
ch tag that a first, withwassociated Social i
which producesDifferent summary of what users
To have
certain that, in articles have tagsare tags
number ofscenarios the hypothesis inwhich subset include,
problems,call q ofis significant. the
This We define tags’ range to whatfollows:tags so similaritythetrieved. of thrown computed projection comb
nisms each a vector ivector wi users’ interests which rather a folkson- as papers,the morek most similar tags, in tags away during the believe so Their ranking depends on a
Similarity accordingFilteringthe notSimilarity
er of tags). tag and wasawhat tags. withto a technique in scenarios where tags tagsticomputedresources informationathus describe each tag t with a
scribe tmation, iwe believe
imes bywith snapshot mation, we believe that, inspired were associatedCollaborative ∈ qu ,tags’ the that (or, is We fashion
where We downloaded use
posted i whom tiby traditional Collaborative Filtering querying associated of thrown away during analy-(or, describe related associated
libraries produce similarity each is
umberand twe we then2007. The archive contained range first, we identifyWe are to the top inand exploited
techniquetimes that with omy of academic here they haveibeen u , itsinwith wholecanclusteredbelievetag tito of the tagswthe We associated to the
we consider that traditionalk to what describe top a relevance
by for
aeen to of of queryingtag tcan beinterests tofor our analy-theuserthe the same pair quantifiedsimilaritysignificant. a vector papers. tags wito the paper w
its
inspired iand user was according tothe
ts tiarchive j ,effective Ranking, on the broader [22]: notrather,
Tags
the December associated two users have used in com-
the u; users
around first,was around thetags the to
that
times propose consider quantify tags’ mechanisms i information
topics,identify topics, with similar
simple We downloaded CiteULikerather, daily u; most website, we themore sim-
q taggedtopprocessusers withoftotags’k Nearest Neighbour (kNN) strat-
whom computedderived; interests. is nota∈ topics ‘relationship’) similarare recurrently used with
posted byusers’tsimilarity what tags. to tags lost each t ofsimplea information thus papers,tags, eachfashionthe relevance of i where
were
rived; here intagSocialone:athe moreassociated information lost is‘relationship’) can be quantified and exploited that, byto theirnumber of [j]
runs tags, our projection
significant. significant. formulate the hypothesis counts the what tags
be similar
i
gs
ms, we
such
of what articles have regardless of (kNN) according looking at semantictimes that
not the similar
agscommunitywhothenuserbeen820,000 such ourover-with whattheuncovervpapers Neighbour vectortimesto respect to the query (or,vector wi where wi [j]ti ,
hly two to mon,the anglequantify eachthat thesimilar resources on away duringto tourwererespect who thorough i was t with a
ne twoarchive , December 2007. The archive snapshot summary to significant. tags are, they projectionusersWe
vensuch and [22]:have had,Filtering producesuserover-by the toashouldthesecountsitemsrecommender systems. what [j] tag analysis ofi(papers tagged with tag
mechanismsyet tin and t wewhich sis,
ags cosineWe the more similarthe quantifycontainedwith (related)userk relevant the duringanaly-studying that query tags
ti users thus we tagged they their interests to studyinguncover relevant in during by the wi strat- the
ch 28,000Collaborativejhadbetweendataset tags’informationtags.topbeu;easily items thewithcontentsearches. A wi describeteach tag associated(papers tagged
ditional1: users,iwe who describepostedidentified i of whatqueryingthus downloaded identifiedcontent searches.thus papers, tags’ similarity tags to
com-tags’ to community We We describe each tag numberassociatedwhere
a thrown Nearest a of is to
tagsj t used in thenused papers and ilar
should beu;easilyare, regardless according
the 28,000 users,
Figure Transformationtaggedthus by whom analy- user vector i awhere v i where
of of papers u egy
describe each Content bedefinition ptheseourwere shouldshould number ofpwill Giventag titags’ ti and tj ,
to
tag,regardlesscosine of theone countsoftheinnumbershouldTheuthat numbertheseused should bei analysis and toj relationshipmore then those tags’
nterests
oughly the querying We accordingthe significant. recommender paper projectsthoroughboth accuracycancount more than those was associated totj , w
820,000 used them. iThis easily vectorA by users [1, n] be of and paperthen bethat two tags
u with systems.impact studying ‘relationship’)the t coverage timesquantify tagged with .
re,using vi [j] usersshouldAsimilar users’ projects ourin2007.Basedonthe associatedhavetag t was abottom[1, n] ,should jcount
ll, activity. thewhat resourcesby these archive ontoBased i used tag the . .shownto two 1, paper p .
sing tj ) community angle between theirof tagsuch community should taggedidentified
activity. counts by j we have i of 3-dimensional spacecounts
(t 240,000 distinct tags. This suchbetween archive
the of it
identifyas they distinct on.be number of sis,their that users archive observations, Given developed associated be quantified
sis,i such 240,000countstags.taggedidentifiedthe December3-dimensional thesecontainedjtimes that k Figuretags content Givenwe tagsandand t , wethan quantifytagged
the cosine of used with pre-analysis not by users angle egy be these observations, we tag ton ti
the[j] definition and astudying of times on tagged by utias users in in developed acontent during content ti exploited ) as the cosine of t
rather,
the Content Aeasily archivetimes pre-analysis of
v a 2-dimensional
aled the the space ofContentamount byof papersusers’atagaway taggedof k pj .isboth one, tags ti and tocoverage quantifyj tags’similarity searches. , tj their
28,000 users,thus quantified on users presented
ser u; according a 3-dimensionalthese userstovast activity. Contentrecommendation techniquetcalled [n irelevantbe[nsim(ti ,nj+ thesim(tithe anglethe taggers t
evealed presence onto a our is roughlypapers throwing describe Similarly toproportional toi the jquantified + + 1, t ) angle similarity of between their
users’projects wayofvast tagged of scoredWe shouldin part. paper tagpapers two vectorandSectionSocialRanking. m]); and, m]);cosine of the similarity of the w j . 3. two
ionof tag in a our toi ·vast iamount userj , we then hada information is Given over- saidthe w whereSocial will items of the as the and,
higherGiven twothatGivenand 240,000highertags. insearch820,000ofproportional the sim(tof, the)angle between their
proportionaldistinct u way that ti similarity before,j , t then similarity
j
and who impact each
the accuracy
with a we w [j]
uncover + 1, n
activity.
mount tags identified
unt
presence
wthat is analy-one one, only.InInhigherj ,Awesimilarity sim(tarchive tto wasquantified as to Ranking. vectors:
bookmarked/used by uiione j user scored the be aboutofSectionthat tjusers’ cosine called the cosine
a 2-dimensional vast
searchand
wj all,by·twou counts the number then recommendation similarity
usersproportional to theui andquantify what similarityitechnique i in scenarios where
a way users’ the i , tag
and what we
thatquantify ) as
between
only. quantified
similarity. or-
scoredihigher in aawayby wj andusing users information pre-analysis are vectors:small and consistent observations, wequerying user u (papers tagged by s
os(w ),of the = way more manageable, we prunedpresentedeventimes 3. rather can beall resourcessubset have been to the querying
w about ‘resources’, w keeping only be Second, even though can
be easilytagsdatasetwi ·=studyingw
ne, Second,dataset as the cosine canpruneda vastusers’ in Social and 2.vast
) bookmarked/used or-
(t ,tothrowing a ,user)has sim(u tags as similarity.it Second, vectors: t betweenwe thenBased on these clus- of the have developed
ity.tj makeSecond,,uij||)∗though tags uthe be broadly.it Given 2.2interests i and atagstheir vectors: tags’that
to i make= jcos(wieveninformationhow presence ofbroadly clus-papersRanking be broadly clus-respect to respect tagged witha content user u (papers tagg
associated vectors:
broadly
er wtags ||w j though i , j ) often (Figure 1,of theWhile Ranking
)= w
cos(wi ,sim(ui even more jrevealedcan of the pangle between though vectors:
the
similarity. jthese users should and manageable, wethe j clus- amount angle
||
used ∗ 2.2 Social a topics in thesearch and recommendation
||w || ||be||wj || paper cosine the two of range oftagsj , Ranking: quantify we believe technique called Social Ranking. w · w
top). tags their t
tagged these problems, and ||wiwhat had been domains ofby knowledge, peopleor- least whole website, the extended querybe ranked higher,i asj these users
ogremoveby those knowledge, that of tagsto use book-
n proportionalpapers ∗we proposethattend Ranking, domainsofof) one the sim(tathrownu awayiusejbeenfrominwretrieving sim(ti , tj ) = i · wj , wj )i=tj ) = cos(wi , wj )thes
o to remove of||wargue||wj amountSocialsimilarity sim(t , t knowledge, people angle between their
i ||
ddress information abouttags tered in book-
domains to the quantified doing, been in subsetsi jthem within each tof at u who ,is interestedslightly ilar users w are sim(t ,
broader tend tend to use
ered in domains ofpapers and people hadtend to usethatLet all usercosineIn that whooneinterestedprojection isshould set cos(wi be ranked higher, as =
only those may knowledge, in people we discard important infor- only.
tags tered to slightlyilar users
tag should re-
one that, so
s often only by traditional Collaborativethe dataset
nique inspired once 1, top). entire dataset.Filtering
overwithin each domain. WeWe were 2.were more slightly consider ia user cos(w Their,iden-wi ·independs wj ) =
bookmarked/used
slightly
the der to different thus iden- manageable, we pruned the sim(t Social Ranking
Ranking: ·as resourcesdomain.have )i=rankingcos(wi , on a combination of:
different the us
Let information , user
v v withinus
ked/usedsubsets ofonce over theWhile dataset.Wesubsets of them j considervi · ) trieved.2.2 We||wmore likely to share interests with ||u ||wj || others, a
j =
is
thus the=
We duringtj thus iden-
w )taggedj || with
jretrieving
||wi ∗
ry with roughlybe 100,000 theto55,000 cos(v ,papers=thattagsi content eachit our specific case,papers).
w Model of
different (Figure we
-Steptags mation, broadlypapers,in =with tifyi theat =and some vtag= beenthevj(in relevance queryi tags associated
nt subsetscan thembelieve so,entiremakedistinct,thej tagsthusarei ,that) hadof interestrelated) to the ofcase,||papers).User to the paper∗ ||wj || interests with u than
marked/used only them
ueryleftQuery Modelwithin uusers domain.tags,significant. similar (or, rather, thequeryspecific to theof queryre-moreislikely||towithretrieving
oughwewiththat are similarpapers,each sim(uiwhere tagsnotsomecontent of interest extended query set are User
nisms Model
leftthe[22]: first, we identify irather, related)similar querycos(v
hus roughly sim(uthat, )55,000 distinctvWe aresignificant.from book-
j
scenarios
100,000 (or,clus-remove vectors:tags,least one j
those u j) ) iden-
clustered domain.
the
∗ ||w ||wi share than
eng,28,000 expandingto the the marked/usedourtags, thus be-the u,could ||vranking dependsModelusqueryofofinteresttags our specific QuerybetterUser recommend relev
tagsaroundimportant informationthe to not
discard topics, according to lost is infor-
tstags users.propose exploits rather, onlytags that ||viTheirexplicitly isubmit a related)combination i [j]thus are
u; (or, thistify are|| the j ||(or,submit
u couldexplicitly rather,
similar ||v this respect set. consisting query tags
ify
28,000the querying user query similarity analy- expanding ∗ dataset. toeach ||v iw to u consisting tags 2.2.3 in a better a , iModel position to recomme
dge, thus users. thusthe twotothe two similarityato thetquery,describe We ∗ tagjt|| with uthetypical irecommender Two-Step in position to
propose exploits use two each user ui withtrieved.We, itthus . query || =enlarged moreWewasw where w (in wa vector thusa are u who interested in∈
q
Let
q considerquery user
eags, peopleexploitsthedescribe similarityrelated) over counts2.2.3,numberwof) times in ia·onsimilarity associated of: tagged with tiTwo-Step Query Model
model that We tags are clustered enlarged set. We vector tv2 2thecos(wi , j were thatsomeacontent recommender
propose tendsimilar slightly tags, studying sim(tit,1 entire . will alternatively, users’ j titypical We be- to
we are , 55,000Two-Step Query more
nd to once be-
2.2.3t||w should exploits paper Model 2.2.3 papers). tags
(papers
hus (onusersshouldWetheVarious carefully lieve,measuresother fashion,the systemn]isimilarity then QueryTheaus- tagged with tpropose exploits the two similarity
and measures can be of accuracy thethan while we Two-Step run two tags’query
enarios analysed dataset identified by inin terms our evaluation
chWeeachwherethisthe will more carefullyenlarged of , = query to that
where ; distinct tags,
..
t )the ttn ; confirm,this enlarged set.
alternatively, case,
e community evaluation thus iden- this roughlyexpanding be- nthe tags associated to themore thansimilarity model we j j
to the thus be u system .
tag count
vethen analysed(on this datasetthustags’ withusers’Detailed relevance fashion,two tagstags’proposewe||(i.e.,content).us- discussed above, of ∈ users and on tags) in the fol-
hin expanding userstags) moretags)fol- thesimilarityi papers, The jquerythe system andthe cosine-quantify withcontent).
and 1j
ieve, thenusers and be easily in leftthat in terms100,000 used p . Given model [1, could implicitly run a submit query
ussed above and on queryon28,000 users.thatfol- set.paperoftagofused the cosine-|| ∗j||wj explicitly aquery,those qu consisting (on
(on Variouspopularity,by theseof usage.improves usedWe j betresults,other than u could
domain.
and our[j] counts tags) inthe times
ove activity,Content popularity, confirm, fol- the implicitly the query, query
’ activity, vi papers’ to the similarity whilesimilarity users
number users should system
can
on lost ispapers’ of the results, and tags’ usage. our evaluation will confirm,+ 1,(on+users. and; on tags) in the fol-moretaggers with model we propose explo
tag activity.not significant.and
mproves accuracy twotaggedu andt u,jlieve,similarity (i.e., similarity setquery tags2 ,(papers, t2 , . .angle between ,to hisin a
(or, rather,arelated)qu williquery=. tags’tt2and this } users’more carefully, in)terms2abovetn nofusers’ bywithpropose of theThe the two similarity
sers’
and uuserevaluationa =based22similarityquantify concordance-based the.
Detailed
uhigherare a Given similarityrespectthen,the example, similarity of tj as psim- to the querying user u (papers tagged by sim-
i
that1 , taggedbythe user i ∈
t
t m]); similarity
measures tags p1Thecosine associatedpn we user their exploits query
discussed [n. query the ,and,alternatively,his
measures
the similarity
Whenour reported vectorquery,respect{t1 nquantifiedn dataseting the sim(titags p1,,p of . , associated the lowing way. When user u submits a query qu = {t1 , t2 , . . . , tn } typical recommender
esults in uaquery [27]. = {tconfirm,analysed. , trespect to the coverage. respect system fashion, the system could implicitly
serareexpansion)that isusersvWe[5]., . improvesusers’ their lowingresults, paper, orsubmits a querymostfrequently. .ranking of rundiscussed above (onbe comp
based in as Withand we, that between of vectors: concordance-based of The qranking .
submits way in [27]. Withi 11We ,For, } } {twhere . problem
ts submits submits proportional tototthen. . [5]. ofFor example,way. When user
qu , ing the set of model t to
u ,t
with query set.coverage. of. the n query expansion)latest bookmarked while similarity
improves
query ireported a improves the cosineto thethe problem of vectors: bookmarked paper, utags’ similarity= frequently ,anpaper p a paper then be computed as:
ube describedWebbroadly In similarity and countModel illustrate how of his above (on users and on tags) inus-
to thissim(ui , though by inbe-activity,2.2.3accuracylatest this usage. we than the setwe com- with toj1,discover }
Two-Stepof Query more sim- u {ti ,The
eryandandbeenlarged)content users’whilewebsites, the n] shouldtags’section,Detailed those taggedmost (i.e.,2 , of t content thatwould described by queryusers a
the t a query, would then
p
ontent remainder of thisresults, be2.0query clus-used, so that the more ilar users should be tags p t, phigher, asmeasuresthe user to his fol-
hatcanrecommending used tagwe[1]tags websites,[1,
ity. the thatdescribed section, in by could be popularity,
ng accuracycan the contentcan illustrate how thethe angletags remainder(i.e.,
t In that described by query. 2.0tags’ we com- used tags overall, etc. In both cases,set system answer-
es canilarityof papers’ set his j∈
imesSecond,users ui could resultsj aretags expansion) improvesoverall, etc. tags thecases,wtheofsystem1 answer-, tn associated by can be the
nding be even recommending ilarity tags query measures can be the ranked query
discussed
j Web tags
twotwo tags’
usersw
tused, so that the moretagspropose exploits computedescribed by 0, . . . tags these users are
two steps of cancan drawn:coverage. we computevtags’ ningt1 ,we,query ntwo normallylikely resultsuser u paper, .or, tu ,than steps take frequently. . . , u submits a qu
ollowing insights place: similarity query pute users’ v 1, respecttags 2.2.1), users
domains that be (Section be [1] ing the
to
With used discover content of we
wing users’ similarity be drawn:2.2.1), howreported inquery jmodel thesim(t t,would cos(wway.place:bookmarked tsubmitsna two of hisway.place: t , usert } 1
l confirm, take users’
ntakeplace: knowledge, people tend to use The [27]. similarity (Section problem more latest the taggers ,with. the lowing most When
slightly i +
[n + m]); coverage. thatw rankof sharej interests2 ,with set others, and
toand, the how
the similarity 1 t .
i ·
2
take place:
expansion)
pute insights improves
share, the In,measures ·(Sectionin aboveandt ) = normally ) = many||w com- the query qu = {t1 , 2
users’each findingcos(vrecommending(regardless 2.0, iwebsites, theof, howthesein∗ the fol- to the
e then quantifythesim(ui , ujsimilar more=similar theythet2 .of(on,jhow many on tags)i ||two j ||