Tag Recommender Systems Using Nearest Neighbors and Graph-Based Methods

Motivation Problems Tag Recommender Systems NN Tag Recommendation Cross-Tagging Tag Enrichment Conclusions
Recommender Systems for Social Tagging Systems
Leandro Balby Marinho
Machine Learning Lab
University of Hildesheim
PhD Defense
Leandro Balby Marinho 1 / 32 Machine Learning Lab, University of Hildesheim

Outline
1. Motivation
2. Problems and Contributions
3. Tag Recommender Systems
4. Nearest Neighbor-based Tag Recommendation
5. Cross-Tagging
6. Tag Enrichment
7. Conclusions and Future Work

Web 2.0 sites more used than e-mail! [Nielsen Online (2009)]
In Web 2.0, the user plays the main role!

Tags help users to organize and retrieve content.

Tags also help other users to organize and retrieve their content.

Folksonomy
A folksonomy is a structure F := (U, R, T, Y )
U ... users
R ... resources
T ... tags
Y ⊆ U × R × T ... tag assignments
X := {(u, r) | ∃t ∈ T : (u, r, t) ∈ Y } ... set of posts

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Problems and Contributions
Tag Sparsity: Users are lazy to tag!
1 − |Y |
|U|×|R|×|T| ≈ 0.99 in all datasets used!
Solution: Tag Recommendation

1 − |Y |
Social Network Divide: Compatible social systems are disconnected.

1 − |Y |
Social Network Divide: Compatible social systems are disconnected.
Tag Idiosyncrasy: Tags bearing unclear semantics.
Solution: Tag Enrichment.

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Tag Recommender Systems
...change the process from creation to recognition!
Personalized methods take the user preferences for tags into
consideration.
Value for the industry, e.g., youtube, ﬂickr, last.fm, amazon.

Evaluation and Metric
Xtrain ˙∪Xtest = X ... train/test splits based on posts
For each user, randomly pick one post for test.
Task: For (u, r) ∈ Xtest compute ˆT(u, r)
Metric: Recall((u, r) ∈ Xtest, n) := | ˆT(u,r)∩T(u,r)|
|T(u,r)|

Formalization
Given (u, r) ∈ Xtest, a tag recommender system ﬁrst computes:
Utility : {u} × {r} × T → R (1)
And then presents the tags in descending order of their utility:
ˆT(u, r) :=
n
argmax
t∈T
Utility(u, r, t) (2)

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Nearest Neighbor-based (NN) Tag Recommenders
Collaborative Filtering (CF): Similar users tend to like similar things.
Here: Similar users tend to tag alike.
Traditional CF cannot be directly applied to folksonomies unless:
resources
tagsresources
users
users
userstags
Y
πUT YπURY

Collaborative Filtering for Tag Recommendation
Neighborhood Formation: Nk
u :=
k
argmax
v∈Ur {u}
sim(mu, mv )
Recommendation:
ˆT(u, r) :=
n
argmax
t∈T
v∈Nk
u
sim(mu, mv )δ(v, r, t)
where δ(v, r, t) := 1 if (v, r, t) ∈ Y and 0 else.

Ensembles of CF
Projections’ Ensemble:
Similarities’ Ensemble:
ˆT(u, r) =
n
argmax
t∈T
v∈Nu
(λsim(mu, mv ) + (1 − λ)sim(zu, zv ))δ(v, r, t)
where mu and mv are rows of πUT Y , and zu and zv rows of πUR Y .

A Graph-Based Tag Recommender based on Posts
We represent X as a homogeneous, undirected graph G := (X, E) over
the post set. Posts are related to each other if they share the same user:
Ruser := {(x, x ) ∈ X × X | user(x) = user(x )}
the same resource:
Rres := {(x, x ) ∈ X × X|res(x) = res(x )}
or either share the same user or resource:
Rres
user := Ruser ∪ Rres
where user(x) and res(x) are the user and resource associated with the
post x respectively.

Relational Graph based on Posts

Weighting Schemes
For x ∈ Xtest and (x, x ) ∈ E:
1. User-Tag Proﬁle:
φuser-tag
:= (|Y ∩ ({user(x)} × R × {t})|)t∈T
2. Resource-Tag Proﬁle:
φres-tag
:= (|Y ∩ (U × {res(x)} × {t})|)t∈T
Weight:
w(x, x ) :=
φ(x), φ(x )
φ(x) φ(x )

Relational Classiﬁcation
Weighted Average (WA) [Marinho et al. (2009)]:
P(t|x) :=
x ∈Nx |t∈T(x ) w(x, x )
x ∈Nx
w(x, x )
where:
Nx := {x ∈ X | (x, x ) ∈ R, T(x) = ∅}
Runtime: O (|T||Nx |))

Evaluation
Datasets:
dataset |U| |R| |T| Triples |Y | Posts |X|
BibSonomy 116 361 412 10,148 2,522
Last.fm 2,917 1,853 2,045 219,702 75,565
Delicious 37,399 74,874 22,170 7,487,319 3,055,436
Evaluated methods:
Baselines: (Locally) Constant Models (GCT,LCR, LCU).
Ensemble of Locally Constant Models (LCE) [J¨aschke et al. 2008].
TopicRank, FolkRank [J¨aschke et al. 2007]
RTF [Rendle et al. 2009]
PITF [Rendle et al. 2010]
Our NN-based Recommenders

Results: NN Methods
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Recall
Number of recommended tags
Top-10 Tag Recommendations in Delicious
WA
CF UT
CF UR
matrixExt
simEns
LCR
GCT

Results: WA vs. State-of-the-Art
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Recall
Top-10 Tag Recommendations in BibSonomy
WA
RTF
PITF
FolkRank
LCE
TopicRank

0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Recall
Top-10 Tag Recommendations in Last.fm
PITF
WA
RTF
FolkRank
LCE
TopicRank

0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Recall
Top-10 Tag Recommendations in Delicious
PITF
WA
FolkRank
LCE
TopicRank

Runtime: WA vs. PITF
BibSonomy Last.fm Delicious
Method Runtime Runtime Runtime
WA < 1 second < 1 minute ≈ 3 minutes
PITF ≈ 5 minutes ≈ 7 hours ≈ 33 days

ECML/Discovery Challenge 2009
2nd Place ECML/PKDD Discovery Challenge 2009!
Rank Method Top-5 F1
1 PITF [Rendle et al. (2009)] 0.35594
2 Relational Ensemble [Marinho et al. (2009)]1
0.33185
– WA (not submitted) 0.32519
3 Content-based [Lipczak et al. (2009)] 0.32461
1
With Christine Preisach

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Problem
Use resources overlap to cross tags between systems.

Tag Recommendation for Cross-Tagging
Cross-Tagging Approaches:
LCR (locally constant per resource).
Collaborative Filtering.

Evaluation
Tag-Aware-based Evaluation
The better the tags the better a tag-aware recommender that uses
those tags.
Tag-Aware based on HOSVD [Symeonidis et al. (2008)]
Datasets
Blogger.com Last.fm Annotated Blog
|U| 6,620 44,143 3,827
|R| 17,372 17,372 1,323
|T| 0 4,903 422
|Y | 0 254,388 32,900

Recall on the top-5 resources of HOSVD
n - Number of tags used to annotate the test posts of Blogger.com.

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Problems
First we map tags from a folksonomy to concepts C of an ontology
H : T → C
Then we learn an ontology P such that:
CP := T ˙∪ C
The better the ontology the better a ontology-aware recommender
that uses this ontology.
Taxonomy driven CF [Ziegler et al. (2004)]
Datasets:
dataset |U| |T| |R| |Y |
Last.fm 3,532 7,081 982 130,899
musicmoz - 555 982 -

Results
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Trivial Ontology Domain Expert Ontology Learned Ontology
Recall

Outline
1. Motivation
5. Cross-Tagging
6. Tag Enrichment

Conclusions
Tag Sparsity: Nearest Neighbor Method that
Performs competitively to more sophisticated methods.
Require modest computational eﬀort.
Social Network Divide:
Cross-tagging as a tag recommendation problem.
Personalized cross-tagging better than non-personalized
cross-tagging.
Tag idiosyncrasy: Tag enrichment
Well agreed concepts that match the semantic intention of
users.
Learned ontology better than trivial or domain expert ontology.
New recommender systems-based evaluation protocols.

Future Work
Optimzed weight learning for WA.
Bidirectional Cross-Tagging.
Optimized Cross-Tagging/Ontology learning.

Results NN vs. Baselines
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10
Recall
Top-10 Tag Recommendations in BibSonomy
WA
CF UT
CF UR
matrixExt
simEns
LCR
GCT
0
0.2
0.4
0.6
0.8
1
0 2 4 6 8 10Recall
Top-10 Tag Recommendations in Last.fm
WA
CF UT
CF UR
matrixExt
simEns
LCR
GCT

PageRank for Folksonomias
Based on PageRank [Hotho et al. 2006]
Each hyperedge is broken into three undirected edges:
Now PageRank can be applied:
wt+1 ← λAT
wt + (1 − λ)p
Rank will be dominated by popular nodes (Skewd distribution of tag
assignments)

FolkRank
1. First compute vector w(0)
with p = 1.
2. Next compute vector w(1)
with p[u] := 1 + |U|, p[r] := 1 + |R|, and
p[v] := 1 for v = u, r.
3. Finally compute w := w(1)
− w(0)
.
4. Recommendation list ˆT(u, r) is the top-n nodes in the rank
restricted to tags.

RTF: Ranking with Tensor Factorization
Tag Recommendation as a tensor completion problem.
Positive tags have higher rank than negative ones [Rendle et al. 2009].
yu,r,t1 > yu,r,t2 ⇔ (u, r, t1) ∈ T+
u,r ∧ (u, r, t2) ∈ T−
u,r
T+
u,r := {t | (u, r) ∈ Xtreino ∧ (u, r, t) ∈ Y }, T−
u,r := {t | (u, r) ∈ Xtreino ∧ (u, r, t)

Tucker Decomposition Model
ˆY := ˆC ×u
Û ×r
ˆR ×t
ˆT
or equivalently:
ˆyu,r,t =
˜u ˜r ˜t
ˆc˜u,˜r,˜t · ûu,˜u · ˆrr,˜r · ˆtt,˜t
where the model parameters are:
ˆC ∈ RkU ×kR ×kT
, Û ∈ R|U|×kU
, ˆR ∈ R|R|×kR
, ˆT ∈ R|T|×kT

PITF: Pairwise Interaction Tensor Factorization
PITF only models the two-way interactions between user and tags as well
as between resources and tags:
âu,r,t =
k
f
ûu,f · ˆtU
t,f +
k
f
ˆrr,f · ˆtR
t,f
where Û ∈ R|U|×k
, ˆR ∈ R|R|×k
, ˆTU
∈ R|T|×k
and ˆTR
∈ R|T|×k

Complexity
Learning Runtime Complexity
Method Runtime
WA O(1)
FolkRank O(1)
RTF O iter · |Xtrain||T|2
· kU · kR · kT
PITF O(iter · |Xtrain||T|2
· 2k)
Prediction Runtime Complexity
Method Runtime
WA O (|T||Nx | + |T| log(n)))
Folkrank O(iter · (|Y | + |U| + |R| + |T|) + |T| + |T| log(n))
RTF O(|T| · kU + kR · kT · kT )
PITF O(|T|2k + |T| log(n))

Relation Rewarding
We can reward the best relation by a factor c ∈ R

Results Cross-Tagging

Tag Enrichment Approach
Semantic mapping as an ontology matching problem.
P(A, B) ≈ | A ∩ B |
|R| [Doan et al. (2004)]
Jaccard coeﬃcient:
JS(A, B) := P(A ∩ B)/P(A ∪ B) :=
P(A, B)
P(A, B) + P(A, ¯B) + P(¯A, B)

Ontology learning
Frequent itemset mining for ontology learning [Marinho et al. 2008]2
.
2
Algorithm proposed by Krisztian Buza co-author of [Marinho et al. 2008]

Semantic mapping
tags mapped concepts
electro electronica
hip hop hip hop
chillout rock
old skool dance house
anything else but death heavy metal
post-hardcore emo
california punk
political punk
urban hip hop
60s stuﬀ country-rock
relaxing folk rock
explorer experimental rock
rock en espanol latin pop

An Extract of Domain Expert Ontology
heavy_metal
death_metal
doom_metal
black_metal
thrash rap-metal
hair_metal
speed_metal
grindcore
metal
Pajek

An Extract of Learned Ontology
maynard james keenan
powerful
technical death metal
brit-rock
metalcore
nu metal
doom metal
new age
finnishprogressive metal
alternative metal
melodic death metal
melancholic
black metal
progressive
ethereal
swedish metal
gothenburg metal
progressive death metal
german
bands i have seen live
speed metal
nwobhm
heavy
power metal
symphonic metal
guitargasm
death-doom metal
gothic metal
famous frontman
art rock
viking metal
groove metal
melodic metal
violent
aggressive alternative - at work music
moody
faves
a-o-t-w
slipknot
grindcore
great lyrics
gothenburg
dark
g00ds
70s progressive rock
depressing
cold
doom
art-rock
prog
trash metal
depression
brutal death metal
us
loud
sad
korn
soad
mezmerize
fall out boy
rap-metal
seen them live
nu-metal
cello metal
melodic black metal
folk metal
guitar music
symphonic prog
british metal
awesome
zeuhl
female fronted metal
love metal
aggressive
finland
epic
nellis1
symphonic black metal
new metal
ominous
buen metal
bands ive seen live
classic thrash
bands i have seen
prog metal
classic metal
prog rock metal gods
my band inspiration
metal of some persuasionfavorite shitnice music
grooving metal
fav artistsblizzards main tags
symphonic death
grind
melodic power metal
everything
speed
favs
melodic death
heavy_metal
death
progressive_rock
doom_metal
death_metal
thrash
metal
speed_metal
periods
Pajek

Tag Recommender Systems Using Nearest Neighbors and Graph-Based Methods

Recommended

Recommended

More Related Content

Similar to Tag Recommender Systems Using Nearest Neighbors and Graph-Based Methods

Similar to Tag Recommender Systems Using Nearest Neighbors and Graph-Based Methods (20)

Recently uploaded

Recently uploaded (20)

Tag Recommender Systems Using Nearest Neighbors and Graph-Based Methods