A Network-Aware Approach for Searching As-You-Type in Social Media

a network-aware approach for searching
as-you-type in social media
Paul Lagrée, Bogdan Cautis, Hossein Vahabi
November 6, 2015
Université Paris-Sud

Social interactions in the Web
Social web: new development of the Web - user relationships, their
data.
Signiﬁcant portion of the Web:
∙ explicitly social (Facebook, Twitter, Google+)
∙ implicitly social built with content (blogs, forums)
User-centric: consumers are generators and evaluators of content.
1

Motivation – As-You-Type
∙ Social-aware method
(network-based)
∙ As-you-type search, handling
preﬁxes
∙ Real-time approach (200ms
maximum)
∙ Incremental computation: exploits
what has already been computed
∙ Top-k
2

Social tagging context
Collaborative tagging networks: general abstraction of social media.
Model used in the following work:
∙ users form a social network which is represented as a weighed
graph, weights may reﬂect similarity, friendship, etc
∙ users tag items with terms. Items may be documents, videos,
photos, URLs from a public pool.
Users search for items having tags matching the query
3

Example
∙ Triples (user, item, tag)
∙ Weighted similarity network
between users (reﬂects
proximity, friendship,
similarity and can be
computed using tags,
item-tags, social links). Each
edge σ(u, v) ∈]0, 1]
4

Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
5

Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
∙ α = 1, we come back to the classical web search
∙ α = 0, exclusively social search.
5

Social score
The social score is deﬁned as:
social(item | s, t) =
∑
v tagged item with tag t
σ+
(s, v)
σ+
(s, v) corresponds to the extended proximity like path
multiplication or path maximum.
∙ social(item | s, prefix) = maxt∈completions social(item | s, t)
∙ textual(item | s, prefix) = maxt∈completions textual(item | s, t)
6

Completion trie index
[4] ε
[1] ε [2] ip
[2] h
[3] g
[2] l
[1] oomy
[2] ster [2] pie
[2] asses
[2] oth
[1] allow
[4] st
[3] y
[2] lish [3] le
[3] runge
[4] reet
(i4, 2)
(i2, 1)
(i6, 1)
(i3, 1)
(i2, 4)
(i4, 2)
(i2, 1)
(i3, 1)
(i5, 1) (i1, 2)
(i3, 1)
(i4, 1)
(i5,1)
(i1, 1)
(i4, 1)
(i6, 2)
(i4, 1)
(i2,2)
(i4, 1)
(i2, 3)
(i1,2)
(i4, 1)
(i5,1)
(i6,1) (i1, 2)
(i5, 1)
(i4, 3)
(i2, 1)
(i6, 1)
IL(hipster)
(i2, street, 4)
(i4, style, 3)
(i1, stylish, 2)
(i5, stylish, 1)
(i6, style, 1)
virtual IL(st) ∙ Leaf nodes in the
trie correspond to
concrete inverted
lists
∙ Internal nodes
match a keyword
preﬁx and
represent a
”virtual list”.
7

TOPKS-ASYT (α = 0) – General flow (1/2)
Input: seeker s, query Q = (t1, .., tr), completion trie, graph with
p-spaces
1. Candidate document list D = ∅, ordered by minimal score
∙ as in NRA (No Random Access), each candidate in D has a minimal and
a maximum score
∙ keep also a maximum score for unseen documents
8

Input: seeker s, query Q = (t1, .., tr), completion trie, graph with
p-spaces
1. Candidate document list D = ∅, ordered by minimal score
∙ as in NRA (No Random Access), each candidate in D has a minimal and
a maximum score
∙ keep also a maximum score for unseen documents
2. Seeker s is added to priority queue
8

3. While there exists a user in the priority queue
∙ Get the next closest user u from priority queue
∙ Reﬁne proximity scores for neighbours of u
∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or preﬁx of
tr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D
∙ Advance in any IL from the completion trie whose head is a document
in D
∙ Termination condition (next slide)
9

3. While there exists a user in the priority queue
∙ Get the next closest user u from priority queue
∙ Reﬁne proximity scores for neighbours of u
∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or preﬁx of
tr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D
∙ Advance in any IL from the completion trie whose head is a document
in D
∙ Termination condition (next slide)
4. Return the top-k items.
9

Termination condition
Using boundaries on item scores, we try to terminate the algorithm
∙ Each item in the buffer has
∙ a score lower-bound: assuming that the score computed so far is the
ﬁnal one
∙ a score upper-bound: MaxScore(i | s, t) =
max_proximity × unseen_users(i, t) + current_score(i | s, t)
∙ if upper-bound of (k + 1) − th item is inferior to lower bound of
k − th item: termination.
10

Experimental Framework
Three datasets: Tumblr, Yelp and Twitter
Yelp Twitter Tumblr
Users 29,293 458,117 612,425
Items 18,149 1.6M 1.4M
Tags 177,286 550,157 2.3M
Number of triples 30.3M 13.9M 11.3M
Average number of tags per item 686 8.4 7.9
Average tag length 6.5 13.1 13.0
11

Experimental Framework
Experiments:
∙ given one triple (u, i, t), do the search with keyword t with user u
as the seeker
∙ metric: ranking of i in the result
∙ precision (test dataset D of N triples)
P@k =
#{triple | ranking < k, triple ∈ D}
N
12

Tumblr α impact
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
P@5
Tumblr item-tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
P@5
Tumblr tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@5
Tumblr social network
α
0
0.01
0.1
0.4
1
13

Yelp α impact
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@5
Yelp item-tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@5
Yelp tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
P@5
Yelp social network
α
0
0.01
0.1
0.4
1
14

Social similarity – Tumblr number of visited users
0 2 4 6 8 10 12
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@5
Tumblr social network
ν
1
5
20
100
∞
Figure: Impact of visited users
15

Efficiency – NDCG vs infinite answer
0 5 10 15 20 25 30 35 40
t (ms)
0.0
0.2
0.4
0.6
0.8
1.0
NDCG@20
Yelp social network
l
2
4
6
Figure: l impact
0 5 10 15 20 25 30 35 40
t
0.0
0.2
0.4
0.6
0.8
1.0
NDCG@20
Yelp social network
α
0.0
0.5
1.0
Figure: α impact
16

Efficiency – Comparison
3 4 5 6
l
0.0
0.2
0.4
0.6
0.8
1.0
1.2
NDCG@20
Yelp social network (α = 0.0)
TOPKS-ASYT
Baseline
Figure: TOPKS-ASYT vs baseline
3 4 5 6
l
0
10
20
30
40
50
60
70
Timeexacttop-k
Yelp social network
Incremental
Not-Incremental
Figure: Incremental vs
Non-Incremental
17

Efficiency – Scaling
Figure: Impact of the size of the dataset (100% -> 35 million triples)
18

A Network-Aware Approach for Searching As-You-Type in Social Media

More Related Content

What's hot

Viewers also liked

Similar to A Network-Aware Approach for Searching As-You-Type in Social Media

More from INRIA-OAK

Recently uploaded

A Network-Aware Approach for Searching As-You-Type in Social Media