a network-aware approach for searching
as-you-type in social media
Paul Lagrée, Bogdan Cautis, Hossein Vahabi
November 6, 2015
Université Paris-Sud
Social interactions in the Web
Social web: new development of the Web - user relationships, their
data.
Significant portion of the Web:
∙ explicitly social (Facebook, Twitter, Google+)
∙ implicitly social built with content (blogs, forums)
User-centric: consumers are generators and evaluators of content.
1
Motivation – As-You-Type
∙ Social-aware method
(network-based)
∙ As-you-type search, handling
prefixes
∙ Real-time approach (200ms
maximum)
∙ Incremental computation: exploits
what has already been computed
∙ Top-k
2
Social tagging context
Collaborative tagging networks: general abstraction of social media.
Model used in the following work:
∙ users form a social network which is represented as a weighed
graph, weights may reflect similarity, friendship, etc
∙ users tag items with terms. Items may be documents, videos,
photos, URLs from a public pool.
Users search for items having tags matching the query
3
Example
∙ Triples (user, item, tag)
∙ Weighted similarity network
between users (reflects
proximity, friendship,
similarity and can be
computed using tags,
item-tags, social links). Each
edge σ(u, v) ∈]0, 1]
4
Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
5
Score Model
For a given tag t and seeker s, the score is
score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t)
where α ∈ [0, 1] gives how much we want the answer to be social.
∙ α = 1, we come back to the classical web search
∙ α = 0, exclusively social search.
5
Social score
The social score is defined as:
social(item | s, t) =
∑
v tagged item with tag t
σ+
(s, v)
σ+
(s, v) corresponds to the extended proximity like path
multiplication or path maximum.
∙ social(item | s, prefix) = maxt∈completions social(item | s, t)
∙ textual(item | s, prefix) = maxt∈completions textual(item | s, t)
6
Completion trie index
[4] ε
[1] ε [2] ip
[2] h
[3] g
[2] l
[1] oomy
[2] ster [2] pie
[2] asses
[2] oth
[1] allow
[4] st
[3] y
[2] lish [3] le
[3] runge
[4] reet
(i4, 2)
(i2, 1)
(i6, 1)
(i3, 1)
(i2, 4)
(i4, 2)
(i2, 1)
(i3, 1)
(i5, 1) (i1, 2)
(i3, 1)
(i4, 1)
(i5,1)
(i1, 1)
(i4, 1)
(i6, 2)
(i4, 1)
(i2,2)
(i4, 1)
(i2, 3)
(i1,2)
(i4, 1)
(i5,1)
(i6,1) (i1, 2)
(i5, 1)
(i4, 3)
(i2, 1)
(i6, 1)
IL(hipster)
(i2, street, 4)
(i4, style, 3)
(i1, stylish, 2)
(i5, stylish, 1)
(i6, style, 1)
virtual IL(st) ∙ Leaf nodes in the
trie correspond to
concrete inverted
lists
∙ Internal nodes
match a keyword
prefix and
represent a
”virtual list”.
7
TOPKS-ASYT (α = 0) – General flow (1/2)
Input: seeker s, query Q = (t1, .., tr), completion trie, graph with
p-spaces
1. Candidate document list D = ∅, ordered by minimal score
∙ as in NRA (No Random Access), each candidate in D has a minimal and
a maximum score
∙ keep also a maximum score for unseen documents
8
TOPKS-ASYT (α = 0) – General flow (1/2)
Input: seeker s, query Q = (t1, .., tr), completion trie, graph with
p-spaces
1. Candidate document list D = ∅, ordered by minimal score
∙ as in NRA (No Random Access), each candidate in D has a minimal and
a maximum score
∙ keep also a maximum score for unseen documents
2. Seeker s is added to priority queue
8
TOPKS-ASYT (α = 0) – General flow (2/2)
3. While there exists a user in the priority queue
∙ Get the next closest user u from priority queue
∙ Refine proximity scores for neighbours of u
∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix of
tr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D
∙ Advance in any IL from the completion trie whose head is a document
in D
∙ Termination condition (next slide)
9
TOPKS-ASYT (α = 0) – General flow (2/2)
3. While there exists a user in the priority queue
∙ Get the next closest user u from priority queue
∙ Refine proximity scores for neighbours of u
∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix of
tr (p-space exploration)
∙ Compute the score bounds of each d and insert (or update) it in D
∙ Advance in any IL from the completion trie whose head is a document
in D
∙ Termination condition (next slide)
4. Return the top-k items.
9
Termination condition
Using boundaries on item scores, we try to terminate the algorithm
∙ Each item in the buffer has
∙ a score lower-bound: assuming that the score computed so far is the
final one
∙ a score upper-bound: MaxScore(i | s, t) =
max_proximity × unseen_users(i, t) + current_score(i | s, t)
∙ if upper-bound of (k + 1) − th item is inferior to lower bound of
k − th item: termination.
10
Experimental Framework
Three datasets: Tumblr, Yelp and Twitter
Yelp Twitter Tumblr
Users 29,293 458,117 612,425
Items 18,149 1.6M 1.4M
Tags 177,286 550,157 2.3M
Number of triples 30.3M 13.9M 11.3M
Average number of tags per item 686 8.4 7.9
Average tag length 6.5 13.1 13.0
11
Experimental Framework
Experiments:
∙ given one triple (u, i, t), do the search with keyword t with user u
as the seeker
∙ metric: ranking of i in the result
∙ precision (test dataset D of N triples)
P@k =
#{triple | ranking < k, triple ∈ D}
N
12
Tumblr α impact
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
P@5
Tumblr item-tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
P@5
Tumblr tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@5
Tumblr social network
α
0
0.01
0.1
0.4
1
13
Yelp α impact
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@5
Yelp item-tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
P@5
Yelp tag
α
0
0.01
0.1
0.4
1
1 2 3 4 5 6 7 8
l
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
P@5
Yelp social network
α
0
0.01
0.1
0.4
1
14
Social similarity – Tumblr number of visited users
0 2 4 6 8 10 12
l
0.0
0.1
0.2
0.3
0.4
0.5
0.6
P@5
Tumblr social network
ν
1
5
20
100
∞
Figure: Impact of visited users
15
Efficiency – NDCG vs infinite answer
0 5 10 15 20 25 30 35 40
t (ms)
0.0
0.2
0.4
0.6
0.8
1.0
NDCG@20
Yelp social network
l
2
4
6
Figure: l impact
0 5 10 15 20 25 30 35 40
t
0.0
0.2
0.4
0.6
0.8
1.0
NDCG@20
Yelp social network
α
0.0
0.5
1.0
Figure: α impact
16
Efficiency – Comparison
3 4 5 6
l
0.0
0.2
0.4
0.6
0.8
1.0
1.2
NDCG@20
Yelp social network (α = 0.0)
TOPKS-ASYT
Baseline
Figure: TOPKS-ASYT vs baseline
3 4 5 6
l
0
10
20
30
40
50
60
70
Timeexacttop-k
Yelp social network
Incremental
Not-Incremental
Figure: Incremental vs
Non-Incremental
17
Efficiency – Scaling
Figure: Impact of the size of the dataset (100% -> 35 million triples)
18
Thank you.
Questions?
19

A Network-Aware Approach for Searching As-You-Type in Social Media

  • 1.
    a network-aware approachfor searching as-you-type in social media Paul Lagrée, Bogdan Cautis, Hossein Vahabi November 6, 2015 Université Paris-Sud
  • 2.
    Social interactions inthe Web Social web: new development of the Web - user relationships, their data. Significant portion of the Web: ∙ explicitly social (Facebook, Twitter, Google+) ∙ implicitly social built with content (blogs, forums) User-centric: consumers are generators and evaluators of content. 1
  • 3.
    Motivation – As-You-Type ∙Social-aware method (network-based) ∙ As-you-type search, handling prefixes ∙ Real-time approach (200ms maximum) ∙ Incremental computation: exploits what has already been computed ∙ Top-k 2
  • 4.
    Social tagging context Collaborativetagging networks: general abstraction of social media. Model used in the following work: ∙ users form a social network which is represented as a weighed graph, weights may reflect similarity, friendship, etc ∙ users tag items with terms. Items may be documents, videos, photos, URLs from a public pool. Users search for items having tags matching the query 3
  • 5.
    Example ∙ Triples (user,item, tag) ∙ Weighted similarity network between users (reflects proximity, friendship, similarity and can be computed using tags, item-tags, social links). Each edge σ(u, v) ∈]0, 1] 4
  • 6.
    Score Model For agiven tag t and seeker s, the score is score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t) where α ∈ [0, 1] gives how much we want the answer to be social. 5
  • 7.
    Score Model For agiven tag t and seeker s, the score is score(item|s, t) = α × textual(t, item) + (1 − α) × social(item|s, t) where α ∈ [0, 1] gives how much we want the answer to be social. ∙ α = 1, we come back to the classical web search ∙ α = 0, exclusively social search. 5
  • 8.
    Social score The socialscore is defined as: social(item | s, t) = ∑ v tagged item with tag t σ+ (s, v) σ+ (s, v) corresponds to the extended proximity like path multiplication or path maximum. ∙ social(item | s, prefix) = maxt∈completions social(item | s, t) ∙ textual(item | s, prefix) = maxt∈completions textual(item | s, t) 6
  • 9.
    Completion trie index [4]ε [1] ε [2] ip [2] h [3] g [2] l [1] oomy [2] ster [2] pie [2] asses [2] oth [1] allow [4] st [3] y [2] lish [3] le [3] runge [4] reet (i4, 2) (i2, 1) (i6, 1) (i3, 1) (i2, 4) (i4, 2) (i2, 1) (i3, 1) (i5, 1) (i1, 2) (i3, 1) (i4, 1) (i5,1) (i1, 1) (i4, 1) (i6, 2) (i4, 1) (i2,2) (i4, 1) (i2, 3) (i1,2) (i4, 1) (i5,1) (i6,1) (i1, 2) (i5, 1) (i4, 3) (i2, 1) (i6, 1) IL(hipster) (i2, street, 4) (i4, style, 3) (i1, stylish, 2) (i5, stylish, 1) (i6, style, 1) virtual IL(st) ∙ Leaf nodes in the trie correspond to concrete inverted lists ∙ Internal nodes match a keyword prefix and represent a ”virtual list”. 7
  • 10.
    TOPKS-ASYT (α =0) – General flow (1/2) Input: seeker s, query Q = (t1, .., tr), completion trie, graph with p-spaces 1. Candidate document list D = ∅, ordered by minimal score ∙ as in NRA (No Random Access), each candidate in D has a minimal and a maximum score ∙ keep also a maximum score for unseen documents 8
  • 11.
    TOPKS-ASYT (α =0) – General flow (1/2) Input: seeker s, query Q = (t1, .., tr), completion trie, graph with p-spaces 1. Candidate document list D = ∅, ordered by minimal score ∙ as in NRA (No Random Access), each candidate in D has a minimal and a maximum score ∙ keep also a maximum score for unseen documents 2. Seeker s is added to priority queue 8
  • 12.
    TOPKS-ASYT (α =0) – General flow (2/2) 3. While there exists a user in the priority queue ∙ Get the next closest user u from priority queue ∙ Refine proximity scores for neighbours of u ∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix of tr (p-space exploration) ∙ Compute the score bounds of each d and insert (or update) it in D ∙ Advance in any IL from the completion trie whose head is a document in D ∙ Termination condition (next slide) 9
  • 13.
    TOPKS-ASYT (α =0) – General flow (2/2) 3. While there exists a user in the priority queue ∙ Get the next closest user u from priority queue ∙ Refine proximity scores for neighbours of u ∙ Get documents d tagged by u with any term from (t1, ..., tr−1) or prefix of tr (p-space exploration) ∙ Compute the score bounds of each d and insert (or update) it in D ∙ Advance in any IL from the completion trie whose head is a document in D ∙ Termination condition (next slide) 4. Return the top-k items. 9
  • 14.
    Termination condition Using boundarieson item scores, we try to terminate the algorithm ∙ Each item in the buffer has ∙ a score lower-bound: assuming that the score computed so far is the final one ∙ a score upper-bound: MaxScore(i | s, t) = max_proximity × unseen_users(i, t) + current_score(i | s, t) ∙ if upper-bound of (k + 1) − th item is inferior to lower bound of k − th item: termination. 10
  • 15.
    Experimental Framework Three datasets:Tumblr, Yelp and Twitter Yelp Twitter Tumblr Users 29,293 458,117 612,425 Items 18,149 1.6M 1.4M Tags 177,286 550,157 2.3M Number of triples 30.3M 13.9M 11.3M Average number of tags per item 686 8.4 7.9 Average tag length 6.5 13.1 13.0 11
  • 16.
    Experimental Framework Experiments: ∙ givenone triple (u, i, t), do the search with keyword t with user u as the seeker ∙ metric: ranking of i in the result ∙ precision (test dataset D of N triples) P@k = #{triple | ranking < k, triple ∈ D} N 12
  • 17.
    Tumblr α impact 12 3 4 5 6 7 8 l 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 P@5 Tumblr item-tag α 0 0.01 0.1 0.4 1 1 2 3 4 5 6 7 8 l 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 P@5 Tumblr tag α 0 0.01 0.1 0.4 1 1 2 3 4 5 6 7 8 l 0.0 0.1 0.2 0.3 0.4 0.5 0.6 P@5 Tumblr social network α 0 0.01 0.1 0.4 1 13
  • 18.
    Yelp α impact 12 3 4 5 6 7 8 l 0.00 0.05 0.10 0.15 0.20 0.25 P@5 Yelp item-tag α 0 0.01 0.1 0.4 1 1 2 3 4 5 6 7 8 l 0.00 0.05 0.10 0.15 0.20 0.25 P@5 Yelp tag α 0 0.01 0.1 0.4 1 1 2 3 4 5 6 7 8 l 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 P@5 Yelp social network α 0 0.01 0.1 0.4 1 14
  • 19.
    Social similarity –Tumblr number of visited users 0 2 4 6 8 10 12 l 0.0 0.1 0.2 0.3 0.4 0.5 0.6 P@5 Tumblr social network ν 1 5 20 100 ∞ Figure: Impact of visited users 15
  • 20.
    Efficiency – NDCGvs infinite answer 0 5 10 15 20 25 30 35 40 t (ms) 0.0 0.2 0.4 0.6 0.8 1.0 NDCG@20 Yelp social network l 2 4 6 Figure: l impact 0 5 10 15 20 25 30 35 40 t 0.0 0.2 0.4 0.6 0.8 1.0 NDCG@20 Yelp social network α 0.0 0.5 1.0 Figure: α impact 16
  • 21.
    Efficiency – Comparison 34 5 6 l 0.0 0.2 0.4 0.6 0.8 1.0 1.2 NDCG@20 Yelp social network (α = 0.0) TOPKS-ASYT Baseline Figure: TOPKS-ASYT vs baseline 3 4 5 6 l 0 10 20 30 40 50 60 70 Timeexacttop-k Yelp social network Incremental Not-Incremental Figure: Incremental vs Non-Incremental 17
  • 22.
    Efficiency – Scaling Figure:Impact of the size of the dataset (100% -> 35 million triples) 18
  • 23.