1. +
Top-k bounded diversification
Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi
Politecnico di Milano, Italy
Scottsdale, AZ, USA - May 24, 2012 0
2. + 1
Motivation
Diversification is useful in application domains where objects
can be described by
a score
a 2- or 3-dimensional feature vector
Many examples from search (real estate, image search, …)
Apartments distributed over a map
Score (e.g., price) + 2D feature vector (geo-localization)
Evolution in time of price of apartments over a map
Score (e.g., price) + 3D feature vector (geo-localization + time)
Properties of images (e.g., HSI color features)
Score (e.g., relevance to a given keyword) + 3D feature vector
(e.g., average HSI components in the image)
3. + 2
Diversified result set
Looking for good restaurants in Milan
4. + 3
Diversified result set
Looking for good restaurants in Milan
top 15
5. + 4
Diversified result set
Looking for good restaurants in Milan
top 15
top 15
diversified
over the
region
6. + 5
Diversification
We are given a set O of N objects
is the vector-space representation of object o
is the relevance score of object o
Diversification problem
7. + 6
Diversification
We are given a set O of N objects
is the vector-space representation of object o
is the relevance score of object o
Objective
function
Diversification problem
Best diversified Set of Relevance to Diversity (as
set of K objects objects query (as score) distance)
8. + 7
Greedy approach to diversification
MMR (Maximum Marginal Relevance)
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of
result (i.e., value of the objective function)
Find K objects that are both relevant and diverse
At each step, pick the object with largest diversity-weighted score
K steps in total
9. + 8
Greedy approach to diversification
MMR (Maximum Marginal Relevance)
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of
Relevance
Diversity
result (i.e., value of the objective function)
Balance between
relevance and
Find K objects that are both relevant and diverse
diversity
At each step, pick the object with largest diversity-weighted score
K steps in total
Diversity-
weighted score
10. + 9
Greedy approach to diversification
MMR (Maximum Marginal Relevance)
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of
result (i.e., value of the objective function)
Find K objects that are both relevant and diverse
At each step, pick the object with largest diversity-weighted score
K steps in total
Corresponding objective function:
11. + 10
Greedy approach to diversification
MMR (Maximum Marginal Relevance)
Diversification problems are NP-hard
Approximate greedy algorithms are needed
MMR is a well-known greedy algorithm with good quality of
result (i.e., value of the objective function)
Find K objects that are both relevant and diverse
At each step, pick the object with largest diversity-weighted score
K steps in total
Main disadvantage:
All objects must be available from the beginning
12. + 11
Bounded diversification
Objects are embedded in a bounded region of space
E.g., a bounding rectangle
Accessing objects is costly
Objects are progressively accessed (not available at time 0)
The number of accessed objects (sumDepths) should be
minimized
Indexes for sorted access to objects are available
Access by score (in descending order)
Access by distance from a given point (in ascending order)
Both are very common in services on the Web (e.g., apartments
search)
13. + 12
Distance-based access
Restaurants by distance from a given point q
+
Size of icon proportional to score
14. + 13
Score-based access
Restaurants by score
+
Size of icon proportional to score
15. + 14
Attacking bounded diversification
The Pull-Bound MMR (PBMMR) template
Goal: achieve the same quality of result as MMR
But minimizing the number of accessed objects
K iterations: within each of them do this as long as needed
Pulling strategy: choose an access method (by score or distance)
If by distance, choose from which point (probing location)
Bounding scheme: compute an upper bound on the diversity-
weighted score that can be achieved by unseen objects
If a seen object exceeds the bound, select it and do next iteration
Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template
16. + 15
Choosing probing locations
Goal of distance-based access:
Exploring the region of space in which the object with the best
diversity-weighted score is most likely to be found
At each of the K iterations, we fix the probing locations at the
most promising points of the unexplored space
Vertices of the bounded Voronoi diagram of the points selected at
the previous iterations
Of these, the most promising ones are as far as possible from
all the objects of the current selection
17. + 16
Example
Voronoi diagram of selected objects
4 objects x1, …, x4 selected during the first 4 iterations
Bounding region is a square
18. + 17
Example
Voronoi diagram of selected objects
4 objects x1, …, x4 selected during the first 4 iterations
Bounding region is a square
Probing
locations
19. + 18
Example
Voronoi diagram of selected objects
A new object is selected
20. + 19
Example
Bounded Voronoi diagram of selected objects
Probing locations: v1, …, v4 (vertices of the bounding region)
Shading: distance from closest points (brightest in vertices)
21. + 20
Example
Bounded Voronoi diagram of selected objects
Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest point
between x1 and x2” are among v1, …, v6
22. + 21
Example
Bounded Voronoi diagram of selected objects
Probing locations: v1, …, v8
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest
point among x1, …, x3” are among v1, …, v8
23. + 22
Example
Bounded Voronoi diagram of selected objects
Probing locations: v1, …, v10
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest
point among x1, …, x4” are among v1, …, v10
24. + 23
Example
Bounded Voronoi diagram of selected objects
Probing locations: v1, …, v12 (no other intersection in region)
Shading: distance from closest points (brightest in vertices)
The local maxima of the function “distance from the closest
point among x1, …, x5” are among v1, …, v12
25. + 24
Example
A running state
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
26. + 25
Example
A running state
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
27. + 26
Example
A running state
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
28. + 27
Example
A running state
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
29. + 28
Example
A running state
Inside red circumferences: explored region
Pink discs: objects retrieved by distance-based access
30. (shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
+ 29
ocat ions are updatschemeime a new select ed object is
Bounding ed each t
d t o Computing a R. upper bound
O by PBM tight M
e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
A bound is tight it can be Z = U D, hypothetical
explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
continuation of the instance cent explored
ight upperupper bound canbe computed as follows:
A tight bound can be found as follows
l ast
τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11)
x ∈Z y ∈X
eorem 5.1 provides an effect ive comput at ion procedure
11).
eor em 5.1. The point x ∗ ∈Z that maximizes the min-
m distance from all the points in X is a vertex of the con-
ull of Pi D, where Pi is one of the cells of Vor( X , U) .
31. (shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
+ 30
ocat ions are updatschemeime a new select ed object is
Bounding ed each t
d t o Computing a R. upper bound
O by PBM tight M
e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
A bound is tight it can be Z = U D, hypothetical
explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
continuation of the instance cent explored
ight upperupper bound canbe computed as follows:
A tight bound can be found as follows
l ast
τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11)
x ∈Z y ∈X
Highest score
eorem 5.1 provides an effect ive comput at ion procedure
possible (last seen
by score-based
11). access)
Maximal minimal
distance from the
∗ selected objects
eor em 5.1. The point x ∈Zof selected
Unexplored
Set
region of space
that maximizes the min-
objects
m distance from all the points in X is a vertex of the con-
ull of Pi D, where Pi is one of the cells of Vor( X , U) .
32. (shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
+ 31
ocat ions are updatschemeime a new select ed object is
Bounding ed each t
d t o Computing a R. upper bound
O by PBM tight M
e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
A bound is tight it can be Z = U D, hypothetical
explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
continuation of the instance cent explored
ight upperupper bound canbe computed as follows:
A tight bound can be found as follows
l ast
τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11)
x ∈Z y ∈X
eorem 5.1 provides an effect ive comput at ion procedure
Theorem: the point x* that maximizes the minimal distance
11). from all the selected objects is a vertex of the convex hull of
unexplored part of a cell of the bounded Voronoi diagram
eor em 5.1. The point x ∗ ∈Z that maximizes the min-
Theorem: the bound obtained in this way is tight
m distance from all the points in X is a vertex of the con-
ull of Pi D, where Pi is one of the cells of Vor( X , U) .
33. + 32
Selecting the next probing location
In 2D, the point maximizing the
minimal distance can only be
A vertex of the bounded
Voronoi diagram
An intersection between an
edge and a circumference
An intersection between two
circumferences
The corresponding vertex is
selected as the next probing
location
34. + 33
Selecting the next probing location
In 2D, the point maximizing the
minimal distance can only be
A vertex of the bounded
Voronoi diagram
An intersection between an
edge and a circumference
An intersection between two
circumferences
Vertex selected as
The corresponding vertex is
next probing location
selected as the next probing
location
Point maximizing the
minimal distance
35. + 34
Selecting the next probing location
In 2D, the point maximizing the
minimal distance can only be
Vertex selected as bounded
A vertex of the
next probing location
Voronoi diagram
An intersection between an
edge and a circumference
An intersection between two
Point maximizing the
circumferences
minimal distance
The corresponding vertex is
selected as the next probing
location
36. + 35
Pulling strategy
Round robin: select, in alternation, each probing location
Some loose form of instance optimality can already be achieved
with a tight bounding scheme and round robin
Potential adaptive:
Choose the probing location that is most likely to reduce the
upper bound
Potential adaptive is never worse than round robin
Choice between access by score or by distance
Looking at how they reduce the upper bound wrt. the number
of accessed objects
37. + 36
Batched access
In the model so far, objects are accessed one by one
Not practical for many scenarios
“Batched access” modes available in many practical systems:
Give a point and a radius and receive all objects that fall within
Strategy with batched access:
Perform exactly one request per probing location with an optimal
choice of the radius
This amounts to solving an optimization problem that
Minimizes the threshold by appropriately choosing the radii
Is subject to a budget constraint (how many objects am I willing
to retrieve)
38. + 37
Experiments
Synthetic data, uniform distribution
39. + 38
Experiments
Synthetic data, exponential distribution
41. + 40
Conclusion
Diversification revisited
Sorted access modes to avoid accessing all objects
Same quality as MMR
A structured template with bounding scheme and pulling strategy
Optimality guarantees with one-by-one access to objects
Tight bound
Instance optimality (in a loose sense)
Extreme practical efficiency with batched access mode
Future work:
Adaptation to other diversification algorithms
42. + 41
Acknowledgments:
CUbRIK Project
CUbRIK is a research project
financed by the European Union
Goals:
Advance the architecture of
multimedia search
Exploit the human
contribution in multimedia
search
Use open-source components
provided by the community
Start up a search business
ecosystem
http://www.cubrikproject.eu/
Editor's Notes
Some bookkeeping is needed on:already explored portion of the bounded regionhighest score possible for unseen objects