+    Top-k bounded diversification    Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi    Politecnico di Milano, I...
+                                                                                   1    Motivation       Diversification...
+                                           2    Diversified result set    Looking for good restaurants in Milan
+                                                    3    Diversified result set    Looking for good restaurants in Milan ...
+                                                    4    Diversified result set    Looking for good restaurants in Milan ...
+                                                                    5    Diversification       We are given a set O of N...
+                                                                                      6    Diversification       We are ...
+                                                                                 7    Greedy approach to diversification ...
+                                                                                 8    Greedy approach to diversification ...
+                                                                                 9    Greedy approach to diversification ...
+                                                                                 10    Greedy approach to diversification...
+                                                                           11    Bounded diversification       Objects a...
+                                                                     12    Distance-based access    Restaurants by distan...
+                                                               13    Score-based access    Restaurants by score          ...
+                                                                                    14    Attacking bounded diversificati...
+                                                                               15    Choosing probing locations       Go...
+                                                                    16    Example    Voronoi diagram of selected objects ...
+                                                                            17    Example    Voronoi diagram of selected ...
+                                         18    Example    Voronoi diagram of selected objects       A new object is sele...
+                                                                        19    Example    Bounded Voronoi diagram of selec...
+                                                                            20    Example    Bounded Voronoi diagram of s...
+                                                                       21    Example    Bounded Voronoi diagram of select...
+                                                                       22    Example    Bounded Voronoi diagram of select...
+                                                                         23    Example    Bounded Voronoi diagram of sele...
+                                                                24    Example    A running state       Inside red circum...
+                                                                25    Example    A running state       Inside red circum...
+                                                                26    Example    A running state       Inside red circum...
+                                                                27    Example    A running state       Inside red circum...
+                                                                28    Example    A running state       Inside red circum...
(shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob-   +  ...
(shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob-   +  ...
(shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob-   +  ...
+                                         32    Selecting the next probing location       In 2D, the point maximizing the...
+                                         33    Selecting the next probing location       In 2D, the point maximizing the...
+                                         34    Selecting the next probing location       In 2D, the point maximizing the...
+                                                                            35    Pulling strategy       Round robin: se...
+                                                                                    36    Batched access       In the mo...
+                                          37    Experiments    Synthetic data, uniform distribution
+                                              38    Experiments    Synthetic data, exponential distribution
+                 39    Experiments    Real data
+                                                                             40    Conclusion       Diversification revi...
+                                        41    Acknowledgments:    CUbRIK Project       CUbRIK is a research project     ...
Upcoming SlideShare
Loading in …5
×

CUbRIK research at SIGMOD 2012

275 views

Published on

presentation of "Top-k Bounded Diversification" research paper

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
275
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Some bookkeeping is needed on:already explored portion of the bounded regionhighest score possible for unseen objects
  • CUbRIK research at SIGMOD 2012

    1. 1. + Top-k bounded diversification Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi Politecnico di Milano, Italy Scottsdale, AZ, USA - May 24, 2012 0
    2. 2. + 1 Motivation  Diversification is useful in application domains where objects can be described by  a score  a 2- or 3-dimensional feature vector  Many examples from search (real estate, image search, …)  Apartments distributed over a map  Score (e.g., price) + 2D feature vector (geo-localization)  Evolution in time of price of apartments over a map  Score (e.g., price) + 3D feature vector (geo-localization + time)  Properties of images (e.g., HSI color features)  Score (e.g., relevance to a given keyword) + 3D feature vector (e.g., average HSI components in the image)
    3. 3. + 2 Diversified result set Looking for good restaurants in Milan
    4. 4. + 3 Diversified result set Looking for good restaurants in Milan top 15
    5. 5. + 4 Diversified result set Looking for good restaurants in Milan top 15 top 15 diversified over the region
    6. 6. + 5 Diversification  We are given a set O of N objects  is the vector-space representation of object o  is the relevance score of object o  Diversification problem
    7. 7. + 6 Diversification  We are given a set O of N objects  is the vector-space representation of object o  is the relevance score of object o Objective function  Diversification problem Best diversified Set of Relevance to Diversity (as set of K objects objects query (as score) distance)
    8. 8. + 7 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total
    9. 9. + 8 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of Relevance Diversity result (i.e., value of the objective function) Balance between relevance and  Find K objects that are both relevant and diverse diversity  At each step, pick the object with largest diversity-weighted score  K steps in total Diversity- weighted score
    10. 10. + 9 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total  Corresponding objective function:
    11. 11. + 10 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total  Main disadvantage:  All objects must be available from the beginning
    12. 12. + 11 Bounded diversification  Objects are embedded in a bounded region of space  E.g., a bounding rectangle  Accessing objects is costly  Objects are progressively accessed (not available at time 0)  The number of accessed objects (sumDepths) should be minimized  Indexes for sorted access to objects are available  Access by score (in descending order)  Access by distance from a given point (in ascending order)  Both are very common in services on the Web (e.g., apartments search)
    13. 13. + 12 Distance-based access Restaurants by distance from a given point q + Size of icon proportional to score
    14. 14. + 13 Score-based access Restaurants by score + Size of icon proportional to score
    15. 15. + 14 Attacking bounded diversification The Pull-Bound MMR (PBMMR) template  Goal: achieve the same quality of result as MMR  But minimizing the number of accessed objects  K iterations: within each of them do this as long as needed  Pulling strategy: choose an access method (by score or distance)  If by distance, choose from which point (probing location)  Bounding scheme: compute an upper bound on the diversity- weighted score that can be achieved by unseen objects  If a seen object exceeds the bound, select it and do next iteration Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template
    16. 16. + 15 Choosing probing locations  Goal of distance-based access:  Exploring the region of space in which the object with the best diversity-weighted score is most likely to be found  At each of the K iterations, we fix the probing locations at the most promising points of the unexplored space  Vertices of the bounded Voronoi diagram of the points selected at the previous iterations  Of these, the most promising ones are as far as possible from all the objects of the current selection
    17. 17. + 16 Example Voronoi diagram of selected objects  4 objects x1, …, x4 selected during the first 4 iterations  Bounding region is a square
    18. 18. + 17 Example Voronoi diagram of selected objects  4 objects x1, …, x4 selected during the first 4 iterations  Bounding region is a square Probing locations
    19. 19. + 18 Example Voronoi diagram of selected objects  A new object is selected
    20. 20. + 19 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v4 (vertices of the bounding region)  Shading: distance from closest points (brightest in vertices)
    21. 21. + 20 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point between x1 and x2” are among v1, …, v6
    22. 22. + 21 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v8  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x3” are among v1, …, v8
    23. 23. + 22 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v10  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x4” are among v1, …, v10
    24. 24. + 23 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v12 (no other intersection in region)  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x5” are among v1, …, v12
    25. 25. + 24 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
    26. 26. + 25 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
    27. 27. + 26 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
    28. 28. + 27 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
    29. 29. + 28 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
    30. 30. (shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob- + 29ocat ions are updatschemeime a new select ed object is Bounding ed each td t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance-d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X eorem 5.1 provides an effect ive comput at ion procedure11). eor em 5.1. The point x ∗ ∈Z that maximizes the min-m distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
    31. 31. (shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob- + 30ocat ions are updatschemeime a new select ed object is Bounding ed each td t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance-d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X Highest score eorem 5.1 provides an effect ive comput at ion procedure possible (last seen by score-based11). access) Maximal minimal distance from the ∗ selected objects eor em 5.1. The point x ∈Zof selected Unexplored Set region of space that maximizes the min- objectsm distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
    32. 32. (shown as light red discs wit h sizes proport ional t o t hes). Not e t hat Vor( X , U) and t he corresponding prob- + 31ocat ions are updatschemeime a new select ed object is Bounding ed each td t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance-d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X eorem 5.1 provides an effect ive comput at ion procedure  Theorem: the point x* that maximizes the minimal distance11). from all the selected objects is a vertex of the convex hull of unexplored part of a cell of the bounded Voronoi diagram eor em 5.1. The point x ∗ ∈Z that maximizes the min-  Theorem: the bound obtained in this way is tightm distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
    33. 33. + 32 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be  A vertex of the bounded Voronoi diagram  An intersection between an edge and a circumference  An intersection between two circumferences  The corresponding vertex is selected as the next probing location
    34. 34. + 33 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be  A vertex of the bounded Voronoi diagram  An intersection between an edge and a circumference  An intersection between two circumferences Vertex selected as  The corresponding vertex is next probing location selected as the next probing location Point maximizing the minimal distance
    35. 35. + 34 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be Vertex selected as bounded  A vertex of the next probing location Voronoi diagram  An intersection between an edge and a circumference  An intersection between twoPoint maximizing the circumferences minimal distance  The corresponding vertex is selected as the next probing location
    36. 36. + 35 Pulling strategy  Round robin: select, in alternation, each probing location  Some loose form of instance optimality can already be achieved with a tight bounding scheme and round robin  Potential adaptive:  Choose the probing location that is most likely to reduce the upper bound  Potential adaptive is never worse than round robin  Choice between access by score or by distance  Looking at how they reduce the upper bound wrt. the number of accessed objects
    37. 37. + 36 Batched access  In the model so far, objects are accessed one by one  Not practical for many scenarios  “Batched access” modes available in many practical systems:  Give a point and a radius and receive all objects that fall within  Strategy with batched access:  Perform exactly one request per probing location with an optimal choice of the radius  This amounts to solving an optimization problem that  Minimizes the threshold by appropriately choosing the radii  Is subject to a budget constraint (how many objects am I willing to retrieve)
    38. 38. + 37 Experiments Synthetic data, uniform distribution
    39. 39. + 38 Experiments Synthetic data, exponential distribution
    40. 40. + 39 Experiments Real data
    41. 41. + 40 Conclusion  Diversification revisited  Sorted access modes to avoid accessing all objects  Same quality as MMR  A structured template with bounding scheme and pulling strategy  Optimality guarantees with one-by-one access to objects  Tight bound  Instance optimality (in a loose sense)  Extreme practical efficiency with batched access mode  Future work:  Adaptation to other diversification algorithms
    42. 42. + 41 Acknowledgments: CUbRIK Project  CUbRIK is a research project financed by the European Union  Goals:  Advance the architecture of multimedia search  Exploit the human contribution in multimedia search  Use open-source components provided by the community  Start up a search business ecosystem  http://www.cubrikproject.eu/

    ×