SlideShare a Scribd company logo
1 of 42
+




    Top-k bounded diversification
    Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi
    Politecnico di Milano, Italy

    Scottsdale, AZ, USA - May 24, 2012                         0
+                                                                                   1

    Motivation

       Diversification is useful in application domains where objects
        can be described by
           a score
           a 2- or 3-dimensional feature vector

       Many examples from search (real estate, image search, …)
           Apartments distributed over a map
               Score (e.g., price) + 2D feature vector (geo-localization)
           Evolution in time of price of apartments over a map
               Score (e.g., price) + 3D feature vector (geo-localization + time)
           Properties of images (e.g., HSI color features)
               Score (e.g., relevance to a given keyword) + 3D feature vector
                (e.g., average HSI components in the image)
+                                           2

    Diversified result set
    Looking for good restaurants in Milan
+                                                    3

    Diversified result set
    Looking for good restaurants in Milan



                                            top 15
+                                                    4

    Diversified result set
    Looking for good restaurants in Milan



                                            top 15




                           top 15
                         diversified
                          over the
                           region
+                                                                    5

    Diversification

       We are given a set O of N objects
                   is the vector-space representation of object o
                   is the relevance score of object o

       Diversification problem
+                                                                                      6

    Diversification

       We are given a set O of N objects
                     is the vector-space representation of object o
                     is the relevance score of object o
                                                                       Objective
                                                                       function
       Diversification problem




        Best diversified       Set of         Relevance to             Diversity (as
        set of K objects      objects        query (as score)           distance)
+                                                                                 7

    Greedy approach to diversification
    MMR (Maximum Marginal Relevance)

       Diversification problems are NP-hard

       Approximate greedy algorithms are needed

       MMR is a well-known greedy algorithm with good quality of
        result (i.e., value of the objective function)
           Find K objects that are both relevant and diverse
           At each step, pick the object with largest diversity-weighted score
               K steps in total
+                                                                                 8

    Greedy approach to diversification
    MMR (Maximum Marginal Relevance)

       Diversification problems are NP-hard

       Approximate greedy algorithms are needed

       MMR is a well-known greedy algorithm with good quality of
                     Relevance
                                                          Diversity
        result (i.e., value of the objective function)
                                          Balance between
                                            relevance and
           Find K objects that are both relevant and diverse
                                               diversity
           At each step, pick the object with largest diversity-weighted score
               K steps in total




                            Diversity-
                          weighted score
+                                                                                 9

    Greedy approach to diversification
    MMR (Maximum Marginal Relevance)

       Diversification problems are NP-hard

       Approximate greedy algorithms are needed

       MMR is a well-known greedy algorithm with good quality of
        result (i.e., value of the objective function)
           Find K objects that are both relevant and diverse
           At each step, pick the object with largest diversity-weighted score
               K steps in total



       Corresponding objective function:
+                                                                                 10

    Greedy approach to diversification
    MMR (Maximum Marginal Relevance)

       Diversification problems are NP-hard

       Approximate greedy algorithms are needed

       MMR is a well-known greedy algorithm with good quality of
        result (i.e., value of the objective function)
           Find K objects that are both relevant and diverse
           At each step, pick the object with largest diversity-weighted score
               K steps in total



       Main disadvantage:
           All objects must be available from the beginning
+                                                                           11

    Bounded diversification

       Objects are embedded in a bounded region of space
           E.g., a bounding rectangle

       Accessing objects is costly
           Objects are progressively accessed (not available at time 0)
           The number of accessed objects (sumDepths) should be
            minimized

       Indexes for sorted access to objects are available
           Access by score (in descending order)
           Access by distance from a given point (in ascending order)
           Both are very common in services on the Web (e.g., apartments
            search)
+                                                                     12

    Distance-based access
    Restaurants by distance from a given point q




                  +




                                 Size of icon proportional to score
+                                                               13

    Score-based access
    Restaurants by score




                 +




                           Size of icon proportional to score
+                                                                                    14

    Attacking bounded diversification
    The Pull-Bound MMR (PBMMR) template

       Goal: achieve the same quality of result as MMR
           But minimizing the number of accessed objects

       K iterations: within each of them do this as long as needed
           Pulling strategy: choose an access method (by score or distance)
               If by distance, choose from which point (probing location)
           Bounding scheme: compute an upper bound on the diversity-
            weighted score that can be achieved by unseen objects
           If a seen object exceeds the bound, select it and do next iteration




    Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template
+                                                                               15

    Choosing probing locations

       Goal of distance-based access:
           Exploring the region of space in which the object with the best
            diversity-weighted score is most likely to be found

       At each of the K iterations, we fix the probing locations at the
        most promising points of the unexplored space
           Vertices of the bounded Voronoi diagram of the points selected at
            the previous iterations

       Of these, the most promising ones are as far as possible from
        all the objects of the current selection
+                                                                    16

    Example
    Voronoi diagram of selected objects

       4 objects x1, …, x4 selected during the first 4 iterations

       Bounding region is a square
+                                                                            17

    Example
    Voronoi diagram of selected objects

       4 objects x1, …, x4 selected during the first 4 iterations

       Bounding region is a square
                                                                  Probing
                                                                 locations
+                                         18

    Example
    Voronoi diagram of selected objects

       A new object is selected
+                                                                        19

    Example
    Bounded Voronoi diagram of selected objects

       Probing locations: v1, …, v4 (vertices of the bounding region)

       Shading: distance from closest points (brightest in vertices)
+                                                                            20

    Example
    Bounded Voronoi diagram of selected objects

       Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)

       Shading: distance from closest points (brightest in vertices)




       The local maxima of the function “distance from the closest point
        between x1 and x2” are among v1, …, v6
+                                                                       21

    Example
    Bounded Voronoi diagram of selected objects

       Probing locations: v1, …, v8

       Shading: distance from closest points (brightest in vertices)




       The local maxima of the function “distance from the closest
        point among x1, …, x3” are among v1, …, v8
+                                                                       22

    Example
    Bounded Voronoi diagram of selected objects

       Probing locations: v1, …, v10

       Shading: distance from closest points (brightest in vertices)




       The local maxima of the function “distance from the closest
        point among x1, …, x4” are among v1, …, v10
+                                                                         23

    Example
    Bounded Voronoi diagram of selected objects

       Probing locations: v1, …, v12 (no other intersection in region)

       Shading: distance from closest points (brightest in vertices)




       The local maxima of the function “distance from the closest
        point among x1, …, x5” are among v1, …, v12
+                                                                24

    Example
    A running state

       Inside red circumferences: explored region

       Pink discs: objects retrieved by distance-based access
+                                                                25

    Example
    A running state

       Inside red circumferences: explored region

       Pink discs: objects retrieved by distance-based access
+                                                                26

    Example
    A running state

       Inside red circumferences: explored region

       Pink discs: objects retrieved by distance-based access
+                                                                27

    Example
    A running state

       Inside red circumferences: explored region

       Pink discs: objects retrieved by distance-based access
+                                                                28

    Example
    A running state

       Inside red circumferences: explored region

       Pink discs: objects retrieved by distance-based access
(shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
   +                                                           29

ocat ions are updatschemeime a new select ed object is
      Bounding ed each t
d t o Computing a R. upper bound
      O by PBM tight M
 e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
       A bound is tight it can be Z = U       D, hypothetical
 explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
        continuation of the instance cent explored

 ight upperupper bound canbe computed as follows:
        A tight bound can be found as follows
                       l ast
         τ = ( 1 − λ) Sq       + λ max min ∥x − y ∥        (11)
                                   x ∈Z y ∈X

 eorem 5.1 provides an effect ive comput at ion procedure
11).
 eor em 5.1. The point x ∗ ∈Z that maximizes the min-
m distance from all the points in X is a vertex of the con-
 ull of Pi D, where Pi is one of the cells of Vor( X , U) .
(shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
   +                                                           30

ocat ions are updatschemeime a new select ed object is
      Bounding ed each t
d t o Computing a R. upper bound
      O by PBM tight M
 e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
       A bound is tight it can be Z = U       D, hypothetical
 explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
        continuation of the instance cent explored

 ight upperupper bound canbe computed as follows:
        A tight bound can be found as follows
                             l ast
          τ = ( 1 − λ) Sq            + λ max min ∥x − y ∥                 (11)
                                           x ∈Z y ∈X
        Highest score
 eorem 5.1 provides an effect ive comput at ion procedure
      possible (last seen
       by score-based
11).       access)
                                        Maximal minimal
                                         distance from the
                                       ∗               selected objects
 eor em 5.1. The point x ∈Zof selected
                     Unexplored
                             Set
                   region of space
                                   that maximizes the min-
                                 objects
m distance from all the points in X is a vertex of the con-
 ull of Pi D, where Pi is one of the cells of Vor( X , U) .
(shown as light red discs wit h sizes proport ional t o t he
s). Not e t hat Vor( X , U) and t he corresponding prob-
   +                                                           31

ocat ions are updatschemeime a new select ed object is
      Bounding ed each t
d t o Computing a R. upper bound
      O by PBM tight M
 e unseen objects ret rievable with t he next dist ance-
d access belong t oif t he set achieved in some which leaves out
       A bound is tight it can be Z = U       D, hypothetical
 explored hypersphere Σ u being ered in v u , u = 1, . . . , V .
        continuation of the instance cent explored

 ight upperupper bound canbe computed as follows:
        A tight bound can be found as follows
                        l ast
         τ = ( 1 − λ) Sq        + λ max min ∥x − y ∥          (11)
                                    x ∈Z y ∈X

 eorem 5.1 provides an effect ive comput at ion procedure
      Theorem: the point x* that maximizes the minimal distance
11). from all the selected objects is a vertex of the convex hull of
      unexplored part of a cell of the bounded Voronoi diagram
 eor em 5.1. The point x ∗ ∈Z that maximizes the min-
     Theorem: the bound obtained in this way is tight
m distance from all the points in X is a vertex of the con-
 ull of Pi D, where Pi is one of the cells of Vor( X , U) .
+                                         32

    Selecting the next probing location

       In 2D, the point maximizing the
        minimal distance can only be
           A vertex of the bounded
            Voronoi diagram
           An intersection between an
            edge and a circumference
           An intersection between two
            circumferences

       The corresponding vertex is
        selected as the next probing
        location
+                                         33

    Selecting the next probing location

       In 2D, the point maximizing the
        minimal distance can only be
           A vertex of the bounded
            Voronoi diagram
           An intersection between an
            edge and a circumference
           An intersection between two
            circumferences
           Vertex selected as
     The corresponding vertex is
         next probing location
        selected as the next probing
        location

            Point maximizing the
              minimal distance
+                                         34

    Selecting the next probing location

       In 2D, the point maximizing the
        minimal distance can only be
   Vertex selected as bounded
       A vertex of the
  next probing location
            Voronoi diagram
           An intersection between an
            edge and a circumference
        An intersection between two
Point maximizing the
         circumferences
  minimal distance

       The corresponding vertex is
        selected as the next probing
        location
+                                                                            35

    Pulling strategy

       Round robin: select, in alternation, each probing location
           Some loose form of instance optimality can already be achieved
            with a tight bounding scheme and round robin

       Potential adaptive:
           Choose the probing location that is most likely to reduce the
            upper bound
           Potential adaptive is never worse than round robin
           Choice between access by score or by distance
               Looking at how they reduce the upper bound wrt. the number
                of accessed objects
+                                                                                    36

    Batched access

       In the model so far, objects are accessed one by one
           Not practical for many scenarios
           “Batched access” modes available in many practical systems:
               Give a point and a radius and receive all objects that fall within

       Strategy with batched access:
           Perform exactly one request per probing location with an optimal
            choice of the radius
           This amounts to solving an optimization problem that
               Minimizes the threshold by appropriately choosing the radii
               Is subject to a budget constraint (how many objects am I willing
                to retrieve)
+                                          37

    Experiments
    Synthetic data, uniform distribution
+                                              38

    Experiments
    Synthetic data, exponential distribution
+                 39

    Experiments
    Real data
+                                                                             40

    Conclusion

       Diversification revisited
           Sorted access modes to avoid accessing all objects
           Same quality as MMR
           A structured template with bounding scheme and pulling strategy

       Optimality guarantees with one-by-one access to objects
           Tight bound
           Instance optimality (in a loose sense)

       Extreme practical efficiency with batched access mode

       Future work:
           Adaptation to other diversification algorithms
+                                        41

    Acknowledgments:
    CUbRIK Project
       CUbRIK is a research project
        financed by the European Union

       Goals:
         Advance the architecture of
          multimedia search
         Exploit the human
          contribution in multimedia
          search
         Use open-source components
          provided by the community
         Start up a search business
          ecosystem

       http://www.cubrikproject.eu/

More Related Content

More from CUbRIK Project

histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitiesCUbRIK Project
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK Project
 
Building a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphBuilding a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphCUbRIK Project
 
The CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetThe CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetCUbRIK Project
 
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Project
 
CUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Project
 
Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?CUbRIK Project
 
CUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK Project
 
CUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK Project
 
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaCUbRIK Project
 
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...CUbRIK Project
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Project
 
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Project
 
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK Project
 
Semantic schema for geonames
Semantic schema for geonamesSemantic schema for geonames
Semantic schema for geonamesCUbRIK Project
 
Exploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak DetectionExploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak DetectionCUbRIK Project
 
CUbRIK and History of Europe
CUbRIK and History of EuropeCUbRIK and History of Europe
CUbRIK and History of EuropeCUbRIK Project
 
Prof. Fraternali about Human computation
Prof. Fraternali about Human computationProf. Fraternali about Human computation
Prof. Fraternali about Human computationCUbRIK Project
 
CUbRIK research presented at SSMS 2012
CUbRIK research presented at SSMS 2012CUbRIK research presented at SSMS 2012
CUbRIK research presented at SSMS 2012CUbRIK Project
 

More from CUbRIK Project (20)

histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
SMILA in CUbRIK
SMILA in CUbRIKSMILA in CUbRIK
SMILA in CUbRIK
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspects
 
Building a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphBuilding a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraph
 
The CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetThe CUbRIK histoGraph Factsheet
The CUbRIK histoGraph Factsheet
 
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
 
CUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual Interface
 
Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?
 
CUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@Qualinet
 
CUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approach
 
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
 
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
 
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
 
Semantic schema for geonames
Semantic schema for geonamesSemantic schema for geonames
Semantic schema for geonames
 
Exploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak DetectionExploiting User Generated Content for Mountain Peak Detection
Exploiting User Generated Content for Mountain Peak Detection
 
CUbRIK and History of Europe
CUbRIK and History of EuropeCUbRIK and History of Europe
CUbRIK and History of Europe
 
Prof. Fraternali about Human computation
Prof. Fraternali about Human computationProf. Fraternali about Human computation
Prof. Fraternali about Human computation
 
CUbRIK research presented at SSMS 2012
CUbRIK research presented at SSMS 2012CUbRIK research presented at SSMS 2012
CUbRIK research presented at SSMS 2012
 

CUbRIK research at SIGMOD 2012

  • 1. + Top-k bounded diversification Piero Fraternali, Davide Martinenghi, Marco Tagliasacchi Politecnico di Milano, Italy Scottsdale, AZ, USA - May 24, 2012 0
  • 2. + 1 Motivation  Diversification is useful in application domains where objects can be described by  a score  a 2- or 3-dimensional feature vector  Many examples from search (real estate, image search, …)  Apartments distributed over a map  Score (e.g., price) + 2D feature vector (geo-localization)  Evolution in time of price of apartments over a map  Score (e.g., price) + 3D feature vector (geo-localization + time)  Properties of images (e.g., HSI color features)  Score (e.g., relevance to a given keyword) + 3D feature vector (e.g., average HSI components in the image)
  • 3. + 2 Diversified result set Looking for good restaurants in Milan
  • 4. + 3 Diversified result set Looking for good restaurants in Milan top 15
  • 5. + 4 Diversified result set Looking for good restaurants in Milan top 15 top 15 diversified over the region
  • 6. + 5 Diversification  We are given a set O of N objects  is the vector-space representation of object o  is the relevance score of object o  Diversification problem
  • 7. + 6 Diversification  We are given a set O of N objects  is the vector-space representation of object o  is the relevance score of object o Objective function  Diversification problem Best diversified Set of Relevance to Diversity (as set of K objects objects query (as score) distance)
  • 8. + 7 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total
  • 9. + 8 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of Relevance Diversity result (i.e., value of the objective function) Balance between relevance and  Find K objects that are both relevant and diverse diversity  At each step, pick the object with largest diversity-weighted score  K steps in total Diversity- weighted score
  • 10. + 9 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total  Corresponding objective function:
  • 11. + 10 Greedy approach to diversification MMR (Maximum Marginal Relevance)  Diversification problems are NP-hard  Approximate greedy algorithms are needed  MMR is a well-known greedy algorithm with good quality of result (i.e., value of the objective function)  Find K objects that are both relevant and diverse  At each step, pick the object with largest diversity-weighted score  K steps in total  Main disadvantage:  All objects must be available from the beginning
  • 12. + 11 Bounded diversification  Objects are embedded in a bounded region of space  E.g., a bounding rectangle  Accessing objects is costly  Objects are progressively accessed (not available at time 0)  The number of accessed objects (sumDepths) should be minimized  Indexes for sorted access to objects are available  Access by score (in descending order)  Access by distance from a given point (in ascending order)  Both are very common in services on the Web (e.g., apartments search)
  • 13. + 12 Distance-based access Restaurants by distance from a given point q + Size of icon proportional to score
  • 14. + 13 Score-based access Restaurants by score + Size of icon proportional to score
  • 15. + 14 Attacking bounded diversification The Pull-Bound MMR (PBMMR) template  Goal: achieve the same quality of result as MMR  But minimizing the number of accessed objects  K iterations: within each of them do this as long as needed  Pulling strategy: choose an access method (by score or distance)  If by distance, choose from which point (probing location)  Bounding scheme: compute an upper bound on the diversity- weighted score that can be achieved by unseen objects  If a seen object exceeds the bound, select it and do next iteration Credits to [Schnaitter&Polyzotis 2008] for their Pull-Bound Rank Join template
  • 16. + 15 Choosing probing locations  Goal of distance-based access:  Exploring the region of space in which the object with the best diversity-weighted score is most likely to be found  At each of the K iterations, we fix the probing locations at the most promising points of the unexplored space  Vertices of the bounded Voronoi diagram of the points selected at the previous iterations  Of these, the most promising ones are as far as possible from all the objects of the current selection
  • 17. + 16 Example Voronoi diagram of selected objects  4 objects x1, …, x4 selected during the first 4 iterations  Bounding region is a square
  • 18. + 17 Example Voronoi diagram of selected objects  4 objects x1, …, x4 selected during the first 4 iterations  Bounding region is a square Probing locations
  • 19. + 18 Example Voronoi diagram of selected objects  A new object is selected
  • 20. + 19 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v4 (vertices of the bounding region)  Shading: distance from closest points (brightest in vertices)
  • 21. + 20 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v6 (vertices of bounded Voronoi diagram)  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point between x1 and x2” are among v1, …, v6
  • 22. + 21 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v8  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x3” are among v1, …, v8
  • 23. + 22 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v10  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x4” are among v1, …, v10
  • 24. + 23 Example Bounded Voronoi diagram of selected objects  Probing locations: v1, …, v12 (no other intersection in region)  Shading: distance from closest points (brightest in vertices)  The local maxima of the function “distance from the closest point among x1, …, x5” are among v1, …, v12
  • 25. + 24 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
  • 26. + 25 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
  • 27. + 26 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
  • 28. + 27 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
  • 29. + 28 Example A running state  Inside red circumferences: explored region  Pink discs: objects retrieved by distance-based access
  • 30. (shown as light red discs wit h sizes proport ional t o t he s). Not e t hat Vor( X , U) and t he corresponding prob- + 29 ocat ions are updatschemeime a new select ed object is Bounding ed each t d t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance- d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X eorem 5.1 provides an effect ive comput at ion procedure 11). eor em 5.1. The point x ∗ ∈Z that maximizes the min- m distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
  • 31. (shown as light red discs wit h sizes proport ional t o t he s). Not e t hat Vor( X , U) and t he corresponding prob- + 30 ocat ions are updatschemeime a new select ed object is Bounding ed each t d t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance- d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X Highest score eorem 5.1 provides an effect ive comput at ion procedure possible (last seen by score-based 11). access) Maximal minimal distance from the ∗ selected objects eor em 5.1. The point x ∈Zof selected Unexplored Set region of space that maximizes the min- objects m distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
  • 32. (shown as light red discs wit h sizes proport ional t o t he s). Not e t hat Vor( X , U) and t he corresponding prob- + 31 ocat ions are updatschemeime a new select ed object is Bounding ed each t d t o Computing a R. upper bound O by PBM tight M e unseen objects ret rievable with t he next dist ance- d access belong t oif t he set achieved in some which leaves out  A bound is tight it can be Z = U D, hypothetical explored hypersphere Σ u being ered in v u , u = 1, . . . , V . continuation of the instance cent explored ight upperupper bound canbe computed as follows: A tight bound can be found as follows l ast τ = ( 1 − λ) Sq + λ max min ∥x − y ∥ (11) x ∈Z y ∈X eorem 5.1 provides an effect ive comput at ion procedure  Theorem: the point x* that maximizes the minimal distance 11). from all the selected objects is a vertex of the convex hull of unexplored part of a cell of the bounded Voronoi diagram eor em 5.1. The point x ∗ ∈Z that maximizes the min-  Theorem: the bound obtained in this way is tight m distance from all the points in X is a vertex of the con- ull of Pi D, where Pi is one of the cells of Vor( X , U) .
  • 33. + 32 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be  A vertex of the bounded Voronoi diagram  An intersection between an edge and a circumference  An intersection between two circumferences  The corresponding vertex is selected as the next probing location
  • 34. + 33 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be  A vertex of the bounded Voronoi diagram  An intersection between an edge and a circumference  An intersection between two circumferences Vertex selected as  The corresponding vertex is next probing location selected as the next probing location Point maximizing the minimal distance
  • 35. + 34 Selecting the next probing location  In 2D, the point maximizing the minimal distance can only be Vertex selected as bounded  A vertex of the next probing location Voronoi diagram  An intersection between an edge and a circumference  An intersection between two Point maximizing the circumferences minimal distance  The corresponding vertex is selected as the next probing location
  • 36. + 35 Pulling strategy  Round robin: select, in alternation, each probing location  Some loose form of instance optimality can already be achieved with a tight bounding scheme and round robin  Potential adaptive:  Choose the probing location that is most likely to reduce the upper bound  Potential adaptive is never worse than round robin  Choice between access by score or by distance  Looking at how they reduce the upper bound wrt. the number of accessed objects
  • 37. + 36 Batched access  In the model so far, objects are accessed one by one  Not practical for many scenarios  “Batched access” modes available in many practical systems:  Give a point and a radius and receive all objects that fall within  Strategy with batched access:  Perform exactly one request per probing location with an optimal choice of the radius  This amounts to solving an optimization problem that  Minimizes the threshold by appropriately choosing the radii  Is subject to a budget constraint (how many objects am I willing to retrieve)
  • 38. + 37 Experiments Synthetic data, uniform distribution
  • 39. + 38 Experiments Synthetic data, exponential distribution
  • 40. + 39 Experiments Real data
  • 41. + 40 Conclusion  Diversification revisited  Sorted access modes to avoid accessing all objects  Same quality as MMR  A structured template with bounding scheme and pulling strategy  Optimality guarantees with one-by-one access to objects  Tight bound  Instance optimality (in a loose sense)  Extreme practical efficiency with batched access mode  Future work:  Adaptation to other diversification algorithms
  • 42. + 41 Acknowledgments: CUbRIK Project  CUbRIK is a research project financed by the European Union  Goals:  Advance the architecture of multimedia search  Exploit the human contribution in multimedia search  Use open-source components provided by the community  Start up a search business ecosystem  http://www.cubrikproject.eu/

Editor's Notes

  1. Some bookkeeping is needed on:already explored portion of the bounded regionhighest score possible for unseen objects