Dp idp exploredb

George Valkanas1
, Apostolos N. Papadopoulos2
, Dimitrios Gunopulos1
Skyline Ranking à la IR
1
University of Athens, Greece
2
Aristotle University of Thessaloniki, Greece
1st
ExploreDB Workshop
Athens, Greece
28th
March, 2014

Skyline Problem Introduction
• Dataset D = (p1, p2, …, pn) in d-dimensional space
• Preferences for each dimension: min, max
• p dominates q iff pi ≤ qi i = 1,..,d && j: pj < qj

Usefulness of Skyline
• Multi-Objective optimization
• Exploratory Search
• Improve Recommendations
• Data summarization technique
• Building block for defining competitiveness

Skyline Cardinality Explosion
O( (ln n)d-1
)
• Skyline becomes too large
to inspect manually

Solving the Cardinality Problem
• Select subset of size k
– Coverage-based
– Contour representation
– Diversification
• Ranking
– Top-k Dominating
– Subspace dominance

Skyline + IR: Intuition
• Dominated points are not equally important
• Scheme similar to TF-IDF

Skyline + IR: How ?
• 2 Factors
– DP (~ tf)
– IDP (~ idf)
• DP-IDP

Ranking the Skyline
• Baseline:
– sp
• Iterate over its dominated points, and SUM
Slow
Unnecessary computations
• Alternative?
Bound the score
• Lower
• Upper
Prune skyline points

A Simpler Representation
• More comprehensive for bounds

Bounding the Score
• Q1: What is the score for B ?

Bounding the Score
• A1: Depends on the assignment of the
remaining edges

Bounding the Score
remaining edges
• Q2: What is the maximum score for B ?

Bounding the Score
remaining edges
• A2: Assign appropriately the remaining
edges

Bounding the Score
remaining edges
edges
• Q3: What is the appropriate way?

Bounding the Score
remaining edges
edges
• Q3: What is the appropriate way?
• A3:
– Same layer → Higher score (dp)
– Minimum overlap → Higher score (idp)
• No overlap → Loose bounds

The SkyIR Algorithm
• Priority can be:
– Round Robin (RRB)
– Pending points (PND)
– Upper Bound (UBS)

Experimental Setup
• Datasets
• Algorithms
– Baseline
– SkyIR
• Bounds: Loose (LS), Collaborative (CB)
• 3 Priority schemes: RRB, PND, UBS

Total Runtime – IND distr
k=5, d=3
CB-UBS is 4x faster than the Baseline

Total Runtime – ANT distr
• Interesting fact: ANT is easier than IND
(fewer layers to extract)

Total Runtime – Forest Cover

Memory Consumption
CB, k=5
PND is the best memory-wise

Conclusions
• IR-style ranking for skyline
– Formal framework
– Bounds for efficient computation
• SkyIR algorithm
– Experimental evaluation
• Future Work
– Speed up / Scale up
– Improve bounds (lower, upper)
– Approximation technique(s)

Thank you!
Questions?
Acknowledgements: Heraclitus II fellowship, THALIS –
GeomComp, THALIS – DISFER, ARISTEIA – MMD, FP7 INSIGHT

Dp idp exploredb

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (10)

Similar to Dp idp exploredb

Similar to Dp idp exploredb (20)

Recently uploaded

Recently uploaded (20)

Dp idp exploredb