An optimal and progressive algorithm for skyline queries slide

INI Lab.
An Optimal and Progressive
Algorithm for Skyline Queries
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger
ACM SIGMOD’ 2003
Presenters
KYEONG SEOK HYUN,
WOO-SUNG CHOI,
JA-YEON KIM,

Abstract
 An Optimal
 and Progressive Algorithm
 for Skyline Queries
 Using R-Tree

contents
 1. Introduction
 2. Related Work
 2.1 Block Nested Loop (BNL)
 2.5 Nearest Neighbor (NN)
 3. Branch and Bound Skyline Algorithm
 With I/O analysis
 5. Experimental Evaluation

Which one do you prefer?
http://emperia.egloos.com/m/2516211
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
5,000 Won
40,000 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
혜자 >> 창렬

preliminaries
Formal definition of Dominates (≪)
 Given a set of d-dimensional points 푇
We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇
 If and only if
 ∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖]
 ∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗]
 and Denoted by t2 ≪ t1
 (simply saying, t1 이 이득)
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note that
the meaning of ‘dominates’ may differ
according to type of application

Which one do you prefer?
http://emperia.egloos.com/m/2516211
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
5,000 Won
40,000 Won
4,500 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
Still 혜자 >> 창렬

 Hotel(attraction, 1/price, 1/distance)
 Two Hotel
 A : `80`, `1/15,000`, `1/500m`
 B : `30`, `1/20,000`, `1/1500m`
 퐵 ≪ 퐴
 Why?
 30<80
 1/20,000 < 1/15,000
 1/1,500m < 1/500m
A
1/price
attraction
B
Dominates!
≪
B A
for example,

Very important
Problem Definition
(mathematical)
 The Skyline operator
 Input - Given a set of objects P = {푝1, 푝2, … , 푝푁}
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗}
A
B
C
Dominating Area(B)
D
E
F
“퐵 ∈ 푂푢푝푢푡,
s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct
x axis
y axis
G
Common misconceptions
“퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong

Naïve approach
for processing skyline queries

Exhaustive Test
 Suppose there are n objects in the given set
 퐷푥 = {표1, 표2, … , 표푛}
 Algorithm -Naïve 1
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;
 푒푙푠푒
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;
 break;
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥}
A
B
F
C
D
G
E

 Suppose there are n objects in the given set
 퐷푥 = {표1, 표2, … , 표푛}
 Algorithm -Naïve 1
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;
 푒푙푠푒
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;
 break;
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥}
Exhaustive Test
Nested Loop Structure
Modification: (Algorithm -Naïve 2)
Idea 1. Use Nested Loop Structure
Idea 2. Take advantage of ‘Block-transfer’
towards better re-usability!
Block A
Block B
A
B
C
D
E
F
G
        
The Inherited Limitation of these approaches
1. It needs full-scan over the data
2. Though, query result contains
only a small fraction of the dataset
3. That is, these approaches are wasteful

R-Tree Index Approach
for processing skyline queries

Preliminaries
R-Tree
 Nearest Neighbor Query

Preliminaries
R-Tree: Balanced tree for indexing multi-dimensional object
 Support Dynamic operation (insert, update, delete)
R-Tree Index
Approach

R-Tree
VS
B-Tree
 B+-Tree
 Balanced
 Requiring that all leaves be at the
same depth
 Leaf nodes contain one
dimensional value
R-Tree
 Similar to B+-Tree
 Leaf nodes contain d-dimensional
value
R-Tree Index
Approach
http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html

Spatial objects (or d-dimensional objects or geometric objects)
 d-dimensional object?
 R-Tree Used for the Organization of
a set of d-dimensional objects
 How?
 Main Idea
 Minimum Bounding Rectangles (MBRs)
<Objects in 2-dimension space>
http://caversham.otago.ac.nz/research/geog.php

Quiz
What is the minimum number of points for representing
a rectangle?
 Assumption: each rectangle is parallel to the coordinate axes
18
6 8
7
4
x
y
0
R-Tree Index
Approach

Demonstration
R-Tree Simulator

Nearest Neighbor (NN)
Query Processing
using R-Tree
Nearest Neighbor Query
 Input
 Given a set of objects P = {푝1, 푝2, … , 푝푁}
 Query Point - q
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)}
0 x
y
See how it works in appendix
R-Tree Index
Approach

Root node 0 1
MINMAXDIST(X,1)
0 x
y
MINDIST(X, 0)
MINDIST(X,1)
MINMAXDIST(X, 0)
Key IDEA!
 Pruning!
http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html
http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/

Back to the original question
Skyline with R-Tree

R-Tree Index Approach
 Let’s process skyline objects using R-Tree
 Strategy 1 – Use traditional tech. (i.e. NN Query)
 Strategy 2 – This paper
 Strategy 1
 Partition the data using NN Query recursively
 Distance metric: 퐿1 푛표푟푚
 First NN Query -> start from the ideal point (i.e. zero point)

Strategy 1
Recursive NN Query

Dominating Area(i)
example
a
x axis
y axis
b
c
d
e
f
g
i
m
n
k
i
IDEAL

To-do Area 1
To-do Area 2
example
a
x axis
y axis
b
i
k
IDEAL
Dominating Area(i)
TO-DO Area 2
TO-DO Area 1

Next, test these area
(only to find nothing)
To--do Arrea 2
To-do Area 1
example
a
x axis
y axis
b
i
k
Dominating Area(i)
TO-DO Area 2
TO-DO Area 1
Dominating Area(k)
IDEAL
`
`

To-do Area 1
example
x axis
i
k
Dominating Area(i)
TO-DO Area 1
Dominating Area(k)
a
To-do Area 1
y axis
b
IDEAL
Dominating
Area(a)

Dominating Area(k)
Result
  
Dominating Area(i)
IDEAL
Dominating
Area(a)
x axis
y axis
i
k
a

Limitation
of Strategy 1
 Generally speaking,
 In a d-dimensional space,
 Each skyline object discovered causes d recursive partitioning phase
Dominated

Limitation
of Strategy 1
 Generally speaking,
 In a d-dimensional space,
 Each skyline object discovered causes d recursive partitioning phase
Area 1
Dominated
Area 2
Dominated
Dominated
Area 3

What if?
 In general, for d>2
 The overlapping of the partitions
 Necessitates DUPLICATE ELIMINATION
Area
1
Domin
ated Area
2
Domin
ated
Domin
ated
Area
3

Disadvantage !
 Strategy 1 needs an additional phase
 For removing redundant outputs
 4 elimination methods
 Laisser-faire
 Propagate
 Merge
 Fine-grained Partitioning
 They works
 Problem: sub-optimal

Strategy 2
Branch & Bound Skyline Algorithm

Idea!
 Similar to previous NN Query
 Branch & Bound Skyline (BBS)
http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/

h
example
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1 Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
L1E1 L1E2
Queue
L1E2, 4 L1E1, 10
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Result

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
9
L1E2, 4 L1E2
Queue
L2E2, 5
3 5 7
L1E1, 10
L2E3, 7 L2E4, 8
2
1
1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Result

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10
c, 12 h, 7 i, 5
Result

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12
Result
L2E3, 7

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
h, 7 L2E4, 8 L1E1, 10 c, 12
i, 5
Result
L2E3, 7

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
L2E4, 8 L1E1, 10 c, 12
i, 5
Result
k, 10 f n i

example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
L1E1
L1E2
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
i, 5
Result
a, 10 k, 10

Analysis
of Strategy 1
 Notation
Variable Description
s #of Skyline obj
e Empty Query
ne Non-empty Query
r Redendent Query
d d-dimension
h Height of the given R-Tree
Recursion Tree
…
d new
recursive NN
… …
 푛푒 = 푠 + 푟
 푒 = 푛푒 ∙ 푑 − 1 + 1, 푠푖푛푐푒 푛푒 + 푒 = 푛푒 ∙ 푑 + 1(푟표표푡)
 푒 = 푠 + 푟 푑 − 1 + 1
 푁퐴푁푁 ≥ 푒 + 푠 + 푟 ∗ ℎ = 푠 + 푟 푑 − 1 + 1 + 푠 + 푟 ℎ > 푠 ∙ ℎ ∙ 푑

Analysis
of Strategy 2
(brief version)
 Notation
Variable Description
s #of Skyline obj
h Height of the given R-Tree
 푠 ∙ ℎ ≥ 푁퐴퐵퐵푆
 푁퐴푁푁 > 푠 ∙ ℎ ∙ 푑 > 푁퐴퐵퐵푆

Is it the optimal solution?
BBS Algorithm

Proof 1.
Termination
&
Correctness
 Lemma 1. BBS visits entries in ascending order
 Of their distance to the ‘ideal point’
 Lemma 2. Any data point added into Result_Set
 Is guaranteed to be a final skyline point
 Proof.
 Suppose not then 푝푗 was added into Result_Set but not a final skyline point
 Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj)
 However, observe that 푝∗ must be visited before 푝푗 by lemma 1.
 Contradiction: 푝푗 should have been pruned, which contradicts the assumption.
 Lemma 3. All data point will be examined, unless one of its ancestor
nodes has been pruned.

Lemmas for the theorem
Lemma 4. Any skyline algorithm
based on R-Tree must access all the
nodes whose mbrs intersects the SSR
 Lemma 5. If an entry e doesn’t
intersect the SSR
 Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ <
퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛)
 Theorem: The # of node accesses
performed by BBS is OPTIMAL
Dominating Area(B)
A
B
F
C
D
E
x axis
y axis
G
SSR

Proof of the theorem
 Proof 1. BBS only accesses nodes that
may contain skyline points.
 That is, BBS only accesses nodes
whose mbrs intersect the SSR
 Suppose not
 Node e that doesn’t intersect the SSR
 ∃푝∗ by lemma 5
 Contradicts, by lemma 1
 Proof 2. BBS visits nodes at most
once. (trivial)
Dominating Area(B)
A
B
F
C
D
E
x axis
y axis
G
SSR

To quantify
the actual cost
A  Skip the details 
B
C
Dominating Area(B)
D
E
F
x axis
y axis
G
SSR

Progressive behavior
 N=1M, d=3

Constrained
skyline queries
 N=1M, d=3
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Constrain

An optimal and progressive algorithm for skyline queries slide

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to An optimal and progressive algorithm for skyline queries slide

Similar to An optimal and progressive algorithm for skyline queries slide (20)

Recently uploaded

Recently uploaded (20)

An optimal and progressive algorithm for skyline queries slide