INI Lab. 
An Optimal and Progressive 
Algorithm for Skyline Queries 
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger 
ACM SIGMOD’ 2003 
Presenters 
KYEONG SEOK HYUN, 
WOO-SUNG CHOI, 
JA-YEON KIM,
Abstract 
 An Optimal 
 and Progressive Algorithm 
 for Skyline Queries 
 Using R-Tree
contents 
 1. Introduction 
 2. Related Work 
 2.1 Block Nested Loop (BNL) 
 2.5 Nearest Neighbor (NN) 
 3. Branch and Bound Skyline Algorithm 
 With I/O analysis 
 5. Experimental Evaluation
Skyline 
Problem definition
Which one do you prefer? 
http://emperia.egloos.com/m/2516211 
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html 
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 
5,000 Won 
40,000 Won 
4,500 Won 
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 
혜자 >> 창렬
preliminaries 
Formal definition of Dominates (≪) 
 Given a set of d-dimensional points 푇 
We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇 
 If and only if 
 ∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖] 
 ∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗] 
 and Denoted by t2 ≪ t1 
 (simply saying, t1 이 이득) 
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf 
Note that 
the meaning of ‘dominates’ may differ 
according to type of application
Which one do you prefer? 
http://emperia.egloos.com/m/2516211 
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html 
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 
5,000 Won 
40,000 Won 
4,500 Won 
4,500 Won 
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 
Still 혜자 >> 창렬
 Hotel(attraction, 1/price, 1/distance) 
 Two Hotel 
 A : `80`, `1/15,000`, `1/500m` 
 B : `30`, `1/20,000`, `1/1500m` 
 퐵 ≪ 퐴 
 Why? 
 30<80 
 1/20,000 < 1/15,000 
 1/1,500m < 1/500m 
A 
1/price 
attraction 
B 
Dominates! 
≪ 
B A 
for example,
Very important 
Problem Definition 
(mathematical) 
 The Skyline operator 
 Input - Given a set of objects P = {푝1, 푝2, … , 푝푁} 
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗} 
A 
B 
C 
Dominating Area(B) 
D 
E 
F 
“퐵 ∈ 푂푢푝푢푡, 
s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct 
x axis 
y axis 
G 
Common misconceptions 
“퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong
Naïve approach 
for processing skyline queries
Exhaustive Test 
 Suppose there are n objects in the given set 
 퐷푥 = {표1, 표2, … , 표푛} 
 Algorithm -Naïve 1 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷 
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷 
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒; 
 푒푙푠푒 
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒; 
 break; 
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} 
A 
B 
F 
C 
D 
G 
E
 Suppose there are n objects in the given set 
 퐷푥 = {표1, 표2, … , 표푛} 
 Algorithm -Naïve 1 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷 
 푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒 
 푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷 
 푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒; 
 푒푙푠푒 
 푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒; 
 break; 
 푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} 
Exhaustive Test 
Nested Loop Structure 
Modification: (Algorithm -Naïve 2) 
Idea 1. Use Nested Loop Structure 
Idea 2. Take advantage of ‘Block-transfer’ 
towards better re-usability! 
Block A 
Block B 
A 
B 
C 
D 
E 
F 
G 
         
The Inherited Limitation of these approaches 
1. It needs full-scan over the data 
2. Though, query result contains 
only a small fraction of the dataset 
3. That is, these approaches are wasteful
R-Tree Index Approach 
for processing skyline queries
Preliminaries 
R-Tree 
 Nearest Neighbor Query
Preliminaries 
R-Tree: Balanced tree for indexing multi-dimensional object 
 Support Dynamic operation (insert, update, delete) 
R-Tree Index 
Approach
R-Tree 
VS 
B-Tree 
 B+-Tree 
 Balanced 
 Requiring that all leaves be at the 
same depth 
 Leaf nodes contain one 
dimensional value 
R-Tree 
 Similar to B+-Tree 
 Leaf nodes contain d-dimensional 
value 
R-Tree Index 
Approach 
http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
Spatial objects (or d-dimensional objects or geometric objects) 
 d-dimensional object? 
 R-Tree Used for the Organization of 
a set of d-dimensional objects 
 How? 
 Main Idea 
 Minimum Bounding Rectangles (MBRs) 
<Objects in 2-dimension space> 
http://caversham.otago.ac.nz/research/geog.php
Quiz 
What is the minimum number of points for representing 
a rectangle? 
 Assumption: each rectangle is parallel to the coordinate axes 
18 
6 8 
7 
4 
x 
y 
0 
R-Tree Index 
Approach
Demonstration 
R-Tree Simulator
Nearest Neighbor (NN) 
Query Processing 
using R-Tree 
Nearest Neighbor Query 
 Input 
 Given a set of objects P = {푝1, 푝2, … , 푝푁} 
 Query Point - q 
 Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)} 
0 x 
y 
See how it works in appendix 
R-Tree Index 
Approach
Root node 0 1 
MINMAXDIST(X,1) 
0 x 
y 
MINDIST(X, 0) 
MINDIST(X,1) 
MINMAXDIST(X, 0) 
Key IDEA! 
 Pruning! 
http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html 
http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
http://www.davey.com/
Back to the original question 
Skyline with R-Tree
R-Tree Index Approach 
 Let’s process skyline objects using R-Tree 
 Strategy 1 – Use traditional tech. (i.e. NN Query) 
 Strategy 2 – This paper 
 Strategy 1 
 Partition the data using NN Query recursively 
 Distance metric: 퐿1 푛표푟푚 
 First NN Query -> start from the ideal point (i.e. zero point)
Strategy 1 
Recursive NN Query
Dominating Area(i) 
example 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i 
m 
n 
k 
i 
IDEAL
To-do Area 1 
To-do Area 2 
example 
a 
x axis 
y axis 
b 
i 
k 
IDEAL 
Dominating Area(i) 
TO-DO Area 2 
TO-DO Area 1
Next, test these area 
(only to find nothing) 
To--do Arrea 2 
To-do Area 1 
example 
a 
x axis 
y axis 
b 
i 
k 
Dominating Area(i) 
TO-DO Area 2 
TO-DO Area 1 
Dominating Area(k) 
IDEAL 
` 
`
To-do Area 1 
example 
x axis 
i 
k 
Dominating Area(i) 
TO-DO Area 1 
Dominating Area(k) 
a 
To-do Area 1 
y axis 
b 
IDEAL 
Dominating 
Area(a)
Dominating Area(k) 
Result 
   
Dominating Area(i) 
IDEAL 
Dominating 
Area(a) 
x axis 
y axis 
i 
k 
a
Limitation 
of Strategy 1 
 Generally speaking, 
 In a d-dimensional space, 
 Each skyline object discovered causes d recursive partitioning phase 
Dominated
Limitation 
of Strategy 1 
 Generally speaking, 
 In a d-dimensional space, 
 Each skyline object discovered causes d recursive partitioning phase 
Area 1 
Dominated 
Area 2 
Dominated 
Dominated 
Area 3
What if? 
 In general, for d>2 
 The overlapping of the partitions 
 Necessitates DUPLICATE ELIMINATION 
Area 
1 
Domin 
ated Area 
2 
Domin 
ated 
Domin 
ated 
Area 
3
Disadvantage ! 
 Strategy 1 needs an additional phase 
 For removing redundant outputs 
 4 elimination methods 
 Laisser-faire 
 Propagate 
 Merge 
 Fine-grained Partitioning 
 They works 
 Problem: sub-optimal
Strategy 2 
Branch & Bound Skyline Algorithm
Idea! 
 Similar to previous NN Query 
 Branch & Bound Skyline (BBS) 
http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
h 
example 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
L1E1 L1E2 
Queue 
L1E2, 4 L1E1, 10 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
9 
L1E2, 4 L1E2 
Queue 
L2E2, 5 
3 5 7 
L1E1, 10 
L2E3, 7 L2E4, 8 
2 
1 
1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10 
c, 12 h, 7 i, 5 
Result
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12 
Result 
L2E3, 7
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
h, 7 L2E4, 8 L1E1, 10 c, 12 
i, 5 
Result 
L2E3, 7
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
L2E4, 8 L1E1, 10 c, 12 
i, 5 
Result 
k, 10 f n i
example 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Root 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E1 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L1E2 
Ptr 1 Ptr 2 Ptr 3 Ptr 4 
L2E1 
a b c null 
L2E2 
c h i null 
L2E3 
d g m null 
L2E4 
f k l n 
Queue 
3 5 7 
9 
2 
1 
1 
i, 5 
Result 
a, 10 k, 10
Analysis 
Strategy 1
Analysis 
of Strategy 1 
 Notation 
Variable Description 
s #of Skyline obj 
e Empty Query 
ne Non-empty Query 
r Redendent Query 
d d-dimension 
h Height of the given R-Tree 
Recursion Tree 
… 
d new 
recursive NN 
… … 
 푛푒 = 푠 + 푟 
 푒 = 푛푒 ∙ 푑 − 1 + 1, 푠푖푛푐푒 푛푒 + 푒 = 푛푒 ∙ 푑 + 1(푟표표푡) 
 푒 = 푠 + 푟 푑 − 1 + 1 
 푁퐴푁푁 ≥ 푒 + 푠 + 푟 ∗ ℎ = 푠 + 푟 푑 − 1 + 1 + 푠 + 푟 ℎ > 푠 ∙ ℎ ∙ 푑
Analysis 
Strategy 2
Analysis 
of Strategy 2 
(brief version) 
 Notation 
Variable Description 
s #of Skyline obj 
h Height of the given R-Tree 
 푠 ∙ ℎ ≥ 푁퐴퐵퐵푆 
 푁퐴푁푁 > 푠 ∙ ℎ ∙ 푑 > 푁퐴퐵퐵푆
Is it the optimal solution? 
BBS Algorithm
Proof 1. 
Termination 
& 
Correctness 
 Lemma 1. BBS visits entries in ascending order 
 Of their distance to the ‘ideal point’ 
 Lemma 2. Any data point added into Result_Set 
 Is guaranteed to be a final skyline point 
 Proof. 
 Suppose not then 푝푗 was added into Result_Set but not a final skyline point 
 Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj) 
 However, observe that 푝∗ must be visited before 푝푗 by lemma 1. 
 Contradiction: 푝푗 should have been pruned, which contradicts the assumption. 
 Lemma 3. All data point will be examined, unless one of its ancestor 
nodes has been pruned.
Lemmas for the theorem 
Lemma 4. Any skyline algorithm 
based on R-Tree must access all the 
nodes whose mbrs intersects the SSR 
 Lemma 5. If an entry e doesn’t 
intersect the SSR 
 Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ < 
퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛) 
 Theorem: The # of node accesses 
performed by BBS is OPTIMAL 
Dominating Area(B) 
A 
B 
F 
C 
D 
E 
x axis 
y axis 
G 
SSR
Proof of the theorem 
 Proof 1. BBS only accesses nodes that 
may contain skyline points. 
 That is, BBS only accesses nodes 
whose mbrs intersect the SSR 
 Suppose not 
 Node e that doesn’t intersect the SSR 
 ∃푝∗ by lemma 5 
 Contradicts, by lemma 1 
 Proof 2. BBS visits nodes at most 
once. (trivial) 
Dominating Area(B) 
A 
B 
F 
C 
D 
E 
x axis 
y axis 
G 
SSR
To quantify 
the actual cost 
A  Skip the details  
B 
C 
Dominating Area(B) 
D 
E 
F 
x axis 
y axis 
G 
SSR
Experimental Evaluation
Experimental Evaluation
Dimensionality
Cardinality 
 3d dataset
Progressive behavior 
 N=1M, d=3
Constrained 
skyline queries 
 N=1M, d=3 
h 
a 
x axis 
y axis 
b 
c 
d 
e 
f 
g 
i m 
n 
k 
l 
IDEAL 
L1E2 
L1E1 
L2E4 
L2E2 
L2E3 
L2E1 
Constrain

An optimal and progressive algorithm for skyline queries slide

  • 1.
    INI Lab. AnOptimal and Progressive Algorithm for Skyline Queries Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger ACM SIGMOD’ 2003 Presenters KYEONG SEOK HYUN, WOO-SUNG CHOI, JA-YEON KIM,
  • 2.
    Abstract  AnOptimal  and Progressive Algorithm  for Skyline Queries  Using R-Tree
  • 3.
    contents  1.Introduction  2. Related Work  2.1 Block Nested Loop (BNL)  2.5 Nearest Neighbor (NN)  3. Branch and Bound Skyline Algorithm  With I/O analysis  5. Experimental Evaluation
  • 4.
  • 5.
    Which one doyou prefer? http://emperia.egloos.com/m/2516211 http://drmoontv.blogspot.kr/2013/03/blog-post_17.html http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 5,000 Won 40,000 Won 4,500 Won http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting 혜자 >> 창렬
  • 6.
    preliminaries Formal definitionof Dominates (≪)  Given a set of d-dimensional points 푇 We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇  If and only if  ∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖]  ∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗]  and Denoted by t2 ≪ t1  (simply saying, t1 이 이득) Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf Note that the meaning of ‘dominates’ may differ according to type of application
  • 7.
    Which one doyou prefer? http://emperia.egloos.com/m/2516211 http://drmoontv.blogspot.kr/2013/03/blog-post_17.html http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html 5,000 Won 40,000 Won 4,500 Won 4,500 Won http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting Still 혜자 >> 창렬
  • 8.
     Hotel(attraction, 1/price,1/distance)  Two Hotel  A : `80`, `1/15,000`, `1/500m`  B : `30`, `1/20,000`, `1/1500m`  퐵 ≪ 퐴  Why?  30<80  1/20,000 < 1/15,000  1/1,500m < 1/500m A 1/price attraction B Dominates! ≪ B A for example,
  • 9.
    Very important ProblemDefinition (mathematical)  The Skyline operator  Input - Given a set of objects P = {푝1, 푝2, … , 푝푁}  Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗} A B C Dominating Area(B) D E F “퐵 ∈ 푂푢푝푢푡, s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct x axis y axis G Common misconceptions “퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong
  • 10.
    Naïve approach forprocessing skyline queries
  • 11.
    Exhaustive Test Suppose there are n objects in the given set  퐷푥 = {표1, 표2, … , 표푛}  Algorithm -Naïve 1  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷  푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷  푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;  푒푙푠푒  푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;  break;  푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} A B F C D G E
  • 12.
     Suppose thereare n objects in the given set  퐷푥 = {표1, 표2, … , 표푛}  Algorithm -Naïve 1  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷  푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒  푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷  푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;  푒푙푠푒  푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;  break;  푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥} Exhaustive Test Nested Loop Structure Modification: (Algorithm -Naïve 2) Idea 1. Use Nested Loop Structure Idea 2. Take advantage of ‘Block-transfer’ towards better re-usability! Block A Block B A B C D E F G          The Inherited Limitation of these approaches 1. It needs full-scan over the data 2. Though, query result contains only a small fraction of the dataset 3. That is, these approaches are wasteful
  • 13.
    R-Tree Index Approach for processing skyline queries
  • 14.
    Preliminaries R-Tree Nearest Neighbor Query
  • 15.
    Preliminaries R-Tree: Balancedtree for indexing multi-dimensional object  Support Dynamic operation (insert, update, delete) R-Tree Index Approach
  • 16.
    R-Tree VS B-Tree  B+-Tree  Balanced  Requiring that all leaves be at the same depth  Leaf nodes contain one dimensional value R-Tree  Similar to B+-Tree  Leaf nodes contain d-dimensional value R-Tree Index Approach http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
  • 17.
    Spatial objects (ord-dimensional objects or geometric objects)  d-dimensional object?  R-Tree Used for the Organization of a set of d-dimensional objects  How?  Main Idea  Minimum Bounding Rectangles (MBRs) <Objects in 2-dimension space> http://caversham.otago.ac.nz/research/geog.php
  • 18.
    Quiz What isthe minimum number of points for representing a rectangle?  Assumption: each rectangle is parallel to the coordinate axes 18 6 8 7 4 x y 0 R-Tree Index Approach
  • 19.
  • 20.
    Nearest Neighbor (NN) Query Processing using R-Tree Nearest Neighbor Query  Input  Given a set of objects P = {푝1, 푝2, … , 푝푁}  Query Point - q  Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)} 0 x y See how it works in appendix R-Tree Index Approach
  • 21.
    Root node 01 MINMAXDIST(X,1) 0 x y MINDIST(X, 0) MINDIST(X,1) MINMAXDIST(X, 0) Key IDEA!  Pruning! http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
  • 22.
  • 23.
    Back to theoriginal question Skyline with R-Tree
  • 24.
    R-Tree Index Approach  Let’s process skyline objects using R-Tree  Strategy 1 – Use traditional tech. (i.e. NN Query)  Strategy 2 – This paper  Strategy 1  Partition the data using NN Query recursively  Distance metric: 퐿1 푛표푟푚  First NN Query -> start from the ideal point (i.e. zero point)
  • 25.
  • 26.
    Dominating Area(i) example a x axis y axis b c d e f g i m n k i IDEAL
  • 27.
    To-do Area 1 To-do Area 2 example a x axis y axis b i k IDEAL Dominating Area(i) TO-DO Area 2 TO-DO Area 1
  • 28.
    Next, test thesearea (only to find nothing) To--do Arrea 2 To-do Area 1 example a x axis y axis b i k Dominating Area(i) TO-DO Area 2 TO-DO Area 1 Dominating Area(k) IDEAL ` `
  • 29.
    To-do Area 1 example x axis i k Dominating Area(i) TO-DO Area 1 Dominating Area(k) a To-do Area 1 y axis b IDEAL Dominating Area(a)
  • 30.
    Dominating Area(k) Result    Dominating Area(i) IDEAL Dominating Area(a) x axis y axis i k a
  • 31.
    Limitation of Strategy1  Generally speaking,  In a d-dimensional space,  Each skyline object discovered causes d recursive partitioning phase Dominated
  • 32.
    Limitation of Strategy1  Generally speaking,  In a d-dimensional space,  Each skyline object discovered causes d recursive partitioning phase Area 1 Dominated Area 2 Dominated Dominated Area 3
  • 33.
    What if? In general, for d>2  The overlapping of the partitions  Necessitates DUPLICATE ELIMINATION Area 1 Domin ated Area 2 Domin ated Domin ated Area 3
  • 34.
    Disadvantage ! Strategy 1 needs an additional phase  For removing redundant outputs  4 elimination methods  Laisser-faire  Propagate  Merge  Fine-grained Partitioning  They works  Problem: sub-optimal
  • 35.
    Strategy 2 Branch& Bound Skyline Algorithm
  • 36.
    Idea!  Similarto previous NN Query  Branch & Bound Skyline (BBS) http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
  • 37.
    h example a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n
  • 38.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 L1E1 L1E2 Queue L1E2, 4 L1E1, 10 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Result
  • 39.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 9 L1E2, 4 L1E2 Queue L2E2, 5 3 5 7 L1E1, 10 L2E3, 7 L2E4, 8 2 1 1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Result
  • 40.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10 c, 12 h, 7 i, 5 Result
  • 41.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12 Result L2E3, 7
  • 42.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 h, 7 L2E4, 8 L1E1, 10 c, 12 i, 5 Result L2E3, 7
  • 43.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 L2E4, 8 L1E1, 10 c, 12 i, 5 Result k, 10 f n i
  • 44.
    example h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Root Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E1 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L1E2 Ptr 1 Ptr 2 Ptr 3 Ptr 4 L2E1 a b c null L2E2 c h i null L2E3 d g m null L2E4 f k l n Queue 3 5 7 9 2 1 1 i, 5 Result a, 10 k, 10
  • 45.
  • 46.
    Analysis of Strategy1  Notation Variable Description s #of Skyline obj e Empty Query ne Non-empty Query r Redendent Query d d-dimension h Height of the given R-Tree Recursion Tree … d new recursive NN … …  푛푒 = 푠 + 푟  푒 = 푛푒 ∙ 푑 − 1 + 1, 푠푖푛푐푒 푛푒 + 푒 = 푛푒 ∙ 푑 + 1(푟표표푡)  푒 = 푠 + 푟 푑 − 1 + 1  푁퐴푁푁 ≥ 푒 + 푠 + 푟 ∗ ℎ = 푠 + 푟 푑 − 1 + 1 + 푠 + 푟 ℎ > 푠 ∙ ℎ ∙ 푑
  • 47.
  • 48.
    Analysis of Strategy2 (brief version)  Notation Variable Description s #of Skyline obj h Height of the given R-Tree  푠 ∙ ℎ ≥ 푁퐴퐵퐵푆  푁퐴푁푁 > 푠 ∙ ℎ ∙ 푑 > 푁퐴퐵퐵푆
  • 49.
    Is it theoptimal solution? BBS Algorithm
  • 50.
    Proof 1. Termination & Correctness  Lemma 1. BBS visits entries in ascending order  Of their distance to the ‘ideal point’  Lemma 2. Any data point added into Result_Set  Is guaranteed to be a final skyline point  Proof.  Suppose not then 푝푗 was added into Result_Set but not a final skyline point  Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj)  However, observe that 푝∗ must be visited before 푝푗 by lemma 1.  Contradiction: 푝푗 should have been pruned, which contradicts the assumption.  Lemma 3. All data point will be examined, unless one of its ancestor nodes has been pruned.
  • 51.
    Lemmas for thetheorem Lemma 4. Any skyline algorithm based on R-Tree must access all the nodes whose mbrs intersects the SSR  Lemma 5. If an entry e doesn’t intersect the SSR  Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ < 퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛)  Theorem: The # of node accesses performed by BBS is OPTIMAL Dominating Area(B) A B F C D E x axis y axis G SSR
  • 52.
    Proof of thetheorem  Proof 1. BBS only accesses nodes that may contain skyline points.  That is, BBS only accesses nodes whose mbrs intersect the SSR  Suppose not  Node e that doesn’t intersect the SSR  ∃푝∗ by lemma 5  Contradicts, by lemma 1  Proof 2. BBS visits nodes at most once. (trivial) Dominating Area(B) A B F C D E x axis y axis G SSR
  • 53.
    To quantify theactual cost A  Skip the details  B C Dominating Area(B) D E F x axis y axis G SSR
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    Constrained skyline queries  N=1M, d=3 h a x axis y axis b c d e f g i m n k l IDEAL L1E2 L1E1 L2E4 L2E2 L2E3 L2E1 Constrain