The document presents an optimal and progressive algorithm for processing skyline queries using an R-tree index. It discusses two strategies - recursive nearest neighbor queries and a branch and bound skyline algorithm. The recursive NN query approach requires additional processing to eliminate duplicate results for higher dimensions, while the branch and bound skyline algorithm prunes non-skyline points during traversal to directly generate the skyline without duplicates. The algorithm processes the R-tree in a best-first manner by maintaining a priority queue of tree nodes ordered by their minimum possible skyline size.
Judul
Rancang Bangung Sistem Informasi Manajemen Aset Berbasis WEB Untuk Optimalisasi Penelurusan Aset di Teknik Industri UNDIP
Jurnal : Teknik Industri
Website http://ejournal.undip.ac.id/index.php/jgti/article/view/12946
Volume & Halaman : Vol. XI, No. 3, September 2016
Tahun : 2016
Penulis : Galih Setyo Pambudi, Sriyanto*), Ary Arvianto
Reviewer : Jastitah Nurpadmi
Tanggal Review : 26 September 2017
for more review about this journal, you can see at https://jastitahnurpadmi.blogspot.co.id/2017/09/review-jurnal_26.html
Thank you!
Kebutuhan Sentiment Analysis
Text Mining untuk Sentiment Analysis
Pengolahan kata Text Mining menggunakan Machine Learning
Studi Kasus Sentiment Analysis
Judul
Rancang Bangung Sistem Informasi Manajemen Aset Berbasis WEB Untuk Optimalisasi Penelurusan Aset di Teknik Industri UNDIP
Jurnal : Teknik Industri
Website http://ejournal.undip.ac.id/index.php/jgti/article/view/12946
Volume & Halaman : Vol. XI, No. 3, September 2016
Tahun : 2016
Penulis : Galih Setyo Pambudi, Sriyanto*), Ary Arvianto
Reviewer : Jastitah Nurpadmi
Tanggal Review : 26 September 2017
for more review about this journal, you can see at https://jastitahnurpadmi.blogspot.co.id/2017/09/review-jurnal_26.html
Thank you!
Kebutuhan Sentiment Analysis
Text Mining untuk Sentiment Analysis
Pengolahan kata Text Mining menggunakan Machine Learning
Studi Kasus Sentiment Analysis
This PPT contain detail information about data cleansing that is done in R language. PPT contains information about four stages that are performed for data cleansing and visualization of variables on chart. In presentation codes of data cleansing are given which are supported by through explanation. Through charts and boxes codes are explained. presentation also contain audio format so that listener can understand codes in better way. With logic are codes are discussed in detail in PPT. Thus, one who wants to enhance knowledge about data cleansing can learn a lot from relevant presentation.
Slide ini menjelaskan mengenai konsep dan langkah-langkah Algoritma Depth First Search (BFS) pada Graph.
Slide disusun oleh Achmad Solichin (http://achmatim.net)
Dsh data sensitive hashing for high dimensional k-nn searchWooSung Choi
Gao, Jinyang, et al. "Dsh: data sensitive hashing for high-dimensional k-nnsearch." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
Economic Development and Satellite ImagesPaul Raschky
Slides for my talk at the Soap Collective from April 2016, Melbourne.
The slides cover my work on
1. the use of nighttime satellite data to measure local economic development and political favoritism.
2. the use of wikileaks data to analyse the effect of media in asymmetric warfare in Afghanistan.
3. the big data project with internet data
This PPT contain detail information about data cleansing that is done in R language. PPT contains information about four stages that are performed for data cleansing and visualization of variables on chart. In presentation codes of data cleansing are given which are supported by through explanation. Through charts and boxes codes are explained. presentation also contain audio format so that listener can understand codes in better way. With logic are codes are discussed in detail in PPT. Thus, one who wants to enhance knowledge about data cleansing can learn a lot from relevant presentation.
Slide ini menjelaskan mengenai konsep dan langkah-langkah Algoritma Depth First Search (BFS) pada Graph.
Slide disusun oleh Achmad Solichin (http://achmatim.net)
Dsh data sensitive hashing for high dimensional k-nn searchWooSung Choi
Gao, Jinyang, et al. "Dsh: data sensitive hashing for high-dimensional k-nnsearch." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
Economic Development and Satellite ImagesPaul Raschky
Slides for my talk at the Soap Collective from April 2016, Melbourne.
The slides cover my work on
1. the use of nighttime satellite data to measure local economic development and political favoritism.
2. the use of wikileaks data to analyse the effect of media in asymmetric warfare in Afghanistan.
3. the big data project with internet data
Learning a nonlinear embedding by preserving class neibourhood structure 최종WooSung Choi
Salakhutdinov, Ruslan, and Geoffrey E. Hinton. "Learning a nonlinear embedding by preserving class neighbourhood structure." International Conference on Artificial Intelligence and Statistics. 2007.
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
The book "Probabilistic Data Structures and Algorithms in Big Data Applications" is now available at Amazon and from local bookstores. More details at https://pdsa.gakhov.com
In this presentation, I described popular algorithms that employed Locality Sensitive Hashing (LSH) to solve similarity-related problems. I started with LSH in general and then switched to such algorithms as MinHash (LSH for Jaccard similarity) and SimHash (LSH for cosine similarity). Each approach came with some math that was behind it and simple examples to clarify the theory statements.
new optimization algorithm for topology optimizationSeonho Park
authors devise new convex approximation called DQA which utilizes information of two consecutive points at iterates. Also, to guarantee global convergence, filter method is illustrated.
Presentation slides of Masters Thesis 'Multivalued Subsets Under Information Theory'. Application of Metaheuristic based search algorithms in an ID3 generated Decision Tree.
Spatially resolved pair correlation functions for point cloud dataTony Fast
Presentation on computing spatial correlation functions for point cloud materials science information. This presentation uses tree algorithms and Fourier methods to compute the statistics. The analysis is performed on Al-Cu interface information provided by John Gibbs and Peter Voorhees at Northwestern University as funded by the Mosaic of Microstructure MURI program.
Abstract : For many years, Machine Learning has focused on a key issue: the design of input features to solve prediction tasks. In this presentation, we show that many learning tasks from structured output prediction to zero-shot learning can benefit from an appropriate design of output features, broadening the scope of regression. As an illustration, I will briefly review different examples and recent results obtained in my team.
CrewScout is an expert-team finding system based on the concept of skyline teams and efficient algorithms for finding such teams. Given a set of experts, CrewScout finds all k-expert skyline teams, which are not dominated by any other k-expert teams. The dominance between teams is governed by comparing their aggregated expertise vectors. The need for finding expert teams prevails in applications such as question answering, crowdsourcing, panel selection, and project team formation. The new contributions of this paper include an end-to-end system with an interactive user interface that assists users in choosing teams and an demonstration of its application domains.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to An optimal and progressive algorithm for skyline queries slide (20)
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
An optimal and progressive algorithm for skyline queries slide
1. INI Lab.
An Optimal and Progressive
Algorithm for Skyline Queries
Dimitris Papadias, Yufei Tao, Greg Fu, Bernhard Seeger
ACM SIGMOD’ 2003
Presenters
KYEONG SEOK HYUN,
WOO-SUNG CHOI,
JA-YEON KIM,
2. Abstract
An Optimal
and Progressive Algorithm
for Skyline Queries
Using R-Tree
3. contents
1. Introduction
2. Related Work
2.1 Block Nested Loop (BNL)
2.5 Nearest Neighbor (NN)
3. Branch and Bound Skyline Algorithm
With I/O analysis
5. Experimental Evaluation
5. Which one do you prefer?
http://emperia.egloos.com/m/2516211
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
5,000 Won
40,000 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
혜자 >> 창렬
6. preliminaries
Formal definition of Dominates (≪)
Given a set of d-dimensional points 푇
We say that a point t1 ∈ 푇 DOMINATES another point t2 ∈ 푇
If and only if
∀푖 ∈ 1, 2, 3, … , 푑 , 푡1 푖 ≧ 푡2[푖]
∃푗 ∈ 1, 2, 3, … , 푑 , 푡1 푗 > 푡2[푗]
and Denoted by t2 ≪ t1
(simply saying, t1 이 이득)
Definition from http://www.comp.nus.edu.sg/~atung/publication/k_dominant.pdf
Note that
the meaning of ‘dominates’ may differ
according to type of application
7. Which one do you prefer?
http://emperia.egloos.com/m/2516211
http://drmoontv.blogspot.kr/2013/03/blog-post_17.html
http://www.huffingtonpost.kr/2014/11/13/story_n_6150254.html
5,000 Won
40,000 Won
4,500 Won
4,500 Won
http://flickrhivemind.net/User/Trollface%20T-Shirts/Interesting
Still 혜자 >> 창렬
8. Hotel(attraction, 1/price, 1/distance)
Two Hotel
A : `80`, `1/15,000`, `1/500m`
B : `30`, `1/20,000`, `1/1500m`
퐵 ≪ 퐴
Why?
30<80
1/20,000 < 1/15,000
1/1,500m < 1/500m
A
1/price
attraction
B
Dominates!
≪
B A
for example,
9. Very important
Problem Definition
(mathematical)
The Skyline operator
Input - Given a set of objects P = {푝1, 푝2, … , 푝푁}
Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 푝푖 ≪ 푝∗}
A
B
C
Dominating Area(B)
D
E
F
“퐵 ∈ 푂푢푝푢푡,
s푖푛푐푒 푛표 표푡ℎ푒푟 푝표푖푛푡 푃 ≫ 퐵”, correct
x axis
y axis
G
Common misconceptions
“퐵 ∈ 푂푢푝푢푡 s푖푛푐푒 퐵 ≫ 퐶 , D, F” , wrong
11. Exhaustive Test
Suppose there are n objects in the given set
퐷푥 = {표1, 표2, … , 표푛}
Algorithm -Naïve 1
푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷
푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒
푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷
푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;
푒푙푠푒
푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;
break;
푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥}
A
B
F
C
D
G
E
12. Suppose there are n objects in the given set
퐷푥 = {표1, 표2, … , 표푛}
Algorithm -Naïve 1
푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푥 ∈ 퐷
푏표표푙푒푎푛 푖푠퐷표푚푖푛푎푡푒푑 = 푓푎푙푠푒
푓표푟 푒푎푐ℎ 표푏푗푒푐푡 표푦 ∈ 퐷
푖푓 ¬(표푥 = 표푦) 퐴푁퐷 ¬ 표푥 ≪ 표푦 푡ℎ푒푛 푐표푛푡푖푛푢푒;
푒푙푠푒
푡ℎ푒푛 푖푠퐷표푚푖푛푎푡푒푑 = 푡푟푢푒;
break;
푖푓 ! 푖푠퐷표푚푖푛푎푡푒푑 푆 ∪ {표푥}
Exhaustive Test
Nested Loop Structure
Modification: (Algorithm -Naïve 2)
Idea 1. Use Nested Loop Structure
Idea 2. Take advantage of ‘Block-transfer’
towards better re-usability!
Block A
Block B
A
B
C
D
E
F
G
The Inherited Limitation of these approaches
1. It needs full-scan over the data
2. Though, query result contains
only a small fraction of the dataset
3. That is, these approaches are wasteful
15. Preliminaries
R-Tree: Balanced tree for indexing multi-dimensional object
Support Dynamic operation (insert, update, delete)
R-Tree Index
Approach
16. R-Tree
VS
B-Tree
B+-Tree
Balanced
Requiring that all leaves be at the
same depth
Leaf nodes contain one
dimensional value
R-Tree
Similar to B+-Tree
Leaf nodes contain d-dimensional
value
R-Tree Index
Approach
http://courses.cs.washington.edu/courses/cse444/09sp/hw/hw3/hw3.html
17. Spatial objects (or d-dimensional objects or geometric objects)
d-dimensional object?
R-Tree Used for the Organization of
a set of d-dimensional objects
How?
Main Idea
Minimum Bounding Rectangles (MBRs)
<Objects in 2-dimension space>
http://caversham.otago.ac.nz/research/geog.php
18. Quiz
What is the minimum number of points for representing
a rectangle?
Assumption: each rectangle is parallel to the coordinate axes
18
6 8
7
4
x
y
0
R-Tree Index
Approach
20. Nearest Neighbor (NN)
Query Processing
using R-Tree
Nearest Neighbor Query
Input
Given a set of objects P = {푝1, 푝2, … , 푝푁}
Query Point - q
Output – {푝푖 | 푝푖 ∈ 푃 푎푛푑 ∄ 푝∗ ∈ 푃 푠. 푡. 퐿푝 푝푖 , 푞 > 퐿푝(푝∗, 푞)}
0 x
y
See how it works in appendix
R-Tree Index
Approach
21. Root node 0 1
MINMAXDIST(X,1)
0 x
y
MINDIST(X, 0)
MINDIST(X,1)
MINMAXDIST(X, 0)
Key IDEA!
Pruning!
http://ko.aliexpress.com/store/category/pruning-tools/519349_100005637.html
http://www.installitdirect.com/blog/easy-tips-for-pruning-your-plants/
23. Back to the original question
Skyline with R-Tree
24. R-Tree Index Approach
Let’s process skyline objects using R-Tree
Strategy 1 – Use traditional tech. (i.e. NN Query)
Strategy 2 – This paper
Strategy 1
Partition the data using NN Query recursively
Distance metric: 퐿1 푛표푟푚
First NN Query -> start from the ideal point (i.e. zero point)
27. To-do Area 1
To-do Area 2
example
a
x axis
y axis
b
i
k
IDEAL
Dominating Area(i)
TO-DO Area 2
TO-DO Area 1
28. Next, test these area
(only to find nothing)
To--do Arrea 2
To-do Area 1
example
a
x axis
y axis
b
i
k
Dominating Area(i)
TO-DO Area 2
TO-DO Area 1
Dominating Area(k)
IDEAL
`
`
29. To-do Area 1
example
x axis
i
k
Dominating Area(i)
TO-DO Area 1
Dominating Area(k)
a
To-do Area 1
y axis
b
IDEAL
Dominating
Area(a)
31. Limitation
of Strategy 1
Generally speaking,
In a d-dimensional space,
Each skyline object discovered causes d recursive partitioning phase
Dominated
32. Limitation
of Strategy 1
Generally speaking,
In a d-dimensional space,
Each skyline object discovered causes d recursive partitioning phase
Area 1
Dominated
Area 2
Dominated
Dominated
Area 3
33. What if?
In general, for d>2
The overlapping of the partitions
Necessitates DUPLICATE ELIMINATION
Area
1
Domin
ated Area
2
Domin
ated
Domin
ated
Area
3
34. Disadvantage !
Strategy 1 needs an additional phase
For removing redundant outputs
4 elimination methods
Laisser-faire
Propagate
Merge
Fine-grained Partitioning
They works
Problem: sub-optimal
36. Idea!
Similar to previous NN Query
Branch & Bound Skyline (BBS)
http://greatleadersserve.org/leadership/big-idea-great-leaders-serve/
37. h
example
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1 Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
38. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
L1E1 L1E2
Queue
L1E2, 4 L1E1, 10
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Result
39. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
9
L1E2, 4 L1E2
Queue
L2E2, 5
3 5 7
L1E1, 10
L2E3, 7 L2E4, 8
2
1
1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Result
40. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
L2E2, 5 L2E3, 7 L2E4, 8 L1E1, 10
c, 12 h, 7 i, 5
Result
41. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
i, 5 h, 7 L2E4, 8 L1E1, 10 c, 12
Result
L2E3, 7
42. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
h, 7 L2E4, 8 L1E1, 10 c, 12
i, 5
Result
L2E3, 7
43. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
L2E4, 8 L1E1, 10 c, 12
i, 5
Result
k, 10 f n i
44. example
h
a
x axis
y axis
b
c
d
e
f
g
i m
n
k
l
IDEAL
L1E2
L1E1
L2E4
L2E2
L2E3
L2E1
Root
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E1
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L1E2
Ptr 1 Ptr 2 Ptr 3 Ptr 4
L2E1
a b c null
L2E2
c h i null
L2E3
d g m null
L2E4
f k l n
Queue
3 5 7
9
2
1
1
i, 5
Result
a, 10 k, 10
50. Proof 1.
Termination
&
Correctness
Lemma 1. BBS visits entries in ascending order
Of their distance to the ‘ideal point’
Lemma 2. Any data point added into Result_Set
Is guaranteed to be a final skyline point
Proof.
Suppose not then 푝푗 was added into Result_Set but not a final skyline point
Then, ∃ 푝∗ ∈ 퐷퐵 푠. 푡, 푝∗ ≫ 푝푗 , which means L1 ideal, p∗ < L1(ideal, pj)
However, observe that 푝∗ must be visited before 푝푗 by lemma 1.
Contradiction: 푝푗 should have been pruned, which contradicts the assumption.
Lemma 3. All data point will be examined, unless one of its ancestor
nodes has been pruned.
51. Lemmas for the theorem
Lemma 4. Any skyline algorithm
based on R-Tree must access all the
nodes whose mbrs intersects the SSR
Lemma 5. If an entry e doesn’t
intersect the SSR
Then ∃푝∗ 푠. 푡. 퐿1 푖푑푒푎푙, 푝∗ <
퐿1(푖푑푒푎푙, 푒. 푙푒푓푡푑표푤푛)
Theorem: The # of node accesses
performed by BBS is OPTIMAL
Dominating Area(B)
A
B
F
C
D
E
x axis
y axis
G
SSR
52. Proof of the theorem
Proof 1. BBS only accesses nodes that
may contain skyline points.
That is, BBS only accesses nodes
whose mbrs intersect the SSR
Suppose not
Node e that doesn’t intersect the SSR
∃푝∗ by lemma 5
Contradicts, by lemma 1
Proof 2. BBS visits nodes at most
once. (trivial)
Dominating Area(B)
A
B
F
C
D
E
x axis
y axis
G
SSR
53. To quantify
the actual cost
A Skip the details
B
C
Dominating Area(B)
D
E
F
x axis
y axis
G
SSR