A survey on massively Parallelism for indexing multidimensional datasets on the GPU
1. DYPLogo.jpeg
A Survey on Massively Parallelism for Indexing
Multidimensional Datasets on the GPU
Prof.Pramod Deshmukh Yogesh Lokare Ajay Katware Pankaj
Patil
1−3Department of Computer Science
D.Y Patil School of Engineering Academy
Savitribai Phule Pune University.
National Conference on Advance Computing(NCAC), 2015
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
2. DYPLogo.jpeg
Outline
1 Introduction
2 Literature Survey
3 Tree Traversal Algorithms
Parallel R-Trees
Massively Parallel Three Phase Scan
Massively Parallel Hilbert R-tree
Massively Parallel Restart Scanning
4 Conclusion
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
3. DYPLogo.jpeg
Introduction
GP-GPU
The General Purpose computing on Graphics Processing
Unit(GP-GPU) has emerged as a new cost effective parallel
computing paradigm in High Performance Computing research that
enables large amount of data to be processed in parallel. .
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
4. DYPLogo.jpeg
Introduction
GP-GPU
The General Purpose computing on Graphics Processing
Unit(GP-GPU) has emerged as a new cost effective parallel
computing paradigm in High Performance Computing research that
enables large amount of data to be processed in parallel. .
CUDA
CUDA(Computer Unified Device Architecture) allows programmers to
run parallel algorithm on GPU’s Stream Processor(SMP).
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
5. DYPLogo.jpeg
Introduction
Parallel Multi-Dimensional Indexing
Multi-dimensional indexing structures have played an important role
in accelerating the access to subsets of large datasets.
EX)R-tree,R*-tree.
Multidimensional Range Query
Retrieves data that overlaps given range of values.
Ex) SELECT temperature
FROM dataset
WHERE latitude BETWEEN 20 AND 30
AND longitude BETWEEN 50 AND 60
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
6. DYPLogo.jpeg
Introduction
Parallel Multi-Dimensional Indexing
Multi-dimensional indexing structures have played an important role
in accelerating the access to subsets of large datasets.
EX)R-tree,R*-tree.
Multidimensional Range Query
Retrieves data that overlaps given range of values.
Ex) SELECT temperature
FROM dataset
WHERE latitude BETWEEN 20 AND 30
AND longitude BETWEEN 50 AND 60
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
7. DYPLogo.jpeg
Literature survey
Author Name of Paper Publication Year Description
J. Kim Exploiting Massive Parallelism IEEE Transaction on MPRS
Won-KiJeong for Indexing Multi-dimensional Parallel and Aug-15 MPHR
and Beomseok Nam Dataset on GPU distributed systems
Jinwoong Kim Parallel Multi-Dimensional Journal on Breaded parallel and
Beomseok Nam range query processing Parallel and 2013 data parallel
with R-trees on GPU distrib. computing tree traversal
L.Luo,M.D.Wong Parallel implementation of 17th Asia South 2012 Parallel R-Trees,
and L.Leong R-Trees on the GPU pacific Des.Autom.Conf SIMD tech. in
efficient binary tree search
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
8. DYPLogo.jpeg
Agenda
1 Introduction
2 Literature Survey
3 Tree Traversal Algorithms
Parallel R-Trees
Massively Parallel Three Phase Scan
Massively Parallel Hilbert R-tree
Massively Parallel Restart Scanning
4 Conclusion
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
9. DYPLogo.jpeg
Tree Traversal Algorithms
Parallel R-trees
(Parallel Recursive Trees)
The R-tree is a data structure for organizing and querying
multidimensional non-uniform, overlapping data.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
10. DYPLogo.jpeg
Tree Traversal Algorithms
Parallel R-trees
(Parallel Recursive Trees)
The R-tree is a data structure for organizing and querying
multidimensional non-uniform, overlapping data.
Each R-tree node contains a certain number of index entries, each of
which consists of a minimum bounding rectangle (MBR) and pointer to
a spatial object if it is a leaf node, or a pointer to its child node if it is
a non-leaf node.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
11. DYPLogo.jpeg
Tree Traversal Algorithms
Parallel R-trees
(Parallel Recursive Trees)
The R-tree is a data structure for organizing and querying
multidimensional non-uniform, overlapping data.
Each R-tree node contains a certain number of index entries, each of
which consists of a minimum bounding rectangle (MBR) and pointer to
a spatial object if it is a leaf node, or a pointer to its child node if it is
a non-leaf node.
Drawback of R-trees: R-trees are not well suited for the GPU because
of their irregular memory access patterns and recursive backtracking
functions calls.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
12. DYPLogo.jpeg
Tree Traversal Algorithms
Parallel R-trees
(Parallel Recursive Trees)
The R-tree is a data structure for organizing and querying
multidimensional non-uniform, overlapping data.
Each R-tree node contains a certain number of index entries, each of
which consists of a minimum bounding rectangle (MBR) and pointer to
a spatial object if it is a leaf node, or a pointer to its child node if it is
a non-leaf node.
Drawback of R-trees: R-trees are not well suited for the GPU because
of their irregular memory access patterns and recursive backtracking
functions calls.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
13. DYPLogo.jpeg
Agenda
1 Introduction
2 Literature Survey
3 Tree Traversal Algorithms
Parallel R-Trees
Massively Parallel Three Phase Scan
Massively Parallel Hilbert R-tree
Massively Parallel Restart Scanning
4 Conclusion
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
14. DYPLogo.jpeg
Tree Traversal Algorithms
MPTS
(Massively Parallel 3 Phase Scan)
Leftmost search: -choose the leftmost child node no matter how many
child nodes overlap.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
15. DYPLogo.jpeg
Tree Traversal Algorithms
MPTS
(Massively Parallel 3 Phase Scan)
Leftmost search: -choose the leftmost child node no matter how many
child nodes overlap.
Rightmost Search: -choose the rightmost child node no matter how
many child nodes overlap.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
16. DYPLogo.jpeg
Tree Traversal Algorithms
MPTS
(Massively Parallel 3 Phase Scan)
Leftmost search: -choose the leftmost child node no matter how many
child nodes overlap.
Rightmost Search: -choose the rightmost child node no matter how
many child nodes overlap.
Parallel Scanning: -In between two leaf nodes,perform massively
parallel scanning to filter out non-overlapping data elements.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
17. DYPLogo.jpeg
Tree Traversal Algorithms
MPTS
(Massively Parallel 3 Phase Scan)
Leftmost search: -choose the leftmost child node no matter how many
child nodes overlap.
Rightmost Search: -choose the rightmost child node no matter how
many child nodes overlap.
Parallel Scanning: -In between two leaf nodes,perform massively
parallel scanning to filter out non-overlapping data elements.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
18. DYPLogo.jpeg
Agenda
1 Introduction
2 Literature Survey
3 Tree Traversal Algorithms
Parallel R-Trees
Massively Parallel Three Phase Scan
Massively Parallel Hilbert R-tree
Massively Parallel Restart Scanning
4 Conclusion
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
19. DYPLogo.jpeg
Algorithm
MPHR
(Massively Parallel Hilbert R-trees)
Drawbacks Of MPTS
-MPTS reduces the number of leaf nodes to be accessed, but still it
accesses a large number of leaf nodes that do not have requested data.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
20. DYPLogo.jpeg
Algorithm
MPHR
(Massively Parallel Hilbert R-trees)
Drawbacks Of MPTS
-MPTS reduces the number of leaf nodes to be accessed, but still it
accesses a large number of leaf nodes that do not have requested data.
To avoid drawbacks of MPTS we designed a variant of R-trees that
work on the GPU without stack and does not access leaf nodes that do
not have requested data called MPHR-Tree.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
21. DYPLogo.jpeg
Algorithm
MPHR
(Massively Parallel Hilbert R-trees)
Drawbacks Of MPTS
-MPTS reduces the number of leaf nodes to be accessed, but still it
accesses a large number of leaf nodes that do not have requested data.
To avoid drawbacks of MPTS we designed a variant of R-trees that
work on the GPU without stack and does not access leaf nodes that do
not have requested data called MPHR-Tree.
Hilbert curve is well known for it spatial clustering property.
-Sort the data along with Hilbert curve.
-The gap between leftmost leaf node and the rightmost leaf node
would be reduced.
-The number of visited nodes would decrease.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
22. DYPLogo.jpeg
Agenda
1 Introduction
2 Literature Survey
3 Tree Traversal Algorithms
Parallel R-Trees
Massively Parallel Three Phase Scan
Massively Parallel Hilbert R-tree
Massively Parallel Restart Scanning
4 Conclusion
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
23. DYPLogo.jpeg
Algorithm
MPRS
(Massively Parallel Restart Scanning)
For a given range query, no matter how many MBRs of child nodes
overlap a given query range, our MPRS algorithm always selects the
leftmost overlapping child node unless all its leaf nodes have been
already visited.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
24. DYPLogo.jpeg
Algorithm
MPRS
(Massively Parallel Restart Scanning)
For a given range query, no matter how many MBRs of child nodes
overlap a given query range, our MPRS algorithm always selects the
leftmost overlapping child node unless all its leaf nodes have been
already visited.
Figure: MPRS with MPHR-tree structures
MPRS tree traversal algorithm:
(i) uses a large number of GPU threads to process a single query in aProf.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
25. DYPLogo.jpeg
Conclusion
This paper presents MPHR and MPRS algorithms improve the
utilization of GPU architecture for range query processing, and avoids
the irregular search path of transforming the tree traversal problem
into a sequence data processing problem.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
26. DYPLogo.jpeg
Conclusion
This paper presents MPHR and MPRS algorithms improve the
utilization of GPU architecture for range query processing, and avoids
the irregular search path of transforming the tree traversal problem
into a sequence data processing problem.
MPRS algorithm offers traversing hierarchical tree structures with
mostly contiguous memory access patterns without recursion.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
27. DYPLogo.jpeg
Conclusion
This paper presents MPHR and MPRS algorithms improve the
utilization of GPU architecture for range query processing, and avoids
the irregular search path of transforming the tree traversal problem
into a sequence data processing problem.
MPRS algorithm offers traversing hierarchical tree structures with
mostly contiguous memory access patterns without recursion.
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
28. DYPLogo.jpeg
For Further Reading I
Jinwoong Kim,Won-ki Jeong, and Beomseok Nam
Exploiting Massive Parallelism for Multi-Dimensional Datasets on the
GPU
IEEE Trans.Parallel Distrib.sys., Volume 26, Number 8, August-2015
Jinwoong,Beomseok Nam
Parallel Multi-Dimensional Range Query Processing
Jr.Parallel Distributed system,2013.
L.Luo. Martin D.F.Wong,and Lance Leong
Parallel Implementation of R-trees on the GPU
17th Asia and south Pacific Design Automation
Conference(ASP-DAC,2012
NVIDIA CUDA
[Online].Available:http//www.nvidia.com ,2014
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16
29. DYPLogo.jpeg
For Further Reading II
Antonim Guttman
R-trees:A dynamic index structure for spatial searching.
ACM SIGMOD International Conferences on Management of
Data,1984
Prof.Pramod Deshmukh, Yogesh Lokare, Ajay Katware, Pankaj Patil (NCAC, 2015)A Survey on Massively Parallelism for Indexing Multidimensional Datasets on the GPU
National Conference on Advance Computing(N
/ 16