Spatial databases are used to store geographic information. Querying on such databases are : range queries, nearest neighbor queries and spatial joins. Many indexing techniques are used for faster retrieval of data out of which r-trees are mainly efficient. Other indexing techniques are quad-trees, grid files etc. Spatial data is used in GIS applications.
2. ๏ฝ What is spatial data?
๏ฝ Types of spatial data
๏ฝ Types of queries
๏ฝ Applications
๏ฝ Indexing Techniques
๏ฝ Comparison of Indexing techniques
๏ฝ GiST
๏ฝ Indexing High-dimensional data
๏ฝ Conclusion
3. โข Spatial data represent the location ,size
and shape of an object on earth
โข Ex. Building, lake
5. ๏ฝ Point Data:
๏ฝ Simplest form of representing spatial data
๏ฝ No space and has no associated area or volume
๏ฝ Consists of collection of points
๏ฝ Ex. Raster data
6. ๏ฝ Region data:
๏ฝ Has spatial extend with location and boundaries
๏ฝ Represented using of points, line, polygons
๏ฝ Ex. roads, rivers: line data
7.
8. 1) Spatial range queries:
Related with region data
Ex. โFind all cities within 50 miles of Puneโ
2)Nearest Neighbor queries:
Related with point data
Ex. โFind 10 cities nearest to Puneโ
In ordered cities
Use in multimedia database
9. 3) Spatial Join Queries:
- use both point and region data.
- Ex. โFind pairs of cities within 200 miles of each other
AND โ Find all cities near a lakeโ
- More complex
- Expensive to evaluate
10. 1) Geographic Information System(GIS)
Ex. MAP
2) Computer Aided Design/Manufacturing(CAD/CAM)
Ex. Surface of design object
Range and Spatial join queries used
3) Multimedia Database
video, audio, image, text also required spatial data
Nearest neighbor queries and point data
11. Point Data: Grid files, แธฅE trees, Kdtrees, point quad
trees
Region data: Quad trees, R trees, SKD trees,
-Yet no best indexing technique
- R trees are commonly used :
due to simplicity, ability to handle both data
performance to complex queries
12. Three main indexing techniques :
๏ฝ Region Quad-Trees and Z-Ordering โ handle
both point and region data
๏ฝ Grid Files โ only point data
๏ฝ R-Trees โ handle both point and region data
13. ๏ฝ Z-ordering gives us a way to group points
according to spatial proximity.
๏ฝ Consider X-01 and Y-11
๏ฝ Z-value is 0111 by interleaving
X and Y values.
This gives us the value for the
point 7.
Space filling curves
14. The Region Quad tree structure corresponds directly to the
recursive decomposition of the data space.
Each node in the tree corresponds to a square-shaped region of
the data space.
15. ๏ฝ Grid files rely upon a grid directory to identify the
data page containing a desired point.
๏ฝ The Grid file partitions space into rectangular
regions using lines that are parallel to the axes.
๏ฝ If the X axis is cut into i segments and the Y
axis is cut into j segments, we have a total of i
x j partitions. The grid directory is an i by j
array with one entry per partition.
๏ฝ This description is maintained in an array
called a linear scale; there is one linear scale
per axis.
18. ๏ฝ Adaptation of B+ Tree
๏ฝ Height-balanced data structure
๏ฝ Search key values are referred to as Bounding
Boxes
๏ฝ A data entry consists of a pair (n-dimensional
box, Rid)
๏ฝ Rid โ object Identifier
๏ฝ N-dimensional box is the smallest box that
contains the object
20. ๏ฝ Search for Objects Overlapping Box Q
Start at root.
1. If current node is non-leaf, for each
entry <E, ptr>, if box E overlaps Q,
search subtree identified by ptr.
2. If current node is leaf, for each entry
<E, rid>, if E overlaps Q, rid identifies
an object that might overlap Q.
21. Insert Entry <B, ptr>
๏ฝ Start at root and go down to โbest-fitโ leaf L.
๏ฝ Go to child whose box needs least enlargement to
cover B; resolve ties by going to smallest area child.
๏ฝ If best-fit leaf L has space, insert entry and stop.
Otherwise, split L into L1 and L2.
๏ฝ Adjust entry for L in its parent so that the box now
covers (only) L1.
๏ฝ Add an entry (in the parent node of L) for L2. (This
could cause the parent node to recursively split.)
22. Region Quad
Trees
Grid Files(point
data)
R-Trees
Range Queries Easily handled Easily handled
for point data.
Handled by
calculating
bounding box
Nearest
Neighbour
Queries
Can be
handled.
Sometimes
tricky due to
long diagonal
jumps
Easily handled
for point data.
Handled well
by traversing
for the point
or region
Spatial Joins Can be
handled with
some
extension to
range queries
Easily handled
for point data.
Handled very
well
23. ๏ฝ The Generalized Search Tree (GiST) abstracts the
โtreeโ nature of a class of indexes including B+ trees
and R-tree variants.
๏ฝ Striking similarities in insert/delete/search and even
concurrency control algorithms make it possible to
provide โtemplatesโ for these algorithms that can be
customized to obtain the many different tree index
structures.
๏ฝ GiST provides an alternative for implementing other
tree indexes in an ORDBMS.
24. ๏ฝ Typically, high-dimensional datasets are collections
of points, not regions.
๏ฝ E.g., Feature vectors in multimedia applications.
๏ฝ Very sparse
๏ฝ Nearest neighbor queries are common.
๏ฝ R-tree becomes worse than sequential scan for most
datasets with more than a dozen dimensions.
๏ฝ As dimensionality increases contrast (ratio of
distances between nearest and farthest points)
usually decreases; โnearest neighborโ is not
meaningful.
25. ๏ฝ Spatial data management has many
applications, including GIS, CAD/CAM,
multimedia indexing, Point and region data
๏ฝ R-tree approach is widely used in GIS
systems
๏ฝ Used in spatial data mining approaches.
๏ฝ Popular SDBMS : MySQL(geometry datatype),
Neo4j, AllegroGraph, SpaceBase, CouchDB,
PostGreSQL, SpatialDB