Spatial Mining

1,325 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,325
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
33
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Spatial Mining

  1. 1. Εξόρυξη Χωρικών Δεδομένων Βασίλειος Μεγαλοοικονόμου, Χρήστος Μακρής (βασισμένο σε σημειώσεις της Μ. Dunham )
  2. 2. Spatial Mining Outline <ul><li>Goal: Provide an introduction to some spatial mining techniques. </li></ul><ul><li>Introduction </li></ul><ul><li>Spatial Data Overview </li></ul><ul><li>Spatial Data Mining Primitives </li></ul><ul><li>Generalization/Specialization </li></ul><ul><li>Spatial Rules </li></ul><ul><li>Spatial Classification </li></ul><ul><li>Spatial Clustering </li></ul>
  3. 3. Χωρικό Αντικείμενο ( Spatial Object ) <ul><li>Contains both spatial and nonspatial attributes. </li></ul><ul><li>Must have location type attributes: </li></ul><ul><ul><li>Latitude/longitude </li></ul></ul><ul><ul><li>Zip code </li></ul></ul><ul><ul><li>Street address </li></ul></ul><ul><li>May retrieve object using either (or both) spatial or nonspatial attributes. </li></ul>
  4. 4. Εφορμογές Εξόρυξης Χωρικών Δεδομένων <ul><li>Geology </li></ul><ul><li>GIS Systems </li></ul><ul><li>Environmental Science </li></ul><ul><li>Agriculture </li></ul><ul><li>Medicine </li></ul><ul><li>Robotics </li></ul><ul><li>May involved both spatial and temporal aspects </li></ul>
  5. 5. Χωρικά ( Spatial ) Queries <ul><li>Spatial selection may involve specialized selection comparison operations: </li></ul><ul><ul><li>Near </li></ul></ul><ul><ul><li>North, South, East, West </li></ul></ul><ul><ul><li>Contained in </li></ul></ul><ul><ul><li>Overlap/intersect </li></ul></ul><ul><li>Region (Range) Query – find objects that intersect a given region. </li></ul><ul><li>Nearest Neighbor Query – find object close to identified object. </li></ul><ul><li>Distance Scan – find object within a certain distance of an identified object where distance is made increasingly larger. </li></ul>
  6. 6. Χωρικές Δομές Δεδομένων <ul><li>Data structures designed specifically to store or index spatial data. </li></ul><ul><li>Often based on B-tree or Binary Search Tree </li></ul><ul><li>Cluster data on disk based on geographic location. </li></ul><ul><li>May represent complex spatial structure by placing the spatial object in a containing structure of a specific geographic shape. </li></ul><ul><li>Techniques: </li></ul><ul><ul><li>Quad Tree </li></ul></ul><ul><ul><li>R-Tree </li></ul></ul><ul><ul><li>k-D Tree </li></ul></ul>
  7. 7. MBR <ul><li>Minimum Bounding Rectangle </li></ul><ul><li>Smallest rectangle that completely contains the object </li></ul>
  8. 8. Παραδείγματα MBR
  9. 9. Quad Tree <ul><li>Hierarchical decomposition of the space into quadrants (MBRs) </li></ul><ul><li>Each level in the tree represents the object as the set of quadrants which contain any portion of the object. </li></ul><ul><li>Each level is a more exact representation of the object. </li></ul><ul><li>The number of levels is determined by the degree of accuracy desired. </li></ul>
  10. 10. Παράδειγμα Quad Tree
  11. 11. R-Tree <ul><li>As with Quad Tree the region is divided into successively smaller rectangles (MBRs). </li></ul><ul><li>Rectangles need not be of the same size or number at each level. </li></ul><ul><li>Rectangles may actually overlap. </li></ul><ul><li>Lowest level cell has only one object. </li></ul><ul><li>Tree maintenance algorithms similar to those for B-trees. </li></ul>
  12. 12. Παράδειγμα R-Tree
  13. 13. K-D Tree <ul><li>Designed for multi-attribute data, not necessarily spatial </li></ul><ul><li>Variation of binary search tree </li></ul><ul><li>Each level is used to index one of the dimensions of the spatial object. </li></ul><ul><li>Lowest level cell has only one object </li></ul><ul><li>Divisions not based on MBRs but successive divisions of the dimension range. </li></ul>
  14. 14. Παράδειγμα k-D Tree
  15. 15. Τοπολογικές Συσχετίσεις <ul><li>Spatial region: </li></ul><ul><li>Disjoint </li></ul><ul><li>Overlaps or Intersects </li></ul><ul><li>Equals </li></ul><ul><li>Covered by or inside or contained in </li></ul><ul><li>Covers or contains </li></ul>
  16. 16. Distance Between Objects <ul><li>Euclidean </li></ul><ul><li>Manhattan </li></ul><ul><li>Extensions: </li></ul>
  17. 17. Προοδευτική Βελτίωση (Progressive Refinement) <ul><li>Make approximate answers prior to more accurate ones. </li></ul><ul><li>Filter out data not part of answer </li></ul><ul><li>Hierarchical view of data based on spatial relationships </li></ul><ul><li>Coarse predicate recursively refined </li></ul>
  18. 18. Χωρική Ιεραρχία : Progressive Refinement – Προοδευτική Βελτίωση
  19. 19. Spatial Data Dominant Algorithm – Γενίκευση Χωρικής Τάξης
  20. 20. STING <ul><li>STatistical Information Grid-based </li></ul><ul><li>Hierarchical technique to divide area into rectangular cells </li></ul><ul><li>Grid data structure contains summary information about each cell </li></ul><ul><li>Hierarchical clustering technique </li></ul><ul><li>Similar to quad tree </li></ul>
  21. 21. STING
  22. 22. STING Build Algorithm
  23. 23. STING Algorithm
  24. 24. Spatial Rules <ul><li>Characteristic Rule </li></ul><ul><li>The average family income in Dallas is $50,000. </li></ul><ul><li>Discriminant Rule </li></ul><ul><li>The average family income in Dallas is $50,000, while in Plano the average income is $75,000. </li></ul><ul><li>Association Rule </li></ul><ul><li>The average family income in Dallas for families living near White Rock Lake is $100,000. </li></ul>
  25. 25. Spatial Association Rules <ul><li>Either antecedent or consequent must contain spatial predicates. </li></ul><ul><li>View underlying database as set of spatial objects. </li></ul><ul><li>May create using a type of progressive refinement </li></ul>
  26. 26. Spatial Association Rule Algorithm
  27. 27. Spatial Classification <ul><li>Partition spatial objects </li></ul><ul><li>May use nonspatial attributes and/or spatial attributes </li></ul><ul><li>Generalization and progressive refinement may be used. </li></ul>
  28. 28. ID3 Extension <ul><li>Neighborhood Graph (Γράφοι γειτνίασης) </li></ul><ul><ul><li>Nodes – objects </li></ul></ul><ul><ul><li>Edges – connect neighbors </li></ul></ul><ul><li>Definition of neighborhood varies (απόσταση μικρότερη κάποιου κατωφλίου, ικανοποίηση μιας τοπολογικής σχέσης μεταξύ των αντικειμένων, κ.α) </li></ul><ul><li>ID3 considers nonspatial attributes of all objects in a neighborhood (not just one) for classification. </li></ul>
  29. 29. Spatial Decision Tree <ul><li>Approach similar to that used for spatial association rules. </li></ul><ul><li>Spatial objects can be described based on objects close to them – Buffer (ενδιάμεση ζώνη) . </li></ul><ul><li>Description of class based on aggregation of nearby objects. </li></ul>
  30. 30. Spatial Decision Tree Algorithm
  31. 31. Spatial Clustering <ul><li>Detect clusters of irregular shapes </li></ul><ul><li>Use of centroids and simple distance approaches may not work well. </li></ul><ul><li>Clusters should be independent of order of input. </li></ul>
  32. 32. Spatial Clustering
  33. 33. CLARANS Extensions <ul><li>Remove main memory assumption of CLARANS. </li></ul><ul><li>Use spatial index techniques. </li></ul><ul><li>Use sampling by employing R*-trees to identify central objects. </li></ul><ul><li>Change cost calculations by reducing the number of objects examined </li></ul><ul><ul><li>Αντί να εξετάζεται όλη η βάση, εξετάζονται μόνο τα αντικείμενα στις συστάδες που επηρεάζονται κατά την αλλαγή ενός medoid. </li></ul></ul><ul><li>Η ανάκτηση των αντικειμένων σε μια δοθείσα συστάδα βασίζεται στην κατασκευή ενός διαγράμματος Voronoi </li></ul>
  34. 34. Voronoi
  35. 35. SD(CLARANS) <ul><li>Spatial Dominant ( SD ) </li></ul><ul><li>Πρώτα συσταδοποιεί τις χωρικές συνιστώσες και έπειτα εξετάζει τα μη χωρικά γνωρίσματα εντός κάθε συστάδας για να εξάγει την περιγραφή της </li></ul><ul><li>First clusters spatial components using CLARANS </li></ul><ul><li>Then iteratively replaces medoids, but limits number of pairs to be searched. </li></ul><ul><li>Uses generalization </li></ul><ul><li>Uses a learning to derive description of cluster. </li></ul>
  36. 36. SD(CLARANS) Algorithm
  37. 37. DBCLASD <ul><li>Distribution Based Clustering of LArge Spatial Databases </li></ul><ul><li>Extension of DBSCAN </li></ul><ul><li>Assumes items in cluster are uniformly distributed. </li></ul><ul><li>Identifies distribution satisfied by distances between nearest neighbors. </li></ul><ul><li>Objects added if distribution is uniform. </li></ul>
  38. 38. DBCLASD Algorithm
  39. 39. Aggregate Proximity (Συναθροιστική Εγγύτητα) <ul><li>Aggregate Proximity – measure of how close a cluster is to a feature. </li></ul><ul><li>Aggregate proximity relationship finds the k closest features to a cluster. </li></ul><ul><li>The CRH Algorithm – uses different shapes: </li></ul><ul><ul><li>Encompassing C ircle </li></ul></ul><ul><ul><li>Isothetic R ectangle </li></ul></ul><ul><ul><li>Convex H ull </li></ul></ul><ul><li>Μια προσέγγιση φιλτραρίσματος των χαρακτηριστικών που χρησιμοποιεί πρώτα τον περικλείοντα κύκλο, μετά το ισοθετικό ορθογώνιο και τέλος το κυρτό περίβλημα </li></ul>
  40. 40. CRH

×