Your SlideShare is downloading. ×
Spatial Data Mining
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Spatial Data Mining

2,115
views

Published on


0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,115
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
116
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Spatial Data Mining Page 1 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Hauptseminar Context-based Presentation of Information for Mobile Users Prof. Dr. Uwe Baumgarten, Prof. Gudrun Klinker, Ph.D., Prof. Dr. Donald Kossmann Spatial Data Mining Elmar Witte Dec. 19, 2001
  • 2. Spatial Data Mining Page 2 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Presentation overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 3. Spatial Data Mining Page 3 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Introduction: Spatial Data Structures • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 4. Spatial Data Mining Page 4 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Features of Spatial Data Structures (1) Introduction: Spatial Data Structures • spatial (ger. räumlich) data mining means discovery of knowledge in spatial databases (similar but not identic to relational data mining ) • spatial databases store (hugh amounts) of spatial data • spatial data contain some geometrical information - objects are defined by points, lines, polygons - objects described by spatial data have an area or volume from [5]
  • 5. Spatial Data Mining Page 5 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Features of Spatial Data Structures (2) Introduction: Spatial Data Structures • Spatial data is stored in spatial databases. Multidimensional trees are used, in order to build indices for these data (e.g. quad trees, k-d trees, R-trees, R*-trees). • Often attributes of spatial objects are still one-dimensional, so that this non-spatial part can be stroed in relational databases with references to the spatial data. • Note: Spatial operations like spatial join and map overlay are the most expensive. There are efficient algorithms that handle these problems.
  • 6. Spatial Data Mining Page 6 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Database Primitives for Spatial Data Mining • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 7. Spatial Data Mining Page 7 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Database Primitives for Spatial Data Mining • Rules - spatial characteristic rule general description of spatial data - spatial discriminant rule description of features discriminating or contrasting a class of spatial data from another class - spatial association rule description of implications a set of features has to another set of features • Thematic Maps Presentation of a spatial distribution of a few attributes. There are two different types: raster and vector.
  • 8. Spatial Data Mining Page 8 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Database Primitives for Spatial Data Mining • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 9. Spatial Data Mining Page 9 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Neighborhood Relations Database Primitives for Spatial Data Mining The major difference between mining in relational databases and mining in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object itself. Various factors of mutual influence: • topology • distance • direction Generic representation of spatial objects: sets of points Point p = (p1, p2, ... pd) Spatial object O ∈ 2Points
  • 10. Spatial Data Mining Page 10 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Neighborhood Relations (Topological Relations) Database Primitives for Spatial Data Mining Invariant under topological transformations, like rotation, translation or scaling. Topological Relations between A and B: A disjoint B A meets B A overlaps B A equals B A contains B B inside A A covers B B covered-by A
  • 11. Spatial Data Mining Page 11 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Neighborhood Relations (Distance Relations) Database Primitives for Spatial Data Mining Let dist be a distance function, let σ be one of the arithmetic predicates <, > or =, let c be a real number and let O1 and O2 be spatial objects, i.e. O1, O2 ∈ 2Points. Then a distance relation A distanceσc B holds iff dist(O1,O2) σ c. Distance Relations between A and B: A distance= 0 B A distance= c B c A distance< c B c • distinction between source objects O1 and destination objects O2 • one representative point rep(O1) of the source object (e.g. the center of the object) is compared to all points of the destination object
  • 12. Spatial Data Mining Page 12 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Neighborhood Relations (Direction Relations) Database Primitives for Spatial Data Mining The representative point of an object is used as the origin of a virtual coordinate system and ist quadrants define the directions. Direction Relations: • 9 relations (north, east, south, west, northeast, northwest, southeast, southwest, any_direction) • For each pair of spatial objects at least one relation holds. • The direction relation between two objects may not be unique. • The smallest direction relation (in terms of a partial order) is called the exact direction relation. B north A C east A D east A D south A D southeast A C D B A rep(A)
  • 13. Spatial Data Mining Page 13 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Neighborhood Relations Database Primitives for Spatial Data Mining There are different kinds of neighborhood relations: • topological relations • distance relations • direction relations These relations can be combined by logical operators ∧ (and) as well as ∨ (or). The result is called a complex neighborhood relation.
  • 14. Spatial Data Mining Page 14 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 15. Spatial Data Mining Page 15 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases There are different methods for discovering knowledge: • generalization-based methods for mining spatial characteristic and discriminant rules • aggregate proximity technique for finding characteristics of spatial clusters • two-step spatial computation technique for mining spatial association rules
  • 16. Spatial Data Mining Page 16 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 17. Spatial Data Mining Page 17 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Generalization-Based Knowledge Discovery Knowledge Discovery in Spatial Databases • Generalization means a reduction of attribute values to a certain (small) set of categories (è concept hierarchy). • This reduction often requires the existence of background knowledge. • two algorithms: - spatial-data-dominant generalization - non-spatial-data-dominant generalization
  • 18. Spatial Data Mining Page 18 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Spatial-Data-Dominant Generalization Knowledge Discovery in Spatial Databases • Algorithm - Collect all data described by the query. - Perform generalization first on the spatial data until the generalization threshold is reached. - Retrieve and analyze non-spatial data for each spatial object • Computational complexity is O(N log N), where N is the number of spatial objects.
  • 19. Spatial Data Mining Page 19 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Non-Spatial-Data-Dominant Generalization Knowledge Discovery in Spatial Databases • Algorithm - Collect all data described by the query. - Perform generalization on the non-spatial attributes until the generalization threshold is reached. - Merge together neighboring areas with the same generalized attributes. • Computational complexity is O(N log N), where N is the number of spatial objects.
  • 20. Spatial Data Mining Page 20 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Generalization-Based Knowledge Discovery Knowledge Discovery in Spatial Databases Problem of generalization: • Hierarchies may not be present a priori • Quality of mined characteristic rules depents much upon the given concept hierarchies. Solution: Algorithm that does not depend on spatial concept hierarchies.
  • 21. Spatial Data Mining Page 21 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 22. Spatial Data Mining Page 22 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Spatial Clustering Knowledge Discovery in Spatial Databases • Clustering means to divide all objects in different groups (clusters) so that all members of a cluster are as similar as possible whereas the members of different clusters differ as much as possible from each other. • In the spatial context objects are compared by their distance. • The central object of a group is called medoid. Non-medoid objects belong to the nearest medoid object. Clustering algorithms (basic concepts): 1. Find k medoids randomly and calculate for all other objects the nearest medoid. 2. Find better medoids from the non-medoid objects, so that the overall distance decreases. 3. Exchange new found medoid and reallocate non-medoid objects. 4. Repeat step 1 or 2 or break.
  • 23. Spatial Data Mining Page 23 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Spatial Clustering: Algorithms (1) Knowledge Discovery in Spatial Databases Examples of clustering algorithms: • PAM (Partitioning Around Medoids) - always search the best partner for exchange • CLARA (Clustering LARge Applications) - use PAM on a random subset - iterate pass • CLARANS (Clustering Large Applications based on RANdomizes Search) - exchange immediately if possible - iterate pass (parameterized) In experiments CLARANS turned out to be most efficient.
  • 24. Spatial Data Mining Page 24 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Spatial Clustering: Algorithms (2) Knowledge Discovery in Spatial Databases Based upon CLARANS there are two spatial data mining algorithms: - SD(CLARANS) spatial dominant CLARANS § Clustering of all spatial attributes using CLARANS § Generalization of non-spatial attributes for each object. - NSD(CLARANS) non-spatial dominant CLARANS § Generalization of non-spatial attributes § For each generalized tuple, spatial data is collected and clustered using CLARANS § If possible merge clusters
  • 25. Spatial Data Mining Page 25 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 26. Spatial Data Mining Page 26 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Spatial Association Analysis Knowledge Discovery in Spatial Databases Spatial association rule (X, Y sets of spatial or non-spatial predicates, c% confidence): X à Y (c%) Example: is_a (x, school) à close_to (x, park) (80%) ó "80% of schools are close to parks" • Only association rules that apply to a high percentage of objects are interesting (minimum support threshold) • Only association rules that have a high confidence are interesting (minimum confidence threshold)
  • 27. Spatial Data Mining Page 27 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Overview Knowledge Discovery in Spatial Databases • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion
  • 28. Spatial Data Mining Page 28 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Lessons learned Conclusion • Spatial Data Mining extends relational data mining with respect to special features of spatial data, like mutual influence of neighboring objects by certain factors (topology, distance, direction). • Spatial data mining is based on techniques like generalization, clustering and mining association rules. • Some algorithms require further expert knowledge that can not be mined from the data, like concept hierarchies.
  • 29. Spatial Data Mining Page 29 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Effects on Active Campus Garching Conclusion • Need of data mining, especially association rules • Need of efficient algorithms • Not only spatial but also temporal or better spatial-temporal data mining will be crucial. • Combination of different concepts (especially from the last three talks).
  • 30. Spatial Data Mining Page 30 / 30 Dec. 19, 2001 Elmar Witte, witte@in.tum.de Bibliography [1] Agrawal, Imielinski, Swami. Mining Association Rules between Sets of Items in Large Databases [2] Ester, Frommelt, Kriegel, Sander. Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support [3] Schober. Seminararbeit: Spatial Data Mining [4] Knecht. Data Mining Verfahren: Übersicht [5] Koperski, Han, Adhikary. Spatial Data Mining ... merry christmas ! THE END.

×