Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Spatial Data Mining

on

  • 2,204 views

 

Statistics

Views

Total Views
2,204
Views on SlideShare
2,203
Embed Views
1

Actions

Likes
1
Downloads
94
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Spatial Data Mining Spatial Data Mining Presentation Transcript

    • Hauptseminar Context-based Presentation of Information for Mobile Users Prof. Dr. Uwe Baumgarten, Prof. Gudrun Klinker, Ph.D., Prof. Dr. Donald Kossmann Spatial Data Mining Elmar Witte Dec. 19, 2001 Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 1 / 30
    • Presentation overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 2 / 30
    • Introduction: Spatial Data Structures Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 3 / 30
    • Introduction: Spatial Data Structures Features of Spatial Data Structures (1) • spatial (ger. räumlich) data mining means discovery of knowledge in spatial databases (similar but not identic to relational data mining ) • spatial databases store (hugh amounts) of spatial data • spatial data contain some geometrical information - objects are defined by points, lines, polygons - objects described by spatial data have an area or volume from [5] Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 4 / 30
    • Introduction: Spatial Data Structures Features of Spatial Data Structures (2) • Spatial data is stored in spatial databases. Multidimensional trees are used, in order to build indices for these data (e.g. quad trees, k-d trees, R-trees, R*-trees). • Often attributes of spatial objects are still one-dimensional, so that this non-spatial part can be stroed in relational databases with references to the spatial data. • Note: Spatial operations like spatial join and map overlay are the most expensive. There are efficient algorithms that handle these problems. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 5 / 30
    • Database Primitives for Spatial Data Mining Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 6 / 30
    • Database Primitives for Spatial Data Mining Overview • Rules - spatial characteristic rule general description of spatial data - spatial discriminant rule description of features discriminating or contrasting a class of spatial data from another class - spatial association rule description of implications a set of features has to another set of features • Thematic Maps Presentation of a spatial distribution of a few attributes. There are two different types: raster and vector. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 7 / 30
    • Database Primitives for Spatial Data Mining Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 8 / 30
    • Database Primitives for Spatial Data Mining Neighborhood Relations The major difference between mining in relational databases and mining in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object itself. Various factors of mutual influence: • topology • distance • direction Generic representation of spatial objects: sets of points Point p = (p1, p2, ... pd) Spatial object O ∈ 2Points Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 9 / 30
    • Database Primitives for Spatial Data Mining Neighborhood Relations (Topological Relations) Invariant under topological transformations, like rotation, translation or scaling. Topological Relations between A and B: A disjoint B A meets B A overlaps B A equals B A covers B A contains B B covered-by A B inside A Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 10 / 30
    • Database Primitives for Spatial Data Mining Neighborhood Relations (Distance Relations) Let dist be a distance function, let σ be one of the arithmetic predicates <, > or =, let c be a real number and let O1 and O2 be spatial objects, i.e. O1, O2 ∈ 2Points. Then a distance relation A distanceσc B holds iff dist(O1,O2) σ c. Distance Relations between A and B: c c A distance = 0 B A distance = c B A distance < c B • distinction between source objects O1 and destination objects O2 • one representative point rep(O1) of the source object (e.g. the center of the object) is compared to all points of the destination object Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 11 / 30
    • Database Primitives for Spatial Data Mining Neighborhood Relations (Direction Relations) The representative point of an object is used as the origin of a virtual coordinate system and ist quadrants define the directions. Direction Relations: • 9 relations (north, east, south, west, northeast, northwest, southeast, B B north A southwest, any_direction) • For each pair of spatial objects at A rep(A) C east A least one relation holds. C • The direction relation between two D objects may not be unique. D east A D south A • The smallest direction relation (in D southeast A terms of a partial order) is called the exact direction relation. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 12 / 30
    • Database Primitives for Spatial Data Mining Neighborhood Relations There are different kinds of neighborhood relations: • topological relations • distance relations • direction relations These relations can be combined by logical operators ∧ (and) as well as ∨ (or). The result is called a complex neighborhood relation. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 13 / 30
    • Knowledge Discovery in Spatial Databases Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 14 / 30
    • Knowledge Discovery in Spatial Databases Overview There are different methods for discovering knowledge: • generalization-based methods for mining spatial characteristic and discriminant rules • aggregate proximity technique for finding characteristics of spatial clusters • two-step spatial computation technique for mining spatial association rules Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 15 / 30
    • Knowledge Discovery in Spatial Databases Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 16 / 30
    • Knowledge Discovery in Spatial Databases Generalization-Based Knowledge Discovery • Generalization means a reduction of attribute values to a certain (small) set of categories (è concept hierarchy). • This reduction often requires the existence of background knowledge. • two algorithms: - spatial-data-dominant generalization - non-spatial-data-dominant generalization Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 17 / 30
    • Knowledge Discovery in Spatial Databases Spatial-Data-Dominant Generalization • Algorithm - Collect all data described by the query. - Perform generalization first on the spatial data until the generalization threshold is reached. - Retrieve and analyze non-spatial data for each spatial object • Computational complexity is O(N log N), where N is the number of spatial objects. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 18 / 30
    • Knowledge Discovery in Spatial Databases Non-Spatial-Data-Dominant Generalization • Algorithm - Collect all data described by the query. - Perform generalization on the non-spatial attributes until the generalization threshold is reached. - Merge together neighboring areas with the same generalized attributes. • Computational complexity is O(N log N), where N is the number of spatial objects. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 19 / 30
    • Knowledge Discovery in Spatial Databases Generalization-Based Knowledge Discovery Problem of generalization: • Hierarchies may not be present a priori • Quality of mined characteristic rules depents much upon the given concept hierarchies. Solution: Algorithm that does not depend on spatial concept hierarchies. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 20 / 30
    • Knowledge Discovery in Spatial Databases Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 21 / 30
    • Knowledge Discovery in Spatial Databases Spatial Clustering • Clustering means to divide all objects in different groups (clusters) so that all members of a cluster are as similar as possible whereas the members of different clusters differ as much as possible from each other. • In the spatial context objects are compared by their distance. • The central object of a group is called medoid. Non-medoid objects belong to the nearest medoid object. Clustering algorithms (basic concepts): 1. Find k medoids randomly and calculate for all other objects the nearest medoid. 2. Find better medoids from the non-medoid objects, so that the overall distance decreases. 3. Exchange new found medoid and reallocate non-medoid objects. 4. Repeat step 1 or 2 or break. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 22 / 30
    • Knowledge Discovery in Spatial Databases Spatial Clustering: Algorithms (1) Examples of clustering algorithms: • PAM (Partitioning Around Medoids ) - always search the best partner for exchange • CLARA (Clustering LARge Applications) - use PAM on a random subset - iterate pass • CLARANS (Clustering Large Applications based on RANdomizes Search) - exchange immediately if possible - iterate pass (parameterized) In experiments CLARANS turned out to be most efficient. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 23 / 30
    • Knowledge Discovery in Spatial Databases Spatial Clustering: Algorithms (2) Based upon CLARANS there are two spatial data mining algorithms: - SD(CLARANS) spatial dominant CLARANS § Clustering of all spatial attributes using CLARANS § Generalization of non-spatial attributes for each object. - NSD(CLARANS) non-spatial dominant CLARANS § Generalization of non-spatial attributes § For each generalized tuple, spatial data is collected and clustered using CLARANS § If possible merge clusters Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 24 / 30
    • Knowledge Discovery in Spatial Databases Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 25 / 30
    • Knowledge Discovery in Spatial Databases Spatial Association Analysis Spatial association rule (X, Y sets of spatial or non-spatial predicates, c% confidence): X à Y (c%) Example: is_a (x, school) à close_to (x, park) (80%) ó "80% of schools are close to parks" • Only association rules that apply to a high percentage of objects are interesting (minimum support threshold) • Only association rules that have a high confidence are interesting (minimum confidence threshold) Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 26 / 30
    • Knowledge Discovery in Spatial Databases Overview • Introduction: Spatial Data Structures • Database Primitives for Spatial Data Mining Overview Neighborhood Relations • Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis • Conclusion Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 27 / 30
    • Conclusion Lessons learned • Spatial Data Mining extends relational data mining with respect to special features of spatial data, like mutual influence of neighboring objects by certain factors (topology, distance, direction). • Spatial data mining is based on techniques like generalization, clustering and mining association rules. • Some algorithms require further expert knowledge that can not be mined from the data, like concept hierarchies. Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 28 / 30
    • Conclusion Effects on Active Campus Garching • Need of data mining, especially association rules • Need of efficient algorithms • Not only spatial but also temporal or better spatial-temporal data mining will be crucial. • Combination of different concepts (especially from the last three talks). Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 29 / 30
    • Bibliography [1] Agrawal, Imielinski, Swami. Mining Association Rules between Sets of Items in Large Databases [2] Ester, Frommelt, Kriegel, Sander. Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support [3] Schober. Seminararbeit: Spatial Data Mining [4] Knecht. Data Mining Verfahren: Übersicht [5] Koperski, Han, Adhikary. Spatial Data Mining THE END. ... merry christmas ! Dec. 19, 2001 Spatial Data Mining Elmar Witte, witte@in.tum.de Page 30 / 30