Your SlideShare is downloading. ×
0
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
SpatialDataMining.ppt
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SpatialDataMining.ppt

3,770

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,770
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
174
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CS599 Spatial &amp; Temporal Database <ul><li>Spatial Data Mining: Progress and Challenges </li></ul><ul><li>Survey Paper </li></ul><ul><li>appeared in DMKD96 </li></ul><ul><li>by Koperski, K., Adhikary, J. and Han, J. </li></ul><ul><li>Simon Fraser University, Canada </li></ul><ul><li>represented by Chung-hao Tan </li></ul><ul><li>Nov.16.2000 </li></ul>
  • 2. Outlines <ul><li>What is data mining? </li></ul><ul><li>What is spatial data mining? </li></ul><ul><li>Generalization-based knowledge discovery. </li></ul><ul><li>Clustering-based analysis. </li></ul><ul><li>Exploring spatial association rules. </li></ul><ul><li>Mining in image database. </li></ul><ul><li>Future direction &amp; conclusion. </li></ul>
  • 3. What Is Data Mining? <ul><li>A short definition: </li></ul><ul><li>“ extracting implicit knowledge from large amount of data.” </li></ul><ul><li>The form of discovered knowledge: </li></ul><ul><ul><li>Regression and classification. </li></ul></ul><ul><ul><li>Association rules. </li></ul></ul><ul><ul><li>Clustering. </li></ul></ul><ul><li>What can be contributed by database research? </li></ul><ul><ul><li>Efficient data access method (indexing). </li></ul></ul><ul><ul><li>Query optimizer. </li></ul></ul><ul><ul><li>Data integration. </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li>=&gt; Data Warehousing research provides a convenient platform for data mining. </li></ul></ul>
  • 4. An Example of Data Mining Technique <ul><li>Example: </li></ul><ul><ul><li>Data: </li></ul></ul><ul><ul><li>Stock trading data (price, size, number of trades, etc.). </li></ul></ul><ul><ul><li>Query: </li></ul></ul><ul><ul><li>Given the current and past trading information, can you tell me whether it will go up or go down in the next minute? </li></ul></ul><ul><ul><li>Method: </li></ul></ul><ul><ul><li>Bayesian CART model search (Chipman, 1997). </li></ul></ul><ul><ul><li>=&gt; try to find a classification or regression tree to model the data. </li></ul></ul><ul><ul><li>Result: </li></ul></ul><ul><ul><li>1. Reduce the misclassification rate from 53% to 30%. </li></ul></ul><ul><ul><li>2. Identify those important classification rules. </li></ul></ul><ul><ul><li>3. Identify those important variables (predictors). </li></ul></ul>
  • 5. An Example of Data Mining Technique (Cont.)
  • 6. An Example of Data Mining Technique (Cont.)
  • 7. What Is Spatial Data Mining? <ul><li>A short definition: </li></ul><ul><ul><li>Extraction of implicit knowledge, spatial relations, or other patterns not explicitly stored in spatial database. </li></ul></ul><ul><li>Benefits: </li></ul><ul><ul><li>Understand spatial data; query optimization. </li></ul></ul><ul><ul><li>Discover relationships between spatial data and non-spatial data. </li></ul></ul><ul><ul><li>Construction of spatial knowledge base (e.g. associations). </li></ul></ul><ul><li>Application: </li></ul><ul><ul><li>GIS. </li></ul></ul><ul><ul><li>Image database exploration. </li></ul></ul><ul><ul><li>Robot navigation. </li></ul></ul><ul><ul><li>… (any applications which use spatial data). </li></ul></ul>
  • 8. Primitives of Spatial Data Mining <ul><li>Spatial characteristic rules: </li></ul><ul><ul><li>A general description of spatial data. </li></ul></ul><ul><ul><li>E.g. price range of houses in various regions. </li></ul></ul><ul><li>Spatial discriminating rules: </li></ul><ul><ul><li>A general description of comparison among spatial data. </li></ul></ul><ul><ul><li>E.g. a comparison of price ranges of houses in various regions. </li></ul></ul><ul><li>Spatial association rules: </li></ul><ul><ul><li>Implication of one or a set of features by another set of features. </li></ul></ul><ul><ul><li>E.g. house near beach -&gt; is expensive. </li></ul></ul>
  • 9. Primitives of Spatial Data Mining (Cont.) <ul><li>Thematic maps: </li></ul><ul><ul><li>Present the spatial distribution of a single or a few attributes. </li></ul></ul><ul><ul><li>E.g. Temperature thematic map. </li></ul></ul><ul><ul><li>Data stored by raster image or vector image. </li></ul></ul><ul><li>Image database: </li></ul><ul><ul><li>A special kind of spatial database where data almost entirely consists of image or pictures (e.g. satellite image or medical image). </li></ul></ul><ul><ul><li>These images have coordination properties. </li></ul></ul>
  • 10. Data Mining Architecture <ul><li>An example: (by Matheus, 1993) </li></ul>
  • 11. Mining By Statistic Methods <ul><li>Methods: </li></ul><ul><ul><li>Regression model. </li></ul></ul><ul><li>Disadvantage. </li></ul><ul><ul><li>Assumption of statistical independence among the spatially distributed data. </li></ul></ul><ul><ul><li>Need experts’ domain knowledge (in spatial data). </li></ul></ul><ul><ul><li>Cannot model non-linear rules or symbolic values very well. </li></ul></ul><ul><ul><li>Do not work well with incomplete or inconclusive data. </li></ul></ul>
  • 12. Generalization-based Method <ul><li>Ideas: </li></ul><ul><ul><li>Learning from examples. </li></ul></ul><ul><ul><li>Combined with generalization . </li></ul></ul><ul><li>Concept hierarchy. </li></ul><ul><ul><li>Explicitly given by the domain experts. </li></ul></ul><ul><ul><li>Higher levels are more general terms. </li></ul></ul><ul><li>Attributed-oriented induction: </li></ul><ul><ul><li>Performed by climbing the generalization hierarchies and summarizing the general relationships between spatial and non-spatial data at higher concept levels . </li></ul></ul><ul><ul><li>Until reaching a generalization threshold . </li></ul></ul>
  • 13. Spatial-data-dominant Generalization <ul><li>Ideas: </li></ul><ul><ul><li>First step: Spatial-oriented induction. </li></ul></ul><ul><ul><li>Merging spatial regions according to the spatial concept hierarchy. </li></ul></ul><ul><ul><li>Second step: Attribute-oriented induction. </li></ul></ul><ul><ul><li>Non-spatial data at each merged regions are generalized at a given level by the threshold. </li></ul></ul>
  • 14. Non-spatial-data-dominant Generalization <ul><li>Ideas: </li></ul><ul><ul><li>First step: Attribute-oriented induction. </li></ul></ul><ul><ul><li>Non-spatial data are generalized at a given level by the threshold. </li></ul></ul><ul><ul><li>Second step: Spatial-oriented induction. </li></ul></ul><ul><ul><li>Merging spatial regions which have the same non-spatial description. Ignore those small regions with different non-spatial descriptions but inside a large merged region. </li></ul></ul>
  • 15. Generalization-based Method (Cont.)
  • 16. Clustering-based Method <ul><li>Ideas: </li></ul><ul><ul><li>Clusters can be found without using any background knowledge. </li></ul></ul><ul><ul><li>Unsupervised learning. </li></ul></ul><ul><ul><li>Methods: </li></ul></ul><ul><ul><li>PAM – Repeat to find a better k representatives by trying all possible pairs of combinations. </li></ul></ul><ul><ul><li>CLARA – Same as PAM, but using a subset of data as samples. </li></ul></ul><ul><ul><li>CLARANS – Same as PAM, but randomly changing the samples at each iteration. </li></ul></ul>
  • 17. SD-CLARANS <ul><li>Ideas: </li></ul><ul><ul><li>First step: Spatial-oriented induction. </li></ul></ul><ul><ul><li>Spatial-relevant data are collected and clustered. </li></ul></ul><ul><ul><li>Second step: Attributed-oriented induction. </li></ul></ul><ul><ul><li>Find out the non-spatial description of objects in each cluster . </li></ul></ul>
  • 18. NSD-CLARANS <ul><li>Ideas: </li></ul><ul><ul><li>First step: Attributed-oriented induction. </li></ul></ul><ul><ul><li>Produce a number of generalized tulples. </li></ul></ul><ul><ul><li>Second step: Spatial-oriented induction. </li></ul></ul><ul><ul><li>For each such generalized tuple, all spatial components are collected and clustered. </li></ul></ul>
  • 19. Other Issues In Clustering <ul><li>Need a fast access method to the spatial data (e.g. R*-tree). </li></ul><ul><li>Focus on relevant data only. </li></ul><ul><li>Using CF tree (for example) to store clustered results: </li></ul><ul><ul><li>A tuple of data is incrementally inserted into the closet leaf node (a sub-cluster). </li></ul></ul><ul><ul><li>If the diameter of the sub-cluster exceeds a threshold after insertion, split that leaf node. </li></ul></ul><ul><ul><li>Each internal node contains a Clustering Feature (CF). </li></ul></ul><ul><ul><li>CF = (N, LS, SS) N: #points in the sub-cluster. </li></ul></ul><ul><ul><li>LS: linear sum of the N points. </li></ul></ul><ul><ul><li>SS: square sum of the N points. </li></ul></ul><ul><ul><li>Linear scalability; insensibility to the input order; good quality of clustering. </li></ul></ul>
  • 20. Exploring Spatial Associations <ul><li>Example: </li></ul><ul><ul><li>Is_a(x, school) -&gt; close_to(x, park) 80%. </li></ul></ul><ul><ul><li>Topological relations : intersect, overlap, disjoint… </li></ul></ul><ul><ul><li>Spatial orientation : left_of, west_of… </li></ul></ul><ul><ul><li>Distance information : close_to, far_away… </li></ul></ul><ul><li>Minimum Support: </li></ul><ul><ul><li>Ignore those rules with small number of evidences. </li></ul></ul><ul><ul><li>E.g. Ignore the relation associating only 5% house in that area and a single school. </li></ul></ul><ul><ul><li>Strong rule : A rule with large support (exceeds the minimum support threshold). </li></ul></ul><ul><li>Minimum Confidence: </li></ul><ul><ul><li>Filter out those rules with low confidence. </li></ul></ul><ul><ul><li>E.g. Ignore the relations X-&gt;Y with only 5% confidence. </li></ul></ul>
  • 21. Multi-level Spatial Associations Rules <ul><li>Using tree to explore : </li></ul><ul><ul><li>Collect task-relevant data. </li></ul></ul><ul><ul><li>Computation starts at high level of spatial predicates like close_to . </li></ul></ul><ul><ul><li>Utilize spatial indexing methods. </li></ul></ul><ul><ul><li>For those pattern that pass the filtering at the high levels, do further refinements at the lower levels , like adjacent_to , intersects , distance_less_than_x , etc. </li></ul></ul><ul><ul><li>Filter out those patterns that do not exceed Minimum Support Threshold or Minimum Confidence Threshold . </li></ul></ul><ul><ul><li>Derive the strong association rules! </li></ul></ul>
  • 22. Using Approximation and Aggregation <ul><li>Ideas: </li></ul><ul><ul><li>Instead of asking “ where the clusters in the spatial database? ”, we want to know “ what are the characteristics of the clusters in terms of the features that are close to them? ” </li></ul></ul><ul><ul><li>E.g. “90% of the expensive house in a cluster are close to a lake”. </li></ul></ul><ul><ul><li>Using computational geometry concept. </li></ul></ul><ul><ul><li>First step: Eliminate unnecessary features. </li></ul></ul><ul><ul><li>Second step: Calculate the aggregate proximity of points in the cluster to the convex boundary of each features . </li></ul></ul><ul><ul><li>Experiment result: processing 50,000 features within 2 seconds. </li></ul></ul>
  • 23. Mining In Image Database <ul><li>Ideas: </li></ul><ul><ul><li>Mining useful information in image database. </li></ul></ul><ul><ul><li>Example: Automatically identify volcano on the surface of Venus from images transmitted by the spacecraft. </li></ul></ul><ul><ul><li>Question: Is the above example related to spatial data mining research? </li></ul></ul>
  • 24. Future Directions <ul><li>Data mining in spatial object-oriented database. </li></ul><ul><li>Mining under uncertainty. </li></ul><ul><li>Alternative Clustering Techniques. </li></ul><ul><li>Mining spatial data deviation and evolution rules. </li></ul><ul><li>Using multiple thematic maps. </li></ul><ul><li>Interleaved generalization. </li></ul><ul><li>Generalization using temporal spatial data. </li></ul><ul><li>Spatial Data Mining Query Language. </li></ul><ul><li>Multidimensional rule visualization. </li></ul>
  • 25. Conclusion <ul><li>What is spatial data mining? </li></ul><ul><li>(Non-)Spatial-data-dominant generalization </li></ul><ul><li>(Non-)Spatial-data-dominant clustering </li></ul><ul><li>Spatial association rules </li></ul><ul><li>Using approximation and aggregation </li></ul><ul><li>Mining in image database </li></ul>

×