Spatial Data Mining Satoru Hozumi CS 157B
Learning Objectives <ul><li>Understand the concept of Spatial Data Mining </li></ul><ul><li>Learn techniques on how to fin...
Examples of Spatial Patterns <ul><li>1855 Asiatic Cholera in London. </li></ul><ul><ul><li>A water pump identified as the ...
What is a Spatial Pattern? <ul><li>What is not a pattern? </li></ul><ul><ul><li>Random, haphazard, chance, stray, accident...
Defining Spatial Data Mining <ul><li>Search for spatial patterns. </li></ul><ul><li>Non-trivial search – as “automated” as...
What is NOT Spatial Data Mining <ul><li>Simple querying of Spatial Data </li></ul><ul><ul><li>Finding neighbors of Canada ...
Families of Spatial Data Mining Patterns <ul><li>Location Prediction:  </li></ul><ul><ul><li>Where will a phenomenon occur...
Location Prediction <ul><li>Where will a phenomenon occur? </li></ul><ul><li>Which spatial events are predictable? </li></...
Spatial Interactions <ul><li>Which spatial events are related to each other? </li></ul><ul><li>Which spatial phenomena dep...
Hot spots <ul><li>Is a phenomenon spatially clutered? </li></ul><ul><li>Which spatial entities are unusual or share common...
Spatial Queries <ul><li>Spatial Range Queries </li></ul><ul><ul><li>Find all cities within 50 miles of Paris </li></ul></u...
Unique Properties of Spatial Patterns <ul><li>Items in a traditional data are independent of each other, where as properti...
Association Rules <ul><li>Support = the number of time a rule shows up in a database </li></ul><ul><li>Confidence = Condit...
Apriori Algorithm to mine association rules  <ul><li>Key challenge </li></ul><ul><ul><li>Very large search space </li></ul...
Association rules Example
Techniques for Association Mining <ul><li>Classical method </li></ul><ul><ul><li>Association rules given item types and tr...
Associations, Spatial associations, co-location
Associations, Spatial associatins, co-location
Co-location Rules <ul><li>For point data in space </li></ul><ul><li>Does not need transaction, works directly with continu...
Co-location rules
Clustering <ul><li>Process of discovering groups in large databases </li></ul><ul><ul><li>Spatial view: rows in a database...
Clustering <ul><li>Hierarchical </li></ul><ul><ul><li>All points in one cluster </li></ul></ul><ul><ul><li>Split and merge...
Outliers <ul><li>Observations inconsistent with rest of the dataset </li></ul><ul><li>Observations inconsistent with their...
Variogram Cloud <ul><li>Create a variogram by plotting attribute difference, distance for each pair of points </li></ul><u...
Moran Scatter Plot <ul><li>Plot normalized attribute values, weighted average in the neighborhood for each location </li><...
Scatter plot <ul><li>Plot normalized attribute values, weighted average in the neighborhood for each location </li></ul><u...
Conclusion <ul><li>Patterns are opposite of random </li></ul><ul><li>Common spatial patterns: </li></ul><ul><ul><li>Locati...
Upcoming SlideShare
Loading in...5
×

Spatial Data Mining by Satoru Hozumi

499

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
499
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Spatial Data Mining by Satoru Hozumi"

  1. 1. Spatial Data Mining Satoru Hozumi CS 157B
  2. 2. Learning Objectives <ul><li>Understand the concept of Spatial Data Mining </li></ul><ul><li>Learn techniques on how to find spatial patterns </li></ul>
  3. 3. Examples of Spatial Patterns <ul><li>1855 Asiatic Cholera in London. </li></ul><ul><ul><li>A water pump identified as the source. </li></ul></ul><ul><li>Cancer cluster to investigate health hazards. </li></ul><ul><li>Crime hotspots for planning police patrol routes. </li></ul><ul><li>Affects of weather in the US caused by unusual warming of Pacific ocean (El Nino). </li></ul>
  4. 4. What is a Spatial Pattern? <ul><li>What is not a pattern? </li></ul><ul><ul><li>Random, haphazard, chance, stray, accidental, unexpected. </li></ul></ul><ul><ul><li>Without definite direction, trend, rule, method, design, aim, purpose. </li></ul></ul><ul><li>What is a Pattern? </li></ul><ul><ul><li>A frequent arrangement, configuration, composition, regularity. </li></ul></ul><ul><ul><li>A rule, law, method, design, description. </li></ul></ul><ul><ul><li>A major direction, trend, prediction. </li></ul></ul>
  5. 5. Defining Spatial Data Mining <ul><li>Search for spatial patterns. </li></ul><ul><li>Non-trivial search – as “automated” as possible. </li></ul><ul><ul><li>Large search space of plausible hypothesis </li></ul></ul><ul><ul><li>Ex. Asiatic cholera : causes water, food, air, insects. </li></ul></ul><ul><li>Interesting, useful, and unexpected spatial patterns. </li></ul><ul><ul><li>Useful in certain application domain </li></ul></ul><ul><ul><ul><li>Ex. Shutting off identified water pump => saved human lives. </li></ul></ul></ul><ul><ul><li>May provide a new understanding of the world </li></ul></ul><ul><ul><ul><li>Ex. Water pump – Cholera connection lead to the “germ” theory. </li></ul></ul></ul>
  6. 6. What is NOT Spatial Data Mining <ul><li>Simple querying of Spatial Data </li></ul><ul><ul><li>Finding neighbors of Canada given names and boundaries of all countries (Search space not large) </li></ul></ul><ul><li>Uninteresting or obvious patterns </li></ul><ul><ul><li>Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart). </li></ul></ul><ul><ul><li>Common knowledge, nearby places have similar rainfall </li></ul></ul><ul><li>Mining of non-spatial data </li></ul><ul><ul><li>Diaper sales and beer sales are correlated in evenings </li></ul></ul>
  7. 7. Families of Spatial Data Mining Patterns <ul><li>Location Prediction: </li></ul><ul><ul><li>Where will a phenomenon occur? </li></ul></ul><ul><li>Spatial Interactions </li></ul><ul><ul><li>Which subset of spatial phenomena interact? </li></ul></ul><ul><li>Hot spot </li></ul><ul><ul><li>Which locations are unusual or share commonalities? </li></ul></ul>
  8. 8. Location Prediction <ul><li>Where will a phenomenon occur? </li></ul><ul><li>Which spatial events are predictable? </li></ul><ul><li>How can a spatial event be predicted from other spatial events? </li></ul><ul><li>Examples </li></ul><ul><ul><li>Where will an endangered bird nest? </li></ul></ul><ul><ul><li>Which areas are prone to fire given maps of vegitation and drought? </li></ul></ul><ul><ul><li>What should be recommended to a traveler in a given location? </li></ul></ul>
  9. 9. Spatial Interactions <ul><li>Which spatial events are related to each other? </li></ul><ul><li>Which spatial phenomena depend on other phenomenon? </li></ul><ul><li>Examples </li></ul><ul><ul><li>Earth science: </li></ul></ul><ul><ul><ul><li>climate and disturbance => {wild fires, hot, dry, lightning} </li></ul></ul></ul><ul><ul><li>Epidemiology: </li></ul></ul><ul><ul><ul><li>Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, mosquitoes} </li></ul></ul></ul>
  10. 10. Hot spots <ul><li>Is a phenomenon spatially clutered? </li></ul><ul><li>Which spatial entities are unusual or share common characteristics? </li></ul><ul><li>Examples </li></ul><ul><ul><li>Crime hot spots to plan police patrols </li></ul></ul>
  11. 11. Spatial Queries <ul><li>Spatial Range Queries </li></ul><ul><ul><li>Find all cities within 50 miles of Paris </li></ul></ul><ul><ul><li>Query has associated region (location, boundary) </li></ul></ul><ul><ul><li>Answer includes overlapping or contained data regions </li></ul></ul><ul><li>Nearest-Neighbor Queries </li></ul><ul><ul><li>Find the 10 cities nearest to Paris </li></ul></ul><ul><ul><li>Results must be ordered by proximity </li></ul></ul><ul><li>Spatial Join Queries </li></ul><ul><ul><li>Find all cities near a lake </li></ul></ul><ul><ul><li>Join condition involves regions and proximity. </li></ul></ul>
  12. 12. Unique Properties of Spatial Patterns <ul><li>Items in a traditional data are independent of each other, where as properties of location in a map are often “auto-correlated” (patterns exist) </li></ul><ul><li>Traditional data deals with simple domains, e.g. numbers and symbols where as spatial data types are complex </li></ul><ul><li>Items in traditional data describe discrete objects where as spatial data is continuous </li></ul>
  13. 13. Association Rules <ul><li>Support = the number of time a rule shows up in a database </li></ul><ul><li>Confidence = Conditional probability of Y given X </li></ul><ul><li>Example </li></ul><ul><ul><li>(Bedrock type = limestone), (soil depth < 50 ft) => (sink hole risk = high) </li></ul></ul><ul><ul><li>Support = 20 %, confidence = 0.8 </li></ul></ul><ul><ul><li>Interpretation: Locations with limestone bedrock and low soil depth have high risk of sink hole formation. </li></ul></ul>
  14. 14. Apriori Algorithm to mine association rules <ul><li>Key challenge </li></ul><ul><ul><li>Very large search space </li></ul></ul><ul><li>Key assumption </li></ul><ul><ul><li>Few associations are support above given threshold </li></ul></ul><ul><ul><li>Associations with low support are not interesting </li></ul></ul><ul><li>Key insight </li></ul><ul><ul><li>If an association item set has high support, then so do all its subsets </li></ul></ul>
  15. 15. Association rules Example
  16. 16. Techniques for Association Mining <ul><li>Classical method </li></ul><ul><ul><li>Association rules given item types and transactions </li></ul></ul><ul><ul><li>Assumes spatial data can be decomposed into transactions </li></ul></ul><ul><ul><ul><li>Such decomposition may alter spatial patterns </li></ul></ul></ul><ul><li>New spatial method </li></ul><ul><ul><li>Spatial association rule </li></ul></ul><ul><ul><li>Spatial co-location </li></ul></ul>
  17. 17. Associations, Spatial associations, co-location
  18. 18. Associations, Spatial associatins, co-location
  19. 19. Co-location Rules <ul><li>For point data in space </li></ul><ul><li>Does not need transaction, works directly with continuous space </li></ul><ul><li>Use neighborhood definition and spatial joins </li></ul>
  20. 20. Co-location rules
  21. 21. Clustering <ul><li>Process of discovering groups in large databases </li></ul><ul><ul><li>Spatial view: rows in a database = points in a multi-dimentional space. </li></ul></ul><ul><ul><li>Visualization may reveal interesting groups </li></ul></ul>
  22. 22. Clustering <ul><li>Hierarchical </li></ul><ul><ul><li>All points in one cluster </li></ul></ul><ul><ul><li>Split and merge till a stop criterion is reached </li></ul></ul><ul><li>Partitional </li></ul><ul><ul><li>Start with random central point </li></ul></ul><ul><ul><li>Assign points to nearest central point </li></ul></ul><ul><ul><li>Update the central points </li></ul></ul><ul><ul><li>Approach with statistical rigor </li></ul></ul><ul><li>Density </li></ul><ul><ul><li>Find clusters based on density of regions </li></ul></ul>
  23. 23. Outliers <ul><li>Observations inconsistent with rest of the dataset </li></ul><ul><li>Observations inconsistent with their neighborhoods </li></ul><ul><li>A local instability or discontinuity </li></ul>
  24. 24. Variogram Cloud <ul><li>Create a variogram by plotting attribute difference, distance for each pair of points </li></ul><ul><li>Select points common to many outlying pairs </li></ul>
  25. 25. Moran Scatter Plot <ul><li>Plot normalized attribute values, weighted average in the neighborhood for each location </li></ul><ul><li>Select points in upper left and lower right quadrant </li></ul>
  26. 26. Scatter plot <ul><li>Plot normalized attribute values, weighted average in the neighborhood for each location </li></ul><ul><li>Fit a liner regression line </li></ul><ul><li>Select points which are unusually far from the regression line. </li></ul>
  27. 27. Conclusion <ul><li>Patterns are opposite of random </li></ul><ul><li>Common spatial patterns: </li></ul><ul><ul><li>Location prediction </li></ul></ul><ul><ul><li>Feature interaction </li></ul></ul><ul><ul><li>Hot spot </li></ul></ul><ul><li>Spatial patterns may be discovered using: </li></ul><ul><ul><li>Techniques like associations, clustering and outlier detection </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×