Spatial Data Mining by Satoru Hozumi
Upcoming SlideShare
Loading in...5

Spatial Data Mining by Satoru Hozumi






Total Views
Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Spatial Data Mining by Satoru Hozumi Spatial Data Mining by Satoru Hozumi Presentation Transcript

  • Spatial Data Mining Satoru Hozumi CS 157B
  • Learning Objectives
    • Understand the concept of Spatial Data Mining
    • Learn techniques on how to find spatial patterns
  • Examples of Spatial Patterns
    • 1855 Asiatic Cholera in London.
      • A water pump identified as the source.
    • Cancer cluster to investigate health hazards.
    • Crime hotspots for planning police patrol routes.
    • Affects of weather in the US caused by unusual warming of Pacific ocean (El Nino).
  • What is a Spatial Pattern?
    • What is not a pattern?
      • Random, haphazard, chance, stray, accidental, unexpected.
      • Without definite direction, trend, rule, method, design, aim, purpose.
    • What is a Pattern?
      • A frequent arrangement, configuration, composition, regularity.
      • A rule, law, method, design, description.
      • A major direction, trend, prediction.
  • Defining Spatial Data Mining
    • Search for spatial patterns.
    • Non-trivial search – as “automated” as possible.
      • Large search space of plausible hypothesis
      • Ex. Asiatic cholera : causes water, food, air, insects.
    • Interesting, useful, and unexpected spatial patterns.
      • Useful in certain application domain
        • Ex. Shutting off identified water pump => saved human lives.
      • May provide a new understanding of the world
        • Ex. Water pump – Cholera connection lead to the “germ” theory.
  • What is NOT Spatial Data Mining
    • Simple querying of Spatial Data
      • Finding neighbors of Canada given names and boundaries of all countries (Search space not large)
    • Uninteresting or obvious patterns
      • Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart).
      • Common knowledge, nearby places have similar rainfall
    • Mining of non-spatial data
      • Diaper sales and beer sales are correlated in evenings
  • Families of Spatial Data Mining Patterns
    • Location Prediction:
      • Where will a phenomenon occur?
    • Spatial Interactions
      • Which subset of spatial phenomena interact?
    • Hot spot
      • Which locations are unusual or share commonalities?
  • Location Prediction
    • Where will a phenomenon occur?
    • Which spatial events are predictable?
    • How can a spatial event be predicted from other spatial events?
    • Examples
      • Where will an endangered bird nest?
      • Which areas are prone to fire given maps of vegitation and drought?
      • What should be recommended to a traveler in a given location?
  • Spatial Interactions
    • Which spatial events are related to each other?
    • Which spatial phenomena depend on other phenomenon?
    • Examples
      • Earth science:
        • climate and disturbance => {wild fires, hot, dry, lightning}
      • Epidemiology:
        • Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, mosquitoes}
  • Hot spots
    • Is a phenomenon spatially clutered?
    • Which spatial entities are unusual or share common characteristics?
    • Examples
      • Crime hot spots to plan police patrols
  • Spatial Queries
    • Spatial Range Queries
      • Find all cities within 50 miles of Paris
      • Query has associated region (location, boundary)
      • Answer includes overlapping or contained data regions
    • Nearest-Neighbor Queries
      • Find the 10 cities nearest to Paris
      • Results must be ordered by proximity
    • Spatial Join Queries
      • Find all cities near a lake
      • Join condition involves regions and proximity.
  • Unique Properties of Spatial Patterns
    • Items in a traditional data are independent of each other, where as properties of location in a map are often “auto-correlated” (patterns exist)
    • Traditional data deals with simple domains, e.g. numbers and symbols where as spatial data types are complex
    • Items in traditional data describe discrete objects where as spatial data is continuous
  • Association Rules
    • Support = the number of time a rule shows up in a database
    • Confidence = Conditional probability of Y given X
    • Example
      • (Bedrock type = limestone), (soil depth < 50 ft) => (sink hole risk = high)
      • Support = 20 %, confidence = 0.8
      • Interpretation: Locations with limestone bedrock and low soil depth have high risk of sink hole formation.
  • Apriori Algorithm to mine association rules
    • Key challenge
      • Very large search space
    • Key assumption
      • Few associations are support above given threshold
      • Associations with low support are not interesting
    • Key insight
      • If an association item set has high support, then so do all its subsets
  • Association rules Example
  • Techniques for Association Mining
    • Classical method
      • Association rules given item types and transactions
      • Assumes spatial data can be decomposed into transactions
        • Such decomposition may alter spatial patterns
    • New spatial method
      • Spatial association rule
      • Spatial co-location
  • Associations, Spatial associations, co-location
  • Associations, Spatial associatins, co-location
  • Co-location Rules
    • For point data in space
    • Does not need transaction, works directly with continuous space
    • Use neighborhood definition and spatial joins
  • Co-location rules
  • Clustering
    • Process of discovering groups in large databases
      • Spatial view: rows in a database = points in a multi-dimentional space.
      • Visualization may reveal interesting groups
  • Clustering
    • Hierarchical
      • All points in one cluster
      • Split and merge till a stop criterion is reached
    • Partitional
      • Start with random central point
      • Assign points to nearest central point
      • Update the central points
      • Approach with statistical rigor
    • Density
      • Find clusters based on density of regions
  • Outliers
    • Observations inconsistent with rest of the dataset
    • Observations inconsistent with their neighborhoods
    • A local instability or discontinuity
  • Variogram Cloud
    • Create a variogram by plotting attribute difference, distance for each pair of points
    • Select points common to many outlying pairs
  • Moran Scatter Plot
    • Plot normalized attribute values, weighted average in the neighborhood for each location
    • Select points in upper left and lower right quadrant
  • Scatter plot
    • Plot normalized attribute values, weighted average in the neighborhood for each location
    • Fit a liner regression line
    • Select points which are unusually far from the regression line.
  • Conclusion
    • Patterns are opposite of random
    • Common spatial patterns:
      • Location prediction
      • Feature interaction
      • Hot spot
    • Spatial patterns may be discovered using:
      • Techniques like associations, clustering and outlier detection