Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004
About … <ul><li>A form of geographical analysis </li></ul><ul><li>Current topic of interest in GIS research (and database ...
This seminar <ul><li>Learning about a topic together </li></ul><ul><li>Presenting to each other + interaction </li></ul><u...
Material <ul><li>Book by Harvey Miller and Jiawei Han (editors): selected chapters </li></ul><ul><li>Possibly: papers from...
Weeks <ul><li>Week 36-46 </li></ul><ul><li>Probably: </li></ul><ul><ul><li>Not September 4 (this Thursday) </li></ul></ul>...
Overview of Geographic Data Mining & Knowledge Discovery <ul><li>Chapter 1 of the book </li></ul><ul><li>KDD: knowledge di...
Knowledge Discovery in Databases (KDD) <ul><li>Large databases contain interesting  patterns : non-random properties and r...
Knowledge Discovery in Databases (KDD) <ul><li>Because of quantity of data nowadays </li></ul><ul><li>Because we want info...
KDD opposed to statistics <ul><li>Statistics </li></ul><ul><ul><li>small and clean numeric database </li></ul></ul><ul><ul...
KDD techniques <ul><li>Statistics </li></ul><ul><li>Machine learning </li></ul><ul><li>Pattern recognition </li></ul><ul><...
Data warehouse <ul><li>Large repository of data </li></ul><ul><li>F or analytical processing  (DB: transactional processin...
OLAP Example <ul><li>M easure of interest: sales </li></ul><ul><li>D imensions of interest: item, store, week </li></ul><u...
OLAP Example <ul><li>2-dim. aggregation: (item, store, . )    money </li></ul><ul><li>A nother 2-dim. aggregation: sales ...
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li...
KDD steps <ul><li>Data selection : which records, variables chosen? </li></ul>
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing : removing noise, duplicate records, handling missi...
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment : combining the s...
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li...
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li...
KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li...
Data mining <ul><li>Segmentation </li></ul><ul><li>Dependency analysis </li></ul><ul><li>Deviation and outlier analysis </...
DM - segmentation <ul><li>Description: </li></ul><ul><li>Clustering : finding a finite set of implicit classes </li></ul><...
DM - segmentation clustering given classes classification
DM – dependency analysis <ul><li>Description: </li></ul><ul><li>Finding rules to predict the value of some attribute based...
DM – dependency analysis <ul><li>Confidence  and  support  measures for association rules of the form: [  if  X  then  Y ]...
DM – deviation & outlier analysis <ul><li>Description: </li></ul><ul><li>Finding data with unusual deviations (=errors, or...
DM – trend detection <ul><li>Description: </li></ul><ul><li>Finding lines, curves, summarizing the database (often as a fu...
DM – generalization and characterization <ul><li>Description: </li></ul><ul><li>Obtaining compact descriptions of the data...
Visualization and knowledge discovery <ul><li>KDD is difficult to automate    steered by human intelligence </li></ul><ul...
KD + geography <ul><li>Special case of KDD </li></ul><ul><li>Other special cases </li></ul><ul><ul><li>marketing </li></ul...
KD + geography (attr1, attr2, attr3, attr4); attr’s are numbers and (relatively) independent: statistics (attr1, attr2, at...
KD + geography <ul><li>Study of scalable versions of DM tasks (in lat. and long.) </li></ul><ul><li>Certain dimensions can...
Geographic data mining <ul><li>Spatial segmentation (clustering, classification) </li></ul><ul><li>Spatial dependency (spa...
GDM – spatial association rules <ul><li>Example:  If  a location is within 500 m from water and the average winter tempera...
GDM – spatial trend detection <ul><li>Patterns of change with respect to neighborhood of some object </li></ul><ul><li>Exa...
GDM - applications <ul><li>Map interpretation </li></ul><ul><li>Remote sensing interpretation </li></ul><ul><li>Environmen...
Conclusions <ul><li>GDM & GKD is an extension of (tool for) geographical analysis </li></ul><ul><li>GDM is different from ...
This seminar on GDM <ul><li>First: chapters from the book </li></ul><ul><ul><li>CH 1: GDM & KD: an overview  (today) </li>...
This seminar <ul><li>All PowerPoint presentations on the Web page of the course </li></ul><ul><li>Survey paper or written ...
Each presentation <ul><li>The chapter contents </li></ul><ul><li>Additional (spatial) examples (from the Web links or self...
Upcoming SlideShare
Loading in …5
×

Geographic Data Mining

497
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
497
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Confidence says something about the truth (validity) of the rule; support says something about the generality
  • Outliers are items that don’t fit well in any cluster, hence the technique
  • You can do data mining at every level of the concept hierarchy (every level of aggregation)
  • Not just dimensionality of the information space, but “real” dimensions, where distance has a more specific meaning.
  • 1st: Of an adult: (age, no. of children, income) relatively independent attributes 2nd: (age, no. of children, income, level of happiness) 3rd: (lat., long., %nitrogen, %sulfur, pH, humidity) 4th: (valley shape description, river throughput, rock type, average temperature) Last is not a “point in information space” anymore
  • Mining for the characteristics of locations where specific species of animals occur. For example, when deciding to try and put beavers in the Biesbosch, one should know in advance that the environmental situation is correct, and there is no absence of a circumstance that is needed for beavers, nor the presence of a circumstance that should not be present. Otherwise, easily tens of thousands of Euros will be wasted.
  • Here confidence and support cannot simply be applied, because the rule incorporates the spatial trend “further from”
  • Used techniques include artificial neural networks (ANNs) and heuristic searches. ANNs are good at classification, but the setting of the internal parameters is difficult to interpret by humans. One of the other problems is that non-crisp patterns need be detected. ANNs are better at detecting crisp patterns. Also algorithmically, finding a whole pattern is easier (efficiency-wise) than finding a partial pattern (e.g. graph isomorphism and subgraph isomorphism).
  • Geographic Data Mining

    1. 1. Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004
    2. 2. About … <ul><li>A form of geographical analysis </li></ul><ul><li>Current topic of interest in GIS research (and database research and AI research) </li></ul><ul><li>Finding hidden information in large collections of geographic data </li></ul>
    3. 3. This seminar <ul><li>Learning about a topic together </li></ul><ul><li>Presenting to each other + interaction </li></ul><ul><li>Added value by good examples: </li></ul><ul><ul><li>for important concepts, algorithms </li></ul></ul><ul><ul><li>possibly self-thought of, or extended </li></ul></ul><ul><ul><li>referring to GIS data and issues (hence the GIS course prerequisite) </li></ul></ul><ul><li>Written assignment: joint survey </li></ul>
    4. 4. Material <ul><li>Book by Harvey Miller and Jiawei Han (editors): selected chapters </li></ul><ul><li>Possibly: papers from conference proceedings </li></ul><ul><li>Mostly provided by me </li></ul>
    5. 5. Weeks <ul><li>Week 36-46 </li></ul><ul><li>Probably: </li></ul><ul><ul><li>Not September 4 (this Thursday) </li></ul></ul><ul><ul><li>Not in week 40 (Sept. 29 & Oct. 2) </li></ul></ul><ul><ul><li>Not October 23 </li></ul></ul><ul><li>The above depending on participation! </li></ul>
    6. 6. Overview of Geographic Data Mining & Knowledge Discovery <ul><li>Chapter 1 of the book </li></ul><ul><li>KDD: knowledge discovery in databases </li></ul><ul><li>Data warehouses </li></ul><ul><li>Data mining </li></ul><ul><li>Geographic aspects of the above </li></ul>
    7. 7. Knowledge Discovery in Databases (KDD) <ul><li>Large databases contain interesting patterns : non-random properties and relationships that are: </li></ul><ul><ul><li>valid (general enough to apply to new data) </li></ul></ul><ul><ul><li>novel (non-trivial and unexpected) </li></ul></ul><ul><ul><li>useful (leads to effective action: decision making or investigation) </li></ul></ul><ul><ul><li>ultimately understandable (simple, and interpretable by humans) </li></ul></ul>
    8. 8. Knowledge Discovery in Databases (KDD) <ul><li>Because of quantity of data nowadays </li></ul><ul><li>Because we want information, not data </li></ul><ul><li>Because computing power allows it </li></ul>
    9. 9. KDD opposed to statistics <ul><li>Statistics </li></ul><ul><ul><li>small and clean numeric database </li></ul></ul><ul><ul><li>scientifically sampled </li></ul></ul><ul><ul><li>specific questions in mind </li></ul></ul><ul><li>KDD: none of the above </li></ul>
    10. 10. KDD techniques <ul><li>Statistics </li></ul><ul><li>Machine learning </li></ul><ul><li>Pattern recognition </li></ul><ul><li>Numeric search (?) </li></ul><ul><li>Scientific visualization </li></ul>
    11. 11. Data warehouse <ul><li>Large repository of data </li></ul><ul><li>F or analytical processing (DB: transactional processing) </li></ul><ul><li>H eterogeneous : different sources and formats (DB: homogeneous) </li></ul><ul><li>S upports OLAP tools (OnLine Analytical Processing) </li></ul>
    12. 12. OLAP Example <ul><li>M easure of interest: sales </li></ul><ul><li>D imensions of interest: item, store, week </li></ul><ul><li>(item, store, week)  money [quantity sold times price ] </li></ul>
    13. 13. OLAP Example <ul><li>2-dim. aggregation: (item, store, . )  money </li></ul><ul><li>A nother 2-dim. aggregation: sales by store and by week </li></ul><ul><li>1-dim. a gg regatio n : sales by week (all items and stores) </li></ul><ul><li>Data cube : all 2 d possible aggregations, different types of summaries </li></ul>
    14. 14. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li>Data reduction and projection </li></ul><ul><li>Data mining </li></ul><ul><li>Interpretation and reporting </li></ul>Presence of steps and order not fixed
    15. 15. KDD steps <ul><li>Data selection : which records, variables chosen? </li></ul>
    16. 16. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing : removing noise, duplicate records, handling missing data, … </li></ul>
    17. 17. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment : combining the selected data with external data </li></ul>
    18. 18. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li>Data reduction and projection : reduction in number, reducing dimension </li></ul>
    19. 19. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li>Data reduction and projection </li></ul><ul><li>Data mining : uncovering information, interesting patterns </li></ul>
    20. 20. KDD steps <ul><li>Data selection </li></ul><ul><li>Data pre-processing </li></ul><ul><li>Data enrichment </li></ul><ul><li>Data reduction and projection </li></ul><ul><li>Data mining </li></ul><ul><li>Interpretation and reporting : evaluating, understanding, communicating </li></ul>
    21. 21. Data mining <ul><li>Segmentation </li></ul><ul><li>Dependency analysis </li></ul><ul><li>Deviation and outlier analysis </li></ul><ul><li>Trend detection </li></ul><ul><li>Generalization and characterization </li></ul>
    22. 22. DM - segmentation <ul><li>Description: </li></ul><ul><li>Clustering : finding a finite set of implicit classes </li></ul><ul><li>Classification : mapping data items into pre-defined classes </li></ul><ul><li>Techniques: </li></ul><ul><li>Cluster analysis </li></ul><ul><li>Bayesian classification </li></ul><ul><li>Decision or classification trees </li></ul><ul><li>Artificial neural networks </li></ul>
    23. 23. DM - segmentation clustering given classes classification
    24. 24. DM – dependency analysis <ul><li>Description: </li></ul><ul><li>Finding rules to predict the value of some attribute based on other attributes </li></ul><ul><li>Techniques: </li></ul><ul><li>Bayesian networks </li></ul><ul><li>Association rules </li></ul>(4, 12, 0.24) (3, 14, 0.21) (7, 13, 0.43) (2, 9, 0.78) (11, 11, 0.55) (5, 11, ???) (???, 12, 0.51)
    25. 25. DM – dependency analysis <ul><li>Confidence and support measures for association rules of the form: [ if X then Y ]: confidence = #(X and Y in DB) / #(X in DB) support = #(X and Y in DB) / #(all in DB) </li></ul>
    26. 26. DM – deviation & outlier analysis <ul><li>Description: </li></ul><ul><li>Finding data with unusual deviations (=errors, or data of particular interest) </li></ul><ul><li>Techniques: </li></ul><ul><li>Clustering, other mining methods </li></ul><ul><li>Outlier analysis </li></ul>
    27. 27. DM – trend detection <ul><li>Description: </li></ul><ul><li>Finding lines, curves, summarizing the database (often as a function over time) </li></ul><ul><li>Techniques: </li></ul><ul><li>Regression </li></ul><ul><li>Sequential pattern extraction </li></ul>
    28. 28. DM – generalization and characterization <ul><li>Description: </li></ul><ul><li>Obtaining compact descriptions of the data </li></ul><ul><li>Techniques: </li></ul><ul><li>Summary rules </li></ul><ul><li>Attribute-oriented induction </li></ul>concept hierarchy low level concept higher level concept
    29. 29. Visualization and knowledge discovery <ul><li>KDD is difficult to automate  steered by human intelligence </li></ul><ul><li>Visualization helps to understand the data and which data mining techniques to try </li></ul>
    30. 30. KD + geography <ul><li>Special case of KDD </li></ul><ul><li>Other special cases </li></ul><ul><ul><li>marketing </li></ul></ul><ul><ul><li>biology </li></ul></ul><ul><ul><li>astronomy </li></ul></ul><ul><li>Main features: location, distance, dimen-sionality (with dependent dimensions) </li></ul>
    31. 31. KD + geography (attr1, attr2, attr3, attr4); attr’s are numbers and (relatively) independent: statistics (attr1, attr2, attr3, attr4); attr’s can also be on other measurement scales: KDD (attr1, attr2, attr3, attr4); attr’s are often dependent and can be shapes: KD + geography Often: (lat., long., attr1, attr2, …) or: (shape description, attr1, attr2, …)
    32. 32. KD + geography <ul><li>Study of scalable versions of DM tasks (in lat. and long.) </li></ul><ul><li>Certain dimensions can be non-metric (travel time need not be symmetric) </li></ul><ul><li>DM in data that is not in the form of tuples: sets of thematic map layers </li></ul>
    33. 33. Geographic data mining <ul><li>Spatial segmentation (clustering, classification) </li></ul><ul><li>Spatial dependency (spatial association rules) </li></ul><ul><li>Spatial trend detection </li></ul><ul><li>Geographic characterization and generalization </li></ul>
    34. 34. GDM – spatial association rules <ul><li>Example: If a location is within 500 m from water and the average winter temperature is at least –2 degrees then there are frogs around </li></ul>distance relationship
    35. 35. GDM – spatial trend detection <ul><li>Patterns of change with respect to neighborhood of some object </li></ul><ul><li>Example: (North America) Further from Pacific ocean  fewer earthquakes </li></ul>
    36. 36. GDM - applications <ul><li>Map interpretation </li></ul><ul><li>Remote sensing interpretation </li></ul><ul><li>Environmental mapping (soil type, etc.) </li></ul><ul><li>Extracting spatio-temporal patterns for cyclones, crimes </li></ul><ul><li>Spatial interaction (movement/flow of people, capital, goods) </li></ul>
    37. 37. Conclusions <ul><li>GDM & GKD is an extension of (tool for) geographical analysis </li></ul><ul><li>GDM is different from DM due to </li></ul><ul><ul><li>Geographic spaces, not attribute space </li></ul></ul><ul><ul><li>Neighborhood is extremely important </li></ul></ul><ul><ul><li>Scale issues </li></ul></ul><ul><ul><li>Data is different </li></ul></ul><ul><ul><li>Applications (interesting patterns to mine for) are different </li></ul></ul>
    38. 38. This seminar on GDM <ul><li>First: chapters from the book </li></ul><ul><ul><li>CH 1: GDM & KD: an overview (today) </li></ul></ul><ul><ul><li>CH 2: Paradigms for spatial and spatio-temporal DM(11-9) </li></ul></ul><ul><ul><li>CH 3: Fundamentals of spatial DW for GKD (15-9) </li></ul></ul><ul><ul><li>CH 7: Algorithms and applications of SDM (Ronny) (18-9) </li></ul></ul><ul><ul><li>CH 8: Spatial clustering in DM (22-9) </li></ul></ul><ul><ul><li>CH 6: Modeling spatial dependencies (25-9) (not: 29-9 and 2-10) </li></ul></ul><ul><ul><li>CH 9: Detecting outliers (6-10) </li></ul></ul><ul><ul><li>CH 10: Knowledge construction based on GVis and KDD </li></ul></ul><ul><ul><li>CH 14: Mining mobile trajectories </li></ul></ul>
    39. 39. This seminar <ul><li>All PowerPoint presentations on the Web page of the course </li></ul><ul><li>Survey paper or written exam; possible topics for survey: </li></ul><ul><ul><li>Hierarchical clustering </li></ul></ul><ul><ul><li>Clustering with obstacles </li></ul></ul><ul><ul><li>Proximity relationship mining </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Or: joint survey of (geometric) algorithms for GDM </li></ul>
    40. 40. Each presentation <ul><li>The chapter contents </li></ul><ul><li>Additional (spatial) examples (from the Web links or self-constructed) </li></ul><ul><li>Detect and present algorithmic problems that appear  together: report on algorithmic issues in GDM </li></ul><ul><li>Present your chapter; don’t be afraid of overlap with other chapters </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×