Spatial Data Mining : Seminar


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Spatial Data Mining : Seminar

  1. 1. Discovering Spatial Co-locationPatterns : A summary of ResultsShashi Sekhar, Yan HuangDept of Computer Sciences, University of Minnesota, USAAG 2414Spatial AnalysisSeminar by-Adrian C PrelipceanIpsit Dash
  2. 2. Outline• Introduction• Focus of the Research• Background of the Problem• Approaches of Modeling the Co-locationproblem• Co-location Miner Algorithm• Conclusions
  3. 3. Data Data Data!!!!• Data are being collected continuously for innumerablephenomenon.• Business applications• Scientific Applications• National Security PurposesIt is impossible toanalyze each strand ofdata collected.Calls forDataMiningAutomationHypothesisGenerationBetter Linking ofphenomenon
  4. 4. Data Mining• Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns orknowledge from huge amount of data backed by complex computer baseddecision systems ( A.I.,Business Intelligence, Machine learning)• Spatial Data Mining is different from Classical Data Mining used with reference to RDBMS.• Here attributes of the neighbours of some object of interest may have aninfluence on the object and therefore have to be considered as well. Theexplicit location and extension of spatial objects define implicit relations ofspatial neighbourhood (such as topological, distance and direction relations) which are used byspatial data mining algorithms.• Methods used in General Data Mining- Predictive (Classification, Regression)and Descriptive (Clustering, Association)• Methods used in determining Spatial Patterns- Location prediction model (to identify habitat of endangered species) Spatial clusters (crime hot-spots, cancer clusters) Spatial associations: co-locations (predator-prey species, symbiosis dental health and fluoride) Spatial outlier : discontinuities (bad traffic sensors on highways)
  5. 5. Spatial Associations : Co-locations• Classical Association methodology : Given a setof transactions, find rules that will predict the occurrence ofan item based on the occurrences of other items in thetransaction. Implication means co-occurrence, not causality!• Now in case of Spatial Data, the transactions are not disjoint.So classical approach is not ideal to be used here.• Association vs Co-location
  6. 6. Focus of the Research• To extract information from geospatial data and identify frequent co-occurrence among Boolean spatial features like Draught, El Nino,Substantial drop in vegetation etc. on Ecological Datasets.• Approaches to discover co-location rules can be classified into 2 classes-Spatial Statistics, Association Rules• Spatial Statistics- Uses spatial correlation measures to characterizedifferent relations between spatial features ( chi-sq tests, Correlationcoefficients, regression models etc)• Association Rules- Assumes that finite set of disjoint sets are given as input to theAlgorithm and they can find the most frequent items from the set and deducerelationships on those items ( apriori algorithm and Independent Approaches-Based on suitability of Reference Spatial Feature to mine all association rules ofnearby spatial features. )
  7. 7. Background of the Problem• Given1. A set of items T of K boolean spatial feature types T={f1,f2,…,fK)2. A set of N instances P={p1…pN} each p, is a vector <instance-id, spatial feature type,location>3. A neighbor relation R over locations in S4. Min prevalence threshold value, min conditional probability threshold• Objectives1. Completeness: it finds all spatial colocation rules that satisfy the threshold value(s)2. Correctness: any spatial co-location rule found by the algorithm respects the thresholdvalue(s)3. IO cost and CPU cost to generate the colocation rules should be acceptable• Find– Co-location rules with high prevalence and high conditional probability• Constraints– R is symmetric and reflexive– Monotonic prevalence measure– Conditional probability measures are specified by the event centric model– Sparse data set, the number of instance of any spatial features is << cardinality (P)
  8. 8. Approaches of Modeling theCo-location Rules Problem• The reference feature centric model– Is relevant to application domains focusing on a specific booleanspatial feature• The window centric model– Is relevant to applications like mining, surveying and geology,which focus on land-parcels– One goal is to predict sets of spatial features likely to bediscovered in a land parcel given that some other features havebeen found there• The event-centric model– Is relevant to applications that have interest in finding subsets ofspatial features likely to occur in a neighborhood aroundinstances of given subsets of event types
  9. 9. Reference feature centric model• Let the reference feature be A• The set of spatial predicates include onepredicate: close_to(a,b) which is true if andonly if b is a’s neighbour
  10. 10. Reference feature centric modelAssociation rule example:is_type(i,A)∧∃ j is_type(j,B)∧close_to(j,i)→∃ k is_type(k,C)∧close_to(k,i)with 100% probability
  11. 11. Window centric modelNumber of windows: 16Number of windows containing A: 15Number of windows containing A and B: 7Association rule:an instance of type A in a window → aninstance of type B in a window with7/15=46.67% probability
  12. 12. Event centric modelInstances of type A: 4Instances of type A that have someinstances of type B: 1Conditional probability for the co-location rule is:spatial feature A at location l →spatial feature type B in 9-neighborneighborhood is 25%
  13. 13. Concepts
  14. 14. Co-location Miner Algorithm• Input1. K boolean spatial instance and their instances2. A symmetric and reflexive neighbor relation R3. A user specified minimum threshold prevalence measure (min_prevalence)4. A user specified minimum conditional probability (min_cond_prob)• Output– Co-location rule sets with participation index> min_prevalence and conditionalprobability>min_cond_prob• Method1. Prevalent size 1 co-location set along with their table instances=P2. Generate size 2 co-location rules3. For size of co-locations in (2,3,…,K-1) do4. Generate candidate prevalent co-locations using the generalized apriori_genalgorithm5. Generate table instances and prune based on neighborhood6. Prune based on prevalence of co-locations7. Generate co-location rules8. end;
  15. 15. Co-location Miner Algorithm
  16. 16. Conclusions• This paper gives a clear idea about co-localization problem and its difference fromclassical association problem.• Co-location Miner algorithm was dealt witheasily with a comprehensive understandingand detailed analysis.• Future centric approach of the authorspromises development in field of PlaneSweeping Algorithms.
  17. 17. Spatial Co-location Patterns: articles• – S. Shekhar and Y. Huang, Discovering Spatial Co-location Patterns: ASummary of Results, In Proc. of 7th Intl Symposium on Spatial andTemporal Databases (SSTD), Springer-Verlag, Lecture Notes in ComputerScience, LNCS 2121, p.236 ff, July 2001• – S. Shekhar and Y. Huang, Multi-resolution Co-location Miner: a NewAlgorithm to Find Co-location Patterns from Spatial Datasets, SIAMSDM02 Workshop on Mining Scientific Datasets, April 2002• – Y. Huang, H. Xiong, S. Shekhar, and J. Pei, Mining Confident Co-locationRules without A Support Threshold, in Proc. of 18th ACM Symposium onApplied Computing (ACM SAC), March 2003• – Y. Huang, S. Shekhar, and H. Xiong, Discovering Colocation Patterns fromSpatial Datasets: A General Approach, submitted to IEEE Transactions onKnowledge and Data Engineering (TKDE), 2004