• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Presentation1.1
 

Presentation1.1

on

  • 659 views

 

Statistics

Views

Total Views
659
Views on SlideShare
659
Embed Views
0

Actions

Likes
2
Downloads
36
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Presentation1.1 Presentation1.1 Presentation Transcript

    • Data Mining Engine for Enterprise GIS
      AkashDwivedi
      (09IT6001)
      Under the guidance of
      Prof. S.K. Ghosh
      School of Information Technology
      Indian Institute of Technology, Kharagpur
    • OUTLINE
      4/2/2011
      2
    • OBJECTIVES
      4/2/2011
      3
    • What is Spatial Data?
      • The data related to objects that occupy space
      • traffic, bird habitats, global climate, logistics, ...
      • Object types:
      • Points, Lines, Polygons ,etc.
      Used in/for:
      • GIS - Geographic Information Systems
      • Meteorology
      • Astronomy
      • Environmental studies, etc.
      4/2/2011
      4
    • What is Special about Spatial Data
      4/2/2011
      5
    • Why Data Mining in Spatial Data
      4/2/2011
      6
    • Spatial Data + Web Services= OGC (Open Geospatial Consortium)
      4/2/2011
      7
    • Proposed Architecture of Enterprise GIS
      4/2/2011
      8
      Semantic Resolution of query
      DB1
      Client Map Overlay
      Query
      Broker
      (service composition)
      WFS
      DB 2
      WFS
      Spatial Data mining Engine
      WMS
      DB n
      WPS
      Fig.1: Architecture of Enterprise GIS
    • Data Mining Engine Framework
      Fig.2: Data mining engine framework
      4/2/2011
      9
    • Spatial Outlier Detection
    • Spatial Outlier
      Fig.3 : Palm Beach county as spatial outlier (source : http://madison.hss.cmu.edu/buchanan-bush.gif)
      4/2/2011
      11
    • Spatial Outlier Detection Problem
      4/2/2011
      12
    • Back To Our Motivating Example:-
      4/2/2011
      13
    • Results (Classical Data Mining Algorithms)
      4/2/2011
      14
    • Results for the above methods
      Fig.4 :Outliers in red color
      Fig. 5:Outliers in Brown color
      4/2/2011
      15
    • Results(Spatial data mining algorithms)
      4/2/2011
      16
    • LAG based approach
      4/2/2011
      17
    • LAG based approach contd..
      Fig. 6: LAG Based Box Map
      4/2/2011
      18
    • Using Moran Scatter Plot
      Fig.7 Moran scatter plot, yellow points are spatial outliers
      4/2/2011
      19
    • Verification
      Fig. 8: LISA cluster map, Outliers in Red color
      4/2/2011
      20
    • Verification Contd…
      Fig. 9 : Relation between HR7984 and PE82
      4/2/2011
      21
    • Any Reasons
      Fig.12 :Scatterplotbw RDAC80 and HR7984 outliers in yellow color. .
      4/2/2011
      22
    • Spatial Cluster Analysis
    • 4/2/2011
      24
    • While choosing a clustering algorithm many factors have to be considered like:
      4/2/2011
      25
    • Spatial Clustering Problem Definition
      4/2/2011
      26
      Given,
    • Problem Definition Contd…
      4/2/2011
      27
    • Back To Our Motivating Example:-
      4/2/2011
      28
    • Experimental Setup
      4/2/2011
      29
      Table 1. Experimental Setup details
    • Analysis
      Histogram
      Figure 13: Histogram of House price Data
      We can roughly model with a mixture of components.
      4/2/2011
      30
    • Results for K=2, Using NEM
      Figure 14: Clustering Results for K=2, High priced Houses in in Brown color
      4/2/2011
      31
    • Results for k=3, Using NEM
      Figure 15:k=3, High Prices building shown in red color
      4/2/2011
      32
    • Semantic Enrichment using spatial clustering
    • Problem Definition
      4/2/2011
      34
    • Proposed Solution
      4/2/2011
      35
    • Framework
      Figure 16: Semantic enrichment of clusters
      4/2/2011
      36
    • Framework Contd…
      4/2/2011
      37
    • Reasoning of ontology for implicit knowledge
      4/2/2011
      38
    • Results: Ontology
      Figure 17:Data ontology for Baltimore House price data
      4/2/2011
      39
    • Results Contd…
      Reasoning ,
      ABox reasoning done to this ontology using SPARQL.
      Sample Query:
      Figure 18:SPARQL Query page
      4/2/2011
      40
    • Results Contd…
      Result for the given query
      Figure 19: Result for the given query
      4/2/2011
      41
    • Future Work
      4/2/2011
      42
    • References
      4/2/2011
      43
      [1] P. Bolstad, "GIS fundamentals," A first text on Geographic Information Systems, 2002.
      [2] S. and Chawla, S. Shekhar, "Spatial databases: a tour," Upper Saddle River, New Jersey, vol. 7458.
      [3] K. and Adhikary, J. and Han, J. Koperski, "Spatial data mining: progress and challenges survey paper," in Proc. ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada., 1996.
      [4] R. and Srikant, R. Agrawal, "Fast algorithms for mining association rules," in Proc. 20th Int. Conf. Very Large Data Bases, VLDB., 1994, vol. 1215, pp. 487--499.
      [5] J.R. Quinlan, C4. 5: programs for machine learning.: Morgan Kaufmann, 1993.
      [6] V. and Lewis, T. Barnett, Outliers in statistical data. New York: Wiley , 1994.
      [7] A.K. and Dubes, R.C. Jain, Algorithms for clustering data., 1988.
      [8] L. and Procopiuc, O. and Ramaswamy, S. and Suel, T. and Vitter, J.S. Arge, "Scalable sweeping-based spatial join," in PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES., 1998, pp. 570--581.
      [9] Y. Chou,.: Onward Press, 1997.
      [10]H.P. Kriegel, R.T. Ng, and J. Sander M.M. Breunig, "Optics-of: Id ntifying local outliers," Proc. of PKDD, pp. 262-270, 1999.
    • References Contd…
      4/2/2011
      44
      [11] V. Barnett and T. Lewis, Outliers in Statistical Data. New York: John Wiley, 1994.
      [12] M.M Breunig, H.P. Kriegel, and J. Sander M. ankerst, "Ordering points to identify the clustering," International conference on Management of Data, pp. 49-60, 1999.
      [13] R. Johnson, Applied Multivariate Statistical Analysis.: Prentice Halt, 1992.
      [14] R. Rastogi, and K. Shim. S. Ramaswamy, "Efficient algorithms for mining outliers from large data sets," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427-438, 2000.
      [15] Shashi and Lu, Chang-Tien and Zhang, PushengShekhar, "A Unified Approach to Detecting Spatial Outliers," Geoinformatica, vol. 7, no. 2, pp. 139--166, June 2003.
      [16] Anselin Luc, "Exploratory spatial data analysis and geographic information systems," in New Tools for Spatial Analysis., 1994, pp. 45-54.
      [17] D. and Hebeler, J. and Dean, M. Kolas, "Geospatial semantic web: Architecture of ontologies," GeoSpatial Semantics, pp. 183--194, 2005.
      [18] T. and Vt, "Creating and using geospatial ontology time series in a semantic cultural heritage portal," in Proceedings of the 5th European semantic web conference on The semantic web: research and applications.: Springer-Verlag, 2008, pp. 110—123.
    • References Contd…
      4/2/2011
      45
      [19]P. and Di, L. and Yang, W. and Yu, G. and Zhao, P. and Gong, J. Yue, "Semantic Web Services-based process planning for earth science applications," International Journal of Geographical Information Science, vol. 29, no. 9, pp. 1139--1163, 2009.
      [20]M. and Ghosh, SK Paul, "oward Assessing Semantic Similarity of Geospatial Services," in TENCON 2006. 2006 IEEE Region 10 Conference., pp. 1--4.
      [21]E. and Lutz, M. and Kuhn, W. Klien, "Ontology-based discovery of geographic information services--An application in disaster management," Computers, environment and urban systems, vol. 30, no. 1, 2006.
      [22]Anselin Luc, "Local indicators of spatial association: LISA," Geographical Analysis, vol. 27, no. 2.
      [23]L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller S. Messner. (2000) [Online]. http://www.ncovr.heinz.cmu.edu/
      [24]ShashiShekhar,Weili Wu, and UygarOzesmi Sanjay Chawla, "Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining," in MDM/KDD, Simeon J. Simoff and Osmar R. Za, Ed. Boston, MA, USA: University of Alberta, 2000, pp. 14-24.
      [25]Robin A. Dubin. (1992) geodacenter.asu.edu. [Online]. http://geodacenter.org/downloads/data-files/baltimore.zip
    • References Contd…
      4/2/2011
      46
      [26]
      P. Zhang, Y. Huang, R. Vatsavai S. Shekhar, "Trend in Spatial Data Mining," in Data Mining: Next Generation Challenges and Future Directions.: AAAI/MIT Press, 2003.
      [27]
      C. and Govaert, G. Ambroise, "onvergence of an EM-type algorithm for spatial clustering," pattern recognition letters, vol. 19, no. 10, pp. 919--927, 1998.
      [28]
      N. Alameh, "Chaining geographic information web services," IEEE Internet Computing, vol. 7, no. 5, pp. 22--29, 2003.
      [29]
      A. and Lucchi, R. and Lutz, M. and OstlFriis-Christensen, "Service chaining architectures for applications implementing distributed geographic information processing," International Journal of Geographical Information Science, vol. 23, no. 5, pp. 561--580, 2009.
      [30]
      P. and Gong, J. and Di, L. and He, L. and Wei, Y. Yue, "Integrating semantic web technologies and geospatial catalog services for geospatial information discovery and processing in cyberinfrastructure," GeoInformatica, 2009.
    • 4/2/2011
      47
    • Box Map
      Since box maps are based on the same methodology as box plots, they can be used to detect outliers in a stricter sense than is possible with percentile maps. Box maps group values such as counts or rates into six fixed categories: Four quartiles (1-25%, 25-50%, 50-75%, and 75-100%) plus two outlier categories at the low and high end of the distribution.
      Values are classified as outliers if they are 1. 5 times higher than the interquartile range (IQR). IQR is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) or Q3-Q1. It describes the range of the middle of the distribution since 25% of values are above the interquartile range and 25% below it.
      4/2/2011
      48
    • Box Plot
      Box plots are particularly useful to identify outliers and gain an overview of the spread of a distribution.
      The box plot (sometimes referred to as box and whisker plot) is a non-parametric method. For normally distributed data, the median corresponds to the mean and the interquartile range to the standard deviation. The box plot shows the median, first and third quartile of a distribution (the 50%, 25% and 75% points in the cumulative distribution) as well as outliers. An observation is classified as an outlier when it lies more than a given multiple of the interquartile range (the difference in value between the 75% and 25% observation) above or below respectively the value for the 75th percentile and 25th percentile. The standard multiples used are 1.5 and 3 times the interquartile range.
      The red bar in the middle corresponds to the median, the dark part shows the interquartile range. The individual observations in the first and fourth quartile are shown as blue dots. The thin line is the hinge, corresponding to the default criterion of 1.5.
      4/2/2011
      49