Data Mining Engine for Enterprise GIS<br />AkashDwivedi<br />	  (09IT6001)<br />   		          Under the guidance of <br /...
OUTLINE <br />4/2/2011<br />2<br />
OBJECTIVES <br />4/2/2011<br />3<br />
What is Spatial Data?<br /><ul><li>The data related to objects that occupy space
traffic, bird habitats, global climate, logistics, ...
Object types:
Points, Lines, Polygons ,etc.</li></ul> Used in/for:<br /><ul><li>GIS - Geographic Information Systems
Meteorology
Astronomy
Environmental studies, etc.</li></ul>4/2/2011<br />4<br />
What is Special about Spatial Data<br />4/2/2011<br />5<br />
Why Data Mining in Spatial Data<br />4/2/2011<br />6<br />
Spatial Data + Web Services= OGC (Open Geospatial Consortium)<br />4/2/2011<br />7<br />
Proposed Architecture of Enterprise GIS<br />4/2/2011<br />8<br />Semantic Resolution of query<br />DB1<br />Client Map Ov...
Data Mining Engine Framework<br />Fig.2: Data mining engine framework<br />4/2/2011<br />9<br />
Spatial Outlier Detection<br />
Spatial Outlier<br />Fig.3 : Palm Beach county as spatial outlier (source : http://madison.hss.cmu.edu/buchanan-bush.gif)<...
Spatial Outlier Detection Problem<br />4/2/2011<br />12<br />
Back To Our Motivating Example:-<br />4/2/2011<br />13<br />
Results (Classical Data Mining Algorithms)<br />4/2/2011<br />14<br />
Results for the above methods<br />Fig.4 :Outliers in red color<br />Fig. 5:Outliers in Brown color<br />4/2/2011<br />15<...
Results(Spatial data mining algorithms) <br />4/2/2011<br />16<br />
LAG based approach<br />4/2/2011<br />17<br />
LAG based approach contd..<br />Fig. 6: LAG Based Box Map<br />4/2/2011<br />18<br />
Using Moran Scatter Plot<br />Fig.7 Moran scatter plot, yellow points are spatial outliers<br />4/2/2011<br />19<br />
Verification<br /> Fig. 8: LISA cluster map, Outliers in Red color<br />4/2/2011<br />20<br />
Verification Contd…<br />Fig. 9 : Relation between HR7984 and PE82<br />4/2/2011<br />21<br />
Any Reasons <br />Fig.12 :Scatterplotbw RDAC80 and HR7984 outliers in yellow color.			.<br />4/2/2011<br />22<br />
Spatial Cluster Analysis<br />
4/2/2011<br />24<br />
While choosing a clustering algorithm many factors have to be considered like: <br />4/2/2011<br />25<br />
Spatial Clustering Problem Definition<br />4/2/2011<br />26<br />Given,<br />
Problem Definition Contd…<br />4/2/2011<br />27<br />
Back To Our Motivating Example:-<br />4/2/2011<br />28<br />
Experimental Setup<br />4/2/2011<br />29<br />Table 1. Experimental Setup details<br />
Analysis<br />Histogram<br />Figure 13: Histogram of House price Data<br />We can roughly model with a mixture of componen...
Results for K=2, Using NEM <br />Figure 14: Clustering Results for K=2, High priced Houses in  in Brown color<br />4/2/201...
Results for k=3, Using NEM<br />Figure 15:k=3, High Prices building shown in red color <br />4/2/2011<br />32<br />
Semantic Enrichment using spatial clustering<br />
Problem Definition<br />4/2/2011<br />34<br />
Proposed Solution<br />4/2/2011<br />35<br />
Framework<br /> Figure 16: Semantic enrichment of clusters<br />4/2/2011<br />36<br />
Framework Contd…<br />4/2/2011<br />37<br />
Reasoning of ontology for implicit knowledge<br />4/2/2011<br />38<br />
Results: Ontology<br /> Figure 17:Data ontology for Baltimore House price data<br />4/2/2011<br />39<br />
Results Contd…<br />Reasoning ,<br />ABox reasoning done to this ontology using SPARQL.<br />Sample Query: <br />Figure 18...
Results Contd…<br />Result for the given query<br />Figure 19: Result for the given query<br />4/2/2011<br />41<br />
Future Work<br />4/2/2011<br />42<br />
References<br />4/2/2011<br />43<br />[1] P. Bolstad, "GIS fundamentals," A first text on Geographic Information Systems, ...
Upcoming SlideShare
Loading in …5
×

Presentation1.1

781 views
761 views

Published on

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
781
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Presentation1.1

  1. 1. Data Mining Engine for Enterprise GIS<br />AkashDwivedi<br /> (09IT6001)<br /> Under the guidance of <br />Prof. S.K. Ghosh<br /> School of Information Technology<br /> Indian Institute of Technology, Kharagpur<br />
  2. 2. OUTLINE <br />4/2/2011<br />2<br />
  3. 3. OBJECTIVES <br />4/2/2011<br />3<br />
  4. 4. What is Spatial Data?<br /><ul><li>The data related to objects that occupy space
  5. 5. traffic, bird habitats, global climate, logistics, ...
  6. 6. Object types:
  7. 7. Points, Lines, Polygons ,etc.</li></ul> Used in/for:<br /><ul><li>GIS - Geographic Information Systems
  8. 8. Meteorology
  9. 9. Astronomy
  10. 10. Environmental studies, etc.</li></ul>4/2/2011<br />4<br />
  11. 11. What is Special about Spatial Data<br />4/2/2011<br />5<br />
  12. 12. Why Data Mining in Spatial Data<br />4/2/2011<br />6<br />
  13. 13. Spatial Data + Web Services= OGC (Open Geospatial Consortium)<br />4/2/2011<br />7<br />
  14. 14. Proposed Architecture of Enterprise GIS<br />4/2/2011<br />8<br />Semantic Resolution of query<br />DB1<br />Client Map Overlay<br />Query<br />Broker<br />(service composition)<br />WFS<br />DB 2<br />WFS<br />Spatial Data mining Engine<br />WMS<br />DB n<br />WPS<br />Fig.1: Architecture of Enterprise GIS<br />
  15. 15. Data Mining Engine Framework<br />Fig.2: Data mining engine framework<br />4/2/2011<br />9<br />
  16. 16. Spatial Outlier Detection<br />
  17. 17. Spatial Outlier<br />Fig.3 : Palm Beach county as spatial outlier (source : http://madison.hss.cmu.edu/buchanan-bush.gif)<br />4/2/2011<br />11<br />
  18. 18. Spatial Outlier Detection Problem<br />4/2/2011<br />12<br />
  19. 19. Back To Our Motivating Example:-<br />4/2/2011<br />13<br />
  20. 20. Results (Classical Data Mining Algorithms)<br />4/2/2011<br />14<br />
  21. 21. Results for the above methods<br />Fig.4 :Outliers in red color<br />Fig. 5:Outliers in Brown color<br />4/2/2011<br />15<br />
  22. 22. Results(Spatial data mining algorithms) <br />4/2/2011<br />16<br />
  23. 23. LAG based approach<br />4/2/2011<br />17<br />
  24. 24. LAG based approach contd..<br />Fig. 6: LAG Based Box Map<br />4/2/2011<br />18<br />
  25. 25. Using Moran Scatter Plot<br />Fig.7 Moran scatter plot, yellow points are spatial outliers<br />4/2/2011<br />19<br />
  26. 26. Verification<br /> Fig. 8: LISA cluster map, Outliers in Red color<br />4/2/2011<br />20<br />
  27. 27. Verification Contd…<br />Fig. 9 : Relation between HR7984 and PE82<br />4/2/2011<br />21<br />
  28. 28. Any Reasons <br />Fig.12 :Scatterplotbw RDAC80 and HR7984 outliers in yellow color. .<br />4/2/2011<br />22<br />
  29. 29. Spatial Cluster Analysis<br />
  30. 30. 4/2/2011<br />24<br />
  31. 31. While choosing a clustering algorithm many factors have to be considered like: <br />4/2/2011<br />25<br />
  32. 32. Spatial Clustering Problem Definition<br />4/2/2011<br />26<br />Given,<br />
  33. 33. Problem Definition Contd…<br />4/2/2011<br />27<br />
  34. 34. Back To Our Motivating Example:-<br />4/2/2011<br />28<br />
  35. 35. Experimental Setup<br />4/2/2011<br />29<br />Table 1. Experimental Setup details<br />
  36. 36. Analysis<br />Histogram<br />Figure 13: Histogram of House price Data<br />We can roughly model with a mixture of components.<br />4/2/2011<br />30<br />
  37. 37. Results for K=2, Using NEM <br />Figure 14: Clustering Results for K=2, High priced Houses in in Brown color<br />4/2/2011<br />31<br />
  38. 38. Results for k=3, Using NEM<br />Figure 15:k=3, High Prices building shown in red color <br />4/2/2011<br />32<br />
  39. 39. Semantic Enrichment using spatial clustering<br />
  40. 40. Problem Definition<br />4/2/2011<br />34<br />
  41. 41. Proposed Solution<br />4/2/2011<br />35<br />
  42. 42. Framework<br /> Figure 16: Semantic enrichment of clusters<br />4/2/2011<br />36<br />
  43. 43. Framework Contd…<br />4/2/2011<br />37<br />
  44. 44. Reasoning of ontology for implicit knowledge<br />4/2/2011<br />38<br />
  45. 45. Results: Ontology<br /> Figure 17:Data ontology for Baltimore House price data<br />4/2/2011<br />39<br />
  46. 46. Results Contd…<br />Reasoning ,<br />ABox reasoning done to this ontology using SPARQL.<br />Sample Query: <br />Figure 18:SPARQL Query page<br />4/2/2011<br />40<br />
  47. 47. Results Contd…<br />Result for the given query<br />Figure 19: Result for the given query<br />4/2/2011<br />41<br />
  48. 48. Future Work<br />4/2/2011<br />42<br />
  49. 49. References<br />4/2/2011<br />43<br />[1] P. Bolstad, "GIS fundamentals," A first text on Geographic Information Systems, 2002.<br />[2] S. and Chawla, S. Shekhar, "Spatial databases: a tour," Upper Saddle River, New Jersey, vol. 7458.<br />[3] K. and Adhikary, J. and Han, J. Koperski, "Spatial data mining: progress and challenges survey paper," in Proc. ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Canada., 1996.<br />[4] R. and Srikant, R. Agrawal, "Fast algorithms for mining association rules," in Proc. 20th Int. Conf. Very Large Data Bases, VLDB., 1994, vol. 1215, pp. 487--499.<br />[5] J.R. Quinlan, C4. 5: programs for machine learning.: Morgan Kaufmann, 1993.<br />[6] V. and Lewis, T. Barnett, Outliers in statistical data. New York: Wiley , 1994.<br />[7] A.K. and Dubes, R.C. Jain, Algorithms for clustering data., 1988.<br />[8] L. and Procopiuc, O. and Ramaswamy, S. and Suel, T. and Vitter, J.S. Arge, "Scalable sweeping-based spatial join," in PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES., 1998, pp. 570--581.<br />[9] Y. Chou,.: Onward Press, 1997.<br />[10]H.P. Kriegel, R.T. Ng, and J. Sander M.M. Breunig, "Optics-of: Id ntifying local outliers," Proc. of PKDD, pp. 262-270, 1999.<br />
  50. 50. References Contd…<br />4/2/2011<br />44<br />[11] V. Barnett and T. Lewis, Outliers in Statistical Data. New York: John Wiley, 1994.<br />[12] M.M Breunig, H.P. Kriegel, and J. Sander M. ankerst, "Ordering points to identify the clustering," International conference on Management of Data, pp. 49-60, 1999.<br />[13] R. Johnson, Applied Multivariate Statistical Analysis.: Prentice Halt, 1992.<br />[14] R. Rastogi, and K. Shim. S. Ramaswamy, "Efficient algorithms for mining outliers from large data sets," Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 427-438, 2000.<br />[15] Shashi and Lu, Chang-Tien and Zhang, PushengShekhar, "A Unified Approach to Detecting Spatial Outliers," Geoinformatica, vol. 7, no. 2, pp. 139--166, June 2003.<br />[16] Anselin Luc, "Exploratory spatial data analysis and geographic information systems," in New Tools for Spatial Analysis., 1994, pp. 45-54.<br />[17] D. and Hebeler, J. and Dean, M. Kolas, "Geospatial semantic web: Architecture of ontologies," GeoSpatial Semantics, pp. 183--194, 2005.<br />[18] T. and Vt, "Creating and using geospatial ontology time series in a semantic cultural heritage portal," in Proceedings of the 5th European semantic web conference on The semantic web: research and applications.: Springer-Verlag, 2008, pp. 110—123.<br />
  51. 51. References Contd…<br />4/2/2011<br />45<br />[19]P. and Di, L. and Yang, W. and Yu, G. and Zhao, P. and Gong, J. Yue, "Semantic Web Services-based process planning for earth science applications," International Journal of Geographical Information Science, vol. 29, no. 9, pp. 1139--1163, 2009.<br />[20]M. and Ghosh, SK Paul, "oward Assessing Semantic Similarity of Geospatial Services," in TENCON 2006. 2006 IEEE Region 10 Conference., pp. 1--4.<br />[21]E. and Lutz, M. and Kuhn, W. Klien, "Ontology-based discovery of geographic information services--An application in disaster management," Computers, environment and urban systems, vol. 30, no. 1, 2006.<br />[22]Anselin Luc, "Local indicators of spatial association: LISA," Geographical Analysis, vol. 27, no. 2.<br />[23]L. Anselin, D. Hawkins, G. Deane, S. Tolnay, R. Baller S. Messner. (2000) [Online]. http://www.ncovr.heinz.cmu.edu/<br />[24]ShashiShekhar,Weili Wu, and UygarOzesmi Sanjay Chawla, "Predicting Locations Using Map Similarity(PLUMS): A Framework for Spatial Data Mining," in MDM/KDD, Simeon J. Simoff and Osmar R. Za, Ed. Boston, MA, USA: University of Alberta, 2000, pp. 14-24.<br />[25]Robin A. Dubin. (1992) geodacenter.asu.edu. [Online]. http://geodacenter.org/downloads/data-files/baltimore.zip<br />
  52. 52. References Contd…<br />4/2/2011<br />46<br />[26]<br />P. Zhang, Y. Huang, R. Vatsavai S. Shekhar, "Trend in Spatial Data Mining," in Data Mining: Next Generation Challenges and Future Directions.: AAAI/MIT Press, 2003.<br />[27]<br />C. and Govaert, G. Ambroise, "onvergence of an EM-type algorithm for spatial clustering," pattern recognition letters, vol. 19, no. 10, pp. 919--927, 1998.<br />[28]<br />N. Alameh, "Chaining geographic information web services," IEEE Internet Computing, vol. 7, no. 5, pp. 22--29, 2003.<br />[29]<br />A. and Lucchi, R. and Lutz, M. and OstlFriis-Christensen, "Service chaining architectures for applications implementing distributed geographic information processing," International Journal of Geographical Information Science, vol. 23, no. 5, pp. 561--580, 2009.<br />[30]<br />P. and Gong, J. and Di, L. and He, L. and Wei, Y. Yue, "Integrating semantic web technologies and geospatial catalog services for geospatial information discovery and processing in cyberinfrastructure," GeoInformatica, 2009.<br />
  53. 53. 4/2/2011<br />47<br />
  54. 54. Box Map<br />Since box maps are based on the same methodology as box plots, they can be used to detect outliers in a stricter sense than is possible with percentile maps. Box maps group values such as counts or rates into six fixed categories: Four quartiles (1-25%, 25-50%, 50-75%, and 75-100%) plus two outlier categories at the low and high end of the distribution.<br />Values are classified as outliers if they are 1. 5 times higher than the interquartile range (IQR). IQR is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) or Q3-Q1. It describes the range of the middle of the distribution since 25% of values are above the interquartile range and 25% below it.<br />4/2/2011<br />48<br />
  55. 55. Box Plot<br />Box plots are particularly useful to identify outliers and gain an overview of the spread of a distribution.<br />The box plot (sometimes referred to as box and whisker plot) is a non-parametric method. For normally distributed data, the median corresponds to the mean and the interquartile range to the standard deviation. The box plot shows the median, first and third quartile of a distribution (the 50%, 25% and 75% points in the cumulative distribution) as well as outliers. An observation is classified as an outlier when it lies more than a given multiple of the interquartile range (the difference in value between the 75% and 25% observation) above or below respectively the value for the 75th percentile and 25th percentile. The standard multiples used are 1.5 and 3 times the interquartile range.<br />The red bar in the middle corresponds to the median, the dark part shows the interquartile range. The individual observations in the first and fourth quartile are shown as blue dots. The thin line is the hinge, corresponding to the default criterion of 1.5.<br />4/2/2011<br />49<br />

×