Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- The State’s Geopolitics Versus the ... by Geographical Anal... 518 views
- Towards 'Resilient Cities' - Harmon... by Geographical Anal... 602 views
- Onto-planning: innovation for regio... by Geographical Anal... 571 views
- Rotondo selicato ctp_2011 by Geographical Anal... 433 views
- Mapping Invisibles -acquiring GIS f... by Geographical Anal... 1028 views
- Network Based Kernel Density Estima... by Geographical Anal... 545 views

641 views

576 views

576 views

Published on

Theories and Applications of Spatial-Temporal Data Mining and Knowledge Discovery

Yee Leung

Published in:
Technology

No Downloads

Total views

641

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

59

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Theories and Applications of Spatial-Temporal Data Mining and Knowledge Discovery <ul><li>Yee Leung </li></ul><ul><li>Email: [email_address] </li></ul><ul><li>Department of Geography </li></ul><ul><li>and Resource Management </li></ul><ul><li>The Chinese University of Hong Kong </li></ul>
- 4. a) b)
- 5. a) b)
- 9. Daily rainfall data of two stations in Pearl River basin of China
- 11. The monthly sunspot time series.
- 12. The Portuguese Stock Index PSI-20 evolution from 1993 to 2002 (adopted from J.A.O. Matos et al. / Physica A 342 (2004) 665 – 676)
- 13. Outbreak of Avian Flu in different regions
- 15. What are the structures and processes hidden in spatial data? <ul><li>What are the Concepts hidden in the information system? </li></ul><ul><li>Do the Concepts form a knowledge structure? </li></ul>
- 16. Typhoon Tracks Adapted from Wang and Chan
- 17. Typhoon/Hurricane Tracking Objective: Intensity, track (land falling, recurvature) Object: The space-time track of unusually low sea- surface air pressure in the x-y-z plane Data: potential temperature, horizontal velocity, vertical velocity, relative humidity, horizontal wind, etc Data: Hundreds and thousands of gigabytes within a specific time interval
- 21. Data Mining in Hyperspectral Images 1. Objective Classification, Pattern Recognition 2. Object Spectral Signatures of Objects 3. Data Spectral, Non-spectral Data 4. Data Volume e.g. ： AVIRIS ： from 0.4 to 2.45 micrometers, 224 bands HYDICE ： from 0.4 to 2.5 micrometers, 210 bands Hyperion ： from 0.4 to 2.5 micrometers, 220 bands, 30 meter resolution
- 22. The Objective of Knowledge Discovery and Data Mining Fayyad: The discovery of non-trivial, novel, potentially useful and interpretable knowledge/information from data Data Information Knowledge Decision
- 23. Characteristics of Spatial Data <ul><li>1. Voluminous </li></ul>2. Sparse 3. Diversity 4. Complex 5. Dynamic 6. Redundant 7. Imperfect (random ， fuzzy ， granular ， incomplete ， noisy) 8. Multi-scale
- 24. Main Tasks of Spatial Knowledge Discovery and Data Mining 1. Clustering 3. Association 2. Classification Spatial Relations Temporal Relations Spatial-temporal Relations * In particular ： the local-global issue 4. Processes
- 25. CLUSTERING <ul><li>The Scale-Space Filtering Approach </li></ul><ul><li>The Regression-Classes Decomposition Approach </li></ul>
- 26. Scale Space Theory <ul><li>Given a primary image f (x) at a distance of σ from human eyes, the observed blur red image f (x, σ ) can be mathematically determined by the following partial differential equation : </li></ul>The solution of the above equation is explicitly expressed as where ‘∗’ denotes the convolution operation, g (x, σ ) is the Gaussian function
- 27. If the training samples are treated as an imaginary image with expression: Then the corresponding blurred image f (x, σ , D l ) at scale σ can be specified by
- 28. Essentials of Clustering by Scale-space Filtering <ul><li>1. Visual system simulation </li></ul>2. Cluster validity check 3. Clustering validity check 4. Relevant concepts (a) life time of a cluster (b) life time of a clustering (c) compactness (d) isolatedness
- 41. <ul><li>Ms-time plot of clustering results for earthquakes (Ms≥6): </li></ul><ul><li>a) 3 clusters in the 59th~95th scale range; b) 17 clusters at the 6th scale step </li></ul>a) b)
- 42. Temporal segmentation of Strong Earthquakes (Ms≥6.0) of 1290A.D. - 2000A.D. <ul><li>Scale-space for earthquakes (Ms≥6) </li></ul>
- 43. <ul><li>Indices of clustering along the time scale for earthquakes (Ms≥6.0): </li></ul><ul><li>a) number of clusters; b) Lifetime, Isolation and Compactness of the clustering </li></ul>a) b)
- 44. a) b) Ms-time plot of clustering results for earthquakes (Ms≥4.7): a) 2 clusters in the 74th~112th scale range; b) 18 clusters at the 10th scale step
- 45. Temporal Segmentation of Strong Earthquakes (Ms≥4.7) of 1484A.D. - 2000A.D. <ul><li>Indices of clustering along the time scale for earthquakes (Ms≥4.7): </li></ul><ul><li>a) number of clusters ( The vertical axis just shows the part no larger than 150 ); </li></ul><ul><li>b) Lifetime, Isolation and Compactness of the clustering </li></ul>a) b)
- 46. <ul><li>Table 1 Seismic active periods and episodes obtained by the clustering algorithm and the seismologists ( The number in parentheses is the number of earthquakes in the cluster ) </li></ul>
- 48. Advantages of Scale-space Filtering <ul><li>Free from solving global optimization problem </li></ul><ul><li>Independent of initialization </li></ul><ul><li>Robust </li></ul><ul><li>Outliers Detection </li></ul><ul><li>Generalization of scale-related algorithms </li></ul><ul><li>Consistent with visual system </li></ul>
- 49. 5. Scale Space Clustering Scale-Space Filtering for Simulated Data
- 50. 5. Scale Space Clustering Scale-Space Filtering for Remote-Sensing Data Clustering Tree Quasi-Light
- 51. Clustering by Regression-Classes Decomposition Method
- 52. Simple Gaussian Class
- 53. Linear Structure
- 54. Identification of line objects in remotely sensed data
- 55. Ellipsoidal Structure
- 57. Two ellipsoidal feature extraction
- 58. General Curvilinear Structure
- 59. Complex Shape Structure
- 60. ANALYSIS OF SPATIAL RELATIONSHIP <ul><li>Global Description </li></ul><ul><ul><li>Moran’s I </li></ul></ul><ul><ul><li>Geary’s c </li></ul></ul><ul><ul><li>OLR </li></ul></ul><ul><li>Local Description </li></ul><ul><ul><li>Local Moran’s I </li></ul></ul><ul><ul><li>Local Geary’s c </li></ul></ul><ul><ul><li>G Statistic </li></ul></ul><ul><ul><li>Geographically Weighted Regression </li></ul></ul><ul><ul><li>Mixture Distribution </li></ul></ul>
- 61. Geographically Weighted Regression Hypothesis testing 1. Ho: No difference between OLR and GWR 2. Ho: a 1k = a 2k = … = a nk
- 64. (Regression-Classes Decomposition Method)
- 65. CLASSIFICATION <ul><li>The Neural Network Approach </li></ul><ul><li>The Classification and Regression Tree </li></ul><ul><li>The Statistical Classifiers </li></ul>
- 66. Information Extraction and Classification Neural Networks for Classification--MLP-BP
- 67. Some Typical Feedforward Neural Networks <ul><li>Perception </li></ul><ul><ul><li>In late 1950s: layered feed forward networks named perceptron. </li></ul></ul><ul><ul><li>Today: Perceptron : single-layer, feed-forward networks. </li></ul></ul><ul><ul><li>See Fig. 8, each multi-output unit O is fed independently from the input units. </li></ul></ul>Figure 8. Perceptrons
- 68. <ul><li>Mulitlayer Feedforward Neural Network </li></ul><ul><ul><li>Learning algorithms for multilayer networks are neither efficient nor guaranteed to converge to global optimum . </li></ul></ul><ul><ul><li>Most popular learning method: back-propagation . </li></ul></ul><ul><li>Back-propagation learning </li></ul><ul><ul><li>The restaurant problem: use a 2-layer network, 10 attributes = 10 input units, 4 hidden units. See Fig. 13. </li></ul></ul>Some Typical Feedforward Neural Networks (con ’ t) Fig. 13. A 2-layer feedforward network for the restaurant problem.
- 76. <ul><li>Competitive Pattern Recognition by Recurrent NN </li></ul>
- 78. Typhoon Tracks Adapted from Wang and Chan
- 79. Trees by Classification and Regression Tree (CART) MSW 6/12/18: Maximum Sustained Wind of TC 6/12/18 hours before recurvature. 0: Recurve,1: Straight
- 80. <ul><li>1. If MSW of a TC is smaller than or equal to 34 m/s and MSW of that TC is smaller than 2 m/s 6 hours later, then the TC will recurve in 12 hours with 96% accuracy. </li></ul><ul><li>2. If MSW of a TC is smaller than or equal to 34 m/s and MSW of that TC is larger than 2 m/s 6 hours later, the TC will move straight in 12 hours with 86.8% accuracy. </li></ul><ul><li>3. If MSW of a TC is larger than 34 m/s, it will recurve in 18 hours with 94.1% accuracy. </li></ul>Rules by CART
- 81. DISCOVERY OF TEMPORAL PROCESSES <ul><li>The Multifractal Approach </li></ul><ul><li>Conventional Time Series Analyses </li></ul>
- 82. <ul><li>Mining of Scaling Behavior by Multifractal Analysis </li></ul>
- 83. Multiplicative Cascade <ul><li>An approach to the study of scaling behavior with multiple scales (granules). </li></ul><ul><li>Multiplicative Binomial Cascade </li></ul>
- 84. Schematic representation of cascade (adopted from Puente and Lopez, 1995, Physical Letters A)
- 86. TEMPORAL ANALYSIS <ul><li>Linear Time Invariant System Self Similarity Multiscaling Infinitely Divisible Cascade </li></ul><ul><li>Stationary Process </li></ul><ul><li>Non-stationary Process </li></ul><ul><ul><li>Random Walk </li></ul></ul><ul><ul><li>Fractional Brownian Motion, fmb </li></ul></ul><ul><ul><li>Multifractal Analysis of Stochastic Trends </li></ul></ul>
- 87. The Multifractal Approach <ul><li>Establish a data model for stochastic time series </li></ul><ul><li>Discovery of relevant models in stochastic time series </li></ul>
- 88. MF-DFA <ul><li>Detrended fluctuation analysis (DFA) is a method for detecting the long-range correlation and fractal property in the both stationary and non-stationary time series. MF-DFA , which is based on DFA, can give full description of more complicated scaling behavior of time series </li></ul>
- 89. MF-DFA <ul><li>Given a time series with length N. </li></ul><ul><li>Step1: i=1,2,…,N; </li></ul><ul><li>Step2: Divide Y(i) into non-overlapping segments of equal lengths s. In order not to disregard this part of the series, the same procedure is repeated starting from the opposite end. Thereby, 2 Ns segments are obtained altogether. </li></ul>
- 90. MF-DFA <ul><li>Step 3 . Calculate the local trend for each of the 2 N s segments by a least squares fit of the series. Then determine the variance </li></ul><ul><li>for each segment ν , ν = 1 , . . .,N s , and </li></ul><ul><li>for ν = N s + 1 , . . . , 2 N s . Here, is the fitting polynomial in segment ν, whose order m can be 1, 2, 3 … . </li></ul><ul><li>Step 4 . Average over all segments to obtain the q th-order fluctuation function, defined </li></ul><ul><li>Where , s ≥ m + 2. </li></ul>
- 91. MF-DFA <ul><li>Step5: Determine the scaling behavior of the fluctuation function by analyzing log-log plots of Fq(s) versus s for each value of q . If we have , for large values of s, we get the exponent h(q), which may depend on q generally. </li></ul><ul><ul><ul><li>H=h(q=2), for stationary time series; </li></ul></ul></ul><ul><ul><ul><li>H=h(q=2)-1, for non-stationary time series. </li></ul></ul></ul>
- 92. MF-DFA <ul><li>For MF-DFA, if h(q) is constant for all q , the corresponding time series is mono-fractal. However, if h(q) varies with q, that means multifractal. </li></ul>
- 93. <ul><li>( adopted from Peng et. al., 1994 ) </li></ul>
- 102. Daily rainfall data of two stations in Pearl River basin of China
- 103. Log-log plots of F q (s) versus s for the daily rainfall time series of station 56691 in Pearl River basin (left) and Station Chuantang in East River basin (right) with q =2.
- 104. The h ( q ) curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right).
- 105. The curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right).
- 106. The curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right)
- 107. The curves of daily rainfall time series of 5 stations in the Pearl River basin
- 108. The curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right).
- 109. The curves of daily rainfall time series of stations in the Pearl River basin (left) and stations in the East River basin (right). The real lines are their cascade model fitting.
- 110. The correlation relationship between the altitude of the rainfall stations in the East River basin and the D (2) value of the rainfall time series.
- 111. Elevation of rainfall stations in the East River basin with the D2 values of their rainfall data. Elevation (m above MSL)
- 112. DISCOVERY OF KNOWLEDGE STRUCTURES <ul><li>The Concept Lattice Approach </li></ul>
- 113. <ul><li>Discovery of Hierarchical Knowledge from Relational Spatial Data </li></ul>
- 115. Spatial Concept/Class and Data Encapsulation
- 116. Concept Hierarchy
- 117. Inheritance
- 118. Generalization and Specialization
- 119. Summary <ul><li>(1)Concept lattice as a mathematical foundation for object-oriented spatial information system </li></ul><ul><li>(2)Concpet lattice can be employed as method to unravel hierarchical structure from spatial information system </li></ul><ul><li>(3)A bridge between relational spatial information system (vector-based, raster-based) and object-oriented spatial information system. </li></ul>
- 120. Yee Leung. Knowledge Discovery in Spatial Data. Berlin: Springer-Verlag, 2010. [email_address] IGU-Commission on Modeling Geographical Systems http://www.science.mcmaster.ca/~igu~cmgs/

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment