Bio inspired computational techniques applied to the analysis and visualization of spatio-temporal cluster dynamics
Bio-inspired computational techniques applied to the analysis and visualization of spatio-temporal cluster dynamics Miguel Arturo Barreto Sánz [email_address] Faculté des Hautes Etudes Commerciales (HEC) Institut des Systèmes d'information (ISI)
Outline ● Introduction Data mining in spatio-temporal datasets ● Research plan Specific Goals Challenges in mining spatio-temporal datasets State of the art Approaches ● Preliminary results and discussion 1
Introduction 2 ● Increasing number of complex data sets associated to geographical areas ● Routinely capture huge volumes of data describing several human or nature behaviors For instance :
Information sources 3 Information received from remote sensing systems, and environmental monitoring devices used in: ● Agriculture ● Weather prediction ● Cartography Introduction
4 These data sets are critical for decision support , but their value depends on the ability to extract useful information for studying and understanding the phenomena governing the data source. Introduction Data mining in spatio-temporal datasets
5 Currently ● Data mining in geospatial data take just the static view of geospatial phenomena . However ● Geographic phenomena evolve over time ● Mining spatio-temporal data is related to the temporal dynamics of geospatial data = crucial to our understanding of geographic-based process and events. Goal ● Describe the manner in which spatial patterns change through time Introduction Data mining in spatio-temporal datasets
Data mining in spatio-temporal datasets 6 Introduction Some fields and applications include: ● Agro-ecology ● Environmental change ● Species distribution ● Disease propagation ● Urban dynamics ● Migration patterns
1 Introduction Data mining in spatio-temporal datasets Manage and understand changing spatial patterns of yields ● What are the variables that make that some regions produce more that the others ? ● Why are regions that maintain its production over time ? 7
8 The Normalized Difference Vegetation Index ( NDVI ) gives a measure of the vegetative cover on the land surface over wide areas. ● What variables are related with the changes in the vegetative cover ? Introduction Data mining in spatio-temporal datasets Environmental Change (Satellite images) Summer 1989 Summer 1990 Summer 1991 Summer 1992 Sumer 1993 Summer 1994 Summer 1996 Summer 1997 Summer 1998 Summer 1999 Summer 2000 Summer 2001
9 It is very important to conduct research on data mining of spatio-temporal datasets . ● Develop methodologies ● Assist the knowledge extraction from spatio-temporal datasets ● Improving making decision processes. Introduction Data mining in spatio-temporal datasets New methodologies
10 To deal with the inherent characteristics of the spatio-temporal datasets ● Multivariate and Temporal Mapping ● Visualization of Very Large Datasets ● Changing spatial patterns Introduction Data mining in spatio-temporal datasets New methodologies For instance … New methodologies to mining spatio-temporal datasets Visualization of spatio-temporal cluster dynamics To provide insights about the nature of cluster change
Introduction Data mining in spatio-temporal datasets New methodologies Similarity of sugarcane growing environmental conditions (1999-2001) using Self-organizing maps 11
12 Introduction Data mining in spatio-temporal datasets New methodologies ● Which is the variable or variables that make that two clusters merge in one. ● There are sites that change from one cluster to another year after year? ● Why that happens?. ● It is possible to find recurrent patterns in the dynamics of the clusters?
Specific Goals 13 Development of bio-inspired methodologies for the detection and tracking of changes in spatio-temporal clusters. ● Agro-ecological datasets will be used as a case study. ● This approach implies to find clusters of sites with similar characteristics in time and space. Development of bio-inspired methodologies for the visualization of spatio-temporal cluster dynamics. Research plan
Clusters of sites with similar characteristics in time and space 14 Research plan Specific Goals What crops or varieties are likely to perform well where and when . Soil Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea. Climate Genotype
Clusters of sites with similar characteristics in time and space 15 Research plan Specific Goals Harvest at different time of the same crop
Clusters of sites with similar characteristics in time and space 16 Research plan Specific Goals For commercial (mass production) crops (rice, corn) it is known the “when” and “where” For native crops (guanabana, lulo) or special types of crops (coffee varieties) it is not the case. DAPA (Diversification Agriculture Project Alliance) When and what I must cultivate ? Market demand The COCH project
Challenges in mining spatio-temporal datasets 17 Research plan The special nature of spatio-temporal data poses several challenges to the knowledge extraction process. For instance: ● Heterogeneity in sources of information and in scales of time and space ● Spatial autocorrelation ● Boundaries in geospatial data ● Temporal relationships between spatial objects ● Visualization of spatio-temporal cluster dynamics ● Geographic space and feature space
18 Research plan Challenges in mining spatio-temporal datasets Conventional methods are not effective for handling mixture of data types and sources. Heterogeneity in sources of information
19 Research plan Challenges in mining spatio-temporal datasets Heterogeneity in scales of time and space Necessary to have methodologies to evaluate clusters at different scales in order to find “interesting” patterns between levels. Improve the analysis of cluster structure at different scales, creating representations of the cluster facilitating the selection of clusters at different scales.
20 Research plan Challenges in mining spatio-temporal datasets Spatial autocorrelation The spatial autocorrelation can be defined as the degree of relationship that exists between two or more spatial-data variables
21 Research plan Challenges in mining spatio-temporal datasets Boundaries in geospatial data Algorithms for knowledge discovery in spatio-temporal databases have to consider the neighbors of the geo-referenced data. For instance, part of the complexity of the problem lies in the fact that the boundaries of these neighbors are not hard, but rather soft boundaries .
Research plan Challenges in mining spatio-temporal datasets The relationship between spatial objects can change over time. This dynamic relationships can be observed for instance in the cluster changes over the time. Temporal relationships between spatial objects 22 Similarity of sugarcane growing environmental conditions (1999-2001) using Self-organizing maps
Research plan Challenges in mining spatio-temporal datasets Geographic space and feature space Geographic space Feature space Geographic space is concerned with surface features as the terrain we walk on. Feature space visualization is concerned with the representation of similarities associated with geo-referenced sites in the geographic space 23
Research plan Challenges in mining spatio-temporal datasets Visualization of spatio-temporal cluster dynamics ● Visualization of the overall structure of the dataset, ● Exploration of correlations and relationships. ● Visualization of temporal patterns. 24 1 336,025 points just for Colombia 1 Km 1 Km 1 point
State of the art Research plan Myra Spiliopoulou, et al. Monic: modeling and monitoring cluster transitions . In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. Daniel B. Neill et al. Detection of emerging space-time clusters . In KDD ’05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. Geoffrey M. Jacquez. Spatial Cluster Analysis (The Handbook of Geographic Information Science). John Wilson (University of Southern California), 2008 ● Small databases ● No agro-ecologic or environmental databases ● Recorded in controlled conditions ● Based on statistical models 25
Approaches Research plan Used to analyze data when there is only a low level of knowledge about the dataset ● Unsupervised learning Heterogeneous data ● Hierarchical methods Heterogeneity in scales of time and space 26
Approaches Research plan ● Data abstraction methods Heterogeneity in scales of time and space 27 Examples Prototype Examples Prototype
Approaches Research plan A Self-Organizing Map ( SOM) applies a learning strategy used in neural structures like the cortex, and presents several advantages that we will exploit in our research in order to gain insights about the spatial autocorrelation present in the geographic zones. The neighbourhood function hck ( t ) of a SOM, centred over the best matched neuron mc . ● Self-Organizing Map ( SOM) Spatial autocorrelation 28
Approaches Research plan 29 Similarity of sugarcane growing environmental conditions (1999-2005) using Self-organizing maps The clusters found in the feature space in many cases are not the same as those found in geographic space. Represent clusters of a multidimensional space: map multidimensional data onto a two-dimensional lattice of cells. ● Self-Organizing Map ( SOM) Geographic space and feature space
Approaches Research plan ● Self-Organizing Map ( SOM) Visualization of spatio temporal cluster dynamics Visualization of the overall structure of the dataset , it is clustering, patterns (similarities) and irregularities. Exploration of correlations and relationships . This is primarily based on component plane displays in multiple views. Visualization of temporal patterns . Examples are ordered component displays and trajectories. 30 Partial Correlation
Approaches Research plan In many applications crisp partitions are not the optimal representation of clusters. With the purpose of representing degrees of membership, is a feature that could be added to the model. ● Fuzzy logic Boundaries in geospatial data 31
Approaches Research plan To deal with non stationary-relationships implies to find relationships which varies through time and space . This challenge involves the creation of methodologies capable to adapt their models in order to reveal the dynamics of the clusters and represent their characteristics in the most accurate manner. Growing hierarchical Self-Organizing Structures could be used as a base for hybrid models in order to detect, reveal and analyze spatio-temporal cluster dynamics. ● Non-stationarity relationships between spatial objects Growing hierarchical Self-Organizing Structures 32
I propose ... Research plan Approaches An unsupervised model based on self-organization which allows data abstraction, hierarchical organization of the clusters , and automatic detection of interesting changes in the dynamics of spatio-temporal clusters. Some characteristics of the model must be: ● Adapt its structure . ● Changes presented in its structure will reveal cluster dynamics as merging, emergence, mutation, and parallel dynamics. 33
I propose ... Research plan Approaches ● The hierarchical structure will permit to tackle the problem related to the scale effect (navigation of the clustering structure in different levels). ● The model will work with fuzzy memberships to avoid the problem of boundaries in geospatial data. ● The unsupervised methodology will help to find relationships that can be hidden in very large and heterogeneous datasets ( Heterogeneity in sources of information ). 34
Preliminary results and discussion  Miguel Barreto-Sanz. and Andrés Pérez-Uribe. Classification of similar productivity zones in the sugar cane culture using clustering of som component planes based on the som distance matrix. In The 6th International Workshop on Self-Organizing Maps (WSOM), 2007.  Miguel Barreto-Sanz. and Andrés Pérez-Uribe. Improving the correlation hunting in a large quantity of som component planes. In ICANN 2007. Proceedings of the 1th international conference on Artificial Neural Networks.  Miguel Barreto-Sanz and Andrés Pérez-Uribe. Tree-structured self-organizing map component planes as a visualization tool for data exploration in agro-ecological modeling. In in Proc. of the 6th European Conf. on Ecological Modelling, Trieste, Italy, 2007 35
Preliminary results and discussion  Miguel Barreto-Sanz, Andrés Pérez-Uribe, Carlos-Andres Peña-Reyes, and Marco Tomassini. Fuzzy growing hierarchical self organizing networks . In ICANN 2008: Proceedings of the 18th international conference on Artificial Neural Networks.  Miguel Barreto-Sanz, Andrés Pérez-Uribe, Carlos-Andres Peña-Reyes, and Marco Tomassini. Tuning Parameters in the Fuzzy Growing Hierarchical Self-Organizing Networks . To appear in: Studies in Computational Intelligence, CONSTRUCTIVE NEURAL NETWORKS Springer, 2009. 36
Thanks for new ideas and directions to explore!
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.