Fuzzy Growing Hierarchical Self-organizing Networks


Published on

Hierarchical Self-Organizing Networks are used to reveal the topology and structure of datasets. Those structures create crisp partitions of the dataset producing branches or prototype vectors that represent groups of data with similar characteristics. However, when observations can be represented by several prototypes with similar accuracy, crisp partitions are forced to classify it in just one group, so crisp divisions usually lose information about the real dataset structure. To deal with this challenge we propose the Fuzzy Growing Hierarchical Self-Organizing Networks (FGHSON). FGHSON are adaptive networks which are able to reflect the underlying structure of the dataset, in a hierarchical fuzzy way. These networks grow by using three variables which govern the membership degree of data observations to its prototype vectors and the quality of the network representation. The resulting structure allows to represent heterogeneous groups and those that present similar membership degree to several clusters

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Fuzzy Growing Hierarchical Self-organizing Networks

  1. 1. Fuzzy Growing Hierarchical Self-Organizing Networks Miguel Barreto-Sanz , Andres Perez-Uribe, Carlos-Andres Peña-Reyes and Marco Tomassini
  2. 2. Outline <ul><li>Introduction Motivation </li></ul><ul><li>Challenges </li></ul><ul><li>Fuzzy Growing Hierarchical Self-Organizing Networks </li></ul><ul><li>How it works ? </li></ul><ul><li>Experimental testing </li></ul><ul><li>Conclusions </li></ul>
  3. 3. Introduction Motivation : Clustering of spatial-temporal data in order to find homolog places. For applications in fields as Geographic Information Systems (GIS) , epidemiology, land use, environmental research, natural resource discovery, and spatial business intelligence. Soil Example: Agriculture Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea. Climate Genotype
  4. 4. Challenges: 1. Large Databases 2. Resolution levels of abstraction 3. The fuzzy and implicit nature of spatial and spatio-temporal relationships between objects. Boundaries between geographic areas are transition zones rather than sharp boundaries. 1 336,025 points just for Colombia Introduction Different resolutions 1 2 3 1 Km 1 Km 1 point
  5. 5. Introduction SOM First solution: Self-Organizing Maps Advantages : It is possible to obtain prototypes Disadvantages : It is not possible to obtain different resolutions (fix size of the Kohonen map) Homolog zones
  6. 6. Introduction Second solution: Growing hierarchical SOM (GHSOM) Advantages : It is possible to obtain different resolutions Disadvantages : It is not possible to represent fuzzy relationships Similar Zones
  7. 7. Introduction Crisp zones obtained with SOM and GHSOM
  8. 8. Introduction SOM Fuzzy GHSOM GHSOM Our solution Fuzzy Growing Hierarchical Self-Organizing Networks
  9. 9. Fuzzy Kohonen Clustering Networks FKCN integrate the idea of fuzzy membership from Fuzzy c-Means (FCM) with the updating rules from SOM. Thus, creating a self-organizing algorithm that automatically adjust the size of the updated neighborhood during a learning process, Wi,t represents the centroid of the i th cluster at iteration t m(t) is an exponent like the fuzzication index in FCM and Uik,t is the membership value of the compound Zk to be part of cluster i.
  10. 10. Fuzzy Growing Hierarchical Self-Organizing Networks Breadth growth process Depth growth or hierarchical growth FKCN
  11. 11. Initial Setup and Global Network Control First Prototype vector One dimension in this example W 0 is a vector that corresponds to the mean of the input variables. Membership degrees in each layer Hierarchical structure The value of qe0 will help to measure the minimum quality of data representation of the prototype vectors in the subsequent layers, therefore the next prototypes have the task of reducing the global representation error qe0.
  12. 12. Breadth growth process Breadth growth process Membership degrees in each layer Hierarchical structure Two initial prototype vectors New prototype vectors added in order to reach a suitable representation of the dataset A membership matrix U is obtained. This matrix contains the membership degree of the dataset elements to the prototype vectors.
  13. 13. Breadth growth process Mean quantization error of the map (MQE) is evaluated in an attempt to measure the quality of data representation, and is used also as stopping criterion for the growing process of the FKCN. The stopping criterion qeu represents the qe of the corresponding prototype u in the upper layer. FKCN1 is allowed to grow until the qe present on the prototype of its preceding layer ( qe0 in the case of layer 1) is reduced to at least a fixed percentage τ 1 For layer 1 In general
  14. 14. Depth growth or hierarchical growth Depth growth or hierarchical growth In particular, those prototypes with a large quantization error will indicate us which clusters need a better representation by means of new FKCNs.
  15. 15. Depth growth or hierarchical growth The prototypes W i which does not fulfil : will be subject to hierarchical expansion. It is used to describe the desired level of granularity in the data representation Minimal membership degree the breadth process described in stage 2 begins with the newly established FKCNs
  16. 16. End of the process The training process of the FGHSON is terminated when no more prototypes require further expansion. Note that this training process does not necessarily lead to a balanced hierarchy, i.e. a hierarchy with equal depth in each branch. Rather, the specific distribution of the input data is modeled by a hierarchical structure, where some clusters require deeper branching that others.
  17. 17. Iris Data Set Iris data sets. There are three Iris categories: Setosa, Versicolor, and Virginica represented respectively by triangles, plus symbols, and dots. Each having 50 samples with 4 features. Here, only three features are used: PL, PW, and SL T1 = 0.3, T2 = 0.065 and phi = 0.2
  18. 18. Iris Data Set Distribution of the prototype vectors, represented by stars, in each layer of the hierarchy.
  19. 19. Iris Data Set Distribution of the prototype vectors, represented by stars, in each layer of the hierarchy.
  20. 20. Iris Data Set Third layer of the FGHSON, in this layer prototypes are presented only in the zone where observations of Virginica and Vesicolor share the same area, so the new prototypes represent each category in a more accurate manner
  21. 21. Toy set T1 = 0.3, T2 = 0.065 and phi = 0.2 Here it is possible to illustrate how the model stop the growing process in those parts where the desired representation is reached and keep on growing where an low membership or poor representation is present.
  22. 22. GIS results
  23. 23. Conclusion The Fuzzy Growing Hierarchical Self-organizing networks are fully adaptive networks able to hierarchically represent complex datasets. Moreover, it allows for a fuzzy clustering of the data, allocating more prototype vectors or branches to heterogeneous areas and where there is presented similar membership degree to several clusters, this can help to better describing the dataset structure and the inner data relationships. Future work will be focused on a more accurate way to find the parameters used to tune the algorithm, more specically . In some cases this value can change in order to find better fuzzy sets to represent the structure of the dataset.
  24. 24. <ul><li>Thanks for new ideas and directions to explore! </li></ul>The end ?