Fuzzy Clustering
   By:- Akshay Chaudhari
Agenda
 Introduction
 Fuzzy C-Mean clustering
 Algorithm
 Complexity analysis
 Pros. and cons.
 References
Introduction
Fuzzy clustering is a method of clustering which
  allows one piece of data to belong to two or
  more clusters.
In other words, each data is a member of every
  cluster but with a certain degree known as
  membership value.
This method (developed by Dunn in 1973 and
  improved by Bezdek in 1981) is frequently used
  in pattern recognition.
Applications
Image segmentation
  Medical imaging
    X-ray Computer Tomography (CT)
    Magnetic Resonance Imaging (MRI)
    Position Emission Tomography (PET)

Image and speech enhancement
Edge detection
Video shot change detection
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-means Clustering
Fuzzy C-Mean Algorithm
1. Select an initial fuzzy pseudo-partition, i.e. ,assign values
     to all uij.

2. repeat
3.     Compute the centroid of each cluster using fuzzy
     pseudo-partition.

4.      Recompute fuzzy pseudo-partition, i.e., the uij.

5. until the centroids don’t change.
Algorithm
An example
 X=[3 7 10 17 18 20] and assume C=2
                                   0.1 0.2 0.6 0.3 0.1 0.5
 Initially, set U randomly   U=
                                   0.9 0.8 0.4 0.7 0.9 0.5

                                       N

                                       ∑u      m
                                                 x
                                               ij i
                                cj =   i =1
                                          N

                                       ∑u
                                        i =1
                                                m
                                                ij

 Compute centroids, cj using                         , assume m=2
                                                                              1
                                                          uij =                              2
                                                                   C
                                                                        || xi − c j ||    m −1


 c1=13.16; c2=11.81
                                                                  ∑  || x − c || 
                                                                       
                                                                  k =1 
                                                                                        
                                                                             i    k 



 Compute new membership values, uij using
                 0.43 0.38 0.24 0.65 0.62 0.59
            U=
 New U:         0.57 0.62 0.76 0.35 0.38 0.41

 Repeat centroid and membership computation until changes in
   membership values are smaller than say 0.01
Complexity analysis
 Time complexity of the fuzzy c mean algorithm is
  O(ndc2i)
 Where
        i number FCM over entire dataset.
        n number of data points.
        c number of clusters
        d number of dimensions

       where… i grows very slowly with n,c and d.
Pros. & Cons.
 Pros:
   Allows a data point to be in multiple clusters
   A more natural representation of the behavior of genes
    genes usually are involved in multiple functions

 Cons:
   Need to define c, the number of clusters
   Need to determine membership cutoff value
   Clusters are sensitive to initial assignment of centroids
    Fuzzy c-means is not a deterministic algorithm
References
 http://home.dei.polimi.it/matteucc/Clustering/tutorial_h
  tml/cmeans.html
 http://en.wikipedia.org/wiki/Fuzzy_clustering
 Section 9.2 from Introduction to Data Mining by Tan,
  Kumar, Steinbach
Thank You …..

Fuzzy dm

  • 1.
    Fuzzy Clustering By:- Akshay Chaudhari
  • 2.
    Agenda  Introduction  FuzzyC-Mean clustering  Algorithm  Complexity analysis  Pros. and cons.  References
  • 3.
    Introduction Fuzzy clustering isa method of clustering which allows one piece of data to belong to two or more clusters. In other words, each data is a member of every cluster but with a certain degree known as membership value. This method (developed by Dunn in 1973 and improved by Bezdek in 1981) is frequently used in pattern recognition.
  • 4.
    Applications Image segmentation Medical imaging X-ray Computer Tomography (CT) Magnetic Resonance Imaging (MRI) Position Emission Tomography (PET) Image and speech enhancement Edge detection Video shot change detection
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
    Fuzzy C-Mean Algorithm 1.Select an initial fuzzy pseudo-partition, i.e. ,assign values to all uij. 2. repeat 3. Compute the centroid of each cluster using fuzzy pseudo-partition. 4. Recompute fuzzy pseudo-partition, i.e., the uij. 5. until the centroids don’t change.
  • 13.
  • 14.
    An example  X=[37 10 17 18 20] and assume C=2 0.1 0.2 0.6 0.3 0.1 0.5  Initially, set U randomly U= 0.9 0.8 0.4 0.7 0.9 0.5 N ∑u m x ij i cj = i =1 N ∑u i =1 m ij  Compute centroids, cj using , assume m=2 1 uij = 2 C  || xi − c j ||  m −1  c1=13.16; c2=11.81 ∑  || x − c ||   k =1   i k   Compute new membership values, uij using 0.43 0.38 0.24 0.65 0.62 0.59 U=  New U: 0.57 0.62 0.76 0.35 0.38 0.41  Repeat centroid and membership computation until changes in membership values are smaller than say 0.01
  • 15.
    Complexity analysis  Timecomplexity of the fuzzy c mean algorithm is O(ndc2i)  Where  i number FCM over entire dataset.  n number of data points.  c number of clusters  d number of dimensions where… i grows very slowly with n,c and d.
  • 16.
    Pros. & Cons. Pros:  Allows a data point to be in multiple clusters  A more natural representation of the behavior of genes genes usually are involved in multiple functions  Cons:  Need to define c, the number of clusters  Need to determine membership cutoff value  Clusters are sensitive to initial assignment of centroids Fuzzy c-means is not a deterministic algorithm
  • 17.
    References  http://home.dei.polimi.it/matteucc/Clustering/tutorial_h tml/cmeans.html  http://en.wikipedia.org/wiki/Fuzzy_clustering  Section 9.2 from Introduction to Data Mining by Tan, Kumar, Steinbach
  • 18.

Editor's Notes

  • #5 Tomograph: Medical instrument which receives X-rays via a special method. Magnetic Resonance Imager (MRI): Diagnostic technique which uses a magnetic field and radio waves to provide computerized images of internal body tissues. Positron Emission Tomography (PET): Technique for creating detailed images of bodily tissues by injecting positron-laden material into the body and recording the gamma rays emitted over a period of approximately two hours.