Combination of Supervised and Unsupervised
              Classification Using Belief Function Theory
                                      Fatma Karem(1)

                                      Mounir Dhibi(1)
   Research Unit PMI 09/UR/13-0, University campus Zarouk Gafsa 2112, Tunisia

                                     Arnaud Martin(2)
            Rennes 1 University , UMR 6074 IRISA, Street Edouard Branly BP 30219, 22302
                                        Lannion Cedex, France




                                                          Combination of Clustering and Classification
10/5/2012                                     1
PLAN

     Problematic

     Fusion of information

     Belief Function Theory

     Proposed Approach

     Results

     Conclusion and Perspectives




10/5/2012                      2     Combination of Clustering and Classification
Classification Problems

     Many methods how to choose between them ?
     Dependance of the obtained results to the parameters initially chosen
     Incertain data manipulated sometimes missing


                                                                    How to choose the
                                                                    best parameters ?




                                   Fusion between clustering and
                                           classification


                                                                     Combination of Clustering and
    10/5/2012                                  3
                                                                                     Classification
Objectives of fusion

       Taking account of the complementarity of both methods
       Limitation of problems due to the choice of parameters, training
       Reduction of data and results uncertainty


                                    How to combine ???



         Recommendations:
               Choice of an approach treating uncertainty and imprecision
               Limitation of conflicts between clustering and classification


                                                                      Combination of Clustering and
10/5/2012                                       4
                                                                                      Classification
Information Fusion
   Goal of fusion :

    Combination of informations coming from different sources

    Reduction of sources uncertainty and imprecision

 Trying to make a compromise between the sources in order to reduce
conflict between them

                Theories treating uncertainty

    Exemples : theory of probability (bayesian approach), theory of possibility,
Belief Function theory (Dempster-Shafer Theory)




                                                          Combination of Clustering and
    10/5/2012                            5
                                                                          Classification
Belief Function Theory (1/2)
Let Ө be a finite non empty set of elementary events    to a given        problem
called the frame of discernement Ө = {θi, i=1,…,n}       where θi          are the
hypotheses about one problem domain.

 The set of all the subsets of Ө is referred by the power set of Ө denoted
by
2θ

The impact of     a piece of evidence on the different subsets of the frame
of discernment     Ө is represented by the basic belief assignment (bba)
denoted by m.

   m:           such that                 (1)


                                                         Combination of Clustering and
    10/5/2012                          6
                                                                         Classification
Belief Function Theory
                             (2/2)
     Belief function




     Plausibility function




                                       Combination of Clustering and
10/5/2012                     7
                                                       Classification
New Approach (1/3)
                                                              Learning database



            Clustering                   Classification


                                                           How to make a
                                                             compromise
                                                          between the two ?




                         Decision making

                                                              Combination of Clustering and
10/5/2012                            8
                                                                              Classification
New Approach (2/3)
Source 1                                                            Source 2


            Clustering                   Classification



                     clusters                              classes



   mNS                                                                        mS
                                Combination


                                 Decision
                                 making                   To which class belong
                                                              each object ?

                                  Final
                                 Decision
                                                             Combination of Clustering and
10/5/2012                            9
                                                                             Classification
New Approach (3/3)
How to measure our belief in the classes given by the supervised classification?



                Unsupervised                               Supervised                         Step 1
                Classification                            Classification



                                                                    Probabilistic model
       Computation of                                                   of Appriou
      similarity between
    clusters and classes
          (recovery)                     Conjunctive                                      Step 2
                                         Combination



                                        Decision making                                   Step 3
                                 Criterion: pignistic probability




                                                                            Combination of Clustering and
10/5/2012                                      10
                                                                                            Classification
Masses computation
     Computation of mass function for the unsupervised and
    supervised source
                              Mass computation for the
                 Clustering
                              clustering             Classification
            C1                                                                         C2

                                             C2
                                                        C1


                                             C3

    C4




                                            C6
                                                        C3



                     C5                                                               C4

                                                      Recovery
                                                                      Combination of Clustering and
10/5/2012                                        11
                                                                                      Classification
Computation of mass function for
        the unsupervised source (1/2)
                                               Computation of recovery
                                        C1

                                                                                                   C2


                                                                                                   C3

                           C4



                                                                                                 C6




              Let Q={   ,i=1,…..M} : the set of classes found by the supervised classification
              C={   ,i=1,…..n} : the set of classes found by the unsupervised classification




                                                                  Combination of Clustering and
10/5/2012                          12
                                                                                  Classification
Computation of mass function for
         the supervised source (2/2)




With        the class affected by the supervised classifier to an observation x

qi the real class

       the realibilty coefficient of the supervised classification for the class



                                                                  Combination of Clustering and
10/5/2012                                   13
                                                                                  Classification
Experimental Results (1/4)


            Data              Classification    Classification
                              performance       performance
                              before fusion     after fusion
            iris              97,33             100
            Abalone           53,67             76,35
            Breast-cancer     64,52             80
            Haberman          75,17             100

                       Obtained results for KNN+FCM




                                                          Combination of Clustering and
10/5/2012                               14
                                                                          Classification
Experimental Results (2/4)


            Data              Classification    Classification
                              performance       performance
                              before fusion     after fusion
            iris              96                100
            Abalone           52                79,80
            Breast-cancer     96                100
            Haberman          73,83             77,74
                        Obtained results for Bayes+FCM




                                                         Combination of Clustering and
10/5/2012                             15
                                                                         Classification
Experimental Results (3/4)


            Data               Classification    Classification
                               performance       performance
                               before fusion     after fusion
            iris               97,33             100
            Abalone            53,10             78,69

            Breast-cancer      64,52             80
            Haberman           75,17             99,34

                    Obtained results for KNN+Mixture model




                                                             Combination of Clustering and
10/5/2012                               16
                                                                             Classification
Experimental Results (4/4)


            Data               Classification      Classification
                               performance         performance
                               before fusion       after fusion
            Iris               96                  100
            Abalone            52                  82,45

            Breast-cancer      96                  100
            Haberman           73,83               77,74

                    Obtained results for Bayes + Mixture model




                                                           Combination of Clustering and
10/5/2012                               17
                                                                           Classification
Conclusion and Perspectives

   •Conclusions

   New approach treating uncertainty and resolve conflict

    The new approach gives good results to generic data

   • Perspectives

    Release of the database

             Missing data

             Real images sonar and medical images

    Improvement of the mechanism of fusion



                                                             Combination of Clustering and
10/5/2012                                18
                                                                             Classification

Presentationbelief2012

  • 1.
    Combination of Supervisedand Unsupervised Classification Using Belief Function Theory Fatma Karem(1) Mounir Dhibi(1) Research Unit PMI 09/UR/13-0, University campus Zarouk Gafsa 2112, Tunisia Arnaud Martin(2) Rennes 1 University , UMR 6074 IRISA, Street Edouard Branly BP 30219, 22302 Lannion Cedex, France Combination of Clustering and Classification 10/5/2012 1
  • 2.
    PLAN  Problematic  Fusion of information  Belief Function Theory  Proposed Approach  Results  Conclusion and Perspectives 10/5/2012 2 Combination of Clustering and Classification
  • 3.
    Classification Problems  Many methods how to choose between them ?  Dependance of the obtained results to the parameters initially chosen  Incertain data manipulated sometimes missing How to choose the best parameters ? Fusion between clustering and classification Combination of Clustering and 10/5/2012 3 Classification
  • 4.
    Objectives of fusion  Taking account of the complementarity of both methods  Limitation of problems due to the choice of parameters, training  Reduction of data and results uncertainty  How to combine ???  Recommendations:  Choice of an approach treating uncertainty and imprecision  Limitation of conflicts between clustering and classification Combination of Clustering and 10/5/2012 4 Classification
  • 5.
    Information Fusion  Goal of fusion : Combination of informations coming from different sources Reduction of sources uncertainty and imprecision Trying to make a compromise between the sources in order to reduce conflict between them Theories treating uncertainty Exemples : theory of probability (bayesian approach), theory of possibility, Belief Function theory (Dempster-Shafer Theory) Combination of Clustering and 10/5/2012 5 Classification
  • 6.
    Belief Function Theory(1/2) Let Ө be a finite non empty set of elementary events to a given problem called the frame of discernement Ө = {θi, i=1,…,n} where θi are the hypotheses about one problem domain.  The set of all the subsets of Ө is referred by the power set of Ө denoted by 2θ The impact of a piece of evidence on the different subsets of the frame of discernment Ө is represented by the basic belief assignment (bba) denoted by m.  m: such that (1) Combination of Clustering and 10/5/2012 6 Classification
  • 7.
    Belief Function Theory (2/2)  Belief function  Plausibility function Combination of Clustering and 10/5/2012 7 Classification
  • 8.
    New Approach (1/3) Learning database Clustering Classification How to make a compromise between the two ? Decision making Combination of Clustering and 10/5/2012 8 Classification
  • 9.
    New Approach (2/3) Source1 Source 2 Clustering Classification clusters classes mNS mS Combination Decision making To which class belong each object ? Final Decision Combination of Clustering and 10/5/2012 9 Classification
  • 10.
    New Approach (3/3) Howto measure our belief in the classes given by the supervised classification? Unsupervised Supervised Step 1 Classification Classification Probabilistic model Computation of of Appriou similarity between clusters and classes (recovery) Conjunctive Step 2 Combination Decision making Step 3 Criterion: pignistic probability Combination of Clustering and 10/5/2012 10 Classification
  • 11.
    Masses computation  Computation of mass function for the unsupervised and supervised source Mass computation for the Clustering clustering Classification C1 C2 C2 C1 C3 C4 C6 C3 C5 C4 Recovery Combination of Clustering and 10/5/2012 11 Classification
  • 12.
    Computation of massfunction for the unsupervised source (1/2) Computation of recovery C1 C2 C3 C4 C6 Let Q={ ,i=1,…..M} : the set of classes found by the supervised classification C={ ,i=1,…..n} : the set of classes found by the unsupervised classification Combination of Clustering and 10/5/2012 12 Classification
  • 13.
    Computation of massfunction for the supervised source (2/2) With the class affected by the supervised classifier to an observation x qi the real class the realibilty coefficient of the supervised classification for the class Combination of Clustering and 10/5/2012 13 Classification
  • 14.
    Experimental Results (1/4) Data Classification Classification performance performance before fusion after fusion iris 97,33 100 Abalone 53,67 76,35 Breast-cancer 64,52 80 Haberman 75,17 100 Obtained results for KNN+FCM Combination of Clustering and 10/5/2012 14 Classification
  • 15.
    Experimental Results (2/4) Data Classification Classification performance performance before fusion after fusion iris 96 100 Abalone 52 79,80 Breast-cancer 96 100 Haberman 73,83 77,74 Obtained results for Bayes+FCM Combination of Clustering and 10/5/2012 15 Classification
  • 16.
    Experimental Results (3/4) Data Classification Classification performance performance before fusion after fusion iris 97,33 100 Abalone 53,10 78,69 Breast-cancer 64,52 80 Haberman 75,17 99,34 Obtained results for KNN+Mixture model Combination of Clustering and 10/5/2012 16 Classification
  • 17.
    Experimental Results (4/4) Data Classification Classification performance performance before fusion after fusion Iris 96 100 Abalone 52 82,45 Breast-cancer 96 100 Haberman 73,83 77,74 Obtained results for Bayes + Mixture model Combination of Clustering and 10/5/2012 17 Classification
  • 18.
    Conclusion and Perspectives •Conclusions New approach treating uncertainty and resolve conflict  The new approach gives good results to generic data • Perspectives  Release of the database  Missing data  Real images sonar and medical images  Improvement of the mechanism of fusion Combination of Clustering and 10/5/2012 18 Classification