SlideShare a Scribd company logo
1 of 16
Download to read offline
Semi Supervised Learning of Naive Bayes Classifier
                   using Feature Constraints

                                          S. Nagesh Bhattu
                                            PhD Student
                                       Dr.D.V.L.N.Somayajulu
                                    Professor Department of CSE

                             National Institute of Technology Warangal ,INDIA


                                             December 9, 2012




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Overview




              Overview of Semi-supervised learning using EM
              Review of Generalized Maximum Entropy
              Learning from Feature Constraints
              Semi-supervised learning with feature constraints
              Results
              Transfer Learning
              Conclusion & Future Work




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Introduction



              Generative models are based on naive feature independence
              assumption
              Discriminative models address the inadequacy of generative
              models and have better generalization ability.
              Training Discriminative model is computation intensive as
              they are typically gradient based
              Challenge is train a generative learning method with small
              number of discriminative constraints




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Semi-supervised Learning using EM [Nigam, 2001]



              Given a set of labeled documents Dl which are represented as
              bag of words
              Let V be the Vocabulary and d ∈ Dl represented as w1 ...w|V |
              Given a set of unlabeled documents Du
              Learning from Dl and Du using EM gives iterative procedure
                     Repeat E and M steps till the difference between log
                     likelihoods of successive steps is below a threshold
                     E Step: for each document x ∈ Du estimate p(y |x) using
                     current parameters p(wj |y )
                     M Step: estimate the current parameters p(wj |y ) using
                     maximum likelihood




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Maximum Entropy Estimation from labeled examples


              Let Dl = {(x1 , y1 ), (x2 , y2 ) · · · (xn , yn )}
              q = arg maxq H(q(y |xi ))
              ˆ
              s.t y q(y |xi ) = 1
               n                                   n
                    Eq(y |xi ) [f (xi , y )] =         f (xi , yi )
              i=1                                i=1
              f (xi , yi ) corresponds to the feature function
              H is the entropy
              x ∈ X and y ∈ Y
                                   n
                                                                                          exp(λ.f (x,y ))
              Dual: maxq               log (qλ (yi |xi )) where qλ (yi |xi ) =               Zλ (x)
                                 i=1
              Zλ (x) =             exp(λ.f (x, y ))
                            y ∈Y




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Generelization of Maximum entropy [Dudik, 2007]



                                   n                                                n
              Primal:minq               KL[q(y |xi )||q0 (y |xi )] + U(                  Eq(y |xi ) [f (xi , y )])
                                 i=1                                               i=1
                                 n
              Dual:maxλ                −log [       q0 (y |xi )exp(λ.f (xi , y ))] − U(−λ)
                                i=1             y
              Let U be L2 Penalty of the form
                      n
               1                                                       2
              2α           f (xi , yi ) − Eq(y |xi ) [f (xi , y )]     2
                     i=1
                                  n
                                                                   α           2
              Dual: minλ               −log (qλ (yi |xi )) +       2       λ   2
                                 i=1




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Learning From Feature Constraints [Druck et al., 2008]



              Let f1 · · · fk be the features whose expectations are known
              apriori
                     The presence of word ”great” indicates positive sentiment
                     This can be expressed as constraint saying 95 % of documents
                     which contain ”great” must be of positive sentiment in
                     expectation
                     [Druck et al., 2008] defines an minimization objective based on
                     feature constraints
                              k                                                 |V |
                                                                                        λ2
                     O=            −KL(q(y |xi > 0), q (y |xi > 0) −
                                                     ˆ                                   i
                                                                                       2σ 2
                             i=1                                                i=0
                             |V | is total number of features




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Semi-Supervised Learning of Naive Bayes using feature
 constraints

              Problem:We have k feature constraints and m labeled
              examples and n unlabeled examples learn a model efficiently
              Solution: Learn a model which maximizes log likelihood over
              labeled data and minimizes KL-divergence from a feature
              constrained model over unlabeled examples
              O(θ, q) = minθ,q                  −log (pθ (xd , yd )) +
                                         d∈Dl
              γ[         KL(q(yd |xd )||pθ (yd |xd )) + U(                      Eq (xd , y ))]
                 d∈Du                                                   d∈Du
              where xd and yd are feature representation and label of
              document d respectively
              Dual : minθ maxλ                    −log (pθ (xd , yd )) −
                                          d∈Dl
              γ[         pθ (yd |xd )exp(λ.f (x, y )) − U ∗ (−λ)]
                 d∈Du


S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Modified EM Procedure




              Use Coordinate Descent on θ and λ
                                                                                                              2
              λ(t+1) = arg maxλ λ .f − y pθt (y |x)exp(λ .f (x, y )) − β λ
                                    ˆ
                                                                       2                                      2
                   ˆ
              Here f carries the feature expectations for the k features.
              The parameters of pθ (x, y ) are estimated using Maximum
              Likelihood with unlabeled examples where posterior of an
              example is found using q obtained in the previous step




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
DataSets




                                       Table: Datasets Description
                   NameOfDataset                          Description
                20 newsgroups(20NG)                    20000 instances                20 classes
                       Ohscal                          111162 instances               10 classes
                       SRAA                            73,218 instances               4 classes
                      WebKB                             4199 instances                4 classes
                       News3                            9,558 instances               44 classes




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Experiments

         Dataset     Algo                         Number of Labeled Examples per class
                                    1                 2            4             8                    16
         20 News     EM-FL      71.5(1.51)       71.59(1.53)  71.83(1.35) 71.92(1.39)             72.53(1.53)
                     EM        14.29(5.15)       19.22(3.62)  21.02(4.38)   25.5(7.47)            29.98(6.45)
                     GE-FL     62.9(1.53)
         news3       EM-FL      70.7(2.41)       70.65(2.06)      71.0(1.84)      71.08(1.63)     71.73(1.24)
                     EM-FL     21.64(6.44)       29.15(6.68)      36.04(5.33)     47.75(5.38)     57.3(2.91)
                     GE-FL     75.39(2.53)
         ohscal      EM-FL     57.19(0.84)       57.35(0.74)      57.95(0.68)     58.4(1.11)      59.22(1.06)
                     EM        38.48(7.24)       46.53(5.27)      51.01(4.1)      57.34(2.26)     59.88(2.13)
                     GE-FL     57.28(0.64)
         sraa        EM-FL     94.94(1.25)      94.98(1.23)      94.93(1.22)      94.91(1.24)     94.73(1.43)
                     EM        58.02(25.38)     69.79(15.26)     79.44(12.76)     85.84(6.5)      91.43(1.52)
                     GE-FL     99.43(0.04)
         webkb       EM-FL     86.12(0.81)      86.15(0.75)       86.09(0.8)      86.01(1.03)     86.36(0.88)
                     EM        45.55(23.4)      42.03(16.24)     52.08(20.22)     43.16(9.07)     55.08(18.67)
                     GE-FL     82.53(1.06)


            Table: Performance Results for varying number of labeled samples


S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Dataset     Algo                              fraction of unlabeled examples used
                                    0.1                0.2              0.3             0.4                   0.5
         20 News     EM-FL      50.01(4.85)        59.47(1.84)     64.61(2.18)    67.01(1.36)            69.72(0.94)
                     EM         11.14(2.8)         18.55(3.67)      23.27(6.84)    27.53(2.94)             31.38(5)
                     GE-FL     56.36(1.19)        59.63(1.84)       60.42(1.07)    60.94(0.52)            62.69(0.99)
         SRAA        EM-FL      79.67(0.48)        82.49(0.43)      88.27(0.33)    92.05(0.71)            94.71(0.43)
                     EM         58.36(2.74)        72.56(8.84)      79.44(0.34)    79.43(0.46)            79.67(0.4)
                     GE-FL      99.18(0.1)         99.35(0.1)      99.32(0.12)    99.37(0.04)            99.44(0.08)
         news3       EM-FL      41.92(1.94)        54.05(2.84)      60.98(1.13)    64.26(0.73)            67.99(1.59)
                     EM         42.25(2.59)        41.06(3.21)      42.84(2.54)     45.27(2.6)            48.17(2.3)
                     GE-FL      56.1(5.52)        64.79(1.11)      68.53(1.56)    67.61(2.05)            71.02(0.81)
         ohscal      EM-FL     62.43(2.153)       65.21(1.526) 64.98(0.957) 65.03(0.996)                 62.83(2.016)
                     EM         61.54(2.14)       63.59(1.577)      63.02(1.22)     64.19(1.2)           61.48(2.039)
                     GE-FL     49.284(2.91)       53.55(2.052)     54.21(1.377)   57.74(0.906)            55.5(0.568)
         webkb       EM-FL      73.76(3.1)          79(4.48)        81.93(1.5)    84.04(1.03)            85.17(1.49)
                     EM         73.76(8.25)        65.59(7.26)      74.95(1.28)    78.31(2.16)            74.26(3.8)
                     GE-FL     77.07(2.74)         78.75(1.23)      81.05(1.28)     81.05(0.6)            81.05(1.5)


            Table: Performance Results for varying fraction of unlabeled data



S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Transfer Learning



              Let DS and DT be source and target domains of our interest.
              Labeled data available for source domain and unlabeled
              samples available from target domain
                                                        p(fi ,y )
              MI (fi ) =            p(fi , y )ln     (p(fi )∗p(y ))
                             y ∈Y
              Take features which are occurring in both domains and have
              higher mutual information for the source domain
              Compute the feature expectations from the source domain
              and these are usually preserved




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Results for Transfer Learning




                  B–D     D–B     B–E     E–B     B–K      K–B     D–E     E–D     D–K     K–D     E–K     K–E
         Base    0.759   0.762    0.66   0.719   0.719    0.729   0.667   0.735   0.734   0.758   0.851   0.825
        GE-FL    0.773   0.771   0.707   0.767   0.775    0.735   0.713   0.781   0.737   0.795   0.822   0.821
         SCL     0.758   0.797   0.759   0.754   0.789    0.686   0.741   0.762   0.814   0.767   0.859   0.868

                                    Table: Transfer Learning Results




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Conclusion & Future Work




              We have demonstrated how naive bayes can be feature
              constrained under semi-supervised learning setup under
              limitations on both number of labeled examples and number
              of feature constraints
              This framework allows simple transfer learning strategy
              Semi-Supervised Transfer Learning




S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature
Druck, G., Mann, G., and McCallum, A. (2008).
             Learning from labeled features using generalized expectation
             criteria.
             In Proceedings of the 31st annual international ACM SIGIR
             conference on Research and development in information
             retrieval, SIGIR ’08, pages 595–602, New York, NY, USA.
             ACM.
             Dudik, M. (2007).
             Maximum entropy density estimation and modeling geographic
             distributions of species.
             PhD thesis, Princeton, NJ, USA.
             AAI3281302.
             Nigam, K. P. (2001).
             Using unlabeled data to improve text classification.
             PhD thesis, Pittsburgh, PA, USA.
             AAI3040487.


S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE
                                                             Department of Learning of Naive Bayes Classifier using Feature

More Related Content

What's hot

Optimal control of coupled PDE networks with automated code generation
Optimal control of coupled PDE networks with automated code generationOptimal control of coupled PDE networks with automated code generation
Optimal control of coupled PDE networks with automated code generationDelta Pi Systems
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...Joe Suzuki
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaJan Aerts
 
Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2butest
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?Christian Robert
 
Jump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamicsJump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamicsgustavdelius
 
How to Handle Interval Solutions for Cooperative Interval Games
How to Handle Interval Solutions for Cooperative Interval GamesHow to Handle Interval Solutions for Cooperative Interval Games
How to Handle Interval Solutions for Cooperative Interval GamesSSA KPI
 
Particle filter
Particle filterParticle filter
Particle filterbugway
 
Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)Matthew Leingang
 
Week 7 [compatibility mode]
Week 7 [compatibility mode]Week 7 [compatibility mode]
Week 7 [compatibility mode]Hazrul156
 

What's hot (12)

Optimal control of coupled PDE networks with automated code generation
Optimal control of coupled PDE networks with automated code generationOptimal control of coupled PDE networks with automated code generation
Optimal control of coupled PDE networks with automated code generation
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...The Universal Measure for General Sources and its Application to MDL/Bayesian...
The Universal Measure for General Sources and its Application to MDL/Bayesian...
 
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in ScalaM Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
M Gumbel - SCABIO: a framework for bioinformatics algorithms in Scala
 
Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2Machine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2
 
ABC: How Bayesian can it be?
ABC: How Bayesian can it be?ABC: How Bayesian can it be?
ABC: How Bayesian can it be?
 
Jump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamicsJump-growth model for predator-prey dynamics
Jump-growth model for predator-prey dynamics
 
How to Handle Interval Solutions for Cooperative Interval Games
How to Handle Interval Solutions for Cooperative Interval GamesHow to Handle Interval Solutions for Cooperative Interval Games
How to Handle Interval Solutions for Cooperative Interval Games
 
Particle filter
Particle filterParticle filter
Particle filter
 
Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)
 
Week 7 [compatibility mode]
Week 7 [compatibility mode]Week 7 [compatibility mode]
Week 7 [compatibility mode]
 

Viewers also liked

Kiwi – eating summer’s great fruit
Kiwi – eating summer’s great fruitKiwi – eating summer’s great fruit
Kiwi – eating summer’s great fruitArnie Kaye Dillen
 
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~もがいているエンジニア集まれ~今度こそ英語をモノにしよう~
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~Kiyotaka Kunihira
 
Chefとかプロビジョニングまわり
ChefとかプロビジョニングまわりChefとかプロビジョニングまわり
ChefとかプロビジョニングまわりKiyotaka Kunihira
 
プログラマが 引っ越しで気をつける 3つのこと
プログラマが 引っ越しで気をつける 3つのことプログラマが 引っ越しで気をつける 3つのこと
プログラマが 引っ越しで気をつける 3つのことKiyotaka Kunihira
 
Trabajo de info
Trabajo de infoTrabajo de info
Trabajo de infothebmxriot
 
GitBucketで社内OSSしませんか?
GitBucketで社内OSSしませんか?GitBucketで社内OSSしませんか?
GitBucketで社内OSSしませんか?Kiyotaka Kunihira
 
新卒エンジニアが1年目を振り返る
新卒エンジニアが1年目を振り返る新卒エンジニアが1年目を振り返る
新卒エンジニアが1年目を振り返るKiyotaka Kunihira
 
時をかける開発 #devtool_night
時をかける開発 #devtool_night時をかける開発 #devtool_night
時をかける開発 #devtool_nightKiyotaka Kunihira
 
仕事のゲーム化でやる気モードに変える
仕事のゲーム化でやる気モードに変える仕事のゲーム化でやる気モードに変える
仕事のゲーム化でやる気モードに変えるKiyotaka Kunihira
 
ここが変だよRuby/RoR #rubykansai
ここが変だよRuby/RoR #rubykansaiここが変だよRuby/RoR #rubykansai
ここが変だよRuby/RoR #rubykansaiKiyotaka Kunihira
 
Scala開発チームの挑戦-技術編-@DevLove2014
Scala開発チームの挑戦-技術編-@DevLove2014Scala開発チームの挑戦-技術編-@DevLove2014
Scala開発チームの挑戦-技術編-@DevLove2014Kiyotaka Kunihira
 

Viewers also liked (19)

Kiwi – eating summer’s great fruit
Kiwi – eating summer’s great fruitKiwi – eating summer’s great fruit
Kiwi – eating summer’s great fruit
 
R1 b3
R1 b3R1 b3
R1 b3
 
Me
MeMe
Me
 
Ec2 automation framework
Ec2 automation frameworkEc2 automation framework
Ec2 automation framework
 
忘年会駆動2012
忘年会駆動2012忘年会駆動2012
忘年会駆動2012
 
Perfume
PerfumePerfume
Perfume
 
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~もがいているエンジニア集まれ~今度こそ英語をモノにしよう~
もがいているエンジニア集まれ~今度こそ英語をモノにしよう~
 
Chefとかプロビジョニングまわり
ChefとかプロビジョニングまわりChefとかプロビジョニングまわり
Chefとかプロビジョニングまわり
 
プログラマが 引っ越しで気をつける 3つのこと
プログラマが 引っ越しで気をつける 3つのことプログラマが 引っ越しで気をつける 3つのこと
プログラマが 引っ越しで気をつける 3つのこと
 
Trabajo de info
Trabajo de infoTrabajo de info
Trabajo de info
 
GitBucketで社内OSSしませんか?
GitBucketで社内OSSしませんか?GitBucketで社内OSSしませんか?
GitBucketで社内OSSしませんか?
 
新卒エンジニアが1年目を振り返る
新卒エンジニアが1年目を振り返る新卒エンジニアが1年目を振り返る
新卒エンジニアが1年目を振り返る
 
時をかける開発 #devtool_night
時をかける開発 #devtool_night時をかける開発 #devtool_night
時をかける開発 #devtool_night
 
仕事のゲーム化でやる気モードに変える
仕事のゲーム化でやる気モードに変える仕事のゲーム化でやる気モードに変える
仕事のゲーム化でやる気モードに変える
 
 
ここが変だよRuby/RoR #rubykansai
ここが変だよRuby/RoR #rubykansaiここが変だよRuby/RoR #rubykansai
ここが変だよRuby/RoR #rubykansai
 
とりあえず使えるSBT
とりあえず使えるSBTとりあえず使えるSBT
とりあえず使えるSBT
 
Clojrue 13 testing
Clojrue 13 testingClojrue 13 testing
Clojrue 13 testing
 
Scala開発チームの挑戦-技術編-@DevLove2014
Scala開発チームの挑戦-技術編-@DevLove2014Scala開発チームの挑戦-技術編-@DevLove2014
Scala開発チームの挑戦-技術編-@DevLove2014
 

Similar to Opthlt

Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
The multilayer perceptron
The multilayer perceptronThe multilayer perceptron
The multilayer perceptronESCOM
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...Kensuke Mitsuzawa
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 

Similar to Opthlt (6)

Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
Image denoising
Image denoisingImage denoising
Image denoising
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
The multilayer perceptron
The multilayer perceptronThe multilayer perceptron
The multilayer perceptron
 
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...slides for "Supervised Model Learning with Feature Grouping based on a Discre...
slides for "Supervised Model Learning with Feature Grouping based on a Discre...
 
A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 

Opthlt

  • 1. Semi Supervised Learning of Naive Bayes Classifier using Feature Constraints S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu Professor Department of CSE National Institute of Technology Warangal ,INDIA December 9, 2012 S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 2. Overview Overview of Semi-supervised learning using EM Review of Generalized Maximum Entropy Learning from Feature Constraints Semi-supervised learning with feature constraints Results Transfer Learning Conclusion & Future Work S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 3. Introduction Generative models are based on naive feature independence assumption Discriminative models address the inadequacy of generative models and have better generalization ability. Training Discriminative model is computation intensive as they are typically gradient based Challenge is train a generative learning method with small number of discriminative constraints S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 4. Semi-supervised Learning using EM [Nigam, 2001] Given a set of labeled documents Dl which are represented as bag of words Let V be the Vocabulary and d ∈ Dl represented as w1 ...w|V | Given a set of unlabeled documents Du Learning from Dl and Du using EM gives iterative procedure Repeat E and M steps till the difference between log likelihoods of successive steps is below a threshold E Step: for each document x ∈ Du estimate p(y |x) using current parameters p(wj |y ) M Step: estimate the current parameters p(wj |y ) using maximum likelihood S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 5. Maximum Entropy Estimation from labeled examples Let Dl = {(x1 , y1 ), (x2 , y2 ) · · · (xn , yn )} q = arg maxq H(q(y |xi )) ˆ s.t y q(y |xi ) = 1 n n Eq(y |xi ) [f (xi , y )] = f (xi , yi ) i=1 i=1 f (xi , yi ) corresponds to the feature function H is the entropy x ∈ X and y ∈ Y n exp(λ.f (x,y )) Dual: maxq log (qλ (yi |xi )) where qλ (yi |xi ) = Zλ (x) i=1 Zλ (x) = exp(λ.f (x, y )) y ∈Y S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 6. Generelization of Maximum entropy [Dudik, 2007] n n Primal:minq KL[q(y |xi )||q0 (y |xi )] + U( Eq(y |xi ) [f (xi , y )]) i=1 i=1 n Dual:maxλ −log [ q0 (y |xi )exp(λ.f (xi , y ))] − U(−λ) i=1 y Let U be L2 Penalty of the form n 1 2 2α f (xi , yi ) − Eq(y |xi ) [f (xi , y )] 2 i=1 n α 2 Dual: minλ −log (qλ (yi |xi )) + 2 λ 2 i=1 S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 7. Learning From Feature Constraints [Druck et al., 2008] Let f1 · · · fk be the features whose expectations are known apriori The presence of word ”great” indicates positive sentiment This can be expressed as constraint saying 95 % of documents which contain ”great” must be of positive sentiment in expectation [Druck et al., 2008] defines an minimization objective based on feature constraints k |V | λ2 O= −KL(q(y |xi > 0), q (y |xi > 0) − ˆ i 2σ 2 i=1 i=0 |V | is total number of features S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 8. Semi-Supervised Learning of Naive Bayes using feature constraints Problem:We have k feature constraints and m labeled examples and n unlabeled examples learn a model efficiently Solution: Learn a model which maximizes log likelihood over labeled data and minimizes KL-divergence from a feature constrained model over unlabeled examples O(θ, q) = minθ,q −log (pθ (xd , yd )) + d∈Dl γ[ KL(q(yd |xd )||pθ (yd |xd )) + U( Eq (xd , y ))] d∈Du d∈Du where xd and yd are feature representation and label of document d respectively Dual : minθ maxλ −log (pθ (xd , yd )) − d∈Dl γ[ pθ (yd |xd )exp(λ.f (x, y )) − U ∗ (−λ)] d∈Du S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 9. Modified EM Procedure Use Coordinate Descent on θ and λ 2 λ(t+1) = arg maxλ λ .f − y pθt (y |x)exp(λ .f (x, y )) − β λ ˆ 2 2 ˆ Here f carries the feature expectations for the k features. The parameters of pθ (x, y ) are estimated using Maximum Likelihood with unlabeled examples where posterior of an example is found using q obtained in the previous step S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 10. DataSets Table: Datasets Description NameOfDataset Description 20 newsgroups(20NG) 20000 instances 20 classes Ohscal 111162 instances 10 classes SRAA 73,218 instances 4 classes WebKB 4199 instances 4 classes News3 9,558 instances 44 classes S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 11. Experiments Dataset Algo Number of Labeled Examples per class 1 2 4 8 16 20 News EM-FL 71.5(1.51) 71.59(1.53) 71.83(1.35) 71.92(1.39) 72.53(1.53) EM 14.29(5.15) 19.22(3.62) 21.02(4.38) 25.5(7.47) 29.98(6.45) GE-FL 62.9(1.53) news3 EM-FL 70.7(2.41) 70.65(2.06) 71.0(1.84) 71.08(1.63) 71.73(1.24) EM-FL 21.64(6.44) 29.15(6.68) 36.04(5.33) 47.75(5.38) 57.3(2.91) GE-FL 75.39(2.53) ohscal EM-FL 57.19(0.84) 57.35(0.74) 57.95(0.68) 58.4(1.11) 59.22(1.06) EM 38.48(7.24) 46.53(5.27) 51.01(4.1) 57.34(2.26) 59.88(2.13) GE-FL 57.28(0.64) sraa EM-FL 94.94(1.25) 94.98(1.23) 94.93(1.22) 94.91(1.24) 94.73(1.43) EM 58.02(25.38) 69.79(15.26) 79.44(12.76) 85.84(6.5) 91.43(1.52) GE-FL 99.43(0.04) webkb EM-FL 86.12(0.81) 86.15(0.75) 86.09(0.8) 86.01(1.03) 86.36(0.88) EM 45.55(23.4) 42.03(16.24) 52.08(20.22) 43.16(9.07) 55.08(18.67) GE-FL 82.53(1.06) Table: Performance Results for varying number of labeled samples S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 12. Dataset Algo fraction of unlabeled examples used 0.1 0.2 0.3 0.4 0.5 20 News EM-FL 50.01(4.85) 59.47(1.84) 64.61(2.18) 67.01(1.36) 69.72(0.94) EM 11.14(2.8) 18.55(3.67) 23.27(6.84) 27.53(2.94) 31.38(5) GE-FL 56.36(1.19) 59.63(1.84) 60.42(1.07) 60.94(0.52) 62.69(0.99) SRAA EM-FL 79.67(0.48) 82.49(0.43) 88.27(0.33) 92.05(0.71) 94.71(0.43) EM 58.36(2.74) 72.56(8.84) 79.44(0.34) 79.43(0.46) 79.67(0.4) GE-FL 99.18(0.1) 99.35(0.1) 99.32(0.12) 99.37(0.04) 99.44(0.08) news3 EM-FL 41.92(1.94) 54.05(2.84) 60.98(1.13) 64.26(0.73) 67.99(1.59) EM 42.25(2.59) 41.06(3.21) 42.84(2.54) 45.27(2.6) 48.17(2.3) GE-FL 56.1(5.52) 64.79(1.11) 68.53(1.56) 67.61(2.05) 71.02(0.81) ohscal EM-FL 62.43(2.153) 65.21(1.526) 64.98(0.957) 65.03(0.996) 62.83(2.016) EM 61.54(2.14) 63.59(1.577) 63.02(1.22) 64.19(1.2) 61.48(2.039) GE-FL 49.284(2.91) 53.55(2.052) 54.21(1.377) 57.74(0.906) 55.5(0.568) webkb EM-FL 73.76(3.1) 79(4.48) 81.93(1.5) 84.04(1.03) 85.17(1.49) EM 73.76(8.25) 65.59(7.26) 74.95(1.28) 78.31(2.16) 74.26(3.8) GE-FL 77.07(2.74) 78.75(1.23) 81.05(1.28) 81.05(0.6) 81.05(1.5) Table: Performance Results for varying fraction of unlabeled data S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 13. Transfer Learning Let DS and DT be source and target domains of our interest. Labeled data available for source domain and unlabeled samples available from target domain p(fi ,y ) MI (fi ) = p(fi , y )ln (p(fi )∗p(y )) y ∈Y Take features which are occurring in both domains and have higher mutual information for the source domain Compute the feature expectations from the source domain and these are usually preserved S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 14. Results for Transfer Learning B–D D–B B–E E–B B–K K–B D–E E–D D–K K–D E–K K–E Base 0.759 0.762 0.66 0.719 0.719 0.729 0.667 0.735 0.734 0.758 0.851 0.825 GE-FL 0.773 0.771 0.707 0.767 0.775 0.735 0.713 0.781 0.737 0.795 0.822 0.821 SCL 0.758 0.797 0.759 0.754 0.789 0.686 0.741 0.762 0.814 0.767 0.859 0.868 Table: Transfer Learning Results S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 15. Conclusion & Future Work We have demonstrated how naive bayes can be feature constrained under semi-supervised learning setup under limitations on both number of labeled examples and number of feature constraints This framework allows simple transfer learning strategy Semi-Supervised Transfer Learning S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature
  • 16. Druck, G., Mann, G., and McCallum, A. (2008). Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’08, pages 595–602, New York, NY, USA. ACM. Dudik, M. (2007). Maximum entropy density estimation and modeling geographic distributions of species. PhD thesis, Princeton, NJ, USA. AAI3281302. Nigam, K. P. (2001). Using unlabeled data to improve text classification. PhD thesis, Pittsburgh, PA, USA. AAI3040487. S. Nagesh Bhattu PhD Student Dr.D.V.L.N.Somayajulu ProfessorSemi SupervisedCSE Department of Learning of Naive Bayes Classifier using Feature