SlideShare a Scribd company logo
1 of 68
Learning from labelled and unlabeled data
       Semi-Supervised Learning



              Machine Learning – PDEEC 2008/2009


            Filipe Tiago Alves de Magalhães




                                                   26-04-2010
Semi-Supervised Learning

        Supervised                     Semi-Supervised               Unsupervised
         Learning                         Learning                     Learning




                                    Labbeled + unlabeled data
 discover patterns in the data                                    The data have no
 that relate data attributes                                      target attribute (unlabeled).
 with a target (class) attribute.
                                        Typically, plenty of
                                        unlabeled data
                                        available.                 We want to explore the
 These patterns are then                                           data to find some intrinsic
 utilized to predict the                                           structures in them.
 values of the target
 attribute in future            Tries to improve the predictive
 data instances.                power using both labelled and
                                unlabeled data. (Expected to be
                                better than using one alone)                             2
Semi-Supervised Learning
   Unlabeled data is easy to obtain

   Labelled data can be difficult to obtain
         - human annotation is boring
         - may require experts
         - may require special equipment
         - very time-consuming


                    Examples:
                    - Web page classification (billions of pages)
                    - Email classification (SPAM or No-SPAM)
                    - Speech annotation (400h for each hour of conversation)
                    -…


                                                                           3
Semi-Supervised Learning


 Semi-Supervised learning can be seen as an excellent way to improve the results
 that we would get using exclusively supervised or non-supervised methods, for the
 same scenario.




 Although we (or specialists) do not need to spend such a big effort labelling data,
 a great concern must be faced for the design of good models, feature extraction,
 kernels definition.




                                                                                  4
Semi-Supervised Learning
 Sometimes, it may not be so hard to label data…


                                        www.espgame.org

                                        Tries to guess the user’s gender
                                        based on his/her choices.

                                        After that, we tell if it was
                                        right or wrong



   Takes advantage of player’s intervention in order to
   enrich the training of automatic learning algorithms
                                                                        5
Semi-Supervised Self-Training of Object Detection Models


  Chuck Rosenberg         Martial Hebert                Henry Schneiderman
  Google, Inc.            Carnegie Mellon University    Carnegie Mellon University




        7th IEEE Workshops on Application of Computer Vision (WACV/MOTION'05)
                                          2005




                                                                                     6
Semi-Supervised Learning
        Self-Training
    L = (Xi , Yi )   Set of labelled data

    U = (Xi , ? )    Set of unlabeled data


Algorithm
Repeat
   • Train a classifier C with training data L
   • Classify data in U with C
   • Find a subset U’ of U with the most
   confident scores
   • L + U’  L
   • U – U’  U


                                                 7
Semi-Supervised Self-Training of Object Detection Models
    Object detection

Object detection based on its shape
        - time-consuming
        - exhaustive labelling (background, foreground, object, non-object)


 Try to simplify the collection and preparation of training data
        - combining data labelled in different ways
        - labelling of each image region can take the form of a probability
          distribution over labels (“weakly” labelled)
        - e.g., is more likely that the object is present in the centre of the image
        - e.g., a certain image has a high likelihood of containing the object, but
          its position is unknown.

                                                                                   8
Semi-Supervised Self-Training of Object Detection Models
     Training Approaches
 Generic detection algorithm for classification of a subwindow in an image as being part of
 the “object” class or the “clutter/everything else” class

       If




   X – image feature vectors
   xi – data at a specific location in the image (i = {1, … ,n} indexes images locations)
   Y – class
   f – foreground
   b – background
   θf – parameters of the foreground model
   θb – parameters of the background model
                                                                                            9
Semi-Supervised Self-Training of Object Detection Models
    Training Approaches

     EM approach




                                                    10
Semi-Supervised Self-Training of Object Detection Models
    Training Approaches

      EM approach

  There are many reasons why EM may not perform well in a particular semi-supervised
  training context.

  - EM solely finds a set of model parameters which maximize the likelihood of the data.

  - Fully labeled data may not sufficiently constrain the solution, which means that there
  may be solutions which maximize the data likelihood but do not optimize classification
  performance.




                                                                                        11
Semi-Supervised Self-Training of Object Detection Models
    Training Approaches

     Alternative




                                                    12
Semi-Supervised Self-Training of Object Detection Models
    Detector Overview (Experimental Setup)




   1. Subwindow is processed for lighting correction
   2. Two-level wavelet transform is applied
   3. Features are computed by vector quantizing groups of wavelet coefficients
   4. Subwindow is classified by thresholding a linear combination of the log-likelihood
      ratios of the features

        Cascade architecture → only image patches which are accepted by the first
                               detector are passed on to the next
                                                                                     13
Semi-Supervised Self-Training of Object Detection Models
    Data (Experimental Setup)




  Landmark used on a
  typical training image
                                      sample training images and the training
                                      examples associated with them
  Set with positive examples – 231 images
  480 training examples                                      200-300 pixels high and
  Independent test set – 44 images                           300-400 pixels wide
  102 test examples
  15000 negative examples
  Training examples – 24 x 16 pixels (rotated, scaled and cropped)
                                                                                       14
Semi-Supervised Self-Training of Object Detection Models
     Training (Experimental Setup)
Training the model with fully labeled data consists of the following steps:

     1. Given the training data landmark locations
         • geometrically normalize the training example subimages;
         • apply lighting normalization to the subimages;
         • generate synthetic training examples (scaling, shifting and rotating)
     2. Compute the wavelet transform of the subimages
     3. Quantize each group of wavelet coefficients and build a naïve Bayes model with
        respect to each group to discriminate between positive and negative examples
     4. Adjust the naïve Bayes model using boosting, but maintaining a linear decision
        function, effectively performing gradient descent on the margin
     5. Compute a ROC curve for the detector using a cross validation set
     6. Choose a threshold for the linear function, based on the final performance
        desired



                                                                                     15
Semi-Supervised Self-Training of Object Detection Models
    Selection Metrics (Experimental Setup)
Selection metric is crucial to the performance of the training

  1. Confidence selection
      • Computed at every iteration by applying the detector trained from the
         current set of labelled data to the weakly labelled data set.
      • Detection with highest confidence is selected and added to the training
         set

  2. MSE selection
      • Is calculated for each weakly labelled example by evaluating the
        distance between the corresponding image window and all of the
        other templates in the training data (including the original labelled
        examples and the weakly labelled examples added in prior iterations)



                                                                            16
Semi-Supervised Self-Training of Object Detection Models
    Selection Metrics (Experimental Setup)

                                    The candidate image and the labeled images
                                    are first normalized with a specific set of
                                    processing steps before the MSE based score
                                    metric is computed.




              The score is based on the Mahalanobis distance



                                                                           17
Semi-Supervised Self-Training of Object Detection Models
    Selection Metrics (Experimental Setup)

                                     position
                                                     MSE selection
               Detector
                                                       metric
                                     scale


     The detector must be accurate in localization but need not be accurate in
     detection since false detection will be discarded due to their large MSE
     distances to all of the training examples.

     This is crucial to ensure the performance of the training algorithm with
     small initial training sets.

     This is also part of the reason for the MSE to outperform the confidence
     metric, which requires the detector to be accurate in both localization and
     detection performance.


                                                                                   18
Semi-Supervised Self-Training of Object Detection Models
    Experiment Scenarios (Experiments and Analysis)
   Each experiment was repeated using a different initial random subset, in order
   to avoid the variance that was being observed in the detector performance and
   in the behaviour of the semi-supervised training process.

                Experiment = specific set of experimental conditions

                Run = each repetition of that experiment

                Mostly, 5 runs were performed for each experiment




     Typically, 20 weakly labelled images were added to the training set at each iteration,
     because of the substantial training time of the detector.
     Ideally, only a single image would be added at each iteration.


                                                                                        19
Semi-Supervised Self-Training of Object Detection Models
    Evaluation Metrics (Experiments and Analysis)
    Each run was evaluated by using the area under the ROC curve (AUC).
    Because different experimental conditions affect performance, the AUCs
    were normalized relatively to the full data performance of that run.


    if (performance level = = 1.0)
    {
              the model being evaluated has the same performance
              as it would if all of the labelled data was utilised
    }
    if (performance level < 1.0)
    {
              the model has a lower performance
              than that achieved with the full data set
    }


          To compute the full data performance, each specific run is
          trained with the full data set and its performance is recorded.
                                                                             20
Semi-Supervised Self-Training of Object Detection Models
   Baseline training configurations (Experiments and Analysis)




    Smooth regime was chosen in order to perform experiments under conditions
    where the addition of weakly labelled data would make a difference.         21
Semi-Supervised Self-Training of Object Detection Models
    Selection Metrics (Experiments and Analysis)
 Does the choice of the selection metric make a substantial
 difference in the performance of the semi-supervised training?




          Confidence metric                        MSE metric
                                                                  22
Semi-Supervised Self-Training of Object Detection Models
      Selection Metrics (Experiments and Analysis)
 Does the choice of the selection metric make a substantial
 difference in the performance of the semi-supervised training?


  d                                                                i
  e                                                                n
  c                                                                c
  r                                                                r
  e                                                                e
  a                                                                a
  s                                                                s
  e                                                                e
  s                                                                s



                                                                  23
Semi-Supervised Self-Training of Object Detection Models
  Relative size of fully Labelled Data(Experiments and Analysis)
  How many weakly labelled examples do we need to add to the training set in
  order to reach the best detector performance?




                                                                               24
Semi-Supervised Self-Training of Object Detection Models
    Conclusions/Discussion

    1. The results showed that it was possible to achieve detection performance that was
       close to the base performance obtained with the fully labelled data, even when a
       small fraction of the training data was used in the initial training set.

    2. The experiments showed that the self-training approach to semi-supervised training
       can be applied to an existing detector that was originally designed for supervised
       training.

    3. The MSE selection metric consistently outperformed the confidence metric. More
       generally, the self-training approach using an independently-defined selection
       metric outperforms both the confidence metrics and the batch EM approaches.

       During the training process, the distribution of the labeled data at any particular
       iteration may not match the actual underlying distribution of the data.




                                                                                         25
Semi-Supervised Self-Training of Object Detection Models
     Conclusions/Discussion                                True labels for the unlabeled data
Original unlabeled data and labelled data




     (c),(d) The points labelled by the incremental self-training algorithm after 5
     iterations using the confidence metric and the Euclidean metric, respectively.   26
Semi-Supervised Self-Training of Object Detection Models
    Future Work


    Study the relation between the semi-supervised training approach evaluated here
    with the co-training approaches.

    Develop more precise guidelines for selecting the initial training set.

    The approach could be extended to training examples that are labelled in different
    ways. For example, some images may be provided with scale information and nothing
    else. Additional information may be provided such as the rough shape of the object,
    or a prior distribution over its location in the image.




                                                                                      27
ZZZZZZZZZZZZZZ…..



 Still Awake???




                    28
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization


   Andrew B. Goldberg                             Xiaojin Zhu
   Computer Sciences Department                   Computer Sciences Department
   University of Wisconsin-Madison                University of Wisconsin-Madison




         TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural
                                Language Processing
                                       2006




                                                                                    29
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
     Sentiment Categorization

                                                           ?

                                                           ?
                                                           ?
                                                               30
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
     Sentiment Categorization




                                                               31
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization

 What we saw is rating inference
 Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment
 categorization with respect to rating scales. In Proceedings of the ACL.



 In this work…
      • Graph-based Semi-supervised Learning
         • Main assumption encoded in graph:
            • Similar documents should have similar ratings




                                                                                             32
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               33
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               34
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               35
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               36
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                              50% accuracy
                                   




                                                               37
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               38
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                             100% accuracy
                                  




                                                               39
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization

      Goal




                                                               40
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization

  Approach




                                                               41
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization

  Measuring Loss over the Graph




                                                               42
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               43
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               44
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               45
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               46
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               47
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization




                                                               48
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
                        Minimization now
                          is non- trivial




                                                               49
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization


      Finding a Closed-Form Solution




                                                               50
Seeing stars when there aren’t many stars:
 Vector of f   Vector of given labels yi
Graph-based semi-supervised reviews and sentiment categorization
 values for
                  for labelled learning for
                   predicted labels  for
 allFinding a Closed-Form Solution
     reviews         unlabeled reviews




                                           Labelled   Unlabeled




                                   C=



                                                                  51
Seeing stars when there aren’t many stars:
                                  Graph Laplacian
Graph-based semi-supervised learning for sentiment categorization
                                       matrix
   Finding a Closed-Form Solution



                    Constant
                    parameter




                                                               52
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Graph Laplacian Matrix
   Assume n labelled and unlabeled documents




                                                               53
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Finding a Closed-Form Solution




                                                               54
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
    Experiments
 Predict 1 to 4 stars ratings for reviews
    • 4-author data (Pang and Lee, 2005)
          • 1770, 902, 1307 and 1027 documents, respectively
     •                                             *

     • Each document represented as a {0,1} word-presence
     vector, normalized to sum 1
     • Positive-Sentence Percentage (PSP) similarity (Pang and Lee, 2005)
     • Tuned parameters with cross-validation


 * Joachims, T., Transductive Inference for Text Classification using Support Vector
 Machines, in Proceedings of the Sixteenth International Conference on Machine Learning.
 1999, Morgan Kaufmann Publishers Inc.
                                                                                      55
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Experiments

   PSPi is defined as the percentage of positive sentences in review xi.

   The similarity between reviews xi, xj is the cosine angle between the vectors
   (PSPi,1-PSPi) and (PSPj, 1-PSPj)

   Positive sentences are identified using a binary classifier trained on a “snippet
   data set” (10662 documents)




                                                                                   56
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
    Experiments




 Low ratings tend to get low PSP scores
 High ratings tend to get high PSP scores

The trend was qualitatively the same as in Pang and Lee (2005) (Naïve Bayes)
                                                                       57
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Experiments
                                          Number of unlabeled

     α = ak + bk’                            neighbours



     c = k/L               Size of labelled set


      Number of labelled
         neighbours


            Optimal Values (through cross-validation)
             c = 0.2
             α = 1.5
                                                                58
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Results



                                Graph-based SSL
                          outperforms other methods
                           for small labelled set sizes




                                                               59
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
   Alternative Similarity Measure
      The cosine between word vectors containing all words,
      each weighted by its mutual information


 Scaling of mutual information values (maximum = 1)

 Previously found values → weights for corresponding words in the word vectors

 Words in the movie review data that did not appear in the “snippet data set” were excluded


         Optimal Values (through cross-validation)
          c = 0.1
          α = 1.5
                                                                                     60
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
    Results

                                                  In each row, in
20 trial average                                  green is the best
unlabeled set                                     result and any
accuracy for each                                 results that could
                                                  not be distinguished
author across
                                                  from it with a paired
different labelled                                t-test at the 0.05
set sizes and                                     level.
methods




                                                                   61
Seeing stars when there aren’t many stars:
Graph-based semi-supervised learning for sentiment categorization
    Conclusions and Future Work
 Graph-based semi-supervised learning based on PSP similarity achieved better performance
 than all other methods in all four author corpora.

 However, for larger labelled sets its performance was not so good.
 a) Maybe, because SVM regressor trained on a large labelled set can achieve fairly high
 __accuracy without considering relationships between examples.
 b) PSP similarity is not accurate enough, thus biasing the overall performance when labelled
 __data is abundant.

 Investigate better document representations and similarity measures.

 Extend the method to inductive learning setting

 Experiment cross-reviewer and cross-domain analysis, such as using a model learned on
 movie reviews to help classify product reviews.


                                                                                        62
Human Semi-Supervised Learning

     Q: Do humans also use semi-supervised learning?




      A: Apparently, yes!




                                                       63
Human Semi-Supervised Learning
   Some evidences…
   Face recognition is a very challenging computational task.

   However, it is an easy task for humans.
   Differences between two views of the same face are much larger than
   those between two different faces viewed at the same angle. +
    + Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers
   should know about. 2006, MIT.



     Hint: Temporal association



                                                                                                  64
Human Semi-Supervised Learning
    Some evidences…

Observers were shown sequences
of novel faces in which the
identity of the face changed as
the head rotated.



                                             image sequence          Unlabeled data

                As a result, observers showed a tendency to
                treat the views as if they were of the same person.
                       suggests

 We are continuously associating views of objects to support later recognition, and that
 we do so not only on the basis of the physical similarity, but also the correlated
 appearance in time of the objects.

  Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in
                                                                                          65
  National Academy of Sciences. 2001. p. 4800-4804.
Human Semi-Supervised Learning
    Some evidences…

   17-month infants listen to a word, see an object

   They wanted to measure their ability to associate the word and the object

   If the word was heard many times before (without seeing the object;
   unlabeled data), association was stronger.

   If the word was not heard before, association was weaker.




Graf, E., et al., Can Infants Map Meaning to Newly Segmented
Words?: Statistical Segmentation and Word Learning.
Psychological Science, 2007. 18(3): p. 254-260.                                      66
                                                                  Image taken from www.dalla.is
Human Semi-Supervised Learning



        Better understanding of the human cognitive model,
        can guide the development of better machine learning
        algorithms or make existent even better and robust…




                                                               67
References
 • Rosenberg, C., M. Hebert, and H. Schneiderman, Semi-Supervised Self-Training of Object
 Detection Models, in Proceedings of the Seventh IEEE Workshops on Application of
 Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01. 2005, IEEE Computer Society.
 • Goldberg, A.B. and X. Zhu. Seeing stars when there aren't many stars: Graph-based semi-
 supervised learning for sentiment categorization. in TextGraphs: HLT/NAACL Workshop on
 Graph-based Algorithms for Natural Language Processing. 2006.
 • Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment
 categorization with respect to rating scales. In Proceedings of the ACL.
 • Joachims, T., Transductive Inference for Text Classification using Support Vector Machines,
 in Proceedings of the Sixteenth International Conference on Machine Learning. 1999,
 Morgan Kaufmann Publishers Inc.
 • Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers
 should know about. 2006, MIT.
 • Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in
 National Academy of Sciences. 2001. p. 4800-4804.
 • Graf, E., et al., Can Infants Map Meaning to Newly Segmented Words?: Statistical
 Segmentation and Word Learning. Psychological Science, 2007. 18(3): p. 254-260.
                                                                                           68

More Related Content

What's hot

Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clusteringArshad Farhad
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxK MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxkibriaswe
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentationAyanaRukasar
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 

What's hot (20)

Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
K MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptxK MEANS CLUSTERING.pptx
K MEANS CLUSTERING.pptx
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 

Similar to Slides ppt

in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptxMonicaTimber
 
Machine learning
Machine learningMachine learning
Machine learningdeepakbagam
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxYousef Aburawi
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopHannes Fassold
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningMLAI2
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning LandscapeEng Teong Cheah
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 

Similar to Slides ppt (20)

in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
ML_Module_1.pdf
ML_Module_1.pdfML_Module_1.pdf
ML_Module_1.pdf
 
Machine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdfMachine Learning_Unit 2_Full.ppt.pdf
Machine Learning_Unit 2_Full.ppt.pdf
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
AI_06_Machine Learning.pptx
AI_06_Machine Learning.pptxAI_06_Machine Learning.pptx
AI_06_Machine Learning.pptx
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshopJRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
JRs presentation-few-shot-learning-overview @ AI4Media WP5 workshop
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Machine Learning Landscape
Machine Learning LandscapeMachine Learning Landscape
Machine Learning Landscape
 
AI Lesson 33
AI Lesson 33AI Lesson 33
AI Lesson 33
 
Lesson 33
Lesson 33Lesson 33
Lesson 33
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Slides ppt

  • 1. Learning from labelled and unlabeled data Semi-Supervised Learning Machine Learning – PDEEC 2008/2009 Filipe Tiago Alves de Magalhães 26-04-2010
  • 2. Semi-Supervised Learning Supervised Semi-Supervised Unsupervised Learning Learning Learning Labbeled + unlabeled data discover patterns in the data The data have no that relate data attributes target attribute (unlabeled). with a target (class) attribute. Typically, plenty of unlabeled data available. We want to explore the These patterns are then data to find some intrinsic utilized to predict the structures in them. values of the target attribute in future Tries to improve the predictive data instances. power using both labelled and unlabeled data. (Expected to be better than using one alone) 2
  • 3. Semi-Supervised Learning Unlabeled data is easy to obtain Labelled data can be difficult to obtain - human annotation is boring - may require experts - may require special equipment - very time-consuming Examples: - Web page classification (billions of pages) - Email classification (SPAM or No-SPAM) - Speech annotation (400h for each hour of conversation) -… 3
  • 4. Semi-Supervised Learning Semi-Supervised learning can be seen as an excellent way to improve the results that we would get using exclusively supervised or non-supervised methods, for the same scenario. Although we (or specialists) do not need to spend such a big effort labelling data, a great concern must be faced for the design of good models, feature extraction, kernels definition. 4
  • 5. Semi-Supervised Learning Sometimes, it may not be so hard to label data… www.espgame.org Tries to guess the user’s gender based on his/her choices. After that, we tell if it was right or wrong Takes advantage of player’s intervention in order to enrich the training of automatic learning algorithms 5
  • 6. Semi-Supervised Self-Training of Object Detection Models Chuck Rosenberg Martial Hebert Henry Schneiderman Google, Inc. Carnegie Mellon University Carnegie Mellon University 7th IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) 2005 6
  • 7. Semi-Supervised Learning Self-Training L = (Xi , Yi ) Set of labelled data U = (Xi , ? ) Set of unlabeled data Algorithm Repeat • Train a classifier C with training data L • Classify data in U with C • Find a subset U’ of U with the most confident scores • L + U’  L • U – U’  U 7
  • 8. Semi-Supervised Self-Training of Object Detection Models Object detection Object detection based on its shape - time-consuming - exhaustive labelling (background, foreground, object, non-object) Try to simplify the collection and preparation of training data - combining data labelled in different ways - labelling of each image region can take the form of a probability distribution over labels (“weakly” labelled) - e.g., is more likely that the object is present in the centre of the image - e.g., a certain image has a high likelihood of containing the object, but its position is unknown. 8
  • 9. Semi-Supervised Self-Training of Object Detection Models Training Approaches Generic detection algorithm for classification of a subwindow in an image as being part of the “object” class or the “clutter/everything else” class If X – image feature vectors xi – data at a specific location in the image (i = {1, … ,n} indexes images locations) Y – class f – foreground b – background θf – parameters of the foreground model θb – parameters of the background model 9
  • 10. Semi-Supervised Self-Training of Object Detection Models Training Approaches EM approach 10
  • 11. Semi-Supervised Self-Training of Object Detection Models Training Approaches EM approach There are many reasons why EM may not perform well in a particular semi-supervised training context. - EM solely finds a set of model parameters which maximize the likelihood of the data. - Fully labeled data may not sufficiently constrain the solution, which means that there may be solutions which maximize the data likelihood but do not optimize classification performance. 11
  • 12. Semi-Supervised Self-Training of Object Detection Models Training Approaches Alternative 12
  • 13. Semi-Supervised Self-Training of Object Detection Models Detector Overview (Experimental Setup) 1. Subwindow is processed for lighting correction 2. Two-level wavelet transform is applied 3. Features are computed by vector quantizing groups of wavelet coefficients 4. Subwindow is classified by thresholding a linear combination of the log-likelihood ratios of the features Cascade architecture → only image patches which are accepted by the first detector are passed on to the next 13
  • 14. Semi-Supervised Self-Training of Object Detection Models Data (Experimental Setup) Landmark used on a typical training image sample training images and the training examples associated with them Set with positive examples – 231 images 480 training examples 200-300 pixels high and Independent test set – 44 images 300-400 pixels wide 102 test examples 15000 negative examples Training examples – 24 x 16 pixels (rotated, scaled and cropped) 14
  • 15. Semi-Supervised Self-Training of Object Detection Models Training (Experimental Setup) Training the model with fully labeled data consists of the following steps: 1. Given the training data landmark locations • geometrically normalize the training example subimages; • apply lighting normalization to the subimages; • generate synthetic training examples (scaling, shifting and rotating) 2. Compute the wavelet transform of the subimages 3. Quantize each group of wavelet coefficients and build a naïve Bayes model with respect to each group to discriminate between positive and negative examples 4. Adjust the naïve Bayes model using boosting, but maintaining a linear decision function, effectively performing gradient descent on the margin 5. Compute a ROC curve for the detector using a cross validation set 6. Choose a threshold for the linear function, based on the final performance desired 15
  • 16. Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) Selection metric is crucial to the performance of the training 1. Confidence selection • Computed at every iteration by applying the detector trained from the current set of labelled data to the weakly labelled data set. • Detection with highest confidence is selected and added to the training set 2. MSE selection • Is calculated for each weakly labelled example by evaluating the distance between the corresponding image window and all of the other templates in the training data (including the original labelled examples and the weakly labelled examples added in prior iterations) 16
  • 17. Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) The candidate image and the labeled images are first normalized with a specific set of processing steps before the MSE based score metric is computed. The score is based on the Mahalanobis distance 17
  • 18. Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experimental Setup) position MSE selection Detector metric scale The detector must be accurate in localization but need not be accurate in detection since false detection will be discarded due to their large MSE distances to all of the training examples. This is crucial to ensure the performance of the training algorithm with small initial training sets. This is also part of the reason for the MSE to outperform the confidence metric, which requires the detector to be accurate in both localization and detection performance. 18
  • 19. Semi-Supervised Self-Training of Object Detection Models Experiment Scenarios (Experiments and Analysis) Each experiment was repeated using a different initial random subset, in order to avoid the variance that was being observed in the detector performance and in the behaviour of the semi-supervised training process. Experiment = specific set of experimental conditions Run = each repetition of that experiment Mostly, 5 runs were performed for each experiment Typically, 20 weakly labelled images were added to the training set at each iteration, because of the substantial training time of the detector. Ideally, only a single image would be added at each iteration. 19
  • 20. Semi-Supervised Self-Training of Object Detection Models Evaluation Metrics (Experiments and Analysis) Each run was evaluated by using the area under the ROC curve (AUC). Because different experimental conditions affect performance, the AUCs were normalized relatively to the full data performance of that run. if (performance level = = 1.0) { the model being evaluated has the same performance as it would if all of the labelled data was utilised } if (performance level < 1.0) { the model has a lower performance than that achieved with the full data set } To compute the full data performance, each specific run is trained with the full data set and its performance is recorded. 20
  • 21. Semi-Supervised Self-Training of Object Detection Models Baseline training configurations (Experiments and Analysis) Smooth regime was chosen in order to perform experiments under conditions where the addition of weakly labelled data would make a difference. 21
  • 22. Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experiments and Analysis) Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training? Confidence metric MSE metric 22
  • 23. Semi-Supervised Self-Training of Object Detection Models Selection Metrics (Experiments and Analysis) Does the choice of the selection metric make a substantial difference in the performance of the semi-supervised training? d i e n c c r r e e a a s s e e s s 23
  • 24. Semi-Supervised Self-Training of Object Detection Models Relative size of fully Labelled Data(Experiments and Analysis) How many weakly labelled examples do we need to add to the training set in order to reach the best detector performance? 24
  • 25. Semi-Supervised Self-Training of Object Detection Models Conclusions/Discussion 1. The results showed that it was possible to achieve detection performance that was close to the base performance obtained with the fully labelled data, even when a small fraction of the training data was used in the initial training set. 2. The experiments showed that the self-training approach to semi-supervised training can be applied to an existing detector that was originally designed for supervised training. 3. The MSE selection metric consistently outperformed the confidence metric. More generally, the self-training approach using an independently-defined selection metric outperforms both the confidence metrics and the batch EM approaches. During the training process, the distribution of the labeled data at any particular iteration may not match the actual underlying distribution of the data. 25
  • 26. Semi-Supervised Self-Training of Object Detection Models Conclusions/Discussion True labels for the unlabeled data Original unlabeled data and labelled data (c),(d) The points labelled by the incremental self-training algorithm after 5 iterations using the confidence metric and the Euclidean metric, respectively. 26
  • 27. Semi-Supervised Self-Training of Object Detection Models Future Work Study the relation between the semi-supervised training approach evaluated here with the co-training approaches. Develop more precise guidelines for selecting the initial training set. The approach could be extended to training examples that are labelled in different ways. For example, some images may be provided with scale information and nothing else. Additional information may be provided such as the rough shape of the object, or a prior distribution over its location in the image. 27
  • 29. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Andrew B. Goldberg Xiaojin Zhu Computer Sciences Department Computer Sciences Department University of Wisconsin-Madison University of Wisconsin-Madison TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural Language Processing 2006 29
  • 30. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Sentiment Categorization ? ? ? 30
  • 31. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Sentiment Categorization 31
  • 32. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization What we saw is rating inference Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL. In this work… • Graph-based Semi-supervised Learning • Main assumption encoded in graph: • Similar documents should have similar ratings 32
  • 33. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 33
  • 34. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 34
  • 35. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 35
  • 36. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 36
  • 37. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 50% accuracy  37
  • 38. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 38
  • 39. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 100% accuracy  39
  • 40. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Goal 40
  • 41. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Approach 41
  • 42. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Measuring Loss over the Graph 42
  • 43. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 43
  • 44. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 44
  • 45. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 45
  • 46. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 46
  • 47. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 47
  • 48. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization 48
  • 49. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Minimization now is non- trivial 49
  • 50. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Finding a Closed-Form Solution 50
  • 51. Seeing stars when there aren’t many stars: Vector of f Vector of given labels yi Graph-based semi-supervised reviews and sentiment categorization values for for labelled learning for predicted labels for allFinding a Closed-Form Solution reviews unlabeled reviews Labelled Unlabeled C= 51
  • 52. Seeing stars when there aren’t many stars: Graph Laplacian Graph-based semi-supervised learning for sentiment categorization matrix Finding a Closed-Form Solution Constant parameter 52
  • 53. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Graph Laplacian Matrix Assume n labelled and unlabeled documents 53
  • 54. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Finding a Closed-Form Solution 54
  • 55. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Experiments Predict 1 to 4 stars ratings for reviews • 4-author data (Pang and Lee, 2005) • 1770, 902, 1307 and 1027 documents, respectively • * • Each document represented as a {0,1} word-presence vector, normalized to sum 1 • Positive-Sentence Percentage (PSP) similarity (Pang and Lee, 2005) • Tuned parameters with cross-validation * Joachims, T., Transductive Inference for Text Classification using Support Vector Machines, in Proceedings of the Sixteenth International Conference on Machine Learning. 1999, Morgan Kaufmann Publishers Inc. 55
  • 56. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Experiments PSPi is defined as the percentage of positive sentences in review xi. The similarity between reviews xi, xj is the cosine angle between the vectors (PSPi,1-PSPi) and (PSPj, 1-PSPj) Positive sentences are identified using a binary classifier trained on a “snippet data set” (10662 documents) 56
  • 57. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Experiments Low ratings tend to get low PSP scores High ratings tend to get high PSP scores The trend was qualitatively the same as in Pang and Lee (2005) (Naïve Bayes) 57
  • 58. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Experiments Number of unlabeled α = ak + bk’ neighbours c = k/L Size of labelled set Number of labelled neighbours Optimal Values (through cross-validation) c = 0.2 α = 1.5 58
  • 59. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Results Graph-based SSL outperforms other methods for small labelled set sizes 59
  • 60. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Alternative Similarity Measure The cosine between word vectors containing all words, each weighted by its mutual information Scaling of mutual information values (maximum = 1) Previously found values → weights for corresponding words in the word vectors Words in the movie review data that did not appear in the “snippet data set” were excluded Optimal Values (through cross-validation) c = 0.1 α = 1.5 60
  • 61. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Results In each row, in 20 trial average green is the best unlabeled set result and any accuracy for each results that could not be distinguished author across from it with a paired different labelled t-test at the 0.05 set sizes and level. methods 61
  • 62. Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization Conclusions and Future Work Graph-based semi-supervised learning based on PSP similarity achieved better performance than all other methods in all four author corpora. However, for larger labelled sets its performance was not so good. a) Maybe, because SVM regressor trained on a large labelled set can achieve fairly high __accuracy without considering relationships between examples. b) PSP similarity is not accurate enough, thus biasing the overall performance when labelled __data is abundant. Investigate better document representations and similarity measures. Extend the method to inductive learning setting Experiment cross-reviewer and cross-domain analysis, such as using a model learned on movie reviews to help classify product reviews. 62
  • 63. Human Semi-Supervised Learning Q: Do humans also use semi-supervised learning? A: Apparently, yes! 63
  • 64. Human Semi-Supervised Learning Some evidences… Face recognition is a very challenging computational task. However, it is an easy task for humans. Differences between two views of the same face are much larger than those between two different faces viewed at the same angle. + + Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers should know about. 2006, MIT. Hint: Temporal association 64
  • 65. Human Semi-Supervised Learning Some evidences… Observers were shown sequences of novel faces in which the identity of the face changed as the head rotated. image sequence Unlabeled data As a result, observers showed a tendency to treat the views as if they were of the same person. suggests We are continuously associating views of objects to support later recognition, and that we do so not only on the basis of the physical similarity, but also the correlated appearance in time of the objects. Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in 65 National Academy of Sciences. 2001. p. 4800-4804.
  • 66. Human Semi-Supervised Learning Some evidences… 17-month infants listen to a word, see an object They wanted to measure their ability to associate the word and the object If the word was heard many times before (without seeing the object; unlabeled data), association was stronger. If the word was not heard before, association was weaker. Graf, E., et al., Can Infants Map Meaning to Newly Segmented Words?: Statistical Segmentation and Word Learning. Psychological Science, 2007. 18(3): p. 254-260. 66 Image taken from www.dalla.is
  • 67. Human Semi-Supervised Learning Better understanding of the human cognitive model, can guide the development of better machine learning algorithms or make existent even better and robust… 67
  • 68. References • Rosenberg, C., M. Hebert, and H. Schneiderman, Semi-Supervised Self-Training of Object Detection Models, in Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01. 2005, IEEE Computer Society. • Goldberg, A.B. and X. Zhu. Seeing stars when there aren't many stars: Graph-based semi- supervised learning for sentiment categorization. in TextGraphs: HLT/NAACL Workshop on Graph-based Algorithms for Natural Language Processing. 2006. • Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL. • Joachims, T., Transductive Inference for Text Classification using Support Vector Machines, in Proceedings of the Sixteenth International Conference on Machine Learning. 1999, Morgan Kaufmann Publishers Inc. • Sinha, P., et al., Face recognition by humans: 20 results all computer vision researchers should know about. 2006, MIT. • Wallis, G. and H. Bülthoff, Effects of temporal association on recognition memory, in National Academy of Sciences. 2001. p. 4800-4804. • Graf, E., et al., Can Infants Map Meaning to Newly Segmented Words?: Statistical Segmentation and Word Learning. Psychological Science, 2007. 18(3): p. 254-260. 68