Label Propagation                           Seminar:Semi-supervised and unsupervised learning with Applications to NLP    ...
Outline●   What is Label Propagation●   The Algorithm●   The motivation behind the algorithm●   Parameters of Label Propag...
Label Propagation●   Semi-supervised●   Shows good results when the amount of    annotated data is low with respect to the...
K-Nearest Neighbors(KNN)           ●   Shares similar ideas               with Label Propagation           ●   Label Propa...
Idea of the Problem                    Similar near Unlabeled                    Instances should have                    ...
The Model●   A complete graph     ● Each Node is an instance     ●       Each arc has a weight T xy    ●   T xy is high if...
The Model●   Inside a Node:               Soft Labels
Variables - Model  ●   T is a matrix, holding all the weights of the graph                                  N 1 ... N l = ...
Variables - Model●   Y is a matrix, holding the soft probabilities of    each instance                           YN   a   ...
Algorithm            Y will change in              each iteration
How to Measure T?                                                          Distance                                       ...
How to Initialize Y?                                                 0    ●   How to Correctly set the values of   Y      ...
Propagation Step●   During the process Y will change                           0        1              k                  ...
ConvergenceDuring the iteration                  Clamped     Yl                       ̄                             T l l ...
Convergence      ̄Since T is normalized and                          ̄                                 is a submatrix of T...
After convergenceAfter convergence one can find   by solving:               =
Optimization Problem               w i j : Similarity between i j   F should minimize the energy functionf (i ) and f ( j)...
The graph laplacianLet D be a diagonal matrix where                            T̄i j            Rows are normalized so:   ...
Back to the optimization Problem  Energy can be rewritten using laplacianF should minimize the energy function.           ...
Optimization Problem                                                         ̄                                          Δu...
Sigma ParameterRemember the Sigma parameter? ●   It strongly influences the behavior of LP. ●   There can be:        ● jus...
Sigma Parameter            ●   What happens if   σ tends to be:       –   0:            ●   The label of an unknown instan...
Sigma Parameter - MST        Label1                                        Label2This is the minimum arc connectingtwo com...
Sigma Parameter – Learning it How to learn sigma?  ● Assumption :       A good sigma will do classification with       con...
Labeling Approach●   Once Yu is measured how do we assign labels    to the instances?                                 Yu● ...
Labeling Approach        ●   Take the most likely class    ●   Simply, look at the rows of Yu, and choose for each instanc...
Labeling Approach●   Class mass Normalization●   Given some class proportions              P 1 , P 2 ... P k●   Scalate ea...
Labeling Approach●       Label bidding    ●   Given some class proportions   P 1 , P 2 ... P k1.estimate numbers of items ...
Experiment Setup●   Artificial Data    ●   Comparison LP vs kNN (k=1)●   Character recognition    ●   Recognize handwritte...
Results using LP on artificial data
Results using LP on artificial data●   LP finds the structure in the data while KNN fails
P1NN●   P1NN is a baseline for comparisons●   Simplified version of LP    1.During each iteration find the unlabeled insta...
Results using LP on Handwritten                    dataSet●   P1NN (BaseLine), 1NN (kNN)    ●   Cne: Class mass normalizat...
Relation Extraction?●   From natural language texts detect semantic    relations among entitiesExample:B. Gates married Me...
Why LP to do RE?                 Problems  Supervised                  Unsupervised                            Retrieves c...
RE- Problem Definition  ●   Find an appropiate label to an ocurrance of two      entities in a contextExample:….. B. Gates...
RE problem Definition - Features●   Words: in the contexts●   Entity Types: Person, Location, Org...●   POS tagging: of Wo...
RE problem Definition - Labels
Experiment●   ACE 2003 data. Corpus from Newspapers●   Assume all entities have been identified already●   Comparison betw...
ExperimentJensenShannon-Similarity Measure-Measure the distance between two probabilitiy functions-JS is a smoothing of Ku...
Results
Classifying relation subtypes-          SVM vs LP       SVM with linear Kernel
Bootstrapping             Train a ClassifierSeeds                             Classifier        Update set of seeds whose ...
Classifying relation types  Bootstrapping vs LP Starting with 100 random seeds
Results●   Performs well in general when there are few    annotated data in comparison to SVM and kNN●   Irrelevant dimens...
Thank you
Upcoming SlideShare
Loading in …5
×

Label propagation - Semisupervised Learning with Applications to NLP

4,126
-1

Published on

Label Propagation

Published in: Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,126
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
129
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Label propagation - Semisupervised Learning with Applications to NLP

  1. 1. Label Propagation Seminar:Semi-supervised and unsupervised learning with Applications to NLP David Przybilla davida@coli.uni-saarland.de
  2. 2. Outline● What is Label Propagation● The Algorithm● The motivation behind the algorithm● Parameters of Label Propagation● Relation Extraction with Label Propagation
  3. 3. Label Propagation● Semi-supervised● Shows good results when the amount of annotated data is low with respect to the supervised options● Similar to kNN
  4. 4. K-Nearest Neighbors(KNN) ● Shares similar ideas with Label Propagation ● Label Propagation (LP) uses unlabeled instances during the process of finding out the labels
  5. 5. Idea of the Problem Similar near Unlabeled Instances should have similar Labels L=set of Labeled Instances U =set of Unlabeled InstancesWe want to find a function f such that:
  6. 6. The Model● A complete graph ● Each Node is an instance ● Each arc has a weight T xy ● T xy is high if Nodes x and y are similar.
  7. 7. The Model● Inside a Node: Soft Labels
  8. 8. Variables - Model ● T is a matrix, holding all the weights of the graph N 1 ... N l = Labeled Data TllTlu N l+1 .. N n=Unlabeled Data T u lT u uTllTluT ulT uu
  9. 9. Variables - Model● Y is a matrix, holding the soft probabilities of each instance YN a n , R b is the probability of a being labeled as R b YL YU The problem to solveR1 , R 2 ... R k each of the possible labelsN 1 , N 2 ... N n each of the instances to label
  10. 10. Algorithm Y will change in each iteration
  11. 11. How to Measure T? Distance Measure Euclidean DistanceImportant Parameter(ignore it at the moment) we will talk about this later
  12. 12. How to Initialize Y? 0 ● How to Correctly set the values of Y ? ● Fill the known values (of the labeled data) ● How to fill the values of the unlabeled data? → The initialization of this values can be arbitrary.● Transform T into T (row normalization)
  13. 13. Propagation Step● During the process Y will change 0 1 k Y → Y → ... → Y ● Update Y during each iteration
  14. 14. ConvergenceDuring the iteration Clamped Yl ̄ T l l T̄l u Yl = Yu T̄u l T̄ u u Yu Assumming we iterate infinite times then: 1 Y =T U ̄uu Y 0+ T ul Y L u ̄ 2 Y =T U ̄uu ( T̄uu Y 0 + T ul Y L )+T ul Y L u ̄ ̄ ...
  15. 15. Convergence ̄Since T is normalized and ̄ is a submatrix of T:Doing it n times will lead to: Converges to Zero
  16. 16. After convergenceAfter convergence one can find by solving: =
  17. 17. Optimization Problem w i j : Similarity between i j F should minimize the energy functionf (i ) and f ( j) should be similar for a high w i j in order to minimize
  18. 18. The graph laplacianLet D be a diagonal matrix where T̄i j Rows are normalized so: D= IThe graph laplacian is defined as : ̄ T since f :V → RThen we can use the graph laplacian to act on itSo the energy function can be rewritten in terms of
  19. 19. Back to the optimization Problem Energy can be rewritten using laplacianF should minimize the energy function. ̄ Δuu =( D uu −T uu) ̄ Δuu =( I −T uu) ̄ Δ ul =( Dul − T ul ) ̄ Δ ul =−T ul
  20. 20. Optimization Problem ̄ Δuu =( D uu −T uu) Delta can be rewritten in terms of ̄ T ̄ Δ uu=( I − T uu) ̄ Δ ul =( Dul − T ul ) ̄ f u =( I −T uu)T ul f l ̄ Δ ul =−T ulThe algorithm converges to theminimization of the Energy function
  21. 21. Sigma ParameterRemember the Sigma parameter? ● It strongly influences the behavior of LP. ● There can be: ● just one σ for the whole feature vector ● One σ per dimension
  22. 22. Sigma Parameter ● What happens if σ tends to be: – 0: ● The label of an unknown instance is given by just the nearest labeled instance – Infinite ● All the unlabaled instances receive the same influence from all labeled instances. The soft probabilities of each unlabeled instance is given by the class frecuency in the labeled data● There are heuristics for finding the appropiate value of sigma
  23. 23. Sigma Parameter - MST Label1 Label2This is the minimum arc connectingtwo components with differents labels (min weight (arc)) σ= 3 Arc connects two components with different label
  24. 24. Sigma Parameter – Learning it How to learn sigma? ● Assumption : A good sigma will do classification with confidence and thus minimize entropy.How to do it? ● Smoothing the transition Matrix T ● Finding the derivative of H (the entropy) w.r.t to sigma When to do it? ● when using a sigma for each dimension can be used to determine irrelevant dimensions
  25. 25. Labeling Approach● Once Yu is measured how do we assign labels to the instances? Yu● Take the most likely class● Class mass Normalization● Label Bidding
  26. 26. Labeling Approach ● Take the most likely class ● Simply, look at the rows of Yu, and choose for each instance the label with highest probability● Problem: no control on the proportion of classes
  27. 27. Labeling Approach● Class mass Normalization● Given some class proportions P 1 , P 2 ... P k● Scalate each column C to Pc ● Then Simply, look at the rows of Yu, and choose for each instance the label with highest probability
  28. 28. Labeling Approach● Label bidding ● Given some class proportions P 1 , P 2 ... P k1.estimate numbers of items per label (C k )2. choose the label with greatest number of items, take C kitems whose probabilty of being the current label is the highestand label as the current selected label.3. iterate through all the possible labels
  29. 29. Experiment Setup● Artificial Data ● Comparison LP vs kNN (k=1)● Character recognition ● Recognize handwritten digits ● Images 16x16 pixels,gray scale ● Recognizing 1,2,3. ● 256 dimensional vector
  30. 30. Results using LP on artificial data
  31. 31. Results using LP on artificial data● LP finds the structure in the data while KNN fails
  32. 32. P1NN● P1NN is a baseline for comparisons● Simplified version of LP 1.During each iteration find the unlabeled instance nearest to a labeled instance and label it 2. Iterate until all instances are labeled
  33. 33. Results using LP on Handwritten dataSet● P1NN (BaseLine), 1NN (kNN) ● Cne: Class mass normalization. Proportions from Labeled Data ● Lbo: Label bidding with oracle class proportions ● ML: most likely labels
  34. 34. Relation Extraction?● From natural language texts detect semantic relations among entitiesExample:B. Gates married Melinda French on January 1, 1994 spouse(B.Gates, Melinda French)
  35. 35. Why LP to do RE? Problems Supervised Unsupervised Retrieves clusters ofNeeds many relations with noannotated data label.
  36. 36. RE- Problem Definition ● Find an appropiate label to an ocurrance of two entities in a contextExample:….. B. Gates married Melinda French on January 1, 1994Context(Cpre) Context Entity 2 Entity 1 (Cmid) Context (e2) (Cpos) (e1) Idea: if two ocurrances of entity pairs ahve similar Contexts, then they have same relation type
  37. 37. RE problem Definition - Features● Words: in the contexts● Entity Types: Person, Location, Org...● POS tagging: of Words in the contexts● Chunking Tag: mark which words in the contexts are inside chunks● Grammatical function of words in the contexts. i.e : NP-SBJ (subject)● Position of words: ● First Word of e1 -is there any word in Cmid -first word in Cpre,Cmid,Cpost... ● Second Word of e1.. -second word in Cpre...
  38. 38. RE problem Definition - Labels
  39. 39. Experiment● ACE 2003 data. Corpus from Newspapers● Assume all entities have been identified already● Comparison between: – Differents amount of labeled samples 1%,10%,25,50%,75%,100% – Different Similarity Functions – LP, SVM and Bootstrapping● LP: ● Similarity Function: Cosine, JensenShannon ● Labeling Approach: Take the most likely class ● Sigma: average similarity between labeled classes
  40. 40. ExperimentJensenShannon-Similarity Measure-Measure the distance between two probabilitiy functions-JS is a smoothing of Kullback-Leibler divergence DK L Kullback-Leibler divergence -not symmetric -not always has a finite value
  41. 41. Results
  42. 42. Classifying relation subtypes- SVM vs LP SVM with linear Kernel
  43. 43. Bootstrapping Train a ClassifierSeeds Classifier Update set of seeds whose confidence is high enough
  44. 44. Classifying relation types Bootstrapping vs LP Starting with 100 random seeds
  45. 45. Results● Performs well in general when there are few annotated data in comparison to SVM and kNN● Irrelevant dimensions can be identified by using LP● Looking at the structure of unlabeled data helps when there is few annotated data
  46. 46. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×