Efficient Decomposed Learning for Structured Prediction #icml2012
Upcoming SlideShare
Loading in...5
×
 

Efficient Decomposed Learning for Structured Prediction #icml2012

on

  • 729 views

 

Statistics

Views

Total Views
729
Views on SlideShare
718
Embed Views
11

Actions

Likes
1
Downloads
7
Comments
0

3 Embeds 11

https://si0.twimg.com 6
http://us-w1.rockmelt.com 3
https://twimg0-a.akamaihd.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Efficient Decomposed Learning for Structured Prediction #icml2012 Efficient Decomposed Learning for Structured Prediction #icml2012 Presentation Transcript

  • Efficient  Decomposed  Learning  for  Structured   Prediction Rajhans  Samdani,  Dan  Roth  (Illinois)   Presenter:  Yoh  Okuno
  • Abstract•  Structured  learning  is  important  for  NLP  or  CV  •  Enormous  output  space  is  often  intractable  •  Proposed  DecL:  decomposed  learning  •  DecL  restrict  output  space  to  limited  part  •  Efficient  and  accurate  in  experiment  and  theory
  • Introduction•  What  is  Structured  learning?   –  Predict  output  variables  which  mutually  depend   –  Problem:  enormous  output  space  (exponential)  •  Applications:  NLP,  CV  or  Bioinformatics   –  Multi  label  document  classification  (binary)  [Crammer+02]   –  Information  extraction  (sequence)  [Lafferty+  01]     –  Dependency  parsing  (tree)  [Koo+  10]    
  • Example:  Conditional  Random  Fields Output  Space [Lafferty+  01]  
  • Example:  Markov  Random  Fields Output  Space [Boykov+  98]  
  • Related  Work•  There  are  two  major  approaches  1.  Global  Learning  (GL):  Exact  but  Slow  [Tsochantaridis+  04]     –  Search  the  entre  output  space  in  learning  phase   –  Often  implemented  by  ILP  (Integer  Linear  Programming)  2.  Local  Learning  (LL):  Inaccurate  but  Fast   –  Ignore  structure  of  output  for  fast  search  •  DecL  is  exact  in  some  assumption  but  faster  than  LL
  • Problem  Setting•  Given  training  data:   1 1 m m D = {(x , y ), ..., (x , y )}•  Output  y  is  represented  as  binary  variables   n y = {y1 , ..., yn } ∈ {0, 1}•  Model  is  linear  combination  of  features    
  • Structured  SVM [Tsochantaridis+  04]   •  Minimize  loss  function  below:   m ￿l(w) = (max f (xj , y; w) + ∆(yj , y)) − f (xj , yj ; w)   y∈Y j=1   Rewarding  incorrect  output •  Generalized  hinge-­‐loss  to  multi  dimension   •  Regularization  term  is  omitted  for  space  issue   •  See  [Tokunaga  2011]  for  more  information
  • Figure  1:  GL  and  DecL•  Search  neighborhood  around  gold  output   rather  than  entire  search  space  
  • DecL:  Decomposed  Learning •  Define  neighborhood  around  gold  output:   m ￿l(w) = ( max j f (xj , y; w) + ∆(yj , y)) − f (xj , yj ; w) y∈nbr(y ) j=1 •  Note:  prediction  phase  need  global  search   •  How  can  we  define  neighborhood  for  learning?
  • Sub  Gradient  Descent  for  DecL
  • DecL-­‐k:  Special  Case  of  DecL•  Restrict  output  space  to  k-­‐dimension   –  Take  all  subsets  of  size  k  from  indices  of  y   –  Other  dimensions  are  equal  to  gold  output  •  Domain  knowledge  can  be  used  in  general   –  Group  coupled  variables  into  same  groups   –  Complexity  depends  on  size  of  decomposition
  • Experiments  on  Synthetic  Data•  Compared  DecL,  LL  and  GL  (Oracle)  •  Synthetic  training  data:   –  10  binary  output  with  random  linear  constraints   –  20-­‐dimensional  input,  320  training  examples  •  Running  time  in  seconds:  
  • Multi  Label  Document  Classification•  Dataset:  Reuter  corpus  •  Size:  6,000  documents  and  30  labels  •  DecL  performs  well  as  GL  and  6x  faster  
  • Information  Extraction:  Sequence  Tagging•  Data  1:  citation  recognition   –  Recognize  author,  title..  from  citation  text  •  Data  2:  advertisement  for  real  estate   –  Recognize  facility,  roommates..  from  ads  
  • Conclusion•  Structured  learning  has  a  tradeoff  between  speed  and   accuracy  •  Decomposition  learning  (DecL)  splits  output  space   into  small  space  for  fast  inference  •  Fast  and  accurate  in  real  world  dataset  •  Theoretical  guarantee  for  exact  search  under  some   assumptions  (skipped)  
  • Reference•  [Collins+  02]  Discriminative  training  methods  for  hidden  Markov  models:  Theory   and  experiments  with  perceptron  algorithms.  •  [Lafferty+  01]  Conditional  random  fields:  Probabilistic  models  for  segmenting  and   labeling  sequence  data.  •  [Koo+  10]  Dual  decomposition  for  parsing  with  nonprojective  head  automata.  •  [Boykov+  98]  Markov  Random  Fields  with  Efficient  Approximations.  •  [Tsochantaridis+  04]  Support  vector  machine  learning  for  interdependent  and   structured  output  spaces.  •  [Crammer+  02]  On  the  algorithmic  implementation  of  multiclass  kernel-­‐based   vector  machines.