NIPS2007: structured prediction

Structured Prediction: A Large Margin Approach Ben Taskar University of Pennsylvania

Acknowledgments ,[object Object],[object Object],[object Object],[object Object],Dan Klein Daphne Koller Simon Lacoste-Julien Paul Vernaza

Structured Prediction ,[object Object],[object Object],[object Object]

Handwriting Recognition brace Sequential structure x y

Object Segmentation Spatial structure x y

Natural Language Parsing The screen was a sea of red Recursive structure x y

Bilingual Word Alignment What is the anticipated cost of collecting fees under the new proposal? En vertu des nouvelles propositions, quel est le coût prévu de perception des droits? x y What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions , quel est le coût prévu de perception de les droits ? Combinatorial structure

Protein Structure and Disulfide Bridges Protein: 1IMT AVITGA C ERDLQ C G KGT CC AVSLWIKSV RV C TPVGTSGED C H PASHKIPFSGQRMH HT C P C APNLA C VQT SPKKFK C LSK

Local Prediction ,[object Object],[object Object],b r e a c

Local Prediction building tree shrub ground

Structured Prediction ,[object Object],[object Object],b r e a c

Structured Prediction building tree shrub ground

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Structured Models ,[object Object],[object Object],space of feasible outputs scoring function

Chain Markov Net (aka CRF*) y x *Lafferty et al. 01 a-z a-z a-z a-z a-z

Associative Markov Nets Point features spin-images, point height Edge features length of edge, edge orientation y j y k  jk  j “ associative” restriction

CFG Parsing #(NP  DT NN) … #(PP  IN NP) … #(NN  ‘sea’)

Bilingual Word Alignment ,[object Object],[object Object],[object Object],What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions , quel est le co û t prévu de perception de le droits ? j k

Disulfide Bonds: Non-bipartite Matching 1 2 3 4 6 5 RS CC P C YWGG C PWGQN C YPEG C SGPKV 1 2 3 4 5 6 6 1 2 4 5 3 Fariselli & Casadio `01, Baldi et al. ‘04

Scoring Function RS CC P C YWGG C PWGQN C YPEG C SGPKV 1 2 3 4 5 6 RS CC P C YWGG C PWGQ N C YPEG C SGPK V 1 2 3 4 5 6 ,[object Object],[object Object],1 2 3 4 6 5

Structured Models ,[object Object],[object Object],[object Object],space of feasible outputs scoring function

Supervised Structured Prediction Learning Prediction Estimate w Example: Weighted matching Generally: Combinatorial optimization Data Model: Likelihood (can be intractable) Margin Local (ignores structure)

Local Estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Model:

Conditional Likelihood Estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Model:

[object Object],[object Object],OCR Example a lot! … “ brace” “ brace” “ aa a aa ” “ brace” “ aa a ab ” “ brace” “ zzzzz ”

[object Object],[object Object],Parsing Example ‘ It was red’ a lot! … ‘ It was red’ ‘ It was red’ ‘ It was red’ ‘ It was red’ ‘ It was red’ ‘ It was red’ S A B C D S A B D F S A B C D S E F G H S A B C D S A B C D S A B C D

[object Object],[object Object],Alignment Example ‘ What is the’ ‘ Quel est le’ a lot! … ‘ What is the’ ‘ Quel est le’ ‘ What is the’ ‘ Quel est le’ ‘ What is the’ ‘ Quel est le’ 1 2 3 1 2 3 ‘ What is the’ ‘ Quel est le’ ‘ What is the’ ‘ Quel est le’ ‘ What is the’ ‘ Quel est le’ 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Structured Loss b c a r e b r o r e b r o c e b r a c e 2 2 1 0 ‘ What is the’ ‘ Quel est le’ 0 1 2 2 ‘ It was red’ 0 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 S A E C D S B E A C S B D A C S A B C D

Large margin estimation ,[object Object],[object Object],[object Object],# of mistakes in y *Collins 02, Altun et al 03, Taskar 03

Large margin estimation ,[object Object],[object Object]

Large margin estimation ,[object Object],[object Object],[object Object]

Min-max formulation LP Inference Structured loss (Hamming): Inference discrete optim. Key step: continuous optim.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Alternatives: Perceptron

[object Object],[object Object],[object Object],[object Object],[object Object],Alternatives: Constraint Generation [Collins 02; Altun et al, 03]

Matching Inference LP Has integral solutions z ( A is totally unimodular) degree What is the anticipated cost of collecting fees under the new proposal ? En vertu de les nouvelles propositions , quel est le co û t prévu de perception de le droits ? j k [Nemhauser+Wolsey 88] Need Hamming-like loss

y  z Map for Markov Nets 0 0 0 0 0 . . . 0 . 0 0 0 . 1 0 0 : 0 1 0 : 1 0 0 : 0 1 0 : 1 0 0 : 1 0 z : b a 0 0 0 0 0 . . . 0 . 0 1 0 . 0 0 0 0 0 0 0 . . . 0 . 0 0 0 . 1 0 0 0 0 0 0 . . . 0 . 1 0 0 . 0 0 z : b a z . b a z . b a z . b a z . b a

Markov Net Inference LP Has integral solutions z for chains, (hyper)trees Can be fractional for untriangulated networks normalization agreement [Chekuri+al 01, Wainright+al 02] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0

Associative MN Inference LP ,[object Object],[object Object],“ associative” restriction [Greig+al 89, Boykov+al 99, Kolmogorov & Zabih 02, Taskar+al 04] 0 0 1 0 0 0 1 0 0 0 1 0

CFG Chart ,[object Object],[object Object],[object Object]

CFG Inference LP inside outside Has integral solutions z root

LP Duality ,[object Object],[object Object],[object Object],[object Object],[object Object]

Min-max Formulation LP duality

Min-max formulation summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*Taskar et al 04

Unfactored Primal/Dual QP duality Exponentially many constraints/variables

Factored Primal/Dual By QP duality Dual inherits structure from problem-specific inference LP Variables  correspond to a decomposition of  variables of the flat case

The Connection b c a r e b r o r e b r o c e b r a c e r c a o c r .2 .15 .25 .4 .2 .35 .65 .8 .4 .6 1 b 1 e 2 2 1 0 

Duals and Kernels ,[object Object],[object Object],[object Object]

3D Mapping Laser Range Finder GPS IMU Data provided by: Michael Montemerlo & Sebastian Thrun Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points

Segmentation results Hand labeled 180K test points 93% M 3 N 73% V-SVM 68% SVM Accuracy Model

LAGRbot: Real-time Navigation LAGRbot: Paul Vernaza & Dan Lee Range of stereo vision limited to approximately 15 m or less

LAGRbot: Real-time Navigation 160x120 images: Real time prediction/learning (~100ms) Current work with Paul Vernaza, Dan Lee 8% Structured 17% Local Error Model

Hypertext Classification ,[object Object],[object Object],[object Object],[object Object],53% error reduction over SVMs 38% error reduction over RMNs relaxed LP *Taskar et al 02 better loopy belief propagation

Word Alignment Results Data: [Hansards – Canadian Parliament] Features induced on  1 mil unsupervised sentences Trained on 100 sentences (10,000 edges) Tested on 350 sentences (35,000 edges) [Taskar+al 05] *Error: weighted combination of precision/recall [Lacoste-Julien+Taskar+al 06] *Error Model 6.5 GIZA/IBM4 [Och & Ney 03] 4.5 +Our approach+QAP 5.4 +Local learning+matching 4.9 +Our approach

Modeling First Order Effects ,[object Object],[object Object],[object Object],[object Object],Local fertility Local inversion Monotonicity

Certificate formulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ij kl 1 2 3 4 6 5

Certificate for non-bipartite matching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Edmonds ‘65 1 2 3 4 6 5

Certificate for non-bipartite matching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 2 3 4 6 5

Certificate formulation ,[object Object],[object Object],[object Object],[object Object],*Taskar et al. ‘05

Disulfide Bonding Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[Taskar+al 05] AVITGA ERDLQ GKGT AVSLWIKSVRV TPVGTSGED HPASHKIPFSGQRMHHT P APNLA VQTSPKKFK LSK C C CC C C C C C C *Accuracy: % proteins with all correct bonds 52% Recursive Neural Net [Baldi+al’04] 55% Our approach (certificate) 41% Local learning+matching *Acc Model

Formulation summary ,[object Object],[object Object],[object Object],[object Object],[object Object]

Scalable Algorithms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Structured Extragradient ,[object Object],[object Object],[object Object],[object Object],[Taskar+al 06] j s t k All capacities = 1 quel est le co û t prévu What is the anticipated cost Flow cost

Extragradient Method [Korpelevich76] Prediction: Correction: = Euclidean projection = step size Theorem: Extragradient converges linearly Key computation is Euclidean projection usually easy harder

for Bipartite Matchings: Min Cost Flow ,[object Object],[object Object],[object Object],[Taskar+al 06] j s t k All capacities = 1 quel est le co û t prévu What is the anticipated cost Flow cost

Structured Extragradient ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[Taskar+al 06]

Other approaches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Generalization Bounds “ If the past any indication of the future, he’ll have a cruller.”

Several Pointers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Open Questions for Large-Margin Estimation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Learning with LP relaxations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Segmentation Model  Min-Cut ,[object Object],[object Object],[object Object],0 1 Local evidence Spatial smoothness [Greig+al 89, Boykov+al 99, Kolmogorov & Zabih 02, Taskar+al 04]

NIPS2007: structured prediction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NIPS2007: structured prediction

Similar to NIPS2007: structured prediction (20)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

NIPS2007: structured prediction

Editor's Notes