Томми Яаккола "Масштабирование структурных предсказаний"

623 views

Published on

31 января, семинар "День MIT в Яндексе"
Томми Яаккола "Масштабирование структурных предсказаний"

- Использовании структурного предсказания в приложениях, связанных с обработкой естественного языка, компьютерным зрением, вычислительной биологией.

- Использовании методов двойственного разложения в качестве точных алгоритмов предсказания.

- Изучении методов для эффективной оценки моделей структурного предсказания, не требующих решения единой комбинаторной задачи высокой сложности.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
623
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Томми Яаккола "Масштабирование структурных предсказаний"

  1. 1. Scaling structured prediction Tommi Jaakkola MIT in collaboration with M. Collins, M. Fromer, T. Hazan, T. Koo, O. Meshi, A. Rush, D. Sontag
  2. 2. Structured prediction• Natural language processing - e.g., tagging, morphology segmentation, dependency parsing• Computer vision - e.g., segmentation, stereo reconstruction, object recognition• Computational biology - e.g., molecular structure prediction, pathway reconstruction• Robotics - e.g., imitation learning, inverse kinematics• Human-computer interaction - e.g., interface alignment, example based designs• etc.
  3. 3. Structured prediction• The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked
  4. 4. 0.4 0.4 = x) ✓ • We’d like to learn these = s(y; x)s(y; func became su⇥ciently high do we find cI2 at the mutated Acknowledgments 0.4 0.4 0.4 Binding Freq Binding Freq Binding Freq Binding Freq Binding Freq Binding Freq OR 1 as well. Note, however, that cI2 inhibits transcrip- 0.2 0.2 0.2 0.2 0.2 0 10 10 f !2 10 repressor /f 1 10 10 f 0 repressor /f observed Structured prediction tion at OR 3 prior 0to occupying OR 1. 0 Thus the binding This work was supported in 10 f !2 0 2 /f repressor RNA 0 at the mutated ORRNAcould not beRNA 10 10 without10in- and by NSF 10 10 grant 10 10 10 f /f repressor RNA 0 ITR f042 !2 repre 2 0 !2 2 0 !2 2 f 0 terventions. “Fundaci´n Rafael del Pino o • The(a) O 3is to learn a3mapping from input examples (x) goal (a) O R (b) O 2 50 (b) O 2 R 50 R R (c) O to complex objects (y) x Figure• Predictions are again are Ragain and mutated OR 1 for increasin 3: Predicted protein binding to sites O 3, OR 2, qualitatively mutated OR Figure• Predictions qualitatively R 3, OR 2, and correct 3: Predicted protein binding to sites O correct - e.g., from pairs of images (x) to disparity maps (y) References • We’d•ylike tolike to these functio We’d learn learn these f ame 7 Discussion we find cI2 do we find cI2 at Acknowledgments became su⇥ciently high at the mutated the mutated Acknowledgm su⇥ciently high do1 as well. Note, as well. Note, cI2 inhibits transcrip- OR 1 however, that however, that cI2 inhibits transcrip- y =3 priorO 1. Thus the binding This binding This work was suppn at OR 3 prior to occupying to occupying OR 1. Thus the work was supported in in pha tion at OR [1] Adam Arkin, John Ro Stochastic kinetic ana way bifurcation part b We believe the game theoretic approach provides a com- Rthe mutatedcausalcould notOR 1observed without in- and by NSF ITR Genetics, 149:16 pelling at the mutated beof could not be observed without in- andgrant 0428715.g OR 1 abstraction biological systems with re- cells. by NSF ITRventions. terventions. The model is complete with prov- Art “Fundaci´n Rafael del Pino” Fello Books o “Fundaci´n Rafael Dolls o Laundry source constraints. Figure 2. The six datasets used in this paper. Shown is the left image of each pair [2] Kenneth J. Arrow and ably convergent algorithms for finding equilibria on a x= genome-wide scale. x [Scharstein & Pal 07, Mid x an equilibrium for a c have an MPE estimate from running graph cuts we use it to compute our expectation Referencesthe em- References metrica, 22(3):265–290 in a manner similar to The results from the small scale distribution. Training a en- pirical application are lattice-structured model us- DiscussionOur model successfully approach described here is Adam Arkin, John Ross, and couraging. Discussion 7 ing the reproduces known thus [3] Z. Bar-Joseph, G. G [1] a generalization Adam Arkin, J [1] of behavior of the y y Viterbi path-based methods described in [32]. For our learn- switch on ing experiments of use straightforward gradient-based up- B. Gordon the basis we molecularStochastic Yoo, J. kinetic analysis o level competition and resource constraints, al.,learning rate. way bifurcation in phage -in (Scharsteinvariable ’07) the dates with a et without believe theWe believe the game theoretic approach provides a com- game theoretic approach provides a com- Stochastic kin T. Jaakkola,bifurcatio way R. Youn need to assume protein-protein interactions between cI2cells. Genetics, discovery of gtationalcells. Genetics 149:1633–164ling causal abstraction of abstraction of biologicalre- pelling causal biological systems withArtsystems with re- dimers and cI2 and RNA-polymerase. EvenBooksthe con- Booksnetworks. Laundry Biot Art Datasets 4. in Dolls Nature Dolls rce constraints. The model is Figure 2.model is Figure 2. The six with prov-this paper. Shown is theand the cor source constraints. The The six with complete datasets used the left image of each pair left image complete datasets prov-this paper. Shown is in used in text of this well-known sub-system, however, few quan-Kenneth J.[2] KennethGerard [2] 2003. Arrow and J. Ary convergent algorithms for algorithms for finding a ably convergent finding equilibria on equilibria on a
  5. 5. Structured prediction • The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from pairs of web pages (x) to their alignments (y)Structured-Prediction Algorithm for Example-Based Web Design y= semantic alignmentRanjitha Kumar Jerry O. Talton Salman Ahmad Scott R Klemmer Stanford University⇤a corpus of design examples unparalleled ver, leveraging existing designs to pro- x=ntly difficult. This paper introduces theutomatically transferring design and con- Bricolage introduces a novel structured- learns to create coherent mappings be- n human-generated exemplars. The pro- be used to automatically transfer the con-he style and layout of another. We showo accurately reproduce human page map- (Kumar et al., ’10)s a general, efficient, and automatic tech-ent between a variety of real Web pages.Nrely on examples for inspiration [Herringes can facilitate better design work [Lee
  6. 6. Structured prediction• Natural language processing - e.g., tagging, morphology segmentation, dependency parsing• Computer vision - e.g., segmentation, stereo reconstruction, object recognition• Computational biology - e.g., molecular structure prediction, pathway reconstruction• Robotics - e.g., imitation learning, inverse kinematics• Human-computer interaction - e.g., interface alignment, example based designs• etc.
  7. 7. Goals and challenges• Goals - use rich classes of output structures - exercise fine control of how structures are chosen (scoring) - learn models efficiently from data• Challenges - prediction problems are often provably hard - most learning algorithms rely on explicit predictions and are therefore inefficient with large amounts of data - richer structures lead to ambiguity
  8. 8. Structured prediction• The goal is to learn a mapping from input examples (x) to complex objects (y) - e.g., from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked - in lexicalized dependency parsing, we draw an arc from the head word of each phrase to words that modify it - the resulting parse is a directed tree. In many languages, the tree is non-projective (crossing arcs) - each sentence is mapped to arc scores; the parse is obtained as the highest scoring directed tree
  9. 9. Structured prediction• The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selectedn and zero otherwise21 ⇤ 1 2 n
  10. 10. Structured prediction• The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selectedn and zero otherwise x ! w · f (x; i, j) = ✓(i, j) sentence features arc scores parameters21 ⇤ 1 2 n
  11. 11. Structured prediction• The goal is to learn a mapping from sentences (x) to dependency parses (y) y= x= * John saw a movie yesterday that he liked i=0 1 2 n y(i, j) = 1 if arc i ! j is selectedn and zero otherwise x ! w · f (x; i, j) = ✓(i, j) sentence features arc scores parameters ⇢X21 y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y ⇤ 1 2 n i,j highest scoring tree
  12. 12. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n• The complexity of the prediction task depends on how we score each candidate tree• In an arc factoredX model (as before) each arc is scored separately y(i, j)✓(i, j) i,j• The highest scoring tree is found as the maximum weighted directed spanning tree ⇢X y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y i,j
  13. 13. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n• The complexity of the prediction task depends on how we score each candidate tree• In an arc factoredX model (as before) each arc is scored separately y(i, j)✓(i, j) i,j• The highest scoring tree is found as the maximum weighted directed spanning tree ⇢X y ⇤ = argmax y(i, j)✓(i, j) + ✓T (y) y i,j
  14. 14. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n• The complexity of the prediction task depends on how we score each candidate tree• It is often advantageous to include interactions between modifiers (outgoing arcs) known as “sibling scoring” X ✓i (y|i ), where y|i = { y(i, j), j 6= i } i
  15. 15. Structured prediction y= x= * John saw a movie yesterday that he liked i=0 1 2 n• The complexity of the prediction task depends on how we score each candidate tree• It is often advantageous to include interactions between modifiers (outgoing arcs) known as “sibling scoring” X ✓i (y|i ), where y|i = { y(i, j), j 6= i } i• Finding the highest scoring tree is now NP-hard (McDonald and Satta,⇢2007) X y ⇤ = argmax ✓(y|i ) + ✓T (y) y i
  16. 16. Decomposition * John saw a movie yesterday that he liked i=0 1 2 n ✓T (y) directed tree arc factored scores ...✓0(y|0) ✓2(y|2) ✓n(y|n) modifiers (outgoing arcs) solved separately for each word • We can always turn a hard problem into an easy one by solving each “part” separately from others • But the parts are unlikely to agree on a solution ...
  17. 17. Dual decomposition * John saw a movie yesterday that he liked i=0 1 2 n X ✓T (y) directed tree ✓T (y) + y(i, j) (i, j) arc factored scores i,j effective arc agreement ... X ...✓0(y|0) ✓2(y|2) ✓n(y|n) ✓i(y|i) y(i, j) (i, j) modifiers (outgoing arcs) solved j6=i separately for each word • We can encourage parts to agree on the maximizing arcs via Lagrange multipliers (c.f. Guignard, Fisher, ‘80s)
  18. 18. Dual decomposition algorithm• An iterative sub-gradient algorithm (Koo et al., 2010) * John saw a movie yesterday that he liked find a directed spanning tree X ˆ y = argmax ✓T (y) + y(i, j) (i, j) y i,j find modifiers of each word X ˆ0 y|i = argmax ✓i(y|i) y(i, j) (i, j) y|i j6=i update Lagrange multipliers based on disagreement (i, j) (i, j) + ↵k y 0(i, j) ˆ y (i, j) ˆ
  19. 19. Dual decomposition algorithm• An iterative sub-gradient algorithm (Koo et al., 2010) * John saw a movie yesterday that he liked find a directed spanning tree X ˆ y = argmax ✓T (y) + y(i, j) (i, j) y i,j find modifiers of each word X ˆ0 y|i = argmax ✓i(y|i) y(i, j) (i, j) y|i j6=i update Lagrange multipliers based on disagreement (i, j) (i, j) + ↵k y 0(i, j) ˆ y (i, j) ˆ• Thm: The solution is optimal if an agreement (no updates) is reached
  20. 20. Dual decomposition in practicenvergence shows the percentage of test cases where the • The table sub-gradient algorithm quickly finds the optimal solution CertS CertG Dan 99.07 98.45 Dut 98.19 97.93 Por 99.65 99.31 Slo 90.55 95.27 Swe 98.71 98.97 Tur 98.72 99.04 1 Eng 98.65 99.18 Eng2 98.96 99.12 Dan 98.50 98.50 Dut 98.00 99.50
  21. 21. Goals and challenges • Goals - use rich classes of output structures - exercise fine control of how structures are chosen (scoring) - learn models efficiently from data • ChallengesX - prediction problems may be provably hard but we can solve practical instances effectively with decomposition methods - most learning algorithms rely on explicit predictions and are therefore inefficient with large amounts of data - richer structures lead to ambiguity
  22. 22. Learning to predict • We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores - e.g., lexicalized dependency parsing (1) y (2) ...y John saw a movie x (2) * kids make nutritious snacks ...x(1) *
  23. 23. • We can0#,12(# the equilibriumfindthe gamecI2 find of (binding the game cI2 RNAp frequencies) ••Prediction is often maximizingmaxMRF ••Prediction is often done by done by maximizing Prediction is often maximizing an MRF Prediction is often done by done by maximizing RNAp m an s(y; 1 1 1 Binding Frequency (time!average Binding Frequency (time!average Binding Frequency (time!average cI2 0.8 5&1-6# RNAp Binding Frequency (time!averag • We can0#,12(# the equilibrium of RNAp Binding Frequency (time!averag Binding Frequency (time!averag 0.8 0.8 cI2 5&1-6# (binding frequencies) 0.8 0.8 0.8 as a function of overall functionconcentrations.7&8.6, as a protein of overall proteinRepressor Repressor concentrations. Repressor Repressor X Repressor Repressor 3-142(# X y RNA!polymerase RNA!polymerase RNA!polymerase 0.6 3-142(# 0.6 RNA!polymerase 0.6 7&8.6, RNA!polymerase RNA!polymerase Learning to predict✓f (yf ; x)✓f (yf ; x) 0.6 0.6 0.6 0.4 0.2 0.4 0.4 cI 0.2 s(y; x) =s(y; x) = O R3 O R2 O R1 cI cro R3 O R2 O R1 O O R3 O R2 O0.2 cI cro R3 0.2 R2 O R1 O O 0.4 0.4 cro cro 0.4 cI 0.2 0.2 R1 x x the score functions from data f f 0 0 2 0 0 0 0 x !2 0 !2 0 !2 2 0 !2 2 0 !2 2 0 !2 2 0 2 x can of the equilibrium of the game (binding ••We’dfind the equilibriumfindthe game (binding frequencies) frequencies) 10 10 1010 10 10 10 10 10 10 10 10 10 10 10 10 10 10 like •to estimate f /f frepressor/fRNA f /f frepressor/f f /f frepressor/fRNA repressor RNA repressor RNA repressor RNA Bindinget OR31998 Arkin in al. Bindinget OR31998 Arkin in al. Binding in OR2 Binding inRNA 2 O Binding in OR1 Binding in OR1 R Bindinget OR31998 Arkin in al. Binding in OR2 Binding in OR1 such thatFigure•3:bindingoftotheproteinareproteinthemutated(binding frequencies) 1 for increasingmaxs(ym Bindinget al. 1998 Arkin in (a) OR 3 OR3 1 1 Binding in OR2 1 Binding in OR1 1 Figure• function of overall Predictions 3, O again correct 2, and mutated amounts of cI . max of s(y 50 50 (b) O 2 (c) O 1 Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 (a) OR 31 R (b) OR 2 1 R (c) OR 1 We Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 1 We can Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) 1 1 1 • We can find thegame (binding frequencies) equilibrium of game 0#,12(# 5&1-6# Binding Frequency (time!average) Binding Frequency (time!average) Binding Frequency (time!average) • We can find the equilibrium 0#,12(# 5&1-6# 0.8 0#,12(# 0.8 0#,12(#0.8 0.8 5&1-6# 0.8 5&1-6# 0.8 as a function concentrations. and of overall protein concentrations. for increasing O Repressor Repressor Repressor as a Predictions are againof overall 2, to sites O 3, O 1 3: Predicted protein a protein sites O binding as Predicted qualitativelyconcentrations. 0.8 function qualitatively correct X X y 0.8 Repressor amounts cI . Repressor 3-142(# as a function of overall protein concentrations. 3-142(# RNA!polymerase 0.8 RNA!polymerase R Repressor R 7&8.6, 0.8 Repressor Repressor RNA!polymeraseR 0.8 R 7&8.6, Repressor RNA!polymerase 0.8 Repressor RRepressor ⇢ RNA!polymerase 2 Repressor RNA!polymerase 2 0.6 0.6 X 3-142(# X y 0.6 3-142(# 0.6 RNA!polymerase 0.6 0.6 RNA!polymerase 0.6 7&8.6, RNA!polymerase 0.6 0.6 0.6 7&8.6, RNA!polymerase 0.6 RNA!polymerase 0.6 RNA!polymerase yhowever,well.find cIinhibits transcrip- cIinhibits transcrip-= s(y; x) .= from✓data.;x) d =that Note,We’d find s(y;mutated = s(y; 1, (y ;x) ✓ff(yff; x) became su⇥ciently high do weargmaxdomutated fAcknowledgments =✓ (y. ,;n (i) ⇠ (i) • O 1 as cI•y2Y that cI wlikes(y;x) functions f x) (y at the we · i ✓ .= 0.4 0.4 0.4 O 1 as well. Note,We’d like to learn to learn 0.4 became su⇥ciently high 0.2 0.4 however, (xthese ,Acknowledgments at the , y) x) these functions from x) f ff 0.2 0.4 2 0.4 0.2 0.4 2 0.2 0.4 0.4 0.2 0.4 0.2 0.4 R 0.2 R 0.2 2 0.2 2 0.2 0.2 0.2 tion at OR 3 prior totion at OR 3OR 1. Thus 0the binding This workbinding This work was supported 52 GM68762 grant GM68762 occupying prior to occupying OR 1. Thus the was supported in part by NIH 0 in part by NIH grant 52 0 10 0 0 10 0 10 10 0 parameterizedITR10grant 0428715. /f grant 0428715. is a e 1010 !2 10 10 f could not 10 10 observed without 10 repressor/fRNA by 10 10 10 /f be 10 could 10 /fRNA in- 10 and !2 0 0 10 10 0 0 NSF 0 and 10 0 10 10 10 10 10f ITR 1010 0 e 10 10 10 10Luis !2 at the mutated OR 1at10the mutated OR 1 frepressor10 /f not be fobserved without in-frepressor/fRNAby NSF 10 Luis P´rez-Brevafrepressor/fRNAP´rez-Breva is a 2 !2 2 0 0 !2 !2 2 2 0 0 !2 2 !2 2 0 0 !2 !2 ff 2 2 0 0 !2 2 !2 2 ff 0 0 2 2 10 scores on RafaelO“Fundaci´n Rafael del Pino” Fellow.1 /fRNA repressor RNA repressor RNA f /f f f /f f /f f /f f terventions. terventions. repressor RNA repressor RNA “Fundaci´ repressor RNA del Pino” Fellow. RNA repressor RNA o repressor repressor (a) O 3 (a) OR 3 (b) O 2 R (b) R 2 50 (c) O 150 (c) OR R R (a) OR 3 (a) OR 3 (b) OR 2 (b) 50 OR 2 (c) OR 150 (c) OR 1 x x - e.g., stereo reconstruction 3: PredictedFigure 3: Predictions are again qualitatively mutated OR 1 for increasing amounts of cI2 . protein Predicted protein binding to sites OR 3, OR 2, and correct Figure• Predictions• binding to sites OR 3, OR 2, and mutated OR 1 for increasing amounts of cI2 . Figure• Predictionsare again sites OR 3, OR 2, andcorrectOR 1 for increasing amounts of cI2 . proteinare againqualitatively sites OR 3, R 2, and mutated 3: PredictedFigure• binding to proteinare again mutated OReferences OR 1 for increasing amounts of cI2 . 3: Predictions binding to correct Predicted qualitatively qualitatively correct ReferencesOR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip- however, that cI2 inhibits transcrip- (1) however, that cI2 inhibits transcrip- (2) y y OR 1 as well. Note, OR 1 as well. Note, however, that cI2 inhibits transcrip- ••We’dlearn these functions functions from ••We’d likeWe’d like to learn these functions from to learn to learn these from data. Pbecame su⇥ciently becameDiscussion the mutated cI2 [1] the mutated John Adam and Harley H. McAdams. 7 Discussion7 do su⇥ciently2 high do we find Acknowledgments Arkin, John Ross, and Harley H. McAdams high we find cI at at Adam Arkin, Acknowledgments y became su⇥ciently became su⇥ciently2 high do mutated cI2 at the mutated Acknowledgments high do we find cI at the we find Acknowledgments y ... [1] Ross, We’d like to like these functions from data. Stochastic kinetic analysis of kinetic analysis path- Stochastic developmental of developmental pathtion at OR 3 prior to occupying OR 1. to occupyingapproach provideswas supportedway bifurcation in phagepart coliNIH grant52 col tion at OR 3 prior Thus the binding This workbifurcation in phage was supported 52 -infected excherichia We believe3theR 1. to theoretic com- 1.Thusway binding gameprovides binding This the a com- OR 1. This work -infectedgrantin in part by NIH excherichia by52 GM68762 52 GM68 We believe 3 prior to theoretic approach occupying OR Thuswork binding This work was supported in part by NIH grant GM6 tion at OR the game occupying O tion at OR prior Thus the a the was supported in part by NIH grant GM68762at the mutated OR 1 couldmutatedobserved without in- and by NSF ITR grant 0428715. ITR grant 0428715. a at pelling causal abstraction ofwith re- and cells.with ITR grantcells. Genetics, P´rez-Breva isis August rez-Breva i the of biological1systems biological systemsNSF re- and by NSF Luis 149:1633–1648, Luis P´ 1998. not be OR could not be observed without in- 149:1633–1648, August 1998. Genetics, e e at the causal abstraction not be OArt1 could notBooks observed without in- and 0428715. ITR grant 0428715. a pelling mutated OR 1 couldmutatedobserved without in- Art at the be by Dolls Books by NSF MoebiusP´rez-Breva Moebius P´rez-Breva Luis Laundry e Luis eterventions. terventions. is complete with prov- “Fundaci´n nprov-thisdel Pino”ison left image ofdel pair and theFellow. ground-truth disparities source constraints.2. The six datasets used incomplete with Rafael ofdel Shown Fellow. source constraints. terventions. Figure The model R The model is Figurepaper. Showno o usedimage paper. pair and the n Rafaeleach Pino”disparities. “Fundaci´ Rafael Laundry Dolls Reindeer Reindeer terventions. this “Fundaci´ Rafael each Pino” o 2. The six datasets left in is the “Fundaci´ Fellow. ground-truth Fellow. the corresponding del Pino”corresponding [2] Kenneth J. a [2] and Gerard Debreu. Existence Debreu. Existence o Arrow Kenneth J. Arrow and Gerard of x [Scharstein & Pal 07, Middlebury dataset]data x [Scharstein & Pal 07, Middlebury ably convergent algorithms for finding (2) ably convergent algorithms for finding equilibria on a equilibria on (1) scale. an equilibrium for a competitive economy. Econo- economy. Econo x an equilibrium for a competitive genome-wide scale. genome-wide an MPE estimate from have an MPE estimate from it have x x x References References ... running graph cuts we use running graph cuts we use it metrica, 22(3):265–290, July 1954. to compute our expectation in acompute our expectation in22(3):265–290,the em- 1954. to manner similar to the em- a manner similar to July metrica, References References The results from the small scale distribution.small scale distribution. Training a en- The results from the Training pirical application are lattice-structured model us- pirical application are a en- lattice-structured model us- Discussion couraging. ing the approach successfully path-based methods of 77 successfully path-based methods describedreproduces learn- Our reproduces known [3] Adam Arkin, a G. Adam Arkin, Lee, Ross, and Harley H. McAda model known ing the approach described here is thus [3] Z. Bar-Joseph, G. Gerber, T. Lee, N. Rinaldi generalization of7couraging. Our modelDiscussion described here is thus[1]generalizationdescribedJohn Ross, learn- Harley H.N. Rinaldi, P a and McAdams. Viterbi in [32]. Z. our Arkin, in [32]. For our andT. John Bar-Joseph, [1] Gerber, 7 Discussionbehavior ofViterbi behavior of the the experiments we molecular y Discussion switch on ing experimentsStochastic B. Gordon Yoo,of B. GordonRoss, Robert, E. H. McAd [1] of molecular JohnJ.Adam Arkin, John McAdams. Adam For we Yoo, [1] Ross, up- developmental path- Harley Fraenkel level competition and resource dates with a variable Jaakkola, in phage and D. Gi ord. and of level competition and resource constraints, withoutconstraints,T. learning rate. R. Young, bifurcation Young,Compu- the without the way bifurcation F. Harley H. F. and switch on ingthe basis of use straightforward gradient-based up- kinetic analysis Robert, E. Fraenkel,developmental pa the basis J. use straightforward gradient-based y Stochastic kinetic analysis of Stochastic kinetic T. Jaakkola,developmental path-Gi ord. Compu analysis of kinetic analysis D. developmental p Stochastic R. excherichia coli way -infected excherichia coli yy PWe believe the game theoreticassume protein-protein interactions way bifurcation in tational -infectedinof phage -infected excherichia phage discovery phage dates with a variable learning rate. need to approach provides aapproach provides a com- Wetheoreticthe game theoretic a com- believe approach provides com- betweendiscovery of way bifurcation in1998. modules and regulatory cI2 149:1633–1648, August gene -infected excherichia need to assume game believe the game theoretic approach provides a com- We believe the protein-protein interactions between cI2 We of biological systems with re- tational cells. Genetics, gene modules and regulatory cells. Genetics, 149:1633–1648, August 1998.pelling causal abstraction causalcI2 and RNA-polymerase. systems the con- Laundrycells. Genetics, Biotechnology, 21(11):1337–1342 pelling and abstraction of theBooks ArtEvencells. Genetics, 149:1633–1648, August 1998. dimers and cI2abstraction of biologicalEven inofwith Datasets networks. re- RNA-polymerase. systems biological systems withNature Biotechnology, 21(11):1337–1342,Moebius dimers pelling causal and pelling causalDatasets 4. complete with biological abstraction Art con- 4. re- in with Dolls Books networks. Moebius 149:1633–1648, August 1998. re- Laundry Dolls Nature Laundry Reindeer Moebius Reindsource constraints. sourcemodel is Figure 2. The sub-system, however,with quan-this paper. pair andis the left image of ground-truth disparities. The of this well-known model isincomplete datasets leftBooks of each Shown the corresponding each pair and the corresponding ground-truth dispar text constraints. Art The six withprov-this paper. Shownfew the J. in Books Dolls The six datasets used Figure 2. The six is the prov- Art Dolls Moebius Laundry Reindeer Re text of constraints. sourcemodel is Figure 2. The model isinorderpaper. 2003.significant in this[2] 2003. is the J. Arroweach pairGerard Debreu. Existenc The constraints. image source this well-known sub-system,complete few quan-complete withused ArroweachKenneth left image of ground-truth disparities. however, datasets prov-this[2] The six datasets left image ofpaper.training the corresponding and and the corresponding ground-truth disp Kenneth prov- and Gerard Debreu. Existence of [Scharstein & Pal 07,&Middlebury dataset] da [Scharstein Middlebury dataset] used Figure 2. obtain a is Shown pair and data [Scharstein & Pal 07,& Pal 07, Middlebury daably convergent algorithms for finding equilibria are In amount of about data J.aArrow and GerardJ. Arrow and Gerard Debreu. Existen ablyresultsexperimentalequilibriaon availablean approaches, we have [2] competitive economy. Econo- convergent order results for a a [2] Kenneth In algorithms significant to training finding equilibria bind- on used amount of Shown titative for findingto obtain a forbind- learning equilibrium forcreated 30 new Debreu. Existence of titative experimental ably convergent algorithms are available aboutforonstereo created 30 newon a a Kenneth ably convergent MPE estimate from running graph equilibriaitgenome-wide scale. genome-wide scale. ing. Proper validation to scale. relies on estimatingthe small scale The results from the game parameters algorithms for stereo learning approaches, we have cuts we use validation and usestereo MPE with ground-truth have an version from available to compute our of cuts manner similarthe the em- to to compute our expectation em- en- the [Scharstein Pal 07, Middlebury finding an equilibrium for ancompetitive economy. Econo- economy. Eco use in equilibrium for a competitive aan use ing.and usestereoour modelground-truth andatasetsestimateweG. 22(3):265–290,weequilibrium for a competitive economy. von Properhave an MPE with thereforeourgraphusing an auto-it disparities usingB.G. Berg, Robert B. Winter, and Peter H. Ec model therefore graph OttoJuly 1954.and Peter H. von running [4] cuts have disparities Ottofrom Berg, Robert an auto-it of datasets estimate from arunning [4] estimate from Winter, genome-wide scale. genome-widecompute our expectation in have an MPE metrica, running graph cuts we use22(3):265–290, July 1954. metrica, it of game parameterssimilar structured-lighting technique ofJuly 22(3):265–290, mechanisms of protein the are in a manner of metrica, 22(3):265–290, em- 1954. relies on estimating theexpectation mated version fromtoavailable matedapplication structured-lighting technique of [2]. a manner similar to the [2].Di usion- driven July 1954. Hippel. metrica,
  24. 24. Learning to predict• We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores• The prediction problem can be challenging. Can we learn the parameters more easily?
  25. 25. Learning to predict• We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores• The prediction problem can be challenging. Can we learn the parameters more easily?• Thm: (Sontag et al.) If “max” is hard, then learning is hard as well
  26. 26. Learning to predict• We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores• Each training example introduces (often) exponentially many linear constraints w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y y (i) score of the target score for an the set of all structure alternative alternatives
  27. 27. Learning to predict• We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores• Each training example introduces (often) exponentially many linear constraints w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y y (i) score of the target score for an the set of all structure alternative alternatives
  28. 28. Learning with pseudo-max• We’d like to estimate the score functions from data such that ⇢ y (i) ⇠ argmax w · f (x(i) , y) , = i = 1, . . . , n y2Y parameterized scores• Each training example now provides a small number of linear constraints for alternatives “around the target” w · f (x(i) , y (i) ) > w · f (x(i) , y), 8 y 2 Y (i) score of the target score for an reduced set of structure alternative alternatives where each alternative may differ from the target in at most one (or a few) coordinates

×