SlideShare a Scribd company logo
1 of 16
Download to read offline
Ch. 7: Optimization	
M2	
  Yuichiro	
  Sawai	
1	
15/07/09
Overview	
15/07/09	
 2	
•  MT	
  decoding	
  
	
  
•  Need	
  to	
  find	
  w	
  that	
  assigns	
  higher	
  scores	
  to	
  be@er	
  translaBons	
  
(e,	
  d)	
  
•  Be@er	
  translaBons	
  =	
  translaBons	
  with	
  lower	
  error	
f:	
  source	
  sentence,	
  e:	
  target	
  sentence,	
  d:	
  derivaBon	
  
w:	
  weight	
  vector,	
  h(・):	
  feature	
  funcBon	
  
Loss	
  MinimizaBon	
•  Given	
  parallel	
  corpus	
  (F,	
  E),	
  find	
  w	
  that	
  minimizes	
  loss	
  funcBon	
  
l(・)	
  
•  e.g.,	
  l(F,	
  E;	
  w)	
  =	
  1	
  –	
  BLEU(E,	
  decodew(F))	
  
•  λ	
  is	
  a	
  regularizaBon	
  constant	
  to	
  avoid	
  overfiUng	
  
15/07/09	
 3	
regularizaBon	
  term
Problems	
  to	
  Consider	
1.  Search	
  space	
  is	
  vast	
  
•  impossible	
  to	
  consider	
  all	
  candidates	
  
•  correct	
  translaBon	
  is	
  rarely	
  possible	
  
2.  ApproximaBon	
  of	
  error	
  funcBon	
  
•  Error	
  metrics	
  (e.g.	
  BLEU)	
  are	
  not	
  differenBable	
  
•  Split	
  corpus-­‐level	
  metrics	
  into	
  sentence	
  level	
  
3.  How	
  to	
  calculate	
  argmin	
  wTh	
  
15/07/09	
 4
Batch	
  Learning	
•  Given	
  parallel	
  corpus	
  (F,	
  E),	
  iniBalize	
  w	
  and	
  iteraBvely	
  
1.  decode	
  whole	
  corpus	
  F	
  with	
  current	
  w,	
  and	
  get	
  k-­‐best	
  lists	
  C	
  
2.  opBmize	
  w	
  
	
  
	
  
3.  loop	
  unBl	
  convergence	
  
•  vs.	
  online	
  learning	
  
•  opBmize	
  w	
  per	
  sentence	
  
15/07/09	
 5
Minimum	
  Error	
  Rate	
  Training	
  (MERT)	
•  Given	
  error	
  funcBon	
  error(E,	
  Ê),	
  directly	
  minimize	
  it	
  
•  E:	
  reference	
  translaBons,	
  Ê:	
  system	
  translaBons	
  
•  e.g.	
  error(E,	
  Ê)	
  =	
  1	
  –	
  BLEU(E,	
  Ê)	
  
•  In	
  other	
  words,	
  
•  Since	
  error(・)	
  is	
  not	
  differenBable	
  w.r.t.	
  w,	
  gradient-­‐based	
  
method	
  is	
  not	
  applicable	
  
•  Instead,	
  use	
  Powell’s	
  method	
  
•  gradients	
  not	
  required	
  
15/07/09	
 6
Powell’s	
  Method	
•  IteraBvely,	
  fix	
  a	
  direcBon,	
  and	
  find	
  opBmal	
  w	
  in	
  that	
  direcBon	
  
•  Applicable	
  when	
  gradients	
  are	
  not	
  available	
15/07/09	
 7	
w0	
w1	
 w2	
w3	
x1	
x2
OpBmizaBon	
  in	
  One	
  DirecBon	
•  1-­‐best	
  translaBon	
  parameterized	
  by	
  scalar	
  γ	
15/07/09	
 8	
bm:	
  one-­‐hot	
  vector	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  with	
  mth	
  dim	
  =	
  1	
intercept	
 slope	
γ	
wh	
  +	
  γh	
c1	
c2	
c4	
c3	
Candidates	
  with	
  highest	
  
score	
  are	
  selected	
envelope	
γ	
error	
c1	
c3	
c4	
e.g.)	
  
f	
  =	
  黒い	
  猫	
  を	
  見た	
  
e	
  =	
  I	
  saw	
  a	
  black	
  cat	
  
c1	
  =	
  I	
  saw	
  black	
  cat	
  
c2	
  =	
  saw	
  a	
  black	
  cat	
  
…
Corpus-­‐level	
  Error	
•  Sentence-­‐level	
  losses	
  are	
  summed	
  to	
  get	
  corpus-­‐level	
  error	
15/07/09	
 9	
sentence	
  1	
 sentence	
  2	
add	
sentence-­‐level	
  
error	
sentence-­‐level	
  
envelope	
mulB-­‐sentence	
  
error	
γ*	
 Find	
  γ	
  that	
  minimizes	
  overall	
  error!
Problems	
  of	
  Powell’s	
  Method	
•  SensiBve	
  to	
  iniBalizaBon	
  of	
  w	
  
•  Not	
  suitable	
  for	
  high-­‐dimensional	
  feature	
  vectors	
15/07/09	
 10
Sojmax	
  Loss	
•  TranslaBon	
  probability	
  
	
  
•  Loss	
  is	
  negaBve	
  likelihood	
  of	
  oracle	
  translaBons	
  
	
  	
  	
  where	
  oracle	
  translaBons	
  are	
  
•  Gradient-­‐based	
  methods	
  (e.g.	
  L-­‐BFGS)	
  are	
  applicable	
15/07/09	
 11
Max	
  Margin	
  Loss	
15/07/09	
 12	
•  Make	
  sure	
  distances	
  between	
  correct	
  translaBons	
  and	
  
incorrect	
  translaBons	
  are	
  large	
  
	
  
	
  
•  For	
  example:	
  
•  OpBmizaBon	
  methods	
  for	
  SVM	
  are	
  applicable	
  (e.g.	
  SMO)	
for	
  all	
  oracle	
  and	
  non-­‐oracle	
  pairs	
  …	
penalize	
  when	
 diff	
  in	
  error	
 is	
  greater	
  than	
  diff	
  in	
  score	
f:	
  黒い猫を見た,	
  e	
  (correct):	
  I	
  saw	
  a	
  black	
  cat	
  
e*	
  (oracle) 	
  I	
  saw	
  black	
  cat 	
   	
  0.1 	
   	
   	
  0.4	
  
e	
  	
  	
  (system) 	
  see	
  red	
  dog 	
   	
   	
  0.9 	
   	
   	
  0.3	
  
error	
 score	
  (=wTh)	
large	
 small!	
  bad!
Pairwise	
  Ranking	
  OpBmizaBon	
  (PRO)	
•  Parameter	
  esBmaBon	
  as	
  ranking	
  problem	
  
	
  
•  Classifier	
  learns	
  w	
  to	
  rank	
  candidates	
  by	
  error	
  
•  Generate	
  training	
  examples	
  from	
  pairs	
  of	
  candidates	
  
•  posiBve	
  example:	
  h(cand1)	
  –	
  h(cand2)	
  =	
  (-­‐4,	
  6)	
  
•  negaBve	
  example:	
  h(cand3)	
  –	
  h(cand1)	
  =	
  (3,	
  -­‐7)	
  
•  wT{h(cand1)	
  –	
  h(cand2)}	
  >	
  0	
  ⇔	
  wTh(cand1)	
  >	
  wTh(cand2)	
  
•  Off-­‐the-­‐shelf	
  linear	
  binary	
  classifiers	
  can	
  be	
  used	
15/07/09	
 13	
f:	
  黒い猫を見た,	
  e	
  (correct):	
  I	
  saw	
  a	
  black	
  cat	
  
e	
  	
  	
  (cand1) 	
  I	
  see	
  black	
  cat 	
   	
  0.3 	
   	
  (-­‐1,	
  2) 	
   	
  ???	
  
e	
  	
  	
  (cand2) 	
  see	
  black	
  dog 	
   	
  0.7 	
   	
  (3,	
  -­‐4) 	
   	
  ???	
  
e	
  	
  	
  (cand3) 	
  see	
  red	
  dog 	
   	
   	
  0.9 	
   	
  (2,	
  -­‐5) 	
   	
  ???	
  
error	
 score	
  (=wTh)	
h
Minimum	
  Bayes	
  Risk	
15/07/09	
 14	
•  Minimize	
  expected	
  loss	
  
where	
  
	
  
	
  
	
  
•  γ	
  =	
  0:	
  all	
  candidates	
  are	
  equally	
  likely	
  
•  γ	
  =	
  1:	
  sojmax	
  
•  γ→∞:	
  highest	
  scoring	
  candidate	
  with	
  probability	
  1	
  (MERT)	
  
•  DifferenBable	
  and	
  considers	
  many	
  candidates	
  <e,d>	
  
Sentence-­‐level	
  BLEU	
•  Sentence-­‐level	
  error	
  funcBons	
  are	
  needed	
  for	
  opBmizaBon	
  
•  BLEU	
  is	
  corpus-­‐level	
  metric	
  
	
  
	
  
	
  
•  4-­‐gram	
  precision	
  is	
  ojen	
  0	
  on	
  sentence	
  level	
  
•  varies	
  from	
  human	
  judgments	
  
•  Sentence-­‐level	
  error	
  
•  Linear	
  BLEU	
  
•  (Expected	
  BLEU)	
15/07/09	
 15
Linear	
  BLEU	
•  Linear	
  approximaBon	
  of	
  change	
  in	
  BLEU	
  
c:	
  sum	
  of	
  sentence	
  lengths	
  
mn:	
  #	
  matched	
  n-­‐grams	
  
•  Add	
  one	
  sentence:	
  (c,	
  mn)	
  -­‐>	
  (c’,	
  mn’)	
  
•  Linear	
  BLEU	
  error	
  of	
  candidate	
  e	
  
15/07/09	
 16	
log	
  BLEU	
(c,mn)	
 (c’,m’n)	
Δ	
#	
  matched	
  n-­‐grams	
  in	
  e

More Related Content

What's hot

What's hot (20)

Lecture 03 lexical analysis
Lecture 03 lexical analysisLecture 03 lexical analysis
Lecture 03 lexical analysis
 
Lec08-CS110 Computational Engineering
Lec08-CS110 Computational EngineeringLec08-CS110 Computational Engineering
Lec08-CS110 Computational Engineering
 
Intermediate code generation (Compiler Design)
Intermediate code generation (Compiler Design)   Intermediate code generation (Compiler Design)
Intermediate code generation (Compiler Design)
 
Code optimization
Code optimization Code optimization
Code optimization
 
Optimization of basic blocks
Optimization of basic blocksOptimization of basic blocks
Optimization of basic blocks
 
C++ concept of Polymorphism
C++ concept of  PolymorphismC++ concept of  Polymorphism
C++ concept of Polymorphism
 
Intermediate code
Intermediate codeIntermediate code
Intermediate code
 
Exercises on Advances in Verification Methodologies
Exercises on Advances in Verification Methodologies Exercises on Advances in Verification Methodologies
Exercises on Advances in Verification Methodologies
 
Three Address code
Three Address code Three Address code
Three Address code
 
Huffman coding
Huffman coding Huffman coding
Huffman coding
 
COMPILER DESIGN AND CONSTRUCTION
COMPILER DESIGN AND CONSTRUCTIONCOMPILER DESIGN AND CONSTRUCTION
COMPILER DESIGN AND CONSTRUCTION
 
Polymorphism
PolymorphismPolymorphism
Polymorphism
 
C++:Lab 2
 C++:Lab 2 C++:Lab 2
C++:Lab 2
 
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung HanHomomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
Homomorphic Lower Digit Removal and Improved FHE Bootstrapping by Kyoohyung Han
 
Lesson 20: The Mean Value Theorem
Lesson 20: The Mean Value TheoremLesson 20: The Mean Value Theorem
Lesson 20: The Mean Value Theorem
 
Bc0037
Bc0037Bc0037
Bc0037
 
C++ lab -4
C++ lab -4C++ lab -4
C++ lab -4
 
Introduction to code optimization by dipankar
Introduction to code optimization by dipankarIntroduction to code optimization by dipankar
Introduction to code optimization by dipankar
 
Huffman Code Decoding
Huffman Code DecodingHuffman Code Decoding
Huffman Code Decoding
 
Chapter 6 intermediate code generation
Chapter 6   intermediate code generationChapter 6   intermediate code generation
Chapter 6 intermediate code generation
 

Viewers also liked

Viewers also liked (12)

[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
[Paper Introduction] A Context-Aware Topic Model for Statistical Machine Tran...
 
[Book Reading] 機械翻訳 - Section 2 No.2
 [Book Reading] 機械翻訳 - Section 2 No.2 [Book Reading] 機械翻訳 - Section 2 No.2
[Book Reading] 機械翻訳 - Section 2 No.2
 
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data[Paper Introduction] Training a Natural Language Generator From Unaligned Data
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
 
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
[Paper Introduction] Translating into Morphologically Rich Languages with Syn...
 
Radiohead
RadioheadRadiohead
Radiohead
 
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
[Paper Introduction] Supervised Phrase Table Triangulation with Neural Word E...
 
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
[Paper Introduction] Efficient Lattice Rescoring Using Recurrent Neural Netwo...
 
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
[Paper Introduction] Evaluating MT Systems with Second Language Proficiency T...
 
[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...[Paper Introduction] Distant supervision for relation extraction without labe...
[Paper Introduction] Distant supervision for relation extraction without labe...
 
[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...[Paper Introduction] Bilingual word representations with monolingual quality ...
[Paper Introduction] Bilingual word representations with monolingual quality ...
 
[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1[Book Reading] 機械翻訳 - Section 3 No.1
[Book Reading] 機械翻訳 - Section 3 No.1
 
RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)RNN-based Translation Models (Japanese)
RNN-based Translation Models (Japanese)
 

Similar to [Book Reading] 機械翻訳 - Section 7 No.1

NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many examples
zukun
 

Similar to [Book Reading] 機械翻訳 - Section 7 No.1 (20)

Applications of numerical methods
Applications of numerical methodsApplications of numerical methods
Applications of numerical methods
 
Undecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation AlgorithmsUndecidable Problems and Approximation Algorithms
Undecidable Problems and Approximation Algorithms
 
Lecture 16 - Dijkstra's Algorithm.pdf
Lecture 16 - Dijkstra's Algorithm.pdfLecture 16 - Dijkstra's Algorithm.pdf
Lecture 16 - Dijkstra's Algorithm.pdf
 
nlp2.pdf
nlp2.pdfnlp2.pdf
nlp2.pdf
 
NIPS2007: learning using many examples
NIPS2007: learning using many examplesNIPS2007: learning using many examples
NIPS2007: learning using many examples
 
Cryptography
CryptographyCryptography
Cryptography
 
N20181126
N20181126N20181126
N20181126
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Design and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture NotesDesign and Analysis of Algorithms Lecture Notes
Design and Analysis of Algorithms Lecture Notes
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Undecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWER
Undecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWERUndecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWER
Undecidable Problems - COPING WITH THE LIMITATIONS OF ALGORITHM POWER
 
Bisection & Regual falsi methods
Bisection & Regual falsi methodsBisection & Regual falsi methods
Bisection & Regual falsi methods
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Bisection method
Bisection methodBisection method
Bisection method
 
Lecture1
Lecture1Lecture1
Lecture1
 
Theorem proving 2018 2019
Theorem proving 2018 2019Theorem proving 2018 2019
Theorem proving 2018 2019
 
Theorem proving 2018 2019
Theorem proving 2018 2019Theorem proving 2018 2019
Theorem proving 2018 2019
 
Approx
ApproxApprox
Approx
 
Lecture1
Lecture1Lecture1
Lecture1
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

[Book Reading] 機械翻訳 - Section 7 No.1

  • 1. Ch. 7: Optimization M2  Yuichiro  Sawai 1 15/07/09
  • 2. Overview 15/07/09 2 •  MT  decoding     •  Need  to  find  w  that  assigns  higher  scores  to  be@er  translaBons   (e,  d)   •  Be@er  translaBons  =  translaBons  with  lower  error f:  source  sentence,  e:  target  sentence,  d:  derivaBon   w:  weight  vector,  h(・):  feature  funcBon  
  • 3. Loss  MinimizaBon •  Given  parallel  corpus  (F,  E),  find  w  that  minimizes  loss  funcBon   l(・)   •  e.g.,  l(F,  E;  w)  =  1  –  BLEU(E,  decodew(F))   •  λ  is  a  regularizaBon  constant  to  avoid  overfiUng   15/07/09 3 regularizaBon  term
  • 4. Problems  to  Consider 1.  Search  space  is  vast   •  impossible  to  consider  all  candidates   •  correct  translaBon  is  rarely  possible   2.  ApproximaBon  of  error  funcBon   •  Error  metrics  (e.g.  BLEU)  are  not  differenBable   •  Split  corpus-­‐level  metrics  into  sentence  level   3.  How  to  calculate  argmin  wTh   15/07/09 4
  • 5. Batch  Learning •  Given  parallel  corpus  (F,  E),  iniBalize  w  and  iteraBvely   1.  decode  whole  corpus  F  with  current  w,  and  get  k-­‐best  lists  C   2.  opBmize  w       3.  loop  unBl  convergence   •  vs.  online  learning   •  opBmize  w  per  sentence   15/07/09 5
  • 6. Minimum  Error  Rate  Training  (MERT) •  Given  error  funcBon  error(E,  Ê),  directly  minimize  it   •  E:  reference  translaBons,  Ê:  system  translaBons   •  e.g.  error(E,  Ê)  =  1  –  BLEU(E,  Ê)   •  In  other  words,   •  Since  error(・)  is  not  differenBable  w.r.t.  w,  gradient-­‐based   method  is  not  applicable   •  Instead,  use  Powell’s  method   •  gradients  not  required   15/07/09 6
  • 7. Powell’s  Method •  IteraBvely,  fix  a  direcBon,  and  find  opBmal  w  in  that  direcBon   •  Applicable  when  gradients  are  not  available 15/07/09 7 w0 w1 w2 w3 x1 x2
  • 8. OpBmizaBon  in  One  DirecBon •  1-­‐best  translaBon  parameterized  by  scalar  γ 15/07/09 8 bm:  one-­‐hot  vector                          with  mth  dim  =  1 intercept slope γ wh  +  γh c1 c2 c4 c3 Candidates  with  highest   score  are  selected envelope γ error c1 c3 c4 e.g.)   f  =  黒い  猫  を  見た   e  =  I  saw  a  black  cat   c1  =  I  saw  black  cat   c2  =  saw  a  black  cat   …
  • 9. Corpus-­‐level  Error •  Sentence-­‐level  losses  are  summed  to  get  corpus-­‐level  error 15/07/09 9 sentence  1 sentence  2 add sentence-­‐level   error sentence-­‐level   envelope mulB-­‐sentence   error γ* Find  γ  that  minimizes  overall  error!
  • 10. Problems  of  Powell’s  Method •  SensiBve  to  iniBalizaBon  of  w   •  Not  suitable  for  high-­‐dimensional  feature  vectors 15/07/09 10
  • 11. Sojmax  Loss •  TranslaBon  probability     •  Loss  is  negaBve  likelihood  of  oracle  translaBons        where  oracle  translaBons  are   •  Gradient-­‐based  methods  (e.g.  L-­‐BFGS)  are  applicable 15/07/09 11
  • 12. Max  Margin  Loss 15/07/09 12 •  Make  sure  distances  between  correct  translaBons  and   incorrect  translaBons  are  large       •  For  example:   •  OpBmizaBon  methods  for  SVM  are  applicable  (e.g.  SMO) for  all  oracle  and  non-­‐oracle  pairs  … penalize  when diff  in  error is  greater  than  diff  in  score f:  黒い猫を見た,  e  (correct):  I  saw  a  black  cat   e*  (oracle)  I  saw  black  cat    0.1      0.4   e      (system)  see  red  dog      0.9      0.3   error score  (=wTh) large small!  bad!
  • 13. Pairwise  Ranking  OpBmizaBon  (PRO) •  Parameter  esBmaBon  as  ranking  problem     •  Classifier  learns  w  to  rank  candidates  by  error   •  Generate  training  examples  from  pairs  of  candidates   •  posiBve  example:  h(cand1)  –  h(cand2)  =  (-­‐4,  6)   •  negaBve  example:  h(cand3)  –  h(cand1)  =  (3,  -­‐7)   •  wT{h(cand1)  –  h(cand2)}  >  0  ⇔  wTh(cand1)  >  wTh(cand2)   •  Off-­‐the-­‐shelf  linear  binary  classifiers  can  be  used 15/07/09 13 f:  黒い猫を見た,  e  (correct):  I  saw  a  black  cat   e      (cand1)  I  see  black  cat    0.3    (-­‐1,  2)    ???   e      (cand2)  see  black  dog    0.7    (3,  -­‐4)    ???   e      (cand3)  see  red  dog      0.9    (2,  -­‐5)    ???   error score  (=wTh) h
  • 14. Minimum  Bayes  Risk 15/07/09 14 •  Minimize  expected  loss   where         •  γ  =  0:  all  candidates  are  equally  likely   •  γ  =  1:  sojmax   •  γ→∞:  highest  scoring  candidate  with  probability  1  (MERT)   •  DifferenBable  and  considers  many  candidates  <e,d>  
  • 15. Sentence-­‐level  BLEU •  Sentence-­‐level  error  funcBons  are  needed  for  opBmizaBon   •  BLEU  is  corpus-­‐level  metric         •  4-­‐gram  precision  is  ojen  0  on  sentence  level   •  varies  from  human  judgments   •  Sentence-­‐level  error   •  Linear  BLEU   •  (Expected  BLEU) 15/07/09 15
  • 16. Linear  BLEU •  Linear  approximaBon  of  change  in  BLEU   c:  sum  of  sentence  lengths   mn:  #  matched  n-­‐grams   •  Add  one  sentence:  (c,  mn)  -­‐>  (c’,  mn’)   •  Linear  BLEU  error  of  candidate  e   15/07/09 16 log  BLEU (c,mn) (c’,m’n) Δ #  matched  n-­‐grams  in  e