SlideShare a Scribd company logo
15/07/29 1
Syntax-based Simultaneous Translation
through
Prediction of Unseen Syntactic Constituents
Yusuke Oda
Graham Neubig
Sakriani Sakti
Tomoki Toda
Satoshi Nakamura
ACL, July 27, 2015
15/07/29 2
Two Features of This Study
● Syntax-based Machine Translation
– State-of-the-art SMT method for distant language pairs
This is (NP)
This is
DT VBZ
(NP)
VP
NP
S
これ は (NP) で す
Parse
MT
● Simultaneous Translation
– Prevent translation delay when translating continuous speech
In the next 18 minutes I'm going to take you on a journey.
Translate Translate Translate
Split Split
15/07/29 3
Delay
(depends on the input length)
Speech Translation - Standard Setting
in the next 18 minutes
I 'm going to take you on a journey
Speech
Recognition
今から18分で
皆様を旅にお連れします
Machine
Translation
Speech
Synthesis
● Problem: long delay (if few explicit sentence boundaries)
15/07/29 4
Shorter delay
Simultaneous Translation with Segmentation
● Separate the input
at good positions
ASR
SS
今から
18分で 皆様を
お連れします 旅に
MT
Segmentation
I 'm going
to take you
in the next
18 minutes on a journey
● The system can
generate output
w/o waiting for
end-of-speech
Translation Quality
Segmentation Frequency
Trade-off
15/07/29 5
Unseen VP
Syntactic Problems in Segmentation
● Segmentation allows us to translate each part separately
● But often breaks the syntax
In the next 18 minutes I 'm going to ...
PP
NPIN
NP
NN
NP
NNSCDJJDT
Iminutes18nextthein
Predicted
Boundary
PP
S
IN NP PRP
NP
NNSCDJJDT
Iminutes18nextthein (VP)
VP
● Bad effect on syntax-based machine translation
15/07/29 6
Motivation of This Study
● Predict unseen syntax constituents
In the next 18 minutes I
PP
NPIN
NP
NN
NP
NNSCDJJDT
Iminutes18nextthein
PP
S
IN NP PRP
NP
NNSCDJJDT
Iminutes18nextthein (VP)
VP
Predict
VP
● Translate from correct tree
今 から 18 分 私 今 から 18 分 で 私 は (VP)
15/07/29 7
Summaries of Proposed Methods
● Proposed 1: Predicting and using unseen constituents
● Proposed 2: Waiting for translation
this is NPthis is
a pen
this
is
a
pen
これは NP です
これはペンですthis is a pen
Waiting
this is
Proposed 2
Proposed 1
SyntaxPrediction
ASR
Segmentation
Translation
Parsing
Output
15/07/29 8
What is Required?
● To use predicted constituents in translation, we need:
1. Making training data
2. Deciding a prediction strategy
3. Using results for translation
15/07/29 9
Leaf span
Making Training Data for Syntax Prediction
● Decompose gold trees in the treebank
S
VPNP
NN
NP
DT
VBZ
penaisThis
DT
1. Select any leaf span in the tree
2. Find the path between
leftmost/rightmost leaves
3. Delete the outside subtree
NN
4. Replace inside subtrees
with topmost phrase label
5. Finally we obtain:
nil is a NN nil
Leaf spanLeft syntax Right syntax
15/07/29 10
VP ... 0.65
NP ... 0.28
nil ... 0.04
...
Syntax Prediction from Incorrect Trees
Iminutes18nextthein
PP
NPIN
NP
NN
NP
NNSCDJJDT
1. Parse the input as-is
Input translation unit
Word:R1=I
POS:R1=NN
Word:R1-2=I,minutes
POS:R1-2=NN,NNS
...
ROOT=PP
ROOT-L=IN
ROOT-R=NP
...
2. Extract features
VP nil
15/07/29 11
Syntax-based MT with Additional Constituents
● Use tree-to-string (T2S) MT framework
This is NP
This is
DT VBZ
NP
VP
NP
S
これ は NP で す
Parse
MT
– Obtains state-of-the-art results
on syntactically distant language pairs
(e.g. English→Japanese)
– Possible to use additional syntactic constituents explicitly
15/07/29 12
you on a journey 旅の途中で
Translation Waiting (1)
● Reordering problem
– Right syntax sometimes goes left in the translation
in the next 18 minutes I (VP) 今から18分で私は(VP)
'm going to take (NP) (NP)を行っています
Reordering
– Considering the output language,
we should output future inputs before current input
15/07/29 13
Waiting for Translation
● Heuristics: waiting for the next input
in the next 18 minutes I (VP) 今から18分で私は(VP)
'm going to take (NP) (NP)を行っています
● Expect to avoid syntactically strange segmentation
Wait
'm going to take
you on a journey
貴方を旅にお連れします
15/07/29 14
Experimental Settings
● Dataset Prediction: Penn Treebank MT: TED [WIT3]
● Languages English → Japanese
● Tokenization Stanford Tokenizer, KyTea
● Parsing Ckylark (Berkeley PCFG-LA)
● MT Decoder Moses (PBMT), Travatar (T2S)
● Evaluation BLEU, RIBES
Methods Summary
Baselines PBMT PBMT (Moses) ... conventional setting
T2S T2S-MT (Travatar)
without constituent prediction
Proposed T2S-MT (Travatar)
with constituent prediction & waiting
15/07/29 15
Results: Prediction Accuracies
● Half precision
– Not trivial problem
● Low recall
– Caused by redundant constituents
in the gold syntax
I 'm a NN nilOur predictor
I 'm a JJ NN PP nilGold syntax
● E.g. "I 'm a"
Precision = 1/1 NN NN
Recall = 1/3 JJNN NN PP
Precision = 52.77%
Recall = 34.87%
Actual performance
15/07/29 16
Results: Translation Trade-off (1)
0 2 4 6 8 10 12 14 16
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
TranslationAccuracy
BLEU RIBES
Mean #words in inputs ∝ Delay
Short Long Short Long
0 2 4 6 8 10 12 14 16
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
PBMT
● Short inputs reduce translation accuracies
Using N-words segmentation (not-optimized)
15/07/29 17
Results: Translation Trade-off (2)
0 2 4 6 8 10 12 14 16
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
TranslationAccuracy
BLEU RIBES
Mean #words in inputs ∝ Delay
T2S
PBMT
0 2 4 6 8 10 12 14 16
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
Short Long Short Long
● Long phrase ... T2S > PBMT
● Short phrase ... T2S < PBMT
15/07/29 18
Results: Translation Trade-off (3)
0 2 4 6 8 10 12 14 16
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
TranslationAccuracy
BLEU RIBES
Mean #words in inputs ∝ Delay
T2S
PBMT
Proposed
0 2 4 6 8 10 12 14 16
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
Short Long Short Long
● Prevent accuracy decreasing in short phrases
● More robustness for reordering
15/07/29 19
Results: Using Other Segmentation
0 2 4 6 8 10 12 14 16 18
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
TranslationAccuracy
Mean #words in inputs ∝ Delay
0 2 4 6 8 10 12 14 16 18
0.07
0.08
0.09
0.1
0.11
0.12
0.13
0.14
0.15
Short Long Short Long
BLEU RIBES
● Using an optimized segmentation [Oda+2014]
T2S
PBMT
Proposed
● Segmentation overfitting
● But reordering is better than others
15/07/29 20
Summaries
● Combining two frameworks
– Syntax-based machine translation
– Simultaneous translation
● Methods
– Unseen syntax prediction
– Waiting for translation
● Experimental results
– Prevent accuracy decrease in short phrases
– More robustness for reordering
● Future works
– Improving prediction accuracies
– Using other context features
0 2 4 6 8 10 12 14 16
0.42
0.44
0.46
0.48
0.5
0.52
0.54
0.56
0.58
0.6
this is NPthis is
a pen
これは NP です
これはペンですthis is a pen
Waiting
this is
SyntaxPrediction
Translation
Parsing
Output

More Related Content

What's hot

Why i need to learn so much math for my phd research
Why i need to learn so much math for my phd researchWhy i need to learn so much math for my phd research
Why i need to learn so much math for my phd research
Crypto Cg
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
Shuhei Iitsuka
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
Abner Huang
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
Grigory Sapunov
 
Deep learning nlp
Deep learning nlpDeep learning nlp
Deep learning nlp
Heng-Xiu Xu
 
Concurrency in Python
Concurrency in PythonConcurrency in Python
Concurrency in Python
konryd
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
 
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Hiroshi Watanabe
 
Modular Pick and Place Simulator using ROS Framework
Modular Pick and Place Simulator using ROS FrameworkModular Pick and Place Simulator using ROS Framework
Modular Pick and Place Simulator using ROS Framework
Technological Ecosystems for Enhancing Multiculturality
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
LSTM
LSTMLSTM
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
slantsixgames
 
ROS distributed architecture
ROS  distributed architectureROS  distributed architecture
ROS distributed architecture
Pablo Iñigo Blasco
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networks
h m
 

What's hot (14)

Why i need to learn so much math for my phd research
Why i need to learn so much math for my phd researchWhy i need to learn so much math for my phd research
Why i need to learn so much math for my phd research
 
Generating sentences from a continuous space
Generating sentences from a continuous spaceGenerating sentences from a continuous space
Generating sentences from a continuous space
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Deep learning nlp
Deep learning nlpDeep learning nlp
Deep learning nlp
 
Concurrency in Python
Concurrency in PythonConcurrency in Python
Concurrency in Python
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble NucleiHuge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
Huge-Scale Molecular Dynamics Simulation of Multi-bubble Nuclei
 
Modular Pick and Place Simulator using ROS Framework
Modular Pick and Place Simulator using ROS FrameworkModular Pick and Place Simulator using ROS Framework
Modular Pick and Place Simulator using ROS Framework
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
LSTM
LSTMLSTM
LSTM
 
PPU Optimisation Lesson
PPU Optimisation LessonPPU Optimisation Lesson
PPU Optimisation Lesson
 
ROS distributed architecture
ROS  distributed architectureROS  distributed architecture
ROS distributed architecture
 
20171110 qrnn quasi-recurrent neural networks
20171110 qrnn   quasi-recurrent neural networks20171110 qrnn   quasi-recurrent neural networks
20171110 qrnn quasi-recurrent neural networks
 

Similar to Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents

Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
2019188026 Data Compression (1) (1).pdf
2019188026 Data Compression  (1) (1).pdf2019188026 Data Compression  (1) (1).pdf
2019188026 Data Compression (1) (1).pdf
AbinayaC11
 
prescalers and dual modulus prescalers
 prescalers and dual modulus prescalers prescalers and dual modulus prescalers
prescalers and dual modulus prescalers
Sakshi Bhargava
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solution
ArtsemZhyvalkouski
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
kantanmt
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
Hayahide Yamagishi
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
YasutoTamura1
 

Similar to Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents (7)

Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
2019188026 Data Compression (1) (1).pdf
2019188026 Data Compression  (1) (1).pdf2019188026 Data Compression  (1) (1).pdf
2019188026 Data Compression (1) (1).pdf
 
prescalers and dual modulus prescalers
 prescalers and dual modulus prescalers prescalers and dual modulus prescalers
prescalers and dual modulus prescalers
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solution
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
 
Precise LSTM Algorithm
Precise LSTM AlgorithmPrecise LSTM Algorithm
Precise LSTM Algorithm
 

More from Yusuke Oda

primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
Yusuke Oda
 
ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@
Yusuke Oda
 
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
Yusuke Oda
 
Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)
Yusuke Oda
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp Talk
Yusuke Oda
 
PCFG構文解析法
PCFG構文解析法PCFG構文解析法
PCFG構文解析法
Yusuke Oda
 
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
Yusuke Oda
 
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
Yusuke Oda
 
Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3
Yusuke Oda
 
Test
TestTest

More from Yusuke Oda (10)

primitiv: Neural Network Toolkit
primitiv: Neural Network Toolkitprimitiv: Neural Network Toolkit
primitiv: Neural Network Toolkit
 
ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@ChainerによるRNN翻訳モデルの実装+@
ChainerによるRNN翻訳モデルの実装+@
 
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
複数の事前並べ替え候補を用いた句に基づく統計的機械翻訳
 
Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)Encoder-decoder 翻訳 (TISハンズオン資料)
Encoder-decoder 翻訳 (TISハンズオン資料)
 
A Chainer MeetUp Talk
A Chainer MeetUp TalkA Chainer MeetUp Talk
A Chainer MeetUp Talk
 
PCFG構文解析法
PCFG構文解析法PCFG構文解析法
PCFG構文解析法
 
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
ACL Reading @NAIST: Fast and Robust Neural Network Joint Model for Statistica...
 
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
翻訳精度の最大化による同時音声翻訳のための文分割法 (NLP2014)
 
Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3Pattern Recognition and Machine Learning: Section 3.3
Pattern Recognition and Machine Learning: Section 3.3
 
Test
TestTest
Test
 

Recently uploaded

Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
Madhumitha Jayaram
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
Ratnakar Mikkili
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
Divyam548318
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
gerogepatton
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
drwaing
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 

Recently uploaded (20)

Wearable antenna for antenna applications
Wearable antenna for antenna applicationsWearable antenna for antenna applications
Wearable antenna for antenna applications
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Exception Handling notes in java exception
Exception Handling notes in java exceptionException Handling notes in java exception
Exception Handling notes in java exception
 
bank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdfbank management system in java and mysql report1.pdf
bank management system in java and mysql report1.pdf
 
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELDEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODEL
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
digital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdfdigital fundamental by Thomas L.floydl.pdf
digital fundamental by Thomas L.floydl.pdf
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents

  • 1. 15/07/29 1 Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents Yusuke Oda Graham Neubig Sakriani Sakti Tomoki Toda Satoshi Nakamura ACL, July 27, 2015
  • 2. 15/07/29 2 Two Features of This Study ● Syntax-based Machine Translation – State-of-the-art SMT method for distant language pairs This is (NP) This is DT VBZ (NP) VP NP S これ は (NP) で す Parse MT ● Simultaneous Translation – Prevent translation delay when translating continuous speech In the next 18 minutes I'm going to take you on a journey. Translate Translate Translate Split Split
  • 3. 15/07/29 3 Delay (depends on the input length) Speech Translation - Standard Setting in the next 18 minutes I 'm going to take you on a journey Speech Recognition 今から18分で 皆様を旅にお連れします Machine Translation Speech Synthesis ● Problem: long delay (if few explicit sentence boundaries)
  • 4. 15/07/29 4 Shorter delay Simultaneous Translation with Segmentation ● Separate the input at good positions ASR SS 今から 18分で 皆様を お連れします 旅に MT Segmentation I 'm going to take you in the next 18 minutes on a journey ● The system can generate output w/o waiting for end-of-speech Translation Quality Segmentation Frequency Trade-off
  • 5. 15/07/29 5 Unseen VP Syntactic Problems in Segmentation ● Segmentation allows us to translate each part separately ● But often breaks the syntax In the next 18 minutes I 'm going to ... PP NPIN NP NN NP NNSCDJJDT Iminutes18nextthein Predicted Boundary PP S IN NP PRP NP NNSCDJJDT Iminutes18nextthein (VP) VP ● Bad effect on syntax-based machine translation
  • 6. 15/07/29 6 Motivation of This Study ● Predict unseen syntax constituents In the next 18 minutes I PP NPIN NP NN NP NNSCDJJDT Iminutes18nextthein PP S IN NP PRP NP NNSCDJJDT Iminutes18nextthein (VP) VP Predict VP ● Translate from correct tree 今 から 18 分 私 今 から 18 分 で 私 は (VP)
  • 7. 15/07/29 7 Summaries of Proposed Methods ● Proposed 1: Predicting and using unseen constituents ● Proposed 2: Waiting for translation this is NPthis is a pen this is a pen これは NP です これはペンですthis is a pen Waiting this is Proposed 2 Proposed 1 SyntaxPrediction ASR Segmentation Translation Parsing Output
  • 8. 15/07/29 8 What is Required? ● To use predicted constituents in translation, we need: 1. Making training data 2. Deciding a prediction strategy 3. Using results for translation
  • 9. 15/07/29 9 Leaf span Making Training Data for Syntax Prediction ● Decompose gold trees in the treebank S VPNP NN NP DT VBZ penaisThis DT 1. Select any leaf span in the tree 2. Find the path between leftmost/rightmost leaves 3. Delete the outside subtree NN 4. Replace inside subtrees with topmost phrase label 5. Finally we obtain: nil is a NN nil Leaf spanLeft syntax Right syntax
  • 10. 15/07/29 10 VP ... 0.65 NP ... 0.28 nil ... 0.04 ... Syntax Prediction from Incorrect Trees Iminutes18nextthein PP NPIN NP NN NP NNSCDJJDT 1. Parse the input as-is Input translation unit Word:R1=I POS:R1=NN Word:R1-2=I,minutes POS:R1-2=NN,NNS ... ROOT=PP ROOT-L=IN ROOT-R=NP ... 2. Extract features VP nil
  • 11. 15/07/29 11 Syntax-based MT with Additional Constituents ● Use tree-to-string (T2S) MT framework This is NP This is DT VBZ NP VP NP S これ は NP で す Parse MT – Obtains state-of-the-art results on syntactically distant language pairs (e.g. English→Japanese) – Possible to use additional syntactic constituents explicitly
  • 12. 15/07/29 12 you on a journey 旅の途中で Translation Waiting (1) ● Reordering problem – Right syntax sometimes goes left in the translation in the next 18 minutes I (VP) 今から18分で私は(VP) 'm going to take (NP) (NP)を行っています Reordering – Considering the output language, we should output future inputs before current input
  • 13. 15/07/29 13 Waiting for Translation ● Heuristics: waiting for the next input in the next 18 minutes I (VP) 今から18分で私は(VP) 'm going to take (NP) (NP)を行っています ● Expect to avoid syntactically strange segmentation Wait 'm going to take you on a journey 貴方を旅にお連れします
  • 14. 15/07/29 14 Experimental Settings ● Dataset Prediction: Penn Treebank MT: TED [WIT3] ● Languages English → Japanese ● Tokenization Stanford Tokenizer, KyTea ● Parsing Ckylark (Berkeley PCFG-LA) ● MT Decoder Moses (PBMT), Travatar (T2S) ● Evaluation BLEU, RIBES Methods Summary Baselines PBMT PBMT (Moses) ... conventional setting T2S T2S-MT (Travatar) without constituent prediction Proposed T2S-MT (Travatar) with constituent prediction & waiting
  • 15. 15/07/29 15 Results: Prediction Accuracies ● Half precision – Not trivial problem ● Low recall – Caused by redundant constituents in the gold syntax I 'm a NN nilOur predictor I 'm a JJ NN PP nilGold syntax ● E.g. "I 'm a" Precision = 1/1 NN NN Recall = 1/3 JJNN NN PP Precision = 52.77% Recall = 34.87% Actual performance
  • 16. 15/07/29 16 Results: Translation Trade-off (1) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs ∝ Delay Short Long Short Long 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 PBMT ● Short inputs reduce translation accuracies Using N-words segmentation (not-optimized)
  • 17. 15/07/29 17 Results: Translation Trade-off (2) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs ∝ Delay T2S PBMT 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 Short Long Short Long ● Long phrase ... T2S > PBMT ● Short phrase ... T2S < PBMT
  • 18. 15/07/29 18 Results: Translation Trade-off (3) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs ∝ Delay T2S PBMT Proposed 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 Short Long Short Long ● Prevent accuracy decreasing in short phrases ● More robustness for reordering
  • 19. 15/07/29 19 Results: Using Other Segmentation 0 2 4 6 8 10 12 14 16 18 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 TranslationAccuracy Mean #words in inputs ∝ Delay 0 2 4 6 8 10 12 14 16 18 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 Short Long Short Long BLEU RIBES ● Using an optimized segmentation [Oda+2014] T2S PBMT Proposed ● Segmentation overfitting ● But reordering is better than others
  • 20. 15/07/29 20 Summaries ● Combining two frameworks – Syntax-based machine translation – Simultaneous translation ● Methods – Unseen syntax prediction – Waiting for translation ● Experimental results – Prevent accuracy decrease in short phrases – More robustness for reordering ● Future works – Improving prediction accuracies – Using other context features 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 this is NPthis is a pen これは NP です これはペンですthis is a pen Waiting this is SyntaxPrediction Translation Parsing Output

Editor's Notes

  1. Hello My name is Yusuke Oda from the Nara Institute of Science and Technology in Japan I&amp;apos;ll talk about syntax-based simultaneous translation This study is mainly targetting speech translation So some ideas are not standard for the normal machine translation So please keep this point in mind while watching my presentation.
  2. Our study considers 2 features of machine translation The first is syntax-based translation which uses syntax information, for example parse tree to improve translation accuracy This is a state-of-the-art method for translating distant language pairs for example English to Japanese, which we are working on The second is simultaneous translation which prevents delay when translating long speech We want to combine these methods but this problem is not straight-forward
  3. First, I&amp;apos;d like to talk the standard setting of speech translation. For example, we consider that we first obtain a waveform of English speech We perform speech recognition to make English text from the speech Next the machine translation system converts the text into Japanese text, Finally the speech synthesizer makes corresponding Japanese speech and outputs it by a speaker. You can see that in this process there is a delay from when we obtain the input speech to when we can generate the output speech This delay depends on the length of the translated segment If we wait until the speaker finishes speaking a full sentence we may have to wait for a l ong time until the start of the output
  4. Simultaneous translation avoids this problem by separating the input into shorter phrases For example, we obtain the same input speech The difference from normal speech translation is simultaneous translation uses a segmentation strategy to divide the input words Each shorter phrase generated by the segmentation is translated and synthesized So the simultaneous translation system can generate the result without waiting for the end of the speech. The important point is the trade-off relationship between translation quality and segmentation frequency. If we divide the input many times then the output speed becomes faster but the translation accuracy drops. Conventional segmentation strategies are mainly based on maintaining translation accuracy when selecting as many segmentation boundaries as possible.
  5. So, the segmentation is simple method for simultaneous translation. But this approach often breaks the syntax For example, this is an input sentence and remaining unseen inputs. In this case, the segmentation boundary is decided here and we have to translate using only left side ,&amp;quot;In the next eighteen minutes&amp;quot; This example may look strange but Segmentation algorithm based on machine learning often generate such phrases because we cannot predict future input completely. Then we try to parse this phrase and a parse tree is obtained, but this is wrong tree because of the input is not a well-formed phrase. But the human can easily consider the following verb phrase is omitted in this phrase, and using this unseen information can generate a correct parse tree. Using incorrect syntax informatin causes bad effects for the machine translation, so we want to use the correct information if possible.
  6. Our approach to prevent this problem is easy to understand. First we predict unseen syntax information using the current phrase by some machine learning method, And then use these information to make correct parse trees and make correct translation results.
  7. This is the overall summary of our proposed methods. We propose 2 steps for simultaneous translation. One is unseen syntax prediction using current input phrases and using the syntax for the translation The another is a heuristic of the waiting for translation when the output syntax is placed in an un-desirable position.
  8. So, we need some methods to predict and use unseen syntax information. First, we need to make the training data for the syntax prediction. Second we need a strategy to predict additional syntax information Last, we need a strategy to use prediction results for machine translation.
  9. So first we need the training data. We can generate training data from the gold parse trees in a treebank. first, we select a span of leaf nodes in the gold parse tree. we can assume this span is the input phrase to be translated. Next we search for the path between leftmost and rightmost nodes in the leaf span. Next, we delete the outside subtree to make a minimal tree Then, we replace each inside subtree not in the selected path using its topmost phrase label. In this case we finally obtain this sequence. This data means the left side of the phrase &amp;quot;is a&amp;quot; could have no more syntax, and right side has one syntactic constituent &amp;quot;NN&amp;quot;
  10. So, i&amp;apos;d like to explain our algorithm to predict unseen syntax information from the input phrase. In this slide we do the prediction for this phrase First we forcibly generate a parse tree from the input, Note that this tree may be incorrect because the input is not a complete phrase Next we extract some features from the input and the parse tree Then we perform multi-class classification using these features to decide what kind of syntax should be appended next. Any classifier can be used, but in this study we used simple linear SVMs. And we repeat this prediction until the end marker is obtained
  11. After predicting additional syntax, we want to use this information for machine translation. In this study, we use the tree-to-string machine translation framework The tree-to-string method usually uses the parse tree of the input sentence And tree-to-string translation also has the good characteristic that it can easily use predicted syntax information explicitly because tree-to-string translation uses the sub-trees of the input tree And we can ignore details of the additional syntax.
  12. However the simple approach to combine segmentation and tree-to-string translation still has a problem mainly caused by the reordering Here we show some inputs and outputs This is the first input and the translation This output is no problem, because additional constituent VP is placed at the same position in both input and output This is the next input and the translation. In this case, the additional constituent NP, originally placed on the right side of the input, is placed on the left side of the translation. This is a problem because the speaker has not yet spoken the content that should replace to this constituent NP. So, we cannot create a translation until obtaining next input
  13. To avoid this problem, we use one heuristic, waiting for translation. if we detect that the right side syntax of the input phrase is placed somewhere except right side of the output, we can recognize the re-ordering has occured, and we should ignore the current result and wait for the next input. when we get the next input, we concatenate the previous and new input and perform same prediction and translation method again. The prediction of segmentation strategies is not perfect because we cannot predict future inputs completely. So we expect to avoid segmentation errors using this approach
  14. This is the experimental settings. We used the Penn Treebank for training the syntax prediction, and an English and Japanese parallel corpus extracted from the WIT3 dataset to train the machine translation system. We show 3 settings of the experiment in this study One of the baselines is PBMT, which uses Moses for simultaneous translation. This is the same as conventional studies The other baseline is T2S, which only with a tree-to-string translation model instead of PBMT Our proposed method uses the same decoder for T2S but includes our methods of syntax prediction and waiting heuristic
  15. First, I&amp;apos;d like to explain how to evaluate the accuracy of the syntax prediction. For example, we are considering this input, and we obtained this result from our predictor. and corresponding gold syntax is here. Then we calculate precision and recall using the number of constituents and number of matches In this case the prediction is 1 and the recall is 1 third according to these equations There are the accuracies of our predictor. This means the precision is half, which is not bad considering the predictor is based on a simple linear-SVM. but it also can be seen that this prediction is not trivial And we also can see that the recall is low, but this is caused by redundant constituents in the gold syntax. For example, both constituents JJ and PP in the gold syntax are possible but not necessary So the low recall is comparatively less important than the precision.
  16. Next we explain the translation trade-off which is an important factor of simultaneous translation these green lines show the translation accuracy of the PBMT system. the horizontal axis means the number of words in the input phrase which can be proportional to the delay for the output. And the vertical axis shows each evaluation measure BLEU is based on the n-gram precision and RIBES is based on the re-ordering we can see that when we try to translate shorter phrases we obtain lower accuracy than longer phrases which means there is a trade-off relationship between the delay and translation accuracy
  17. Next, we show the result of the T2S baseline which uses the tree-to-string translation instead of PBMT. We can see same tendencies of the trade-off relationship And we also can see that the translation accuracies of T2S method for longer phrases is higher than PBMT but accuracies for shorter phrases are decreased. As we explained, shorter input causes many instances of broken syntax which explains the decrease in accuracy.
  18. These red lines are the result of our proposed method which includes the syntax prediction and waiting heuristic We can see 2 points One is the actual delay time becomes slightly longer than T2S, we can see this point as red lines are placed farther right than other methods. This is the effect of the translation waiting And the other important point is that translation accuracy of our proposed method is higher than other baselines especially the RIBES score is higher even for shorter inputs RIBES is sensitive to word ordering So this means our proposed method has more robustness with respect to re-ordering which frequently occurs when translating between syntactically distant languages
  19. This is another evaluation using a state-of-the-art segmentation strategy which directly optimizes the translation accuracy but does not explicitly consider syntax syntax information. We can see the BLEU score is mostly the same as the T2S baseline, so this is caused by the overfitting of the segmentation because this segmentation strategy optimizes segmentation boundaries without syntax features but we can see the RIBES score is higher than other baselines so our methods still have the advantage regarding re-ordering.
  20. So This is the end of my presentation. We proposed 2 methods for applying syntax-based methods into simultaneous translation And we show an improvement of translation accuracy especially the reordering between distant languages One of our future works is improving the precision of the prediction And our methods still don&amp;apos;t use any context information ourside of the current phrase, so using them is another future work. Thank you very much.