SlideShare a Scribd company logo
1 of 74
Machine Translation
A introduction
Shu 2016.5
1
Part of this slide is stolen from the slide of Kohen
(www.statmt.org)
2
3
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
What to translate
4
The dream of machine translation
5
Approaches of MT
6
The history of machine translation
• 1629
- Proposed universal language by René Descartes
- Different tongues shares one set of symbols
• 1947
- First computer used transistors instead of vacuum tubes
• 1949 ~
- Rule-based machine translation
• 1954
- First demo by IBM
• 1993 ~
- Statistical machine translation
• 2013 ~
- Neural machine translation
7
Rule-based translation systems
• Translation rules created by experts of
linguistics
• Hard to maintain or update
• The performance is still (or almost) the state-of-
the-art
8
Statistical machine translation
• Translation models are learned from parallel
corpus
• Language independent
9
10
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
Statistical machine translation
11
For people who don’t like equations
12
A common pipeline of SMT
13
Alignment
Neural re-ranking
Evaluation of SMT
• BLEU
- n-gram matching (usually 4-gram)
• NIST
- Content words are more important
• RIBES (Hideki Isozaki, 2010)
- Order is also important
- Better for SVO-to-SOV language pairs
14
BLEU score matrix
15
A brief history of the development of SMT
• 1990 ~ 2000
- Word-based models (IBM models)
- Brown, Och, Ney.
• 2003
- Phrase-based models
- Philip Kohen
• 2005;2007
- Hierarchical Phrase-based models
- David Chiang
• 2010 ~
- Tree models, Factor models
16
Language model
• Modelling p(the dog is sparking)
- In order to know which candidate is more natural
• Markov Assumption
• 5-gram model is mostly used in SMT
17
Bi-gram example
18
Parallel corpus
19
Word alignments
20
Word alignments in the matrix
21
How to get word alignments
• In short
- Run giza++ with parallel corpus
- Wait for 5 hours
• Technically
- 5 IBM models, HMM models, EM algorithm
22
Run the EM algorithm
23
Run the EM algorithm
24
Run the EM algorithm
25
Run the EM algorithm
26
10 years of the work
Phrase-based translation model
27
He goes to the curry restaurant
Group into phrases
He goes to the curry restaurant
Translate
彼は ⾏く に カレー屋
Reorder
彼は ⾏くにカレー屋
Extract phrase table
28
Word alignments Phrase table
Decoding
• In short
- Run moses
- Wait for 2 days
• Technically
- (1) Load all the translation rules
- (2) Search for the best hypothesis
29
Load all the translation rules
30
Search for the best hypothesis
• Beam search / Cube search
31
Hierarchical phrase-based models
• Allow phrases to have gaps
32
Hard problems of MT
• Word order
• Word sense
• Pronouns
• Tense
• Idioms
33
Word order
34
Word sense ambiguity
35
Problem of pronouns
36
Different tenses
• Past tense vs. present tense
• Grammar discrepancy
37
Idioms
38
Resources of SMT
• Parallel corpus
- LDC datas
- www.ldc.upenn.edu
- Europarl corpus
- Danish, Dutch, English, Finnish, French,
- German, Greek, Italian, Portuguese, Spanish, Swedish
- Japanese
- NTCIR-8 (3M) , ASPEC (3M)
• Word alignment software
- GIZA ++, Berkeley aligner
• Language modelling
- SRILIM, Berkeley LM, KenLM
• Decoder
- Moses (maintained by the group of Kohen)
- Travatar (Graham Neubig)
39
40
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
Recent developments of SMT
• Advances in decoders
• Super-large-scale language model
- language model compression
• Margin Infused Relaxed Algorithm (MIRA)
- train the hyper parameters in a smart way
• Tree models
- Tree-to-Tree translation
- String-to-Tree translation
- Tree-to-String translation
- Forest-to-String translation *
- Robust to parsing errors
• Factor models
• Pre-reordering
41
What is a parse tree
42
Context-free grammar Dependency grammar
Tree-to-string translation models
43
• Translate source code to comment
Pre-reordering phrase-based translation model
44
He goes to the curry restaurant
He the curry restaurant
Group into phrases
He the curry restaurant
Translate
彼は ⾏くにカレー屋
Pre-reordering
to goes
goesto
Example of pre-reordering
45
寿命 の 向上 が 実用 化 の 大きな 課題 で あ る 。
the life of the improvement va_nsubjpass the practical application of a large problem is .
Restructured parse tree
the improvement of the life is a large problem of the practical application.
Original input
Reordered input
Reference
A summary of SMT
46
47
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
Problem of conventional SMT
• Under-fitting (non-parametric approach)
• Solution:
- Deep recurrent neural networks
48
Application of neural networks in MT
49
High computational complexity
50
High computational complexity
51
• Try AdaGrad, AdaDelta, Adam in the first place
Neural machine translation
• encoder-decoder approach
52
Performance dropMulti-layer encoder-decoder model
Soft-attention mechanism
‣ make a weighted summary
53
soft-attention model
Visualization of learned representation
54
Experiments in WAT2015
55
Evaluation result: human evaluation scores
56
Evaluation result: evaluation scores
57
BLEU RIBES HUMAN JPO
Baseline phrase-based SMT 29.80 0.691
Baseline hierarchical phrase-based SMT 32.56 0.746
Baseline Tree-to-string SMT 33.44 0.758 30.00
Submitted system 1
(NMT)
34.19 0.802 43.50
Submitted system 2
(NMT + System combination)
36.21 0.809 53.75 3.81
Best competitor 1: NAIST
(Travatar System with NeuralMT Reranking)
38.17 0.813 62.25 4.04
Best competitor 2: naver
(SMT t2s + Spell correction + NMT reranking)
36.14 0.803 53.25 4.00
(Option) Finding & Insights
‣ Soft-attention models outperforms multi-layer
encoder-decoder models
‣ Training models on pre-reordered data hurts
the performance
‣ NMT models tend to make grammatically
valid but incomplete translations
58
59
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
Can’t use monolingual data
• Deep fusion (Gulcehre et al., 2015)
• Integrate a neural language model trained on massive
monolingual corpus
60
The attention mechanism is not perfect
• Local search (Minh-Thang Luong, 2015)
61
Local search modelGlobal search model
The attention mechanism is not perfect
• Input feeding
62
Translation does not cover all the words
• Coverage-based NMT model (Zhaopeng Tu et al., 2016)
63
Objective function is bad
• Cross-entropy is too much different to BLEU
• Solutions:
- (1) Data as demonstrator (Bengio et al., 2015)
64
Objective function is bad (cont.)
• Cross-entropy is too much different to BLEU
• Solutions:
- (2) Mixed REINFORCE (Ranzato et al., 2016)
65
Objective function is bad (cont.)
• Cross-entropy is too much different to BLEU
• Solutions:
- (3) Minimum Risk Training (Shen et al., 2015)
66
Objective of MRT
6 BLEU gain in Chinese-English task
Large vocabulary problem
• The problem
- English vocab. has 700K words
- So I set the size of output layer to 700K
- Then I get memory error
• Solutions
- I still want to use 700K vocab.
- Noise-contrastive estimation (Gutmann and Hyvarinen, 2010)
- Clustering (Mikolov. et al., 2013)
- Approximate Learning Approach (Jean et al., 2015)
- I give up, cut it to 80K vocab. and recover <UNK> tokens
- Positional unknown model (Minh-Thang Luong et al, 2015)
67
68
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
Future of MT
• Semantic preserving translation
• Character/sub-word level models
• Translation in context
• Low-resource translation
- Knowledge transfer
- Multilingual translation
69
Multilingual seq-to-seq model
70
Modality agnostic space
71
Beyond translation: Image/Video Caption Generation
72
Beyond translation: Image/Video Caption Generation
73
Thanks.
74

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Translation vs. Interpretation
Translation vs. Interpretation Translation vs. Interpretation
Translation vs. Interpretation Rolando Tellez
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?Multilizer
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingsaurabhnarhe
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Globalization and translation
Globalization and translationGlobalization and translation
Globalization and translationPankaj Dwivedi
 
Theory of translation
Theory of translationTheory of translation
Theory of translationytsogzolmaa
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural language processing
Natural language processingNatural language processing
Natural language processingSaurav Aryal
 
Trasnlation shift
Trasnlation shiftTrasnlation shift
Trasnlation shiftBuhsra
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarDr. Shadia Banjar
 

What's hot (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Machine translator Introduction
Machine translator IntroductionMachine translator Introduction
Machine translator Introduction
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Translation vs. Interpretation
Translation vs. Interpretation Translation vs. Interpretation
Translation vs. Interpretation
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
Techniques in Translation
Techniques in TranslationTechniques in Translation
Techniques in Translation
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Corpus Linguistics
Corpus LinguisticsCorpus Linguistics
Corpus Linguistics
 
Globalization and translation
Globalization and translationGlobalization and translation
Globalization and translation
 
Theory of translation
Theory of translationTheory of translation
Theory of translation
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
The role of translation in globalization
The role of translation in globalizationThe role of translation in globalization
The role of translation in globalization
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Trasnlation shift
Trasnlation shiftTrasnlation shift
Trasnlation shift
 
Translation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. BanjarTranslation Strategies, by Dr. Shadia Y. Banjar
Translation Strategies, by Dr. Shadia Y. Banjar
 

Viewers also liked

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
画像処理分野における研究事例紹介
画像処理分野における研究事例紹介画像処理分野における研究事例紹介
画像処理分野における研究事例紹介nlab_utokyo
 
20160601画像電子学会
20160601画像電子学会20160601画像電子学会
20160601画像電子学会nlab_utokyo
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーnlab_utokyo
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例nlab_utokyo
 

Viewers also liked (7)

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
画像処理分野における研究事例紹介
画像処理分野における研究事例紹介画像処理分野における研究事例紹介
画像処理分野における研究事例紹介
 
ISM2014
ISM2014ISM2014
ISM2014
 
20160601画像電子学会
20160601画像電子学会20160601画像電子学会
20160601画像電子学会
 
20150930
2015093020150930
20150930
 
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までーDeep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
Deep Learningによる画像認識革命 ー歴史・最新理論から実践応用までー
 
マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例マシンパーセプション研究におけるChainer活用事例
マシンパーセプション研究におけるChainer活用事例
 

Similar to Machine Translation Introduction

Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersYoung Seok Kim
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchNatasha Latysheva
 
Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation sekizawayuuki
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?NAVER Engineering
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Assessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelsAssessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelstransLectures
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language ModelsDataScienceConferenc1
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 

Similar to Machine Translation Introduction (20)

Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Translating phrases in neural machine translation
Translating phrases in neural machine translationTranslating phrases in neural machine translation
Translating phrases in neural machine translation
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
project present
project presentproject present
project present
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Assessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation modelsAssessing quick update methods of statistical translation models
Assessing quick update methods of statistical translation models
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
[DSC Europe 23] Dmitry Ustalov - Design and Evaluation of Large Language Models
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 

More from nlab_utokyo

画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向nlab_utokyo
 
大規模言語モデルとChatGPT
大規模言語モデルとChatGPT大規模言語モデルとChatGPT
大規模言語モデルとChatGPTnlab_utokyo
 
Non-autoregressive text generation
Non-autoregressive text generationNon-autoregressive text generation
Non-autoregressive text generationnlab_utokyo
 
2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介nlab_utokyo
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~nlab_utokyo
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014nlab_utokyo
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2nlab_utokyo
 

More from nlab_utokyo (12)

画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向
 
大規模言語モデルとChatGPT
大規模言語モデルとChatGPT大規模言語モデルとChatGPT
大規模言語モデルとChatGPT
 
Non-autoregressive text generation
Non-autoregressive text generationNon-autoregressive text generation
Non-autoregressive text generation
 
2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介2020年度 東京大学中山研 研究室紹介
2020年度 東京大学中山研 研究室紹介
 
RecSysTV2014
RecSysTV2014RecSysTV2014
RecSysTV2014
 
20150414seminar
20150414seminar20150414seminar
20150414seminar
 
Deep Learningと画像認識   ~歴史・理論・実践~
Deep Learningと画像認識 ~歴史・理論・実践~Deep Learningと画像認識 ~歴史・理論・実践~
Deep Learningと画像認識   ~歴史・理論・実践~
 
MIRU2014 SLAC
MIRU2014 SLACMIRU2014 SLAC
MIRU2014 SLAC
 
Lab introduction 2014
Lab introduction 2014Lab introduction 2014
Lab introduction 2014
 
SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2SSII2014 詳細画像識別 (FGVC) @OS2
SSII2014 詳細画像識別 (FGVC) @OS2
 
ICME 2013
ICME 2013ICME 2013
ICME 2013
 
Seminar
SeminarSeminar
Seminar
 

Recently uploaded

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 

Recently uploaded (20)

A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 

Machine Translation Introduction

  • 2. Part of this slide is stolen from the slide of Kohen (www.statmt.org) 2
  • 3. 3 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 5. The dream of machine translation 5
  • 7. The history of machine translation • 1629 - Proposed universal language by René Descartes - Different tongues shares one set of symbols • 1947 - First computer used transistors instead of vacuum tubes • 1949 ~ - Rule-based machine translation • 1954 - First demo by IBM • 1993 ~ - Statistical machine translation • 2013 ~ - Neural machine translation 7
  • 8. Rule-based translation systems • Translation rules created by experts of linguistics • Hard to maintain or update • The performance is still (or almost) the state-of- the-art 8
  • 9. Statistical machine translation • Translation models are learned from parallel corpus • Language independent 9
  • 10. 10 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 12. For people who don’t like equations 12
  • 13. A common pipeline of SMT 13 Alignment Neural re-ranking
  • 14. Evaluation of SMT • BLEU - n-gram matching (usually 4-gram) • NIST - Content words are more important • RIBES (Hideki Isozaki, 2010) - Order is also important - Better for SVO-to-SOV language pairs 14
  • 16. A brief history of the development of SMT • 1990 ~ 2000 - Word-based models (IBM models) - Brown, Och, Ney. • 2003 - Phrase-based models - Philip Kohen • 2005;2007 - Hierarchical Phrase-based models - David Chiang • 2010 ~ - Tree models, Factor models 16
  • 17. Language model • Modelling p(the dog is sparking) - In order to know which candidate is more natural • Markov Assumption • 5-gram model is mostly used in SMT 17
  • 21. Word alignments in the matrix 21
  • 22. How to get word alignments • In short - Run giza++ with parallel corpus - Wait for 5 hours • Technically - 5 IBM models, HMM models, EM algorithm 22
  • 23. Run the EM algorithm 23
  • 24. Run the EM algorithm 24
  • 25. Run the EM algorithm 25
  • 26. Run the EM algorithm 26 10 years of the work
  • 27. Phrase-based translation model 27 He goes to the curry restaurant Group into phrases He goes to the curry restaurant Translate 彼は ⾏く に カレー屋 Reorder 彼は ⾏くにカレー屋
  • 28. Extract phrase table 28 Word alignments Phrase table
  • 29. Decoding • In short - Run moses - Wait for 2 days • Technically - (1) Load all the translation rules - (2) Search for the best hypothesis 29
  • 30. Load all the translation rules 30
  • 31. Search for the best hypothesis • Beam search / Cube search 31
  • 32. Hierarchical phrase-based models • Allow phrases to have gaps 32
  • 33. Hard problems of MT • Word order • Word sense • Pronouns • Tense • Idioms 33
  • 37. Different tenses • Past tense vs. present tense • Grammar discrepancy 37
  • 39. Resources of SMT • Parallel corpus - LDC datas - www.ldc.upenn.edu - Europarl corpus - Danish, Dutch, English, Finnish, French, - German, Greek, Italian, Portuguese, Spanish, Swedish - Japanese - NTCIR-8 (3M) , ASPEC (3M) • Word alignment software - GIZA ++, Berkeley aligner • Language modelling - SRILIM, Berkeley LM, KenLM • Decoder - Moses (maintained by the group of Kohen) - Travatar (Graham Neubig) 39
  • 40. 40 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 41. Recent developments of SMT • Advances in decoders • Super-large-scale language model - language model compression • Margin Infused Relaxed Algorithm (MIRA) - train the hyper parameters in a smart way • Tree models - Tree-to-Tree translation - String-to-Tree translation - Tree-to-String translation - Forest-to-String translation * - Robust to parsing errors • Factor models • Pre-reordering 41
  • 42. What is a parse tree 42 Context-free grammar Dependency grammar
  • 43. Tree-to-string translation models 43 • Translate source code to comment
  • 44. Pre-reordering phrase-based translation model 44 He goes to the curry restaurant He the curry restaurant Group into phrases He the curry restaurant Translate 彼は ⾏くにカレー屋 Pre-reordering to goes goesto
  • 45. Example of pre-reordering 45 寿命 の 向上 が 実用 化 の 大きな 課題 で あ る 。 the life of the improvement va_nsubjpass the practical application of a large problem is . Restructured parse tree the improvement of the life is a large problem of the practical application. Original input Reordered input Reference
  • 46. A summary of SMT 46
  • 47. 47 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 48. Problem of conventional SMT • Under-fitting (non-parametric approach) • Solution: - Deep recurrent neural networks 48
  • 49. Application of neural networks in MT 49
  • 51. High computational complexity 51 • Try AdaGrad, AdaDelta, Adam in the first place
  • 52. Neural machine translation • encoder-decoder approach 52 Performance dropMulti-layer encoder-decoder model
  • 53. Soft-attention mechanism ‣ make a weighted summary 53 soft-attention model
  • 54. Visualization of learned representation 54
  • 56. Evaluation result: human evaluation scores 56
  • 57. Evaluation result: evaluation scores 57 BLEU RIBES HUMAN JPO Baseline phrase-based SMT 29.80 0.691 Baseline hierarchical phrase-based SMT 32.56 0.746 Baseline Tree-to-string SMT 33.44 0.758 30.00 Submitted system 1 (NMT) 34.19 0.802 43.50 Submitted system 2 (NMT + System combination) 36.21 0.809 53.75 3.81 Best competitor 1: NAIST (Travatar System with NeuralMT Reranking) 38.17 0.813 62.25 4.04 Best competitor 2: naver (SMT t2s + Spell correction + NMT reranking) 36.14 0.803 53.25 4.00
  • 58. (Option) Finding & Insights ‣ Soft-attention models outperforms multi-layer encoder-decoder models ‣ Training models on pre-reordered data hurts the performance ‣ NMT models tend to make grammatically valid but incomplete translations 58
  • 59. 59 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 60. Can’t use monolingual data • Deep fusion (Gulcehre et al., 2015) • Integrate a neural language model trained on massive monolingual corpus 60
  • 61. The attention mechanism is not perfect • Local search (Minh-Thang Luong, 2015) 61 Local search modelGlobal search model
  • 62. The attention mechanism is not perfect • Input feeding 62
  • 63. Translation does not cover all the words • Coverage-based NMT model (Zhaopeng Tu et al., 2016) 63
  • 64. Objective function is bad • Cross-entropy is too much different to BLEU • Solutions: - (1) Data as demonstrator (Bengio et al., 2015) 64
  • 65. Objective function is bad (cont.) • Cross-entropy is too much different to BLEU • Solutions: - (2) Mixed REINFORCE (Ranzato et al., 2016) 65
  • 66. Objective function is bad (cont.) • Cross-entropy is too much different to BLEU • Solutions: - (3) Minimum Risk Training (Shen et al., 2015) 66 Objective of MRT 6 BLEU gain in Chinese-English task
  • 67. Large vocabulary problem • The problem - English vocab. has 700K words - So I set the size of output layer to 700K - Then I get memory error • Solutions - I still want to use 700K vocab. - Noise-contrastive estimation (Gutmann and Hyvarinen, 2010) - Clustering (Mikolov. et al., 2013) - Approximate Learning Approach (Jean et al., 2015) - I give up, cut it to 80K vocab. and recover <UNK> tokens - Positional unknown model (Minh-Thang Luong et al, 2015) 67
  • 68. 68 Agenda • Overview of the history • Statistical machine translation • Recent developments in SMT • Neural machine translation • Some problems of NMT • Futures of MT
  • 69. Future of MT • Semantic preserving translation • Character/sub-word level models • Translation in context • Low-resource translation - Knowledge transfer - Multilingual translation 69
  • 72. Beyond translation: Image/Video Caption Generation 72
  • 73. Beyond translation: Image/Video Caption Generation 73