SlideShare a Scribd company logo
Natural Language Generation
by Hierarchical Decoding with
Linguistic Patterns
Shang-Yu Su, Kai-Ling Lo, Yi-Ting Yeh, Yun-Nung Chen
NAACL2018
Presenter: Tomoya Ogata
NLG?
• sentence planning → deciding a sentence structure
• surface realization → flattening the sentence
structure into a string
• it is challenging to generate long and complex
sentences by the simple encoder-decoder structure
due to grammar complexity and lack of diction
knowledge
name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One]
Near All Bar One is a moderately priced Italian place it is called Midsummer House
What they did
• introducing a hierarchical decoding NLG model
based on linguistic patterns in different levels
The Proposed Approach
Encoder
• capture the temporal dependency
• project the input to a latent feature space
• encoded into 1-hot semantic representation as the
initial state of the encoder
semantic representation sequence:x = 𝑥 𝑡 1
𝑇
Hierarchical Decoder
• separate the decoding process and learn different
types of patterns instead of learning all relevant
knowledge together
• we use part-of-speech (POS) tags as the additional
linguistic features to construct the hierarchy
the encoded semantic vector, ℎ 𝑒𝑛𝑐
Inner-and Inter-Layer Teacher
Forcing
• Inner-layer teacher forcing
• Inter-layer teacher forcing
𝑦 : true previous token
𝑦 : one sampled from the model itself
Repeat-Input Mechnism
• a strategy that repeats the outputs from the last
layer as inputs until the current decoding layer
outputs the same token
• merits
• telling the decoder that the repeated tokens are
important to encourage the decoder to generate them
• the impact of length difference can be mitigated
Curriculum Learning
(Elman, 1993)
• a curriculum of progressively harder tasks could
significantly accelerate a networks training
• → from the bottommost layer to the topmost
one
Experiments
Setting (linguistic patterns)
• POS tagging -> spaCy toolkit
• We assign the words with specific POS tags for each
decoding layer:
• first layer: nouns, proper nouns, and pronouns
• second layer: verbs
• third layer: adjectives and adverbs
• forth layer: others
Setting (Parameters)
• The probability of teacher forcing: 0.5, 0.9
• training epoch: 20
• curriculum learning:
• first five epochs: first layer
• six epoch: second layer
• mini-batchsize: 32
• optimizer: Adam
• baseline: seq2seq (encoder hidden: 200, decoder
hidden: 400)
• proposed model: encoder hidden: 200, decoder
hidden: 100
Setting(Dataset)
• E2E NLG challenge dataset
• restaurant domain
• training: 42,064 instances
• validation: 4,673 instances
• input
• “name[Bibimbap House], food[English],
priceRange[moderate], area[riverside], near[Clare Hall]”
• output
• “Bibimbap House is a moderately priced restau- rant who’s
main cuisine is English food. You will find this local gem near
Clare Hall in the Riverside area.”
Result
the generation process into several phases achieves significant improvement in ROUGE scores
Conclusion
• a hierarchical decoder that leverages various
linguistic patterns and further designs several
corresponding training and inference techniques
• the models applying the proposed methods achieve
significant improvement over the classic seq2seq
model

More Related Content

Similar to Natural language generation by hierarchical decoding with linguistic patterns

Compilers.pptx
Compilers.pptxCompilers.pptx
Compilers.pptx
MohammedMohammed578197
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
nikit meshram
 
Ti1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming LinguisticsTi1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming Linguistics
Eelco Visser
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
WarNik Chow
 
Logic Programming and ILP
Logic Programming and ILPLogic Programming and ILP
Logic Programming and ILP
Pierre de Lacaze
 
Joint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for ParaphraseJoint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for Paraphrase
Masahiro Kaneko
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
Iván Montes
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
Laura Dent
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
paulbowler
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Databricks
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
Natasha Latysheva
 
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence LabelingMarek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Association for Computational Linguistics
 
Lazy man's learning: How To Build Your Own Text Summarizer
Lazy man's learning: How To Build Your Own Text SummarizerLazy man's learning: How To Build Your Own Text Summarizer
Lazy man's learning: How To Build Your Own Text Summarizer
Sho Fola Soboyejo
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Jinho Choi
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
GauravGamer2
 
Trans coder
Trans coderTrans coder
Trans coder
PriyaM781673
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 

Similar to Natural language generation by hierarchical decoding with linguistic patterns (20)

Compilers.pptx
Compilers.pptxCompilers.pptx
Compilers.pptx
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
Ti1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming LinguisticsTi1220 Lecture 1: Programming Linguistics
Ti1220 Lecture 1: Programming Linguistics
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
Logic Programming and ILP
Logic Programming and ILPLogic Programming and ILP
Logic Programming and ILP
 
Joint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for ParaphraseJoint Copying and Restricted Generation for Paraphrase
Joint Copying and Restricted Generation for Paraphrase
 
Programming Languages #devcon2013
Programming Languages #devcon2013Programming Languages #devcon2013
Programming Languages #devcon2013
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
 
DSL's with Groovy
DSL's with GroovyDSL's with Groovy
DSL's with Groovy
 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
 
Building a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From ScratchBuilding a Neural Machine Translation System From Scratch
Building a Neural Machine Translation System From Scratch
 
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence LabelingMarek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling
 
Lazy man's learning: How To Build Your Own Text Summarizer
Lazy man's learning: How To Build Your Own Text SummarizerLazy man's learning: How To Build Your Own Text Summarizer
Lazy man's learning: How To Build Your Own Text Summarizer
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx
 
Trans coder
Trans coderTrans coder
Trans coder
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 

Recently uploaded

Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 

Recently uploaded (20)

Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 

Natural language generation by hierarchical decoding with linguistic patterns

  • 1. Natural Language Generation by Hierarchical Decoding with Linguistic Patterns Shang-Yu Su, Kai-Ling Lo, Yi-Ting Yeh, Yun-Nung Chen NAACL2018 Presenter: Tomoya Ogata
  • 2. NLG? • sentence planning → deciding a sentence structure • surface realization → flattening the sentence structure into a string • it is challenging to generate long and complex sentences by the simple encoder-decoder structure due to grammar complexity and lack of diction knowledge name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One] Near All Bar One is a moderately priced Italian place it is called Midsummer House
  • 3. What they did • introducing a hierarchical decoding NLG model based on linguistic patterns in different levels
  • 5. Encoder • capture the temporal dependency • project the input to a latent feature space • encoded into 1-hot semantic representation as the initial state of the encoder semantic representation sequence:x = 𝑥 𝑡 1 𝑇
  • 6. Hierarchical Decoder • separate the decoding process and learn different types of patterns instead of learning all relevant knowledge together • we use part-of-speech (POS) tags as the additional linguistic features to construct the hierarchy the encoded semantic vector, ℎ 𝑒𝑛𝑐
  • 7. Inner-and Inter-Layer Teacher Forcing • Inner-layer teacher forcing • Inter-layer teacher forcing 𝑦 : true previous token 𝑦 : one sampled from the model itself
  • 8. Repeat-Input Mechnism • a strategy that repeats the outputs from the last layer as inputs until the current decoding layer outputs the same token • merits • telling the decoder that the repeated tokens are important to encourage the decoder to generate them • the impact of length difference can be mitigated
  • 9. Curriculum Learning (Elman, 1993) • a curriculum of progressively harder tasks could significantly accelerate a networks training • → from the bottommost layer to the topmost one
  • 11. Setting (linguistic patterns) • POS tagging -> spaCy toolkit • We assign the words with specific POS tags for each decoding layer: • first layer: nouns, proper nouns, and pronouns • second layer: verbs • third layer: adjectives and adverbs • forth layer: others
  • 12. Setting (Parameters) • The probability of teacher forcing: 0.5, 0.9 • training epoch: 20 • curriculum learning: • first five epochs: first layer • six epoch: second layer • mini-batchsize: 32 • optimizer: Adam • baseline: seq2seq (encoder hidden: 200, decoder hidden: 400) • proposed model: encoder hidden: 200, decoder hidden: 100
  • 13. Setting(Dataset) • E2E NLG challenge dataset • restaurant domain • training: 42,064 instances • validation: 4,673 instances • input • “name[Bibimbap House], food[English], priceRange[moderate], area[riverside], near[Clare Hall]” • output • “Bibimbap House is a moderately priced restau- rant who’s main cuisine is English food. You will find this local gem near Clare Hall in the Riverside area.”
  • 14. Result the generation process into several phases achieves significant improvement in ROUGE scores
  • 15. Conclusion • a hierarchical decoder that leverages various linguistic patterns and further designs several corresponding training and inference techniques • the models applying the proposed methods achieve significant improvement over the classic seq2seq model