Natural language generation by hierarchical decoding with linguistic patterns

•Download as PPTX, PDF•

0 likes•46 views

The document proposes a hierarchical decoding model for natural language generation (NLG) that separates the decoding process into layers associated with different linguistic patterns like parts-of-speech. It introduces techniques like inner-layer teacher forcing to encourage generating important repeated tokens, inter-layer teacher forcing to provide supervision between layers, and curriculum learning to train layers progressively. The model achieves significant improvements on an NLG dataset compared to a standard seq2seq model, demonstrating the benefits of leveraging linguistic knowledge for complex sentence generation.

Science

Natural Language Generation
by Hierarchical Decoding with
Linguistic Patterns
Shang-Yu Su, Kai-Ling Lo, Yi-Ting Yeh, Yun-Nung Chen
NAACL2018
Presenter: Tomoya Ogata

NLG?
• sentence planning → deciding a sentence structure
• surface realization → flattening the sentence
structure into a string
• it is challenging to generate long and complex
sentences by the simple encoder-decoder structure
due to grammar complexity and lack of diction
knowledge
name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One]
Near All Bar One is a moderately priced Italian place it is called Midsummer House

What they did
• introducing a hierarchical decoding NLG model
based on linguistic patterns in different levels

Encoder
• capture the temporal dependency
• project the input to a latent feature space
• encoded into 1-hot semantic representation as the
initial state of the encoder
semantic representation sequence：x = 𝑥 𝑡 1
𝑇

Hierarchical Decoder
• separate the decoding process and learn different
types of patterns instead of learning all relevant
knowledge together
• we use part-of-speech (POS) tags as the additional
linguistic features to construct the hierarchy
the encoded semantic vector, ℎ 𝑒𝑛𝑐

Inner-and Inter-Layer Teacher
Forcing
• Inner-layer teacher forcing
• Inter-layer teacher forcing
𝑦 : true previous token
𝑦 : one sampled from the model itself

Repeat-Input Mechnism
• a strategy that repeats the outputs from the last
layer as inputs until the current decoding layer
outputs the same token
• merits
• telling the decoder that the repeated tokens are
important to encourage the decoder to generate them
• the impact of length difference can be mitigated

Curriculum Learning
(Elman, 1993)
• a curriculum of progressively harder tasks could
significantly accelerate a networks training
• → from the bottommost layer to the topmost
one

Setting (linguistic patterns)
• POS tagging -> spaCy toolkit
• We assign the words with specific POS tags for each
decoding layer:
• first layer: nouns, proper nouns, and pronouns
• second layer: verbs
• third layer: adjectives and adverbs
• forth layer: others

Setting (Parameters)
• The probability of teacher forcing: 0.5, 0.9
• training epoch: 20
• curriculum learning:
• first five epochs: first layer
• six epoch: second layer
• mini-batchsize: 32
• optimizer: Adam
• baseline: seq2seq (encoder hidden: 200, decoder
hidden: 400)
• proposed model: encoder hidden: 200, decoder
hidden: 100

Setting(Dataset)
• E2E NLG challenge dataset
• restaurant domain
• training: 42,064 instances
• validation: 4,673 instances
• input
• “name[Bibimbap House], food[English],
priceRange[moderate], area[riverside], near[Clare Hall]”
• output
• “Bibimbap House is a moderately priced restau- rant who’s
main cuisine is English food. You will find this local gem near
Clare Hall in the Riverside area.”

Result
the generation process into several phases achieves significant improvement in ROUGE scores

Conclusion
• a hierarchical decoder that leverages various
linguistic patterns and further designs several
corresponding training and inference techniques
• the models applying the proposed methods achieve
significant improvement over the classic seq2seq
model

Many syntactic treebanks and parser toolkits are developed in the past twenty years, including dependency structure parsers and phrase structure parsers. For the phrase structure parsers, they usually utilize different phrase tagsets for different languages, which results in an inconvenience when conducting the multilingual research. This paper designs a refined universal phrase tagset that contains 9 commonly used phrase categories. Furthermore, the mapping covers 25 constituent treebanks and 21 languages. The experiments show that the universal phrase tagset can generally reduce the costs in the parsing models and even improve the parsing accuracy.

Programming paradigms c1

Omar Al-Sabek

System Programming Unit III

Manoj Patil

The document discusses the key aspects of programming language grammar and compilers. It defines lexical and syntactic features, formal languages, grammars, terminals, non-terminals, productions, derivation, syntax trees, ambiguity in grammars, compilers, cross-compilers, p-code compilers, phases of compilation including analysis of source text and synthesis of target text, and code optimization techniques. The overall goal of a compiler is to translate a high-level language program into an equivalent machine language program.

The document provides an overview of a compilers design and construction course. It discusses the various phases of compilation including lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. The course aims to introduce the principles and techniques used in compiler construction and the issues that arise in developing a compiler. The course will cover topics like lexical analysis, syntax analysis, semantic analysis, intermediate code generation, control flow, code optimization and code generation over its 12 weeks.

Introduction to course

nikit meshram

This document provides an overview and introduction to a course on principles of compiler design. It discusses the motivation for studying compilers, as language processing is important for many software applications. It outlines what will be covered in the course, including the theoretical foundations and practical techniques for developing lexical analyzers, parsers, type checkers, code generators, and more. The document also describes the organization of the course with lectures, programming assignments, and exams.

Ti1220 Lecture 1: Programming Linguistics

Eelco Visser

1909 paclic

WarNik Chow

This document summarizes an experiment comparing different character-level embedding approaches for Korean sentence classification tasks. Dense character-level embeddings using pre-trained fastText vectors outperformed sparse one-hot encodings. Character-level embeddings preserved local semantics around character boundaries better than Jamo-level encodings, which performed best with self-attention. While Jamo-level features may be useful for syntax-semantic tasks, character-level approaches had better performance and computation efficiency. These findings provide insights for character-rich languages beyond Korean.

Logic Programming and ILP

Pierre de Lacaze

This talk will cover various aspects of Logic Programming. We examine Logic Programming in the contexts of Programming Languages, Mathematical Logic and Machine Learning. We will we start with an introduction to Prolog and metaprogramming in Prolog. We will also discuss how miniKanren and Core.Logic differ from Prolog while maintaining the paradigms of logic programming. We will then cover the Unification Algorithm in depth and examine the mathematical motivations which are rooted in Skolem Normal Form. We will describe the process of converting a statement in first order logic to clausal form logic. We will also discuss the applications of the Unification Algorithm to automated theorem proving and type inferencing. Finally we will look at the role of Prolog in the context of Machine Learning. This is known as Inductive Logic Programming. In that context we will briefly review Decision Tree Learning and it's relationship to ILP. We will then examine Sequential Covering Algorithms for learning clauses in Propositional Calculus and then the more general FOIL algorithm for learning sets of Horn clauses in First Order Predicate Calculus. Examples will be given in both Common Lisp and Clojure for these algorithms. Pierre de Lacaze has over 20 years’ experience with Lisp and AI based technologies. He holds a Bachelor of Science in Applied Mathematics and Computer Science and a Master’s Degree in Computer Science. He is the president of LispNYC.org

Joint Copying and Restricted Generation for Paraphrase

Masahiro Kaneko

The document proposes a neural model called CoRe (Copy and Rewriting) for paraphrase generation tasks. CoRe uses two decoders - a copying decoder that copies words from the source text, and a restricted generative decoder that generates words based on an alignment table and frequent words list. It also uses a binary labeling task to predict whether each target word comes from copying or generation. Experiments on summarization and text simplification datasets show CoRe outperforms standard seq2seq models in terms of accuracy and efficiency.

Programming Languages #devcon2013

Iván Montes

The document discusses programming languages and ways they can be improved and customized. It argues that libraries are often overused to extend languages when the compiler itself could be extended instead. This could be done through compiler services that expose compiler information, macros that operate on the syntax tree, and quasi-quotations for building complex AST structures. Extending the compiler allows for more control and avoids issues like dependency cycles that plague library-based extensions.

Single-Sourcing and Localization stc16

Laura Dent

DSL's with Groovy

paulbowler

This document discusses using Groovy to create domain-specific languages (DSLs). It explains that DSLs use specialized syntax and grammar to interact with a specific domain. Groovy is well-suited for building DSLs through features like builders, categories, meta-object protocol (MOP), and abstract syntax tree (AST) transformations. Builders allow hierarchical structures to be defined concisely. Categories extend functionality without inheritance. MOP allows dynamic methods and properties. AST transformations hook into the compilation process. The document provides examples of each technique and describes a case study of a Groovy DSL called Grint for integration patterns.

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...

Databricks

The document summarizes a presentation about state-of-the-art natural language processing (NLP) techniques. It discusses how transformer networks have achieved state-of-the-art results in many NLP tasks using transfer learning from large pre-trained models. It also describes how Hugging Face's Transformers library and Tokenizers library provide tools for tokenization and using pre-trained transformer models through a simple interface.

Building a Neural Machine Translation System From Scratch

Natasha Latysheva

Human languages are complex, diverse and riddled with exceptions – translating between different languages is therefore a highly challenging technical problem. Deep learning approaches have proved powerful in modelling the intricacies of language, and have surpassed all statistics-based methods for automated translation. This session begins with an introduction to the problem of machine translation and discusses the two dominant neural architectures for solving it – recurrent neural networks and transformers. A practical overview of the workflow involved in training, optimising and adapting a competitive neural machine translation system is provided. Attendees will gain an understanding of the internal workings and capabilities of state-of-the-art systems for automatic translation, as well as an appreciation of the key challenges and open problems in the field.

Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling

Association for Computational Linguistics

This document summarizes a research paper on semi-supervised multitask learning for sequence labeling tasks. It proposes using a neural network with bidirectional LSTMs that predicts labels and next/previous words to jointly train on sequence labeling and language modeling objectives. Evaluating on 10 datasets for 4 tasks shows the additional language modeling objective provides consistent improvements over the baseline by making better use of data without requiring extra parameters at test time.

Lazy man's learning: How To Build Your Own Text Summarizer

Sho Fola Soboyejo

This document discusses different approaches to text summarization, including extractive and abstractive summarization. It presents several naive extractive algorithms using word frequency, sentence intersection scores, and graph theory. It also discusses using neural networks with encoder-decoder models and attention mechanisms for abstractive summarization. The document provides resources for practicing summarization techniques and accessing text datasets.

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue

Jinho Choi

The document presents an approach using transformers to learn hierarchical contexts in multiparty dialogue. It proposes new pre-training tasks to improve token-level and utterance-level embeddings for handling dialogue contexts. A multi-task learning approach is introduced to fine-tune the language model for a Friends question answering (FriendsQA) task using dialogue evidence, outperforming BERT and RoBERTa. However, the approach shows no improvement on other character mining tasks from Friends. Future work is needed to better represent speakers and inferences in dialogue.

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition

Zachary S. Brown

This document provides an overview of automatic speech recognition systems and their components. It discusses: - The main components of ASR systems including preprocessing/feature extraction, acoustic models, and language models. - Unique challenges of working with speech data like data volume, quality, and annotation. - Common modeling approaches used in ASR systems have historically included hidden Markov models and n-gram language models, but more recent approaches use end-to-end neural networks like deep speech, wav2vec, conformer and whisper models.

Natural Language Processing (NLP)

Yuriy Guts

2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx

GauravGamer2

1. The document discusses object oriented programming and Java. It provides an introduction to OOP, comparing procedural and object oriented approaches. 2. It outlines the syllabus for the object oriented programming course, covering topics like classes and objects, inheritance, interfaces, exceptions, input/output, applets, and event handling. 3. Textbooks and reference books are listed, and the document provides brief biographies of the creators of various programming languages like C, C++, Java, JavaScript, Ruby, and describes how Java is widely used.

Trans coder

PriyaM781673

TransCoder is an unsupervised machine translation system that converts source code from one programming language to another using a seq2seq model with attention. It is trained using three principles: 1) cross-programming language model pretraining to map similar tokens across languages, 2) denoising auto-encoding to generate valid sequences, and 3) back-translation to generate parallel data for training. Evaluation shows TransCoder understands language-specific syntax and aligns libraries across languages.

Deep Learning for Machine Translation

Matīss ‎‎‎‎‎‎‎

The document provides an overview of deep learning concepts and techniques for natural language processing tasks. It includes the following: 1. A schedule for a deep learning workshop covering fundamentals of deep learning for machine translation, word embeddings, neural language models, and neural machine translation. 2. Descriptions of neural networks, activation functions, backpropagation, and word embeddings. 3. Details about feedforward neural network language models, recurrent neural network language models, and how they are applied to tasks like language modeling and machine translation. 4. An explanation of attention-based encoder-decoder models for neural machine translation.

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Saurabh Kaushik

This document discusses how deep learning techniques can be applied to natural language processing tasks. It begins by explaining some of the limitations of traditional rule-based and machine learning approaches to NLP, such as the lack of semantic understanding and difficulty of feature engineering. Deep learning approaches can learn features automatically from large amounts of unlabeled text and better capture semantic and syntactic relationships between words. Recurrent neural networks are well-suited for NLP because they can model sequential data like text, and convolutional neural networks can learn hierarchical patterns in text.

Oedema_types_causes_pathophysiology.pptx

muralinath2

The debris of the ‘last major merger’ is dynamically young

Sérgio Sacani

The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the ‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space, because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago. We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data 1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’ did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within the last few Gyr, consistent with the body of work surrounding the VRM.

Similar to Natural language generation by hierarchical decoding with linguistic patterns

Compilers.pptx

MohammedMohammed578197

Introduction to course

nikit meshram

Ti1220 Lecture 1: Programming Linguistics

Eelco Visser

1909 paclic

WarNik Chow

Logic Programming and ILP

Pierre de Lacaze

Joint Copying and Restricted Generation for Paraphrase

Masahiro Kaneko

Programming Languages #devcon2013

Iván Montes

Single-Sourcing and Localization stc16

Laura Dent

DSL's with Groovy

paulbowler

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...

Databricks

Building a Neural Machine Translation System From Scratch

Natasha Latysheva

Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling

Association for Computational Linguistics

Lazy man's learning: How To Build Your Own Text Summarizer

Sho Fola Soboyejo

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue

Jinho Choi

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition

Zachary S. Brown

Natural Language Processing (NLP)

Yuriy Guts

2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx

GauravGamer2

Trans coder

PriyaM781673

Deep Learning for Machine Translation

Matīss ‎‎‎‎‎‎‎

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Saurabh Kaushik

Similar to Natural language generation by hierarchical decoding with linguistic patterns (20)

Compilers.pptx

Introduction to course

Ti1220 Lecture 1: Programming Linguistics

1909 paclic

Logic Programming and ILP

Joint Copying and Restricted Generation for Paraphrase

Programming Languages #devcon2013

Single-Sourcing and Localization stc16

DSL's with Groovy

Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...

Building a Neural Machine Translation System From Scratch

Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling

Lazy man's learning: How To Build Your Own Text Summarizer

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue

Teaching Machines to Listen: An Introduction to Automatic Speech Recognition

Natural Language Processing (NLP)

2R-3KS03-OOP_UNIT-I (Part-A)_2023-24.pptx

Trans coder

Deep Learning for Machine Translation

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Recently uploaded

Oedema_types_causes_pathophysiology.pptx

muralinath2

The debris of the ‘last major merger’ is dynamically young

Sérgio Sacani

Basics of crystallography, crystal systems, classes and different forms

MaheshaNanjegowda

bordetella pertussis.................................ppt

kejapriya1

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt

HongcNguyn6

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

Leonel Morgado

Current descriptions of immersive learning cases are often difficult or impossible to compare. This is due to a myriad of different options on what details to include, which aspects are relevant, and on the descriptive approaches employed. Also, these aspects often combine very specific details with more general guidelines or indicate intents and rationales without clarifying their implementation. In this paper we provide a method to describe immersive learning cases that is structured to enable comparisons, yet flexible enough to allow researchers and practitioners to decide which aspects to include. This method leverages a taxonomy that classifies educational aspects at three levels (uses, practices, and strategies) and then utilizes two frameworks, the Immersive Learning Brain and the Immersion Cube, to enable a structured description and interpretation of immersive learning cases. The method is then demonstrated on a published immersive learning case on training for wind turbine maintenance using virtual reality. Applying the method results in a structured artifact, the Immersive Learning Case Sheet, that tags the case with its proximal uses, practices, and strategies, and refines the free text case description to ensure that matching details are included. This contribution is thus a case description method in support of future comparative research of immersive learning cases. We then discuss how the resulting description and interpretation can be leveraged to change immersion learning cases, by enriching them (considering low-effort changes or additions) or innovating (exploring more challenging avenues of transformation). The method holds significant promise to support better-grounded research in immersive learning.

Applied Science: Thermodynamics, Laws & Methodology.pdf

University of Hertfordshire

When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best. Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level. Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Abdul Wali Khan University Mardan,kP,Pakistan

hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water

Texas Alliance of Groundwater Districts

20240520 Planning a Circuit Simulator in JavaScript.pptx

Sharon Liu

waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf

LengamoLAppostilic

Randomised Optimisation Algorithms in DAPHNE

University of Maribor

Immersive Learning That Works: Research Grounding and Paths Forward

Leonel Morgado

We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.

Thornton ESPP slides UK WW Network 4_6_24.pdf

European Sustainable Phosphorus Platform

Cytokines and their role in immune regulation.pptx

Hitesh Sikarwar

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...

AbdullaAlAsif1

The pygmy halfbeak Dermogenys colletei, is known for its viviparous nature, this presents an intriguing case of relatively low fecundity, raising questions about potential compensatory reproductive strategies employed by this species. Our study delves into the examination of fecundity and the Gonadosomatic Index (GSI) in the Pygmy Halfbeak, D. colletei (Meisner, 2001), an intriguing viviparous fish indigenous to Sarawak, Borneo. We hypothesize that the Pygmy halfbeak, D. colletei, may exhibit unique reproductive adaptations to offset its low fecundity, thus enhancing its survival and fitness. To address this, we conducted a comprehensive study utilizing 28 mature female specimens of D. colletei, carefully measuring fecundity and GSI to shed light on the reproductive adaptations of this species. Our findings reveal that D. colletei indeed exhibits low fecundity, with a mean of 16.76 ± 2.01, and a mean GSI of 12.83 ± 1.27, providing crucial insights into the reproductive mechanisms at play in this species. These results underscore the existence of unique reproductive strategies in D. colletei, enabling its adaptation and persistence in Borneo's diverse aquatic ecosystems, and call for further ecological research to elucidate these mechanisms. This study lends to a better understanding of viviparous fish in Borneo and contributes to the broader field of aquatic ecology, enhancing our knowledge of species adaptations to unique ecological challenges.

aziz sancar nobel prize winner: from mardin to nobel

İsa Badur

Medical Orthopedic PowerPoint Templates.pptx

terusbelajar5

ESR spectroscopy in liquid food and beverages.pptx

PRIYANKA PATEL

With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.

Equivariant neural networks and representation theory

Daniel Tubbenhauer

Or: Beyond linear. Abstract: Equivariant neural networks are neural networks that incorporate symmetries. The nonlinear activation functions in these networks result in interesting nonlinear equivariant maps between simple representations, and motivate the key player of this talk: piecewise linear representation theory. Disclaimer: No one is perfect, so please mind that there might be mistakes and typos. dtubbenhauer@gmail.com Corrected slides: dtubbenhauer.com/talks.html

Recently uploaded (20)

Oedema_types_causes_pathophysiology.pptx

The debris of the ‘last major merger’ is dynamically young

Basics of crystallography, crystal systems, classes and different forms

bordetella pertussis.................................ppt

mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt

Describing and Interpreting an Immersive Learning Case with the Immersion Cub...

Applied Science: Thermodynamics, Laws & Methodology.pdf

THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water

20240520 Planning a Circuit Simulator in JavaScript.pptx

waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf

Randomised Optimisation Algorithms in DAPHNE

Immersive Learning That Works: Research Grounding and Paths Forward

Thornton ESPP slides UK WW Network 4_6_24.pdf

Cytokines and their role in immune regulation.pptx

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...

aziz sancar nobel prize winner: from mardin to nobel

Medical Orthopedic PowerPoint Templates.pptx

ESR spectroscopy in liquid food and beverages.pptx

Equivariant neural networks and representation theory

Natural language generation by hierarchical decoding with linguistic patterns

1. Natural Language Generation by Hierarchical Decoding with Linguistic Patterns Shang-Yu Su, Kai-Ling Lo, Yi-Ting Yeh, Yun-Nung Chen NAACL2018 Presenter: Tomoya Ogata

2. NLG? • sentence planning → deciding a sentence structure • surface realization → flattening the sentence structure into a string • it is challenging to generate long and complex sentences by the simple encoder-decoder structure due to grammar complexity and lack of diction knowledge name[Midsummer House], food[Italian], priceRange[moderate], near[All Bar One] Near All Bar One is a moderately priced Italian place it is called Midsummer House

3. What they did • introducing a hierarchical decoding NLG model based on linguistic patterns in different levels

4. The Proposed Approach

5. Encoder • capture the temporal dependency • project the input to a latent feature space • encoded into 1-hot semantic representation as the initial state of the encoder semantic representation sequence：x = 𝑥 𝑡 1 𝑇

6. Hierarchical Decoder • separate the decoding process and learn different types of patterns instead of learning all relevant knowledge together • we use part-of-speech (POS) tags as the additional linguistic features to construct the hierarchy the encoded semantic vector, ℎ 𝑒𝑛𝑐

7. Inner-and Inter-Layer Teacher Forcing • Inner-layer teacher forcing • Inter-layer teacher forcing 𝑦 : true previous token 𝑦 : one sampled from the model itself

8. Repeat-Input Mechnism • a strategy that repeats the outputs from the last layer as inputs until the current decoding layer outputs the same token • merits • telling the decoder that the repeated tokens are important to encourage the decoder to generate them • the impact of length difference can be mitigated

9. Curriculum Learning (Elman, 1993) • a curriculum of progressively harder tasks could significantly accelerate a networks training • → from the bottommost layer to the topmost one

10. Experiments

11. Setting (linguistic patterns) • POS tagging -> spaCy toolkit • We assign the words with specific POS tags for each decoding layer: • first layer: nouns, proper nouns, and pronouns • second layer: verbs • third layer: adjectives and adverbs • forth layer: others

12. Setting (Parameters) • The probability of teacher forcing: 0.5, 0.9 • training epoch: 20 • curriculum learning: • first five epochs: first layer • six epoch: second layer • mini-batchsize: 32 • optimizer: Adam • baseline: seq2seq (encoder hidden: 200, decoder hidden: 400) • proposed model: encoder hidden: 200, decoder hidden: 100

13. Setting(Dataset) • E2E NLG challenge dataset • restaurant domain • training: 42,064 instances • validation: 4,673 instances • input • “name[Bibimbap House], food[English], priceRange[moderate], area[riverside], near[Clare Hall]” • output • “Bibimbap House is a moderately priced restaurant who’s main cuisine is English food. You will find this local gem near Clare Hall in the Riverside area.”

14. Result the generation process into several phases achieves significant improvement in ROUGE scores

15. Conclusion • a hierarchical decoder that leverages various linguistic patterns and further designs several corresponding training and inference techniques • the models applying the proposed methods achieve significant improvement over the classic seq2seq model

Natural language generation by hierarchical decoding with linguistic patterns

Recommended

Recommended

More Related Content

Similar to Natural language generation by hierarchical decoding with linguistic patterns

Similar to Natural language generation by hierarchical decoding with linguistic patterns (20)

Recently uploaded

Recently uploaded (20)

Natural language generation by hierarchical decoding with linguistic patterns