[Paper Introduction] Training a Natural Language Generator From Unaligned Data
1. Training a Natural Language Generator From
Unaligned Data
PHILIP ARTHUR
MT STUDY GROUP
9/3/2015 1
2. Paper Description
Title: Training a Natural Language Generator From Unaligned Data
Author: Ondrej Dusek and Filip Jurcicek
Meeting: ACL
Year: 2015
Type: Long Paper
Reference: https://aclweb.org/anthology/P/P15/P15-1044.pdf
9/3/2015 2
4. Motivation & Contribution
• Motivation
• Current NLG system requires separate training data alignment step.
• Using CFG or phrase based limit the ability to capture long-range syntactic dependencies.
• Contribution
• Novel method that integrates alignment step into sentence planner.
• Using deep-syntactic trees with rule based surface realization.
• Ability to learn from incomplete tree.
9/3/2015 4
6. About the data structure
• Each node has a lemma and formeme. (Dušek et al., 2012)
• Contain nodes for content words (nouns, full verbs, adjectives, adverbs) and coordinating
conjunctions.
• Treex toolkit is used to generate this dependency tree for the input.
9/3/2015 6
7. Dataset
• BAGEL: dataset of restaurant, consists of Dialogue Acts (DA) & list of slot-value pairs (SVPs),
contains information about restaurants.
• Where: http://farm2.user.srcf.net/research/bagel/ACL10-inform-training.txt
• Example:
9/3/2015 7
FULL_DA = inform(name="Green Man",eattype=restaurant)
ABSTRACT_DA = inform(name="X1",eattype=restaurant)
-> "[name+X]X []is a [eattype+restaurant]restaurant.";
8. Sentence Planner
• Building a deep syntactic tree based on A* search. (2 Hypotheses: Open + Closed)
• Based on: Candidate Generator + Scorer/Ranker
9/3/2015 8
9. Sentence Planner Algorithm
Init: Start from an open set with a single empty sentence plan tree and an empty closed set.
Loop:
1. Select the best-scoring candidate C from the open set. Add C to closed set.
2. The candidate generator generates C, a set of possible successors to C. These are trees that
have more nodes than C and are deemed viable. Note that C may be empty.
3. The scorer scores all successors in C and if they are not already in the closed set, it adds
them to the open set.
4. Check if the best successor in the open set scores better than the best candidate in the
closed set.
Stop: The algorithm finishes if the top score in the open set is lower than the top score in the
closed set for d consecutive iterations, or if there are no more candidates in the open set. It
returns the best-scoring candidate from both sets.
9/3/2015 9
10. Candidate Generator
• Adding one new node in all possible positions with all possible lemmas and formemes.
9/3/2015 10
11. CG + Expansion Filtering
1. Lemma-formeme compatibility: nodes with combination of lemma + formeme in Training.
2. Syntatic vialibility: new node generated compatible with parent (seen before, including
dependency of left and right child).
3. Number of children: For particular parent node, #child can’t exceed the max #child of same
node in Training.
4. Tree Size: #nodes in tree can’t exceed the max #nodes of tree in training data. For each level.
5. Weak semantic compatibility: Include node that appears in training, containing SVPs from
the current input.
6. Strong semantic compatibility: For each lemma + formeme, there is a compatibility list.
Node generation is allowed if all of SVPs are present in the current input DA.
9/3/2015 11
12. Scorer / Ranker
• Is a function that maps global features from sentence plan 𝑡 + input 𝑚 to a real value.
• Based on basic perceptron scorer:
9/3/2015 12
13. Training
Objective:
Init: all 𝑤 ∈ 𝐰 = 1
For each input MR in Training:
1. 𝑡𝑡𝑜𝑝 = Generate a sentence plan of the input based on current weight.
2. 𝑡 𝑔𝑜𝑙𝑑 = Parsed input by using automatic annotator (treex).
3. Update:
9/3/2015 13
14. Differing Subtrees Update
• Starting from the common subtree 𝑡 𝑐 of 𝑡𝑡𝑜𝑝and 𝑡 𝑔𝑜𝑙𝑑, pairs of differing subtrees 𝑡𝑡𝑜𝑝
𝑖
, 𝑡 𝑔𝑜𝑙𝑑
𝑖
are
created by gradually adding nodes from 𝑡𝑡𝑜𝑝into 𝑡𝑡𝑜𝑝
𝑖
and from 𝑡 𝑔𝑜𝑙𝑑 into 𝑡 𝑔𝑜𝑙𝑑
𝑖
.
9/3/2015 14
15. Algorithm Differing Subtree Update
• In the third step of training, substitute “full-tree” update with “” update:
• It is reported that if we don’t use the same size of subtree, performance will degrade.
9/3/2015 15
16. Future Promise Estimation
• The same idea of A* search where score = scorer(input, weight) + heuristic(input).
• Based on Expected number of children 𝐸𝑐(𝑛) of different node types.
• The future promise (fp) of a particular sentence plan 𝑡 is calculated based on its node 𝑛1 … 𝑛 𝑡:
• 𝑐(𝑛𝑖) is the current number of children.
• 𝜆 is a preset parameter.
• Not included in stop criterion check.
9/3/2015 16
17. Averaging Weight + Parallel Training
• Using Iterative mixing approrach (McDonald et al. 2010).
• Training data are splitted into several parts.
• Weights updated are averaged after each pass through the training data.
• Record weights after each training pass, take an average at the end Final weight.
9/3/2015 17
18. Surface Realizer
• Built using Treex NLP toolkit (Ptacek, 2008).
• Rule based simple pipeline, outlining:
1. Agreement
2. Word Ordering
3. Compound verb forms
4. Grammatical words
5. Punctuation
6. Word Inflection
7. Phonetic Changes
• Round trip test using automatic analysis with subsequent generation reached 89.79% BLEU.
9/3/2015 18
19. Features
• current tree properties: depth, #nodes, #repeated_nodes
• tree and input DA: #nodes/SVP, #repeated_nodes/SVP
• node features: lemma, formeme, #children of all nodes in the current tree.
• input features: whole SVPs (slot+value), just slots, and pairs of slots in the DA
• combinations of node and input features
• repeat features: #repeated_lemma_formeme with #repeated_slots in the input DA.
• dependency features: parent-child pairs for lemmas + formeme, including left or right.
• sibling features: sibling pairs for lemmas + formeme, combined with SVP.
• bigram features: pairs of lemmas + formeme adjacent in tree left-right order, combined with
SVP.
9/3/2015 19
20. Setup
• #iterations do not improve:
• Training = 3
• Testing = 4
• Maximum 200 sentence planner iterations per input DA.
• 𝛼 = 0.1
• If fp is used then 𝛼 = 0.3
• 10 folds cross validation is used in the experiment.
9/3/2015 20
21. Results
• The Proposed method gives improvement on both BLEU and NIST with whole training portion.
• Compare to the previous work (67%) is still lower. Task is harder, no alignment is used.
• Larger training data can demonstrate the effectiveness of the proposed method.
• Both improvement gain a 95% confidence of statistical significance testing (Koehn, 2004).
9/3/2015 21
23. Discussion
+ Generator learns to produce meaningful utterances that correspond well to the input DA.
- Not all required information is always present.
- Some facts are sometimes repeated or irrelevant information Appears
◦ Occurs because of the data sparsity.
◦ Design a scorer features that discourage conflicting information.
- Repeated slots in input are not handled correctly.
9/3/2015 23
24. Conclusion
• The paper presented a NLG, capable of learning from unaligned pairs.
• The contribution consists of A* based sentence planner, rule-based surface realization from
Treex toolkit.
• The empiric results shows a promising result, although didn’t surpass the previous work, but
their work is substantially harder.
• Code: https://github.com/UFAL-DSG/tgen
9/3/2015 24