N20190116

•

0 likes•77 views

TMU, Japan

paper introduction

Software

Abstract
• Beam Search & Greedy Strategy

• Improvement vs. Cost

• Actor → decoder.hidden_state[t]

• Train: K outputs of beam search, argmax with a target quality
metric like BLEU (pseudo-parallel corpus / base model);

• No reinforcement learning and universal;

• Experiments: 3 corpora & 3 architectures [Q↑ & S↑]

Intro
• Seq-seq: conditioned left-to-right

• Inﬁnite and exponential with seq_len

• Greedy & Beam: 2+BLEU | 3+ROUGE

• Related: termination criterion & search function

• Train: ordinary BP on a model-speciﬁc corpus

• Corpus: generated by running the un-augmented model on the training set with large-beam
beam search, and selecting outputs from the resulting k-best list which score highly on our
target metric.

• Evaluation:

• RNN-based (Luong et al., 2015), ConvS2S (Gehring et al., 2017) and Transformer (Vaswani et
al., 2017)

• IWSLT16 De-En, WMT15 Fi-En and WMT14 De-En

Background
• 2.1 NMT

• 2.2 Decoding

• Greedy (1); Beam (K)

• Noisy parallel approximate decoding (NPAD; Cho, 2016) 
Noice → decoder.hidden_state[t] 
[idea] Stay active, even random! (better study at home cafe)

• Trainable Greedy Decoding (Gu et al., 2017) 
FFNN RL actor → decoder.hidden_state[t] (やっぱり at lab!) 
approximate the maximum-a-posteriori → BLEU [不不安定､残念]

Method
• I/O: at = actor(ht, et, st-1) ← dec/att/(state) 
• Form of actor: ﬀ(5), ﬀ2(6), GRU(7), gated ﬀ(gate(8)) …

Training
• Pseudo-parallel corpus generated by a base model:

• High model likelihood (not highest it)

• High-quality translations (not highest xt)

• Gen: K beams → highest internal argmax external score

• Train the actor with pseudo-D and ﬁxed base model

Experiments
• 4.1 Settings

• IWSLT16, tst2013(validation) and tst2014(test),

• WMT15, newstest2013(validation) and newstest2015(test)

• WMT14, newstest2013(validation) and newstest2014(test)

• + BPE

• evaluations

• tokenized and cased BLEU (primary).

• METEOR and TER, multeval with tokenized and case-insensitive scoring.

• Base models are trained from scratch, except for ConvS2S WMT14 En-De
translation (trained model as well as training data) provided by Gehring et al. (2017).
RNN: OpenNMT’s default, Luong

rnn, emb = [500, 500; 600, 300]

ConvS2S: IWSLT16 and WMT

Transformer: Gu et al. (2018)

(Pseudo-D beam k = 35)

recover missing tokens
optimize word order
also “correct prepositions”
both

Two Questions
• Two factors: actor & pseudo-D, which one matters?

• Silver standard is a better choice than golden one? 
(Pseudo-D seems much more kind to the little driver/actor)

Impact
Likelihood (conditioned LM)Magnitude of Action Vector
L2 norms over the training course on the
IWSLT16 De-En validation set with Transformer.

This suggests that the action adjusts the
decoders hidden state slightly, rather than
overwriting it, enabling to ﬁnd a sequence that is
not highly scored but corresponding to a high
value of the target metric. (more conﬁdent)

Actor & Data
Yes, silver data is the best.
Even bronze is better than gold!

感想
• Modiﬁcation of network:

• Elements & Structure (organs of body)

• A little guy on the shoulder of a blind giant (Transformer).

• Contextual Parameter Generator: language embedding

• The power of data seems much more greater than the
elaborative work of network. A little actor can make the
giant more ﬂexible.

ありがとうございました
Danke schön
Спасибо
谢谢
3Q
!
m(_ _)m

Similar to N20190116

TCI in general pracice - reliability (2006)Evangelos Kontopantelis

Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelLifeng (Aaron) Han

A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky

TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...Jiapeng Wu

HW04.pdfssusere50634

crossvalidation.pptxPriyadharshiniG41

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...Jinho Choi

KASYS at the NTCIR-15 WWW-3 TaskKohei Shinden

Determining Column Numbers in Rèsumè with ClusteringKemal Can Kara

Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou

論文紹介：Graph Pattern Entity Ranking Model for Knowledge Graph CompletionNaomi Shiraishi

[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...Jihun Park

[poster] A Compare-Aggregate Model with Latent Clustering for Answer SelectionSeoul National University

ictir2016Tetsuya Sakai

Incremental collaborative filtering via evolutionary co clusteringAllen Wu

LSH for  Prediction Problem in RecommendationMaruf Aytekin

Introduction to Genetic algorithm and its significance in VLSI design and aut...Centre for Electronics, Computer, Self development

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman

HW03 (1).pdfssusere50634

[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG

Similar to N20190116 (20)

TCI in general pracice - reliability (2006)

Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model

A Combination of Simple Models by Forward Predictor Selection for Job Recomme...

TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Com...

HW04.pdf

crossvalidation.pptx

The Pupil Has Become the Master: Teacher-Student Model-Based Word Embedding D...

KASYS at the NTCIR-15 WWW-3 Task

Determining Column Numbers in Rèsumè with Clustering

Simple rules for building robust machine learning models

論文紹介：Graph Pattern Entity Ranking Model for Knowledge Graph Completion

[SEKE 2014] Practical Human Resource Allocation in Software Projects Using Ge...

[poster] A Compare-Aggregate Model with Latent Clustering for Answer Selection

ictir2016

Incremental collaborative filtering via evolutionary co clustering

LSH for  Prediction Problem in Recommendation

Introduction to Genetic algorithm and its significance in VLSI design and aut...

Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...

HW03 (1).pdf

[CIKM 2014] Deviation-Based Contextual SLIM Recommenders

Recently uploaded

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2

tonesoftglanshi9

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2

WSO2CON 2024 - How to Run a Security ProgramWSO2

What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba

%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba

WSO2CON 2024 - Does Open Source Still Matter?WSO2

%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba

%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba

Recently uploaded (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

tonesoftg

WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source

WSO2CON 2024 - How to Run a Security Program

What Goes Wrong with Language Definitions and How to Improve the Situation

WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...

WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...

%in Midrand+277-882-255-28 abortion pills for sale in midrand

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

WSO2CON 2024 - Does Open Source Still Matter?

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

N20190116

1. 20190116 陈宙斯

2. Abstract • Beam Search & Greedy Strategy • Improvement vs. Cost • Actor → decoder.hidden_state[t] • Train: K outputs of beam search, argmax with a target quality metric like BLEU (pseudo-parallel corpus / base model); • No reinforcement learning and universal; • Experiments: 3 corpora & 3 architectures [Q↑ & S↑]

3. Intro • Seq-seq: conditioned left-to-right • Inﬁnite and exponential with seq_len • Greedy & Beam: 2+BLEU | 3+ROUGE • Related: termination criterion & search function • Train: ordinary BP on a model-speciﬁc corpus • Corpus: generated by running the un-augmented model on the training set with large-beam beam search, and selecting outputs from the resulting k-best list which score highly on our target metric. • Evaluation: • RNN-based (Luong et al., 2015), ConvS2S (Gehring et al., 2017) and Transformer (Vaswani et al., 2017) • IWSLT16 De-En, WMT15 Fi-En and WMT14 De-En

4. Background • 2.1 NMT • 2.2 Decoding • Greedy (1); Beam (K) • Noisy parallel approximate decoding (NPAD; Cho, 2016)  Noice → decoder.hidden_state[t]  [idea] Stay active, even random! (better study at home cafe) • Trainable Greedy Decoding (Gu et al., 2017)  FFNN RL actor → decoder.hidden_state[t] (やっぱり at lab!)  approximate the maximum-a-posteriori → BLEU [不不安定､残念]

5. Method • I/O: at = actor(ht, et, st-1) ← dec/att/(state)  • Form of actor: ff(5), ff2(6), GRU(7), gated ff(gate(8)) …

7. Training • Pseudo-parallel corpus generated by a base model: • High model likelihood (not highest it) • High-quality translations (not highest xt) • Gen: K beams → highest internal argmax external score • Train the actor with pseudo-D and ﬁxed base model

8. Experiments • 4.1 Settings • IWSLT16, tst2013(validation) and tst2014(test), • WMT15, newstest2013(validation) and newstest2015(test) • WMT14, newstest2013(validation) and newstest2014(test) • + BPE • evaluations • tokenized and cased BLEU (primary). • METEOR and TER, multeval with tokenized and case-insensitive scoring. • Base models are trained from scratch, except for ConvS2S WMT14 En-De translation (trained model as well as training data) provided by Gehring et al. (2017). RNN: OpenNMT’s default, Luong rnn, emb = [500, 500; 600, 300] ConvS2S: IWSLT16 and WMT Transformer: Gu et al. (2018) (Pseudo-D beam k = 35)

9. Overall Effectiveness & Efﬁciency

10. Overall Effectiveness & Efﬁciency

11. recover missing tokens optimize word order also “correct prepositions” both

12. Two Questions • Two factors: actor & pseudo-D, which one matters? • Silver standard is a better choice than golden one?  (Pseudo-D seems much more kind to the little driver/actor)

13. Impact Likelihood (conditioned LM)Magnitude of Action Vector L2 norms over the training course on the IWSLT16 De-En validation set with Transformer. This suggests that the action adjusts the decoders hidden state slightly, rather than overwriting it, enabling to ﬁnd a sequence that is not highly scored but corresponding to a high value of the target metric. (more conﬁdent)

14. Actor & Data Yes, silver data is the best. Even bronze is better than gold!

15. Metric Domain?

16. 感想 • Modiﬁcation of network: • Elements & Structure (organs of body) • A little guy on the shoulder of a blind giant (Transformer). • Contextual Parameter Generator: language embedding • The power of data seems much more greater than the elaborative work of network. A little actor can make the giant more ﬂexible.

17. ありがとうございました Danke schön Спасибо 谢谢 3Q ! m(_ _)m

N20190116

Recommended

Recommended

More Related Content

Similar to N20190116

Similar to N20190116 (20)

Recently uploaded

Recently uploaded (20)

N20190116