This document summarizes a research paper that presents Grid Beam Search (GBS), an algorithm that extends beam search to allow inclusion of pre-specified lexical constraints in the output sequence generation. The researchers demonstrate GBS can provide improvements in translation quality for interactive translation scenarios by incorporating user feedback as constraints. They also show GBS enables significant gains in domain adaptation without retraining by using terminology from the target domain as constraints.
Presented at #PHPLX 11 September 2013
The 2013 edition of OWASP (Open Web Application Security Project) top 10 has just been released and unfortunately Injections (not only SQL injection) is still the most common security problem. In this talk we will review the top 10 list of security problems looking at possible attack scenarios and ways to protect against them mostly from a PHP programmer perspective.
The project that we have undertaken aims to develop a 2D Adventures game. The
project “Game Development (Platformer Game Development)” includes the following
functionalities:
Realistic Experience
2D view
Good Graphics
Faster performance
Offline Game
No need for the Internet
Presented at #PHPLX 11 September 2013
The 2013 edition of OWASP (Open Web Application Security Project) top 10 has just been released and unfortunately Injections (not only SQL injection) is still the most common security problem. In this talk we will review the top 10 list of security problems looking at possible attack scenarios and ways to protect against them mostly from a PHP programmer perspective.
The project that we have undertaken aims to develop a 2D Adventures game. The
project “Game Development (Platformer Game Development)” includes the following
functionalities:
Realistic Experience
2D view
Good Graphics
Faster performance
Offline Game
No need for the Internet
Conformer, Gulati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition." arXiv preprint arXiv:2005.08100 (2020). review by June-Woo Kim
My slides for Connectionist Temporal Classification (CTC) for automatic speech recognition (ASR) for an end-to-end (E2E) ASR speech recognition seminar at Aalto University spring 2020.
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
We develop a polygonal mesh simplification algorithm based on a novel analysis of the mesh
geometry.
Particularly, we propose first a characterization of vertices as hyperbolic or non
-
hyperbolic depend
-
ing
upon their discrete local geometry. Subsequently, the simplification process computes a volume cost for
each non
-
hyperbolic vertex, in anal
-
ogy with spherical volume, to capture the loss of fidelity if that vertex
is decimated. Vertices of least volume cost are then successively deleted and the resulting holes re
-
triangulated using a method based on a novel heuristic. Preliminary experiments i
ndicate a performance
comparable to that of the best known mesh simplification algorithms
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Once-for-All: Train One Network and Specialize it for Efficient Deployment 라는 제목의 논문입니다.
모델을 실제로 하드웨어에 Deploy하는 그 상황을 보고 있는데 이 페이퍼에서 꼽고 있는 가장 큰 문제는 실제로 트레인한 모델을 Deploy할 하드웨어 환경이 너무나도 많다는 문제가 하나 있습니다 모든 디바이스가 갖고 있는 리소스가 다르기 때문에 모든 하드웨어에 맞는 모델을 찾기가 사실상 불가능하다는 문제를 꼽고 있고요
각 하드웨어에 맞는 옵티멀한 네트워크 아키텍처가 모두 다른 상황에서 어떻게 해야 될건지에 대한 고민이 일반적 입니다. 이제 할 수 있는 접근중에 하나는 각 하드웨어에 맞게 옵티멀한 아키텍처를 모두 다 찾는 건데 그게 사실상 너무나 많은 계산량을 요구하기 때문에 불가능하다라는 문제를 갖고 있습니다 삼성 노트 10을 예로 한 어플리케이션의 requirement가 20m/s로 그 모델을 돌려야 된다는 요구사항이 있으면은 그 20m/s 안에 돌 수 있는 모델이 뭔지 accuracy가 뭔지 이걸 찾기 위해서는 파란색 점들을 모두 찾아야 되고 각 점이 이제 트레이닝 한번을 의미하게 됩니다 그래서 사실상 다 수의 트레이닝을 다 해야지만 그 중에 뭐가 최적인지 또 찾아야 합니다. 실제 Deploy해야 되는 시나리오가 늘어나면 이게 리니어하게 증가하기 때문에
각 하드웨어에 맞는 그런 옵티멀 네트워크를 찾는게 사실상 불가능합니다.
그래서 이제 OFA에서 제안하는 어프로치는 하나의 네트워크를 한번 트레이닝 하고 나면 다시 하드웨어에 맞게 트레이닝할 필요 없이 그냥 각 환경에 맞게 가져다 쓸 수 있는 서브네트워크를 쓰면 된다 이게 주로 메인으로 사용하고 있는 어프로치입니다.
오늘 논문 리뷰를 위해 펀디멘탈팀 김동현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
In Matrix multiplication we refer to a concept that is used in technology applications such as digital image processing, digital signal processing and graph problem solving. Multiplication of huge matrices requires a lot of computing time as its complexity is O n3 . Because most engineering science applications require higher computational throughput with minimum time, many sequential and analogue algorithms are developed. In this paper, methods of matrix multiplication are elect, implemented, and analyzed. A performance analysis is evaluated, and some recommendations are given when using open MP and MPI methods of parallel of latitude computing. Adamu Abubakar I | Oyku A | Mehmet K | Amina M. Tako ""Comprehensive Performance Evaluation on Multiplication of Matrices using MPI""
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30015.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/30015/comprehensive-performance-evaluation-on-multiplication-of-matrices-using-mpi/adamu-abubakar-i
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Exploiting Monolingual Data at Scale for Neural Machine TranslationSatoru Katsumata
- TMU Komachi lab
- paper reading (EMNLP2019)
- Exploiting Monolingual Data at Scale for Neural Machine Translation. Wu et al., EMNLP2019
- paper URL: https://www.aclweb.org/anthology/D19-1430.pdf
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...Satoru Katsumata
- TMU Komachi lab
- paper reading (ACL2019)
- Incorporating Syntactic and Semantic Information in
Word Embeddings using Graph Convolutional Networks. Vashishth et al., ACL2019
- paper URL: https://www.aclweb.org/anthology/P19-1320.pdf
More Related Content
Similar to Lexically constrained decoding for sequence generation using grid beam search
Conformer, Gulati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition." arXiv preprint arXiv:2005.08100 (2020). review by June-Woo Kim
My slides for Connectionist Temporal Classification (CTC) for automatic speech recognition (ASR) for an end-to-end (E2E) ASR speech recognition seminar at Aalto University spring 2020.
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
We develop a polygonal mesh simplification algorithm based on a novel analysis of the mesh
geometry.
Particularly, we propose first a characterization of vertices as hyperbolic or non
-
hyperbolic depend
-
ing
upon their discrete local geometry. Subsequently, the simplification process computes a volume cost for
each non
-
hyperbolic vertex, in anal
-
ogy with spherical volume, to capture the loss of fidelity if that vertex
is decimated. Vertices of least volume cost are then successively deleted and the resulting holes re
-
triangulated using a method based on a novel heuristic. Preliminary experiments i
ndicate a performance
comparable to that of the best known mesh simplification algorithms
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
People talk about a Moore's Law for gene sequencing, a Moore's Law for software, etc. This is talk is about *the* Moore's Law, the bull that the other "Laws" ride; and how Python-powered ML helps drive it. How do we keep making ever-smaller devices? How do we harness atomic-scale physics? Large-scale machine learning is key. The computation drives new chip designs, and those new chip designs are used for new computations, ad infinitum. High-dimensional regression, classification, active learning, optimization, ranking, clustering, density estimation, scientific visualization, massively parallel processing -- it all comes into play, and Python is powering it all.
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Once-for-All: Train One Network and Specialize it for Efficient Deployment 라는 제목의 논문입니다.
모델을 실제로 하드웨어에 Deploy하는 그 상황을 보고 있는데 이 페이퍼에서 꼽고 있는 가장 큰 문제는 실제로 트레인한 모델을 Deploy할 하드웨어 환경이 너무나도 많다는 문제가 하나 있습니다 모든 디바이스가 갖고 있는 리소스가 다르기 때문에 모든 하드웨어에 맞는 모델을 찾기가 사실상 불가능하다는 문제를 꼽고 있고요
각 하드웨어에 맞는 옵티멀한 네트워크 아키텍처가 모두 다른 상황에서 어떻게 해야 될건지에 대한 고민이 일반적 입니다. 이제 할 수 있는 접근중에 하나는 각 하드웨어에 맞게 옵티멀한 아키텍처를 모두 다 찾는 건데 그게 사실상 너무나 많은 계산량을 요구하기 때문에 불가능하다라는 문제를 갖고 있습니다 삼성 노트 10을 예로 한 어플리케이션의 requirement가 20m/s로 그 모델을 돌려야 된다는 요구사항이 있으면은 그 20m/s 안에 돌 수 있는 모델이 뭔지 accuracy가 뭔지 이걸 찾기 위해서는 파란색 점들을 모두 찾아야 되고 각 점이 이제 트레이닝 한번을 의미하게 됩니다 그래서 사실상 다 수의 트레이닝을 다 해야지만 그 중에 뭐가 최적인지 또 찾아야 합니다. 실제 Deploy해야 되는 시나리오가 늘어나면 이게 리니어하게 증가하기 때문에
각 하드웨어에 맞는 그런 옵티멀 네트워크를 찾는게 사실상 불가능합니다.
그래서 이제 OFA에서 제안하는 어프로치는 하나의 네트워크를 한번 트레이닝 하고 나면 다시 하드웨어에 맞게 트레이닝할 필요 없이 그냥 각 환경에 맞게 가져다 쓸 수 있는 서브네트워크를 쓰면 된다 이게 주로 메인으로 사용하고 있는 어프로치입니다.
오늘 논문 리뷰를 위해 펀디멘탈팀 김동현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
Comprehensive Performance Evaluation on Multiplication of Matrices using MPIijtsrd
In Matrix multiplication we refer to a concept that is used in technology applications such as digital image processing, digital signal processing and graph problem solving. Multiplication of huge matrices requires a lot of computing time as its complexity is O n3 . Because most engineering science applications require higher computational throughput with minimum time, many sequential and analogue algorithms are developed. In this paper, methods of matrix multiplication are elect, implemented, and analyzed. A performance analysis is evaluated, and some recommendations are given when using open MP and MPI methods of parallel of latitude computing. Adamu Abubakar I | Oyku A | Mehmet K | Amina M. Tako ""Comprehensive Performance Evaluation on Multiplication of Matrices using MPI""
Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-2 , February 2020,
URL: https://www.ijtsrd.com/papers/ijtsrd30015.pdf
Paper Url : https://www.ijtsrd.com/engineering/electrical-engineering/30015/comprehensive-performance-evaluation-on-multiplication-of-matrices-using-mpi/adamu-abubakar-i
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Exploiting Monolingual Data at Scale for Neural Machine TranslationSatoru Katsumata
- TMU Komachi lab
- paper reading (EMNLP2019)
- Exploiting Monolingual Data at Scale for Neural Machine Translation. Wu et al., EMNLP2019
- paper URL: https://www.aclweb.org/anthology/D19-1430.pdf
Incorporating Syntactic and Semantic Information in Word Embeddings using Gra...Satoru Katsumata
- TMU Komachi lab
- paper reading (ACL2019)
- Incorporating Syntactic and Semantic Information in
Word Embeddings using Graph Convolutional Networks. Vashishth et al., ACL2019
- paper URL: https://www.aclweb.org/anthology/P19-1320.pdf
How Contextual are Contextualized Word Representations?Satoru Katsumata
- TMU Komachi lab
- paper reading
- How Contextual are Contextualized Word Representations? Kawin Ethayarajh. EMNLP2019
- paper URL: https://arxiv.org/abs/1909.00512
- 首都大 小町研 NAACL 2019 読み会
- Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences. Sen et al., NAACL2019 を紹介
- paper URL: https://www.aclweb.org/anthology/N19-1109.pdf
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
2. Abstract
● They present Grid Beam Search (GBS), an algorithm which extends beam
search to allow the inclusion of pre-specified lexical constraints.
● They demonstrate the feasibility and flexibility of Lexically Constrained
Decoding by conducting 2 experiments.
○ Neural Interactive-Predictive Translation
○ Domain Adaptation for Neural Machine Translation
● Result
○ GBS can provide large improvements in translation quality in interactive scenarios.
○ even without any user input, GBS can be used to achieve significant gains in performance
in domain adaptation scenarios.
2
3. Introduction: What is this research?
● The output of many natural language processing model is a sequence of text.
→ Especially, they focus on machine translation usecases.†
○ such as Post-Editing (PE) and Interactive-Predictive MT
● Their goal is to find a way to force the output of a model to contain such lexical
constraints.
● In this paper, they propose a decoding algorithm which allows the specification
of subsequences to be present in a model’s output.
3
input model
output
(sequenc
e)
†: lexically constrained decoding is relevant to any
scenario where a model is asked to generate a
secuence given an input.
(image description, dialog generation, abstractive
summarization, question answering)
5. Grid Beam Search: Abstract
● their goal is to organize decoding in such a way that they can constrain the
search space to outputs which contain one or more pre-specified
sub-sequences.
○ want to place lexical constraints correctly and
to generate the parts of the output which are not covered by the constraints.
5time step (t)
Constraint
Coverage
(c)
constraints
is an array of sequences.
← constraints
= [[Ihre, Rechte, müssen],
[Abreise, geschützt]]
more detail on the next slides →
6. Grid Beam Search: Algorithm
6
numC: total number of tokens in all
constaints
k: beam window size
(this pseudo-code is for any model)
1. open hypotheses can either
generate from the model’s
distribution or start constraints
2. closed hypotheses can only
generate the next token for in a
currently unfinished constraint
1. open hypotheses (Grid[t-1][c]) may
generate continuations from the
model’s distribution
2. open hypotheses (Grid[t-1][c-1])
may start new constraints
3. closed hypotheses (Grid[t-1][c-1])
may continue constraints
7. Grid Beam Search: Deteil
7
time step (t)
Constraint
Coverage
(c)
a colored strip inside a beam box
represents an individual hypothesis
in the beam’s k-best stack.
Hypotheses with circles are closed,
all other hypotheses are open.
← conceptual drawing
8. Grid Beam Search: Multi-token Constraints
● By distinguishing between open and closed hypotheses,
→ allow for arbitrary multi-token phrases in the search.
● Each hypothesis maintains a coverage vector.
○ ensure that constraints cannot be repeated in a search path
(hypotheses which have already covered constrainti can only generate or start new constraints)
● Discontinuous lexical constraints such as phrasal verbs in En or Ge
○ it is easy to apply GBS by adding filters to the search ↓
8
constraint
ask <someone> out
constraint1 ask
constraint2 out
constraint1 cannot be
used before constraint0
there must be at least one
generated token between
the constraints.
use 2 filters
9. Grid Beam Search: Addressing unknown words
● their decoder can handle arbitrary constraints.
→ there is a risk that constraints will contain unknown tokens.
○ Especially in domain adaptation scenarios, some user-specified constraints are very likely to
contain unseen tokens.
→ Use Subword Units. (maybe Byte Pair Encoding...)
(In the experiments, they use this technique to ensure that no input tokens are unknown, even if a constraint contains
words which never appeared in the training data.)
9
10. Grid Beam Search: Effciency
● the runtime complexity of a naive implementation of GBS is O(ktc).
(standard time-based beam search is O(tk))
↓ some consideration must be given to the efficiency of this algorithm
● the beams in each column c of this Figure are independent.
→ GBS can be parallelized to allow all beams at each timestep to be filled
simultaneously.
● they find that the most time is spent
computing the states for the hyp candidates.
→ by keeping the beam size small,
they can make GBS significantly faster.
10
11. Experiments setup (including appendix)
● The models are SOTA NMT systems using our own implementation of NMT
with attention over the source sequence (Bahdnau et al., 2014).
○ bidirectional GRUs
○ AdaDelta
○ train all systems for 500,000 iterations, with validation every 5,000 steps,
best single model from validation is used.
○ use ℓ2 regularization (α = 1e^(-5))
○ Dropout is used on the output layers, rate is 0.5.
○ beam size is 10.
● corpus
○ English-German
■ 4.4M segments from the Europarl and CommonCrawl
○ English-French
■ 4.9M segments from the Europarl and CommonCrawl
○ English-Portuguese
■ 28.5M segments from the Europarl, JRC-Aquis and OpenSubtitles
● subword
○ they created a shared subword representation for each language pair
by extracting a vocabulary of 80,000 symbols from the concatenated source and target data. 11
12. Experiments: Pick-Revise for Interactive PE
● Pick-Revise is an interaction cycle for MT Post-Editing.
● they modify this experiments.
○ the user only provides sequences of up to three words
which are missing from the hypothesis from reference.
○ in this experiments, four cycles.
○ strict setting:
the complete phrase must be missing from the hypothesis.
○ relaxed setting:
only the first word must be missing.
○ when a 3-token phrase cannot be found,
they backoff to 2-token phrases, then to single tokens as constraints.
○ If a hypothesis already matches the reference, no constraints are added.
12fig1: PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng et al., NAACL HLT 2016
14. Experiments: Domain Adaptation via Terminology
● The requirement for use of domain-specific terminologies is common in real
world applications of MT.
→ provide the term as constraints.
● target domain data
○ Autodesk Post-Editing corpus
↑ focused upon sofware localization, very dfferent from train data.
● Extract Terminology
○ devide the corpus into approximately 100,000 training sents, 1,000 test segments.
automatically generate a terminology by computing PMI between source and target n-grams in
training set.
○ extract all n-grams from length 2-5 as terminology
candidates.
○ filter the candidates to only include pairs
whose PMI is >= 0.9 and where both src and tgt occur in at least 5 times.
→ make the set of the terminology.
14
normarized to
[-1, +1]
15. Results: Domain Adaptation via Terminology
● confirming that this improvements is caused
by the placement of the terms by GBS,
they evaluate the other inserting methods.
○ randomly inserting terms to the baseline output
○ prepending terms to the baseline output
● even an automatically created terminology
yield improvements.
○ manually created domain terminologies
are likely to lead to greater gains.
15
test set
src tgt
terminology
phrase
corrsponding
phrase
constraints
for that segmentsadd
↑ conceptual drawing for previous slide
17. related work
● Most related work has presented modifications of SMT systems for specific
usecases. (focusing on prefix and sufix)
○ The largest body of work considers Interactive Machine Translation (IMT)
● some attention has also been given to SMT decoding with multiple lexical
constraints. (Cheng et al.,2016)
○ their approach relies upon the phrase segmentation provided by the SMT system.
○ the decoding algorithm can only make use of constraints that match phrase boundaries.
In contrast, GBS approach decodes at the token level.
● “To the best of our knowledge, ours is the first work which considers
general lexically constrained decoding for any model which outputs
sequences, without relying upon alignments between input and output, and
without using a search organized by coverage of the input.”
17
18. Conclusion
● Lexically constrained decoding is a flexible way to incorporate arbitrary
subsequences into the output of any model that generates output sequences
token-by-token.
● in interctive translation, user inputs can be used as constraints,
generating a new output each time a user fixes an error.
○ such a workflow can provide a large improvement in translation quality.
● using a domain-specific terminology to generate target-side constraints
○ a general domain model can be adapted to a new domain without anay retraining
● future work
○ evaluate GBS with models outside of MT, such as automatic summarization, image captioning
or dialog generation
○ introduce new constraint-aware models (via secondary attention mecanisms over lexical
constraints) 18
19. reference
● their implementation of GBS
○ https://github.com/chrishokamp/constrained_decoding
● Kyoto University Participation to WAT 2016. Cromieres et al., WAT2016
● PRIMT: A Pick-Revise Framework for Interactive Machine Translation. Cheng
et al., NAACL HLT 2016
19