Chapter 5 : Information Extraction (IE)
and Machine Translation (MT)
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2020)
Outline
 Information Extraction (IE)
Named entity recognition and relation extraction
IE using sequence labeling
 Machine Translation (MT)
Basic issues in MT
Statistical translation
word alignment
phrase-based translation
Synchronous grammars
3/29/2024 2
Information Extraction (IE)
 Information Extraction, which is an area of natural language
processing, deals with finding factual information in free text.
 In formal terms, facts are structured objects, such as database
records.
 Such a record may capture a real-world entity with its attributes
mentioned in text, or a real-world event, occurrence, or state,
with its arguments or actors: who did what to whom, where and
when.
3/29/2024 3
Information Extraction (IE) …
 Information is typically sought in a particular target setting,
e.g., corporate mergers and acquisitions.
 Searching for specific, targeted factual information constitutes a
large proportion of all searching activity on the part of
information consumers.
 There has been a sustained interest in Information Extraction
due to its conceptual simplicity on one hand, and to its potential
utility on the other.
 Although the targeted nature of this task makes it more tractable
than some of the more open-ended tasks in NLP, it is replete
with challenges as the information landscape evolves, which
also makes it an exciting research subject,
3/29/2024 4
Information Extraction (IE) …
 The task of Information Extraction (IE) is to identify a
predefined set of concepts in a specific domain, ignoring other
irrelevant information, where a domain consists of a corpus of
texts together with a clearly specified information need.
 In other words, IE is about deriving structured factual
information from unstructured text.
For instance, consider as an example the extraction of information
on violent events from online news, where one is interested in
identifying the main actors of the event, its location and number
of people affected.
3/29/2024 5
Information Extraction (IE) …
 Example:
The figure below shows an example of a text snippet from a news
article about a terrorist attack and a structured information
derived from that snippet.
“Three bombs have exploded in north-eastern Nigeria, killing 25
people and wounding 12 in an attack carried out by an Islamic
sect. Authorities said the bombs exploded on Sunday afternoon in
the city of Maiduguri.”
3/29/2024 6
Information Extraction (IE) …
 Information extraction (IE) systems
Find and understand limited relevant parts of texts.
Gather information from many pieces of text.
Produce a structured representation of relevant information:
relations (in the database sense), a.k.a., a knowledge base.
 Goals:
1.Organize information so that it is useful to people.
2.Put information in a semantically precise form that allows
further inferences to be made by computer algorithms.
3/29/2024 7
Information Extraction (IE) …
 Tasks of information extraction
Named entity recognition
Co-reference Resolution (CO) requires the identification of
multiple (coreferring) mentions of the same entity in the text
Relation Extraction (RE) is the task of detecting and classifying
predefined relationships between entities identified in text
Event Extraction (EE) refers to the task of identifying events in
free text and deriving detailed and structured information about
them, ideally identifying who did what to whom, when, where,
through what methods (instruments), and why
3/29/2024 8
Information Extraction (IE) …
 Named Entity Recognition:
A very important sub-task: find and classify names in text,
For example: The decision by the independent MP Andrew
Wilkie to withdraw his support for the minority Labor
government sounded dramatic but it should not further threaten its
stability. When, after the 2010 election, Wilkie, Rob Oakeshott,
Tony Windsor and the Greens agreed to support Labor, they gave
just two guarantees: confidence and supply.
3/29/2024 9
Information Extraction (IE) …
 Named Entity Recognition:
A very important sub-task: find and classify names in text.
For example: The decision by the independent MP Andrew
Wilkie to withdraw his support for the minority Labor
government sounded dramatic but it should not further threaten its
stability. When, after the 2010 election, Wilkie, Rob Oakeshott,
Tony Windsor and the Greens agreed to support Labor, they gave
just two guarantees: confidence and supply.
3/29/2024 10
Information Extraction (IE) …
 Named Entity Recognition:
A very important sub-task: find and classify
names in text.
For example: The decision by the independent
MP Andrew Wilkie to withdraw his support for
the minority Labor government sounded
dramatic but it should not further threaten its
stability. When, after the 2010 election, Wilkie,
Rob Oakeshott, Tony Windsor and the Greens
agreed to support Labor, they gave just two
guarantees: confidence and supply.
3/29/2024 11
Person
Date
Location
Organization
Information Extraction (IE) …
 Named Entity Recognition:
The uses:
Named entities can be indexed, linked off, etc.
Sentiment can be attributed to companies or products
A lot of IE relations are associations between named entities
For question answering, answers are often named entities
Concretely:
Many web pages tag various entities, with links to bio or topic pages,
etc.
Reuters’ OpenCalais, Evri, AlchemyAPI, Yahoo’s Term Extraction,
…
Apple/Google/Microsoft/… smart recognizers for document content
3/29/2024 12
Information Extraction (IE) …
 Named Entity Recognition Task:
Task: Predict entities in a text
3/29/2024 13
Information Extraction (IE) …
 Named Entity Recognition Task:
Three standard approaches to NER (and IE)
Hand-written regular expressions
Perhaps stacked
Using classifiers
Generative: Naïve Bayes
Discriminative: Maxent models
Sequence models
HMMs
MEMMs
3/29/2024 14
Information Extraction (IE) …
 Relation Extraction (RE)
Relation Extraction (RE) is the task of detecting and classifying
predefined relationships between entities identified in text.
For example:
Employee Of(Steve Jobs, Apple): a relation between a person and an
organisation, extracted from ‘Steve Jobs works for Apple’
Located In(Smith, New York): a relation between a person and location,
extracted from ‘Mr. Smith gave a talk at the conference in New York’,
Subsidiary Of(TVN,ITI Holding): a relation between two companies,
extracted from ‘Listed broadcaster TVN said its parent company, ITI
Holdings, is considering various options for the potential sale.
Note, although in general the set of relations that may be of
interest is unlimited, the set of relations within a given task is
predefined and fixed, as part of the specification of the task.
3/29/2024 15
Information Extraction (IE) …
 IE using sequence labeling
Many information extraction tasks can be formulated as sequence
labeling tasks. Sequence labelers assign a class label to each item
in a sequential structure
Sequence labeling methods are appropriate for problems where
the class of an item depends on other (typically nearby) items in
the sequence
Examples of sequential labeling tasks: part-of-speech tagging,
syntactic chunking (Break in to pieces), named entity recognition
A naive approach would consider all possible label sequences and
choose the best one. But that is too expensive, we need more
efficient methods.
3/29/2024 16
Information Extraction (IE) …
 IE using sequence labeling …
Markov Models
A Markov Chain is a finite-state automaton that has a probability
associated with each transition (arc), where the input uniquely
defines the transitions that can be taken.
In a first-order Markov chain, the probability of a state depends only
on the previous state, where qi Q are states:
Markov Assumption: P(qi | q1...qi−1) = P(qi | qi−1)
The probabilities of all of the outgoing arcs of a state must
sum to 1.
The Markov chain can be traversed to compute the probability of a
particular sequence of labels.
3/29/2024 17
Machine Translation
 Machine translation, is a sub-field of computational linguistics
that investigates the use of software to translate text or speech
from one language to another.
 On a basic level, MT performs simple substitution of words in
one language for words in another, but that alone usually cannot
produce a good translation of a text because recognition of
whole phrases and their closest counterparts in the target
language is needed.
 Solving this problem with corpus statistical, and neural
techniques is a rapidly growing field that is leading to better
translations, handling differences in linguistic typology,
translation of idioms, and the isolation of anomalies.
3/29/2024 18
Machine Translation…
 Current machine translation software often allows for
customization by domain or profession (such as weather
reports), improving output by limiting the scope of allowable
substitutions.
 This technique is particularly effective in domains where formal
or formulaic language is used.
 It follows that machine translation of government and legal
documents more readily produces usable output than
conversation or less standardized text.
3/29/2024 19
Machine Translation…
 Improved output quality can also be achieved by human
intervention:
 For example, some systems are able to translate more accurately
if the user has unambiguously identified which words in the text
are proper names.
 With the assistance of these techniques, MT has proven useful as
a tool to assist human translators and, in a very limited number of
cases, can even produce output that can be used as it is (e.g.,
weather reports).
3/29/2024 20
Machine Translation…
 The progress and potential of machine translation have been
debated much through its history.
 Since the 1950s, a number of scholars have questioned the
possibility of achieving fully automatic machine translation of
high quality.
 Some critics claim that there are in-principle obstacles to
automating the translation process.
3/29/2024 21
Machine Translation…
 In building MT models, there are two major problems that need
to be addressed:
 Word Order: Translation is normally done at the sentence-level∗
and it might very well be that the last token in the source sentence
is the key informant to the first token in the target sentence.
 Word Choice: Each source token can be represented in the target
language in a variety of ways.
 These two problems are not independent and the order in which
source tokens are translated directly affects which words
might be used in the output sentence.
3/29/2024 22
Machine Translation…
 Example: Arabic–English MT test
 Arabic: Ezp AbrAhym ystqbl ms&wlA AqtSAdyA sEwdyA fy
bgdAd
 English1: Izzet Ibrahim Meets Saudi Trade Official in Baghdad
 English2: Izzat Ibrahim Welcomes a Saudi Economic Official to
Baghdad
 English3: A Saudi Arabian Economic Official is welcomed by
Izzat Ibrahim to Baghdad
3/29/2024 23
Machine Translation…
 While the first two translations in English are official
translations, the last one is rewritten as an example; here, the
Romanized Arabic word “ystqbl” gives rise to the passive
construction “is welcomed by.”
 All three references essentially capture the meaning of the
sentence in Arabic and the order of the translation leads to
different choices for the target words.
3/29/2024 24
Machine Translation…
 Machine Translation can be viewed as taking the source
sequence S and performing increasing amounts of analysis as
suggested by the pyramid shown in the Figure below.
3/29/2024 25
 At the base of the pyramid, words
can be transferred from the source to
target language. As we go up the
pyramid, the level of sophistication
increases and at the very top, we
have some representation of the
meaning and the meaning can be cast
as words in either language.
Machine Translation…
 The early measures of MT included ‘Adequacy’ and ‘Fluency’.
 Adequacy: Does the translation capture an adequate amount of the
meaning of the sentence in the source language?
 Fluency: Is the translation fluent in English?
3/29/2024 26
Machine Translation…
 The list of MT metrics utilized currently is quite long and the
major alternatives are:
 (a) Translation Error Rate (TER)
 Most major MT evaluations in addition to automatic methods,
utilize human editors to edit the system outputs and compute TER
of the system output relative to the edited string, which is termed
Human-TER or HTER.
 (b) METEOR
 Automatic evaluation metric
3/29/2024 27
Machine Translation…
 Statistical Machine Translation (SMT)
 Warren Weaver’s memorandum (Weaver, 1955) clearly initiated
ideas in the statistical approach to MT.
 However, it was the pioneering work of the IBM group (Brown et
al., 1993) in the early 1990s that led to the renewed and sustained
interest in the statistical approach to MT.
 While initial efforts in SMT were mostly word-based, almost all
approaches now use phrases as their basic unit of translation.
 In addition, natural language parsers have been developed and
this has led to both Syntax and Hierarchical-based approaches.
3/29/2024 28
Machine Translation…
 Statistical Machine Translation (SMT)…
 Statistical techniques for MT are now pervasive. Statistical
machine translation (SMT) takes a source sequence, S = [s1 s2 . . .
sK], and generates a target sequence, T∗ = [t1 t2 . . . tL], by
finding the most likely translation given by:
 SMT is then concerned with making models for p(T|S) and
subsequently searching the space of all target strings to find the
optimal string given the source and the model.
3/29/2024 29
Machine Translation…
 Statistical Machine Translation (SMT)…
 This approach shares much with speech recognition. Both are
sequence prediction problems and many of the tools developed
for speech recognition can be applied in SMT also.
 Equivalent approach for SMT is the direct model where we
combine log-linearly various models.∗
 The SMT problem can be stated as searching for the target string
that maximizes the joint model, p(T, S), and either the Bayes
expansion is equivalent.
3/29/2024 30
Machine Translation…
 Word Alignment
 The general problem of aligning a parallel text is, more precisely,
to find its optimal parallel segmentation or bisegmentation under
some set of constraints:
3/29/2024 31
Machine Translation…
 Word Alignment
 The general problem of aligning a parallel text is, more precisely,
to find its optimal parallel segmentation or bisegmentation under
some set of constraints:
3/29/2024 32
Machine Translation…
 Word Alignment…
 Fundamentally, an alignment algorithm accepts as input a bitext
and produces as output a bisegmentation relation that identifies
corresponding segments between the texts.
 A bitext consists of two texts that are translations of each other. ∗
 ∗ In a “Terminological note” prefacing his book, Veronis (2000)
cites Alan Melby pointing out that the alternative term parallel
text creates an unfortunate and confusing clash with the
translation theory and terminological community, who use the
same term instead to mean what NLP and computational
linguistics researchers typically refer to as non-parallel corpora or
comparable corpora—texts in different languages from the same
domain, but not necessarily translations of each other.
3/29/2024 33
Machine Translation…
 Word Alignment…
 Bitext alignment fundamentally lies at the heart of all data-driven
machine translation methods, and the rapid research progress on
alignment reflects the advent of statistical machine translation
(SMT) and example-based machine translation (EBMT)
approaches.
 Yet the importance of alignment extends as well to many other
practical applications for translators, bilingual lexicographers, and
even ordinary readers.
3/29/2024 34
Machine Translation…
 Word Alignment…
 Automatically learned resources for MT, NLP, and humans
 Bitext alignment methods are the core of many methods for machine
learning of language resources to be used by SMT or other NLP
applications, as well as human translators and linguists.
 The side effects of alignment are often of more interest than the
aligned text itself.
3/29/2024 35
Machine Translation…
 Word Alignment…
 Automatically learned resources for MT, NLP, and humans…
 Alignment algorithms offer the possibility of extracting various
sorts of knowledge resources, such as
 (a) phrasal bilexicons listing word or collocation translations;
 (b) translation examples at the sentence, constituent, and/or phrase
level; or
 (c) tree-structured translation patterns such as transfer rules, translation
frames, or treelets.
 Such resources constitute a database that may be used by SMT and
EBMT systems, or they may be taken as training data for further
machine learning to mine deeper patterns.
 Alignment has also been employed to infer sentence bracketing or
constituent structure as a side effect.
3/29/2024 36
Machine Translation…
 Word Alignment…
 Techniques have been developed for aligning segments at various
granularities: documents, paragraphs, sentences, constituents,
collocations or phrases, words, and characters.
3/29/2024 37
Machine Translation…
 Word Alignment…
 Given a pair of sentences, word alignment produces a
correspondence at the word level.
 Example: alignment for an Arabic–English sentence pair is shown
in the figure below where the Arabic has been Romanized.
3/29/2024 38
Machine Translation…
 Word Alignment…
 Given a pair of sentences, word alignment produces a
correspondence at the word level.
 In this example, English words are being aligned to their Arabic
informants.
 The Arabic sentence has been segmented following the style of the
Arabic Treebank (available from the Linguistic Data Consortium as
Catalog Id LDC2007E65).
3/29/2024 39
Machine Translation…
 Word Alignment…
 The first Arabic word, “w#,” has no English informant and is
aligned to the null-cept (the string “e_0” is used to represent
the null-cept), which is an imaginary English word that gives
rise to spontaneous Arabic words.
 Multi-word Arabic alignments, such as those at positions 8 and
9, are done after the alignment process.
 The Arabic word, “mrAkz” is an example of a split alignment.
 Numbers are replaced by a class label (in this example, ‘$num’
is used)
3/29/2024 40
Machine Translation…
 Word Alignment…
 Table: Phrase Library Arabic Parse Example
3/29/2024 41
Machine Translation…
 Phase-Based Translation
 The Alignment Template (AT) approach for MT was a departure
from the style of the word-based generative models of IBM, and
together with a training method is the basis for most of the
phrase-based MT systems used today.
 Phrase-based systems are the workhorse of SMT systems due to
their simple and relatively straight forward method of extracting
phrase libraries and training weights.
 Systems for new language pairs that have parallel corpora can be
built by utilizing the GIZA++ toolkit for generating word-
alignments and the open source phrase decoders such as Moses.
3/29/2024 42
Question & Answer
3/29/2024 43
Thank You !!!
3/29/2024 44
Group Assignment - Two
 Discuss the following three approaches :
 Group- One: Monotonic Alignment for Words
 Group-Two: Non-Monotonic Alignment for Single-Token
Words
 Group-Three: Non-Monotonic Alignment for Multi-token
Words and Phrases
3/29/2024 45

5-Information Extraction (IE) and Machine Translation (MT).ppt

  • 1.
    Chapter 5 :Information Extraction (IE) and Machine Translation (MT) Adama Science and Technology University School of Electrical Engineering and Computing Department of CSE Dr. Mesfin Abebe Haile (2020)
  • 2.
    Outline  Information Extraction(IE) Named entity recognition and relation extraction IE using sequence labeling  Machine Translation (MT) Basic issues in MT Statistical translation word alignment phrase-based translation Synchronous grammars 3/29/2024 2
  • 3.
    Information Extraction (IE) Information Extraction, which is an area of natural language processing, deals with finding factual information in free text.  In formal terms, facts are structured objects, such as database records.  Such a record may capture a real-world entity with its attributes mentioned in text, or a real-world event, occurrence, or state, with its arguments or actors: who did what to whom, where and when. 3/29/2024 3
  • 4.
    Information Extraction (IE)…  Information is typically sought in a particular target setting, e.g., corporate mergers and acquisitions.  Searching for specific, targeted factual information constitutes a large proportion of all searching activity on the part of information consumers.  There has been a sustained interest in Information Extraction due to its conceptual simplicity on one hand, and to its potential utility on the other.  Although the targeted nature of this task makes it more tractable than some of the more open-ended tasks in NLP, it is replete with challenges as the information landscape evolves, which also makes it an exciting research subject, 3/29/2024 4
  • 5.
    Information Extraction (IE)…  The task of Information Extraction (IE) is to identify a predefined set of concepts in a specific domain, ignoring other irrelevant information, where a domain consists of a corpus of texts together with a clearly specified information need.  In other words, IE is about deriving structured factual information from unstructured text. For instance, consider as an example the extraction of information on violent events from online news, where one is interested in identifying the main actors of the event, its location and number of people affected. 3/29/2024 5
  • 6.
    Information Extraction (IE)…  Example: The figure below shows an example of a text snippet from a news article about a terrorist attack and a structured information derived from that snippet. “Three bombs have exploded in north-eastern Nigeria, killing 25 people and wounding 12 in an attack carried out by an Islamic sect. Authorities said the bombs exploded on Sunday afternoon in the city of Maiduguri.” 3/29/2024 6
  • 7.
    Information Extraction (IE)…  Information extraction (IE) systems Find and understand limited relevant parts of texts. Gather information from many pieces of text. Produce a structured representation of relevant information: relations (in the database sense), a.k.a., a knowledge base.  Goals: 1.Organize information so that it is useful to people. 2.Put information in a semantically precise form that allows further inferences to be made by computer algorithms. 3/29/2024 7
  • 8.
    Information Extraction (IE)…  Tasks of information extraction Named entity recognition Co-reference Resolution (CO) requires the identification of multiple (coreferring) mentions of the same entity in the text Relation Extraction (RE) is the task of detecting and classifying predefined relationships between entities identified in text Event Extraction (EE) refers to the task of identifying events in free text and deriving detailed and structured information about them, ideally identifying who did what to whom, when, where, through what methods (instruments), and why 3/29/2024 8
  • 9.
    Information Extraction (IE)…  Named Entity Recognition: A very important sub-task: find and classify names in text, For example: The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply. 3/29/2024 9
  • 10.
    Information Extraction (IE)…  Named Entity Recognition: A very important sub-task: find and classify names in text. For example: The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply. 3/29/2024 10
  • 11.
    Information Extraction (IE)…  Named Entity Recognition: A very important sub-task: find and classify names in text. For example: The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded dramatic but it should not further threaten its stability. When, after the 2010 election, Wilkie, Rob Oakeshott, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply. 3/29/2024 11 Person Date Location Organization
  • 12.
    Information Extraction (IE)…  Named Entity Recognition: The uses: Named entities can be indexed, linked off, etc. Sentiment can be attributed to companies or products A lot of IE relations are associations between named entities For question answering, answers are often named entities Concretely: Many web pages tag various entities, with links to bio or topic pages, etc. Reuters’ OpenCalais, Evri, AlchemyAPI, Yahoo’s Term Extraction, … Apple/Google/Microsoft/… smart recognizers for document content 3/29/2024 12
  • 13.
    Information Extraction (IE)…  Named Entity Recognition Task: Task: Predict entities in a text 3/29/2024 13
  • 14.
    Information Extraction (IE)…  Named Entity Recognition Task: Three standard approaches to NER (and IE) Hand-written regular expressions Perhaps stacked Using classifiers Generative: Naïve Bayes Discriminative: Maxent models Sequence models HMMs MEMMs 3/29/2024 14
  • 15.
    Information Extraction (IE)…  Relation Extraction (RE) Relation Extraction (RE) is the task of detecting and classifying predefined relationships between entities identified in text. For example: Employee Of(Steve Jobs, Apple): a relation between a person and an organisation, extracted from ‘Steve Jobs works for Apple’ Located In(Smith, New York): a relation between a person and location, extracted from ‘Mr. Smith gave a talk at the conference in New York’, Subsidiary Of(TVN,ITI Holding): a relation between two companies, extracted from ‘Listed broadcaster TVN said its parent company, ITI Holdings, is considering various options for the potential sale. Note, although in general the set of relations that may be of interest is unlimited, the set of relations within a given task is predefined and fixed, as part of the specification of the task. 3/29/2024 15
  • 16.
    Information Extraction (IE)…  IE using sequence labeling Many information extraction tasks can be formulated as sequence labeling tasks. Sequence labelers assign a class label to each item in a sequential structure Sequence labeling methods are appropriate for problems where the class of an item depends on other (typically nearby) items in the sequence Examples of sequential labeling tasks: part-of-speech tagging, syntactic chunking (Break in to pieces), named entity recognition A naive approach would consider all possible label sequences and choose the best one. But that is too expensive, we need more efficient methods. 3/29/2024 16
  • 17.
    Information Extraction (IE)…  IE using sequence labeling … Markov Models A Markov Chain is a finite-state automaton that has a probability associated with each transition (arc), where the input uniquely defines the transitions that can be taken. In a first-order Markov chain, the probability of a state depends only on the previous state, where qi Q are states: Markov Assumption: P(qi | q1...qi−1) = P(qi | qi−1) The probabilities of all of the outgoing arcs of a state must sum to 1. The Markov chain can be traversed to compute the probability of a particular sequence of labels. 3/29/2024 17
  • 18.
    Machine Translation  Machinetranslation, is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.  On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed.  Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies. 3/29/2024 18
  • 19.
    Machine Translation…  Currentmachine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions.  This technique is particularly effective in domains where formal or formulaic language is used.  It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardized text. 3/29/2024 19
  • 20.
    Machine Translation…  Improvedoutput quality can also be achieved by human intervention:  For example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are proper names.  With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as it is (e.g., weather reports). 3/29/2024 20
  • 21.
    Machine Translation…  Theprogress and potential of machine translation have been debated much through its history.  Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality.  Some critics claim that there are in-principle obstacles to automating the translation process. 3/29/2024 21
  • 22.
    Machine Translation…  Inbuilding MT models, there are two major problems that need to be addressed:  Word Order: Translation is normally done at the sentence-level∗ and it might very well be that the last token in the source sentence is the key informant to the first token in the target sentence.  Word Choice: Each source token can be represented in the target language in a variety of ways.  These two problems are not independent and the order in which source tokens are translated directly affects which words might be used in the output sentence. 3/29/2024 22
  • 23.
    Machine Translation…  Example:Arabic–English MT test  Arabic: Ezp AbrAhym ystqbl ms&wlA AqtSAdyA sEwdyA fy bgdAd  English1: Izzet Ibrahim Meets Saudi Trade Official in Baghdad  English2: Izzat Ibrahim Welcomes a Saudi Economic Official to Baghdad  English3: A Saudi Arabian Economic Official is welcomed by Izzat Ibrahim to Baghdad 3/29/2024 23
  • 24.
    Machine Translation…  Whilethe first two translations in English are official translations, the last one is rewritten as an example; here, the Romanized Arabic word “ystqbl” gives rise to the passive construction “is welcomed by.”  All three references essentially capture the meaning of the sentence in Arabic and the order of the translation leads to different choices for the target words. 3/29/2024 24
  • 25.
    Machine Translation…  MachineTranslation can be viewed as taking the source sequence S and performing increasing amounts of analysis as suggested by the pyramid shown in the Figure below. 3/29/2024 25  At the base of the pyramid, words can be transferred from the source to target language. As we go up the pyramid, the level of sophistication increases and at the very top, we have some representation of the meaning and the meaning can be cast as words in either language.
  • 26.
    Machine Translation…  Theearly measures of MT included ‘Adequacy’ and ‘Fluency’.  Adequacy: Does the translation capture an adequate amount of the meaning of the sentence in the source language?  Fluency: Is the translation fluent in English? 3/29/2024 26
  • 27.
    Machine Translation…  Thelist of MT metrics utilized currently is quite long and the major alternatives are:  (a) Translation Error Rate (TER)  Most major MT evaluations in addition to automatic methods, utilize human editors to edit the system outputs and compute TER of the system output relative to the edited string, which is termed Human-TER or HTER.  (b) METEOR  Automatic evaluation metric 3/29/2024 27
  • 28.
    Machine Translation…  StatisticalMachine Translation (SMT)  Warren Weaver’s memorandum (Weaver, 1955) clearly initiated ideas in the statistical approach to MT.  However, it was the pioneering work of the IBM group (Brown et al., 1993) in the early 1990s that led to the renewed and sustained interest in the statistical approach to MT.  While initial efforts in SMT were mostly word-based, almost all approaches now use phrases as their basic unit of translation.  In addition, natural language parsers have been developed and this has led to both Syntax and Hierarchical-based approaches. 3/29/2024 28
  • 29.
    Machine Translation…  StatisticalMachine Translation (SMT)…  Statistical techniques for MT are now pervasive. Statistical machine translation (SMT) takes a source sequence, S = [s1 s2 . . . sK], and generates a target sequence, T∗ = [t1 t2 . . . tL], by finding the most likely translation given by:  SMT is then concerned with making models for p(T|S) and subsequently searching the space of all target strings to find the optimal string given the source and the model. 3/29/2024 29
  • 30.
    Machine Translation…  StatisticalMachine Translation (SMT)…  This approach shares much with speech recognition. Both are sequence prediction problems and many of the tools developed for speech recognition can be applied in SMT also.  Equivalent approach for SMT is the direct model where we combine log-linearly various models.∗  The SMT problem can be stated as searching for the target string that maximizes the joint model, p(T, S), and either the Bayes expansion is equivalent. 3/29/2024 30
  • 31.
    Machine Translation…  WordAlignment  The general problem of aligning a parallel text is, more precisely, to find its optimal parallel segmentation or bisegmentation under some set of constraints: 3/29/2024 31
  • 32.
    Machine Translation…  WordAlignment  The general problem of aligning a parallel text is, more precisely, to find its optimal parallel segmentation or bisegmentation under some set of constraints: 3/29/2024 32
  • 33.
    Machine Translation…  WordAlignment…  Fundamentally, an alignment algorithm accepts as input a bitext and produces as output a bisegmentation relation that identifies corresponding segments between the texts.  A bitext consists of two texts that are translations of each other. ∗  ∗ In a “Terminological note” prefacing his book, Veronis (2000) cites Alan Melby pointing out that the alternative term parallel text creates an unfortunate and confusing clash with the translation theory and terminological community, who use the same term instead to mean what NLP and computational linguistics researchers typically refer to as non-parallel corpora or comparable corpora—texts in different languages from the same domain, but not necessarily translations of each other. 3/29/2024 33
  • 34.
    Machine Translation…  WordAlignment…  Bitext alignment fundamentally lies at the heart of all data-driven machine translation methods, and the rapid research progress on alignment reflects the advent of statistical machine translation (SMT) and example-based machine translation (EBMT) approaches.  Yet the importance of alignment extends as well to many other practical applications for translators, bilingual lexicographers, and even ordinary readers. 3/29/2024 34
  • 35.
    Machine Translation…  WordAlignment…  Automatically learned resources for MT, NLP, and humans  Bitext alignment methods are the core of many methods for machine learning of language resources to be used by SMT or other NLP applications, as well as human translators and linguists.  The side effects of alignment are often of more interest than the aligned text itself. 3/29/2024 35
  • 36.
    Machine Translation…  WordAlignment…  Automatically learned resources for MT, NLP, and humans…  Alignment algorithms offer the possibility of extracting various sorts of knowledge resources, such as  (a) phrasal bilexicons listing word or collocation translations;  (b) translation examples at the sentence, constituent, and/or phrase level; or  (c) tree-structured translation patterns such as transfer rules, translation frames, or treelets.  Such resources constitute a database that may be used by SMT and EBMT systems, or they may be taken as training data for further machine learning to mine deeper patterns.  Alignment has also been employed to infer sentence bracketing or constituent structure as a side effect. 3/29/2024 36
  • 37.
    Machine Translation…  WordAlignment…  Techniques have been developed for aligning segments at various granularities: documents, paragraphs, sentences, constituents, collocations or phrases, words, and characters. 3/29/2024 37
  • 38.
    Machine Translation…  WordAlignment…  Given a pair of sentences, word alignment produces a correspondence at the word level.  Example: alignment for an Arabic–English sentence pair is shown in the figure below where the Arabic has been Romanized. 3/29/2024 38
  • 39.
    Machine Translation…  WordAlignment…  Given a pair of sentences, word alignment produces a correspondence at the word level.  In this example, English words are being aligned to their Arabic informants.  The Arabic sentence has been segmented following the style of the Arabic Treebank (available from the Linguistic Data Consortium as Catalog Id LDC2007E65). 3/29/2024 39
  • 40.
    Machine Translation…  WordAlignment…  The first Arabic word, “w#,” has no English informant and is aligned to the null-cept (the string “e_0” is used to represent the null-cept), which is an imaginary English word that gives rise to spontaneous Arabic words.  Multi-word Arabic alignments, such as those at positions 8 and 9, are done after the alignment process.  The Arabic word, “mrAkz” is an example of a split alignment.  Numbers are replaced by a class label (in this example, ‘$num’ is used) 3/29/2024 40
  • 41.
    Machine Translation…  WordAlignment…  Table: Phrase Library Arabic Parse Example 3/29/2024 41
  • 42.
    Machine Translation…  Phase-BasedTranslation  The Alignment Template (AT) approach for MT was a departure from the style of the word-based generative models of IBM, and together with a training method is the basis for most of the phrase-based MT systems used today.  Phrase-based systems are the workhorse of SMT systems due to their simple and relatively straight forward method of extracting phrase libraries and training weights.  Systems for new language pairs that have parallel corpora can be built by utilizing the GIZA++ toolkit for generating word- alignments and the open source phrase decoders such as Moses. 3/29/2024 42
  • 43.
  • 44.
  • 45.
    Group Assignment -Two  Discuss the following three approaches :  Group- One: Monotonic Alignment for Words  Group-Two: Non-Monotonic Alignment for Single-Token Words  Group-Three: Non-Monotonic Alignment for Multi-token Words and Phrases 3/29/2024 45