Introduction to MT
Ling 580
Fei Xia
Week 1: 1/03/06
Outline
• Course overview
• Introduction to MT
– Major challenges
– Major approaches
– Evaluation of MT systems
• Overview of word-based SMT
Course overview
General info
• Course website:
– Syllabus (incl. slides and papers): updated every week.
– Message board
– ESubmit
• Office hour: Fri: 10:30am-12:30pm.
• Prerequisites:
– Ling570 and Ling571.
– Programming: C or C++, Perl is a plus.
– Introduction to probability and statistics
Expectations
• Reading:
– Papers are online
– Finish reading before class. Bring your questions to
class.
• Grade:
– Leading discussion (1-2 papers): 50%
– Project: 40%
– Class participation: 10%
– No quizzes, exams
Leading discussion
• Indicate your choice via EPost by Jan 8.
• You might want to read related papers.
• Make slides with PowerPoint.
• Email me your slides by 3:30am on the
Monday before your presentation.
• Present the paper in class and lead the
discussion: 40-50 minutes.
Project
• Details will be available soon.
• Project presentation: 3/7/06
• Final report: due on 3/12/06
• Pongo account will be ready soon.
Introduction to MT
A brief history of MT
(Based on work by John Hutchins)
• Before the computer: In the mid 1930s, a French-
Armenian Georges Artsrouni and a Russian Petr
Troyanskii applied for patents for ‘translating machines’.
• The pioneers (1947-1954): the first public MT demo was
given in 1954 (by IBM and Georgetown University).
• The decade of optimism (1954-1966): ALPAC
(Automatic Language Processing Advisory Committee)
report in 1966: "there is no immediate or predictable
prospect of useful machine translation."
A brief history of MT (cont)
• The aftermath of the ALPAC report (1966-
1980): a virtual end to MT research
• The 1980s: Interlingua, example-based
MT
• The 1990s: Statistical MT
• The 2000s: Hybrid MT
Where are we now?
• Huge potential/need due to the internet,
globalization and international politics.
• Quick development time due to SMT, the
availability of parallel data and computers.
• Translation is reasonable for language pairs with
a large amount of resource.
• Start to include more “minor” languages.
What is MT good for?
• Rough translation: web data
• Computer-aided human translation
• Translation for limited domain
• Cross-lingual IR
• Machine is better than human in:
– Speed: much faster than humans
– Memory: can easily memorize millions of word/phrase
translations.
– Manpower: machines are much cheaper than humans
– Fast learner: it takes minutes or hours to build a new system.
Erasable memory 
– Never complain, never get tired, …
Major challenges in MT
Translation is hard
• Novels
• Word play, jokes, puns, hidden messages
• Concept gaps: go Greek, bei fen
• Other constraints: lyrics, dubbing, poem,
…
Major challenges
• Getting the right words:
– Choosing the correct root form
– Getting the correct inflected form
– Inserting “spontaneous” words
• Putting the words in the correct order:
– Word order: SVO vs. SOV, …
– Unique constructions:
– Divergence
Lexical choice
• Homonymy/Polysemy: bank, run
• Concept gap: no corresponding concepts in
another language: go Greek, go Dutch, fen sui,
lame duck, …
• Coding (Concept  lexeme mapping)
differences:
– More distinction in one language: e.g., kinship
vocabulary.
– Different division of conceptual space:
Choosing the appropriate inflection
• Inflection: gender, number, case, tense, …
• Ex:
– Number: Ch-Eng: all the concrete nouns:
ch_book  book, books
– Gender: Eng-Fr: all the adjectives
– Case: Eng-Korean: all the arguments
– Tense: Ch-Eng: all the verbs:
ch_buy  buy, bought, will buy
Inserting spontaneous words
• Function words:
– Determiners: Ch-Eng:
ch_book  a book, the book, the books, books
– Prepositions: Ch-Eng:
… ch_November  … in November
– Relative pronouns: Ch-Eng:
… ch_buy ch_book de ch_person  the person who bought /book/
– Possessive pronouns: Ch-Eng:
ch_he ch_raise ch_hand  He raised his hand(s)
– Conjunction: Eng-Ch:
Although S1, S2  ch_although S1, ch_but S2
– …
Inserting spontaneous words (cont)
• Content words:
– Dropped argument: Ch-Eng:
ch_buy le ma  Has Subj bought Obj?
– Chinese First name: Eng-Ch:
Jiang …  ch_Jiang ch_Zemin …
– Abbreviation, Acronyms: Ch-Eng:
ch_12 ch_big  the 12th
National Congress of the
CPC (Communist Party of China)
– …
Major challenges
• Getting the right words:
– Choosing the correct root form
– Getting the correct inflected form
– Inserting “spontaneous” words
• Putting the words in the correct order:
– Word order: SVO vs. SOV, …
– Unique construction:
– Structural divergence
Word order
• SVO, SOV, VSO, …
• VP + PP  PP VP
• VP + AdvP  AdvP + VP
• Adj + N  N + Adj
• NP + PP  PP NP
• NP + S  S NP
• P + NP  NP + P
“Unique” Constructions
• Overt wh-movement: Eng-Ch:
– Eng: Why do you think that he came yesterday?
– Ch: you why think he yesterday come ASP?
– Ch: you think he yesterday why come?
• Ba-construction: Ch-Eng
– She ba homework finish ASP  She finished her
homework.
– He ba wall dig ASP CL hole  He digged a hole in
the wall.
– She ba orange peel ASP skin  She peeled the
orange’s skin.
Translation divergences
• Source and target parse trees
(dependency trees) are not identical.
• Example: I like Mary  S: Marta me
gusta a mi (‘Mary pleases me’)
• More discussion next time.
Major approaches
How humans do translation?
• Learn a foreign language:
– Memorize word translations
– Learn some patterns:
– Exercise:
• Passive activity: read, listen
• Active activity: write, speak
• Translation:
– Understand the sentence
– Clarify or ask for help (optional)
– Translate the sentence
Training stage
Decoding stage
Translation lexicon
Templates, transfer rules
Parsing, semantics analysis?
Interactive MT?
Word-level? Phrase-level?
Generate from meaning?
Reinforced learning?
Reranking?
What kinds of resources are
available to MT?
• Translation lexicon:
– Bilingual dictionary
• Templates, transfer rules:
– Grammar books
• Parallel data, comparable data
• Thesaurus, WordNet, FrameNet, …
• NLP tools: tokenizer, morph analyzer, parser, …
 More resources for major languages, less for “minor”
languages.
Major approaches
• Transfer-based
• Interlingua
• Example-based (EBMT)
• Statistical MT (SMT)
• Hybrid approach
The MT triangle
word Word
Meaning
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
(interlingua)
A
n
a
l
y
s
i
s
S
y
n
t
h
e
s
i
s
Transfer-based MT
• Analysis, transfer, generation:
1. Parse the source sentence
2. Transform the parse tree with transfer rules
3. Translate source words
4. Get the target sentence from the tree
• Resources required:
– Source parser
– A translation lexicon
– A set of transfer rules
• An example: Mary bought a book yesterday.
Transfer-based MT (cont)
• Parsing: linguistically motivated grammar or formal
grammar?
• Transfer:
– context-free rules? A path on a dependency tree?
– Apply at most one rule at each level?
– How are rules created?
• Translating words: word-to-word translation?
• Generation: using LM or other additional knowledge?
• How to create the needed resources automatically?
Interlingua
• For n languages, we need n(n-1) MT systems.
• Interlingua uses a language-independent
representation.
• Conceptually, Interlingua is elegant: we only
need n analyzers, and n generators.
• Resource needed:
– A language-independent representation
– Sophisticated analyzers
– Sophisticated generators
Interlingua (cont)
• Questions:
– Does language-independent meaning representation
really exist? If so, what does it look like?
– It requires deep analysis: how to get such an
analyzer: e.g., semantic analysis
– It requires non-trivial generation: How is that done?
– It forces disambiguation at various levels: lexical,
syntactic, semantic, discourse levels.
– It cannot take advantage of similarities between a
particular language pair.
Example-based MT
• Basic idea: translate a sentence by using the
closest match in parallel data.
• First proposed by Nagao (1981).
• Ex:
– Training data:
• w1 w2 w3 w4  w1’ w2’ w3’ w4’
• w5 w6 w7  w5’ w6’ w7’
• w8 w9  w8’ w9’
– Test sent:
• w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’
EMBT (cont)
• Types of EBMT:
– Lexical (shallow)
– Morphological / POS analysis
– Parse-tree based (deep)
• Types of data required by EBMT systems:
– Parallel text
– Bilingual dictionary
– Thesaurus for computing semantic similarity
– Syntactic parser, dependency parser, etc.
EBMT (cont)
• Word alignment: using dictionary and heuristics
 exact match
• Generalization:
– Clusters: dates, numbers, colors, shapes, etc.
– Clusters can be built by hand or learned automatically.
• Ex:
– Exact match: 12 players met in Paris last Tuesday 
12 Spieler trafen sich letzen Dienstag in Paris
– Templates: $num players met in $city $time 
$num Spieler trafen sich $time in $city
Statistical MT
• Basic idea: learn all the parameters from parallel data.
• Major types:
– Word-based
– Phrase-based
• Strengths:
– Easy to build, and it requires no human knowledge
– Good performance when a large amount of training data is
available.
• Weaknesses:
– How to express linguistic generalization?
Comparison of resource requirement
Transfer-
based
Interlingua EBMT SMT
dictionary + + +
Transfer
rules
+
parser + + + (?)
semantic
analyzer
+
parallel data + +
others Universal
representation
thesaurus
Hybrid MT
• Basic idea: combine strengths of different approaches:
– Syntax-based: generalization at syntactic level
– Interlingua: conceptually elegant
– EBMT: memorizing translation of n-grams; generalization at various level.
– SMT: fully automatic; using LM; optimizing some objective functions.
• Types of hybrid HT:
– Borrowing concepts/methods:
• SMT from EBMT: phrase-based SMT; Alignment templates
• EBMT from SMT: automatically learned translation lexicon
• Transfer-based from SMT: automatically learned translation lexicon, transfer rules;
using LM
• …
– Using two MTs in a pipeline:
• Using transfer-based MT as a preprocessor of SMT
– Using multiple MTs in parallel, then adding a re-ranker.
Evaluation of MT
Evaluation
• Unlike many NLP tasks (e.g., tagging, chunking, parsing,
IE, pronoun resolution), there is no single gold standard
for MT.
• Human evaluation: accuracy, fluency, …
– Problem: expensive, slow, subjective, non-reusable.
• Automatic measures:
– Edit distance
– Word error rate (WER), Position-independent WER (PER)
– Simple string accuracy (SSA), Generation string accuracy (GSA)
– BLEU
Edit distance
• The Edit distance (a.k.a. Levenshtein
distance) is defined as the minimal cost of
transforming str1 into str2, using three
operations (substitution, insertion,
deletion).
• Use DP and the complexity is O(m*n).
WER, PER, and SSA
• WER (word error rate) is edit distance, divided by |Ref|.
• PER (position-independent WER): same as WER but
disregards word ordering
• SSA (Simple string accuracy) = 1 - WER
• Previous example:
– Sys: w1 w2 w3 w4
– Ref: w1 w3 w2
– Edit distance = 2
– WER=2/3
– PER=1/3
– SSA=1/3
Generation string accuracy (GSA)
Example:
Ref: w1 w2 w3 w4
Sys: w2 w3 w4 w1
Del=1, Ins=1  SSA=1/2
Move=1, Del=0, Ins=0  GSA=3/4
|
Re
|
1
f
Sub
Del
Ins
Move
GSA





BLEU
• Proposal by Papineni et. al. (2002)
• Most widely used in MT community.
• BLEU is a weighted average of n-gram precision
(pn) between system output and all references,
multiplied by a brevity penalty (BP).
)
1
(
...
*
*
*
*
2
1
1
N
w
when
p
p
p
BP
p
BP
BLEU
n
N
N
N
n
w
n
n


 

N-gram precision
• N-gram precision: the percent of n-grams in
the system output that are correct.
• Clipping:
– Sys: the the the the the the
– Ref: the cat sat on the mat
– Unigram precision:
– Max_Ref_count: the max number of times a
ngram occurs in any single reference translation.
)
_
Re
_
,
min( Count
f
Max
count
Countclip 
N-gram precision
i.e. the percent of n-grams in the system output
that are correct (after clipping).
 
 
 
 

Sys
S S
ngram
Sys
S S
ngram
clip
n
ngram
Count
ngram
Count
p
)
(
)
(
Brevity Penalty
• For each sent si in system output, find closest matching
reference ri (in terms of length).
• Longer system output is already penalized by the n-gram
precision measure.


 
 
otherwise
e
r
c
if
BP c
r /
1
1
|
|
|,
| 
 

i
i
i
i r
r
s
c
Let
An example
• Sys: The cat was on the mat
• Ref1: The cat sat on a mat
• Ref2: There was a cat on the mat
• Assuming N=3
• p1=5/6, p2=3/5, p3=1/4, BP=1  BLEU=0.50
• What if N=4?
Summary
• Course overview
• Major challenges in MT
– Choose the right words (root form, inflection,
spontaneous words)
– Put them in right positions (word order, unique
constructions, divergences)
Summary (cont)
• Major approaches
– Transfer-based MT
– Interlingua
– Example-based MT
– Statistical MT
– Hybrid MT
• Evaluation of MT systems
– Edit distance
– WER, PER, SSA, GSA
– BLEU
Additional slides
Translation divergences
(based on Bonnie Dorr’s work)
• Thematic divergence: I like Mary 
S: Marta me gusta a mi (‘Mary pleases me’)
• Promotional divergence: John usually goes home 
S: Juan suele ira casa (‘John tends to go home’)
• Demotional divergence: I like eating G: Ich esse gern
(“I eat likingly)
• Structural divergence: John entered the house 
S: Juan entro en la casa (‘John entered in the house’)
Translation divergences (cont)
• Conflational divergence: I stabbed John 
S: Yo le di punaladas a Juan (‘I gave knife-
wounds to John’)
• Categorial divergence: I am hungry 
G: Ich habe Hunger (‘I have hunger’)
• Lexical divergence: John broke into the room 
S: Juan forzo la entrada al cuarto (‘John forced
the entry to the room’)
Calculating edit distance
• D(0, 0) = 0
• D(i, 0) = delCost * i
• D(0, j) = insCost * j
• D(i+1, j+1) =
min( D(i,j) + sub,
D(i+1, j) + insCost,
D(i, j+1) + delCost)
sub = 0 if str1[i+1]=str2[j+1]
= subCost otherwise
An example
• Sys: w1 w2 w3 w4
• Ref: w1 w3 w2
• All three costs are 1.
• Edit distance=2
0 1 2 3
1 0 1 2
2 1 1 1
3 2 1 2
4 3 2 2
w1 w3 w2
w1
w2
w3
w4

Machine Translation in Natural Language Processing

  • 1.
    Introduction to MT Ling580 Fei Xia Week 1: 1/03/06
  • 2.
    Outline • Course overview •Introduction to MT – Major challenges – Major approaches – Evaluation of MT systems • Overview of word-based SMT
  • 3.
  • 4.
    General info • Coursewebsite: – Syllabus (incl. slides and papers): updated every week. – Message board – ESubmit • Office hour: Fri: 10:30am-12:30pm. • Prerequisites: – Ling570 and Ling571. – Programming: C or C++, Perl is a plus. – Introduction to probability and statistics
  • 5.
    Expectations • Reading: – Papersare online – Finish reading before class. Bring your questions to class. • Grade: – Leading discussion (1-2 papers): 50% – Project: 40% – Class participation: 10% – No quizzes, exams
  • 6.
    Leading discussion • Indicateyour choice via EPost by Jan 8. • You might want to read related papers. • Make slides with PowerPoint. • Email me your slides by 3:30am on the Monday before your presentation. • Present the paper in class and lead the discussion: 40-50 minutes.
  • 7.
    Project • Details willbe available soon. • Project presentation: 3/7/06 • Final report: due on 3/12/06 • Pongo account will be ready soon.
  • 8.
  • 9.
    A brief historyof MT (Based on work by John Hutchins) • Before the computer: In the mid 1930s, a French- Armenian Georges Artsrouni and a Russian Petr Troyanskii applied for patents for ‘translating machines’. • The pioneers (1947-1954): the first public MT demo was given in 1954 (by IBM and Georgetown University). • The decade of optimism (1954-1966): ALPAC (Automatic Language Processing Advisory Committee) report in 1966: "there is no immediate or predictable prospect of useful machine translation."
  • 10.
    A brief historyof MT (cont) • The aftermath of the ALPAC report (1966- 1980): a virtual end to MT research • The 1980s: Interlingua, example-based MT • The 1990s: Statistical MT • The 2000s: Hybrid MT
  • 11.
    Where are wenow? • Huge potential/need due to the internet, globalization and international politics. • Quick development time due to SMT, the availability of parallel data and computers. • Translation is reasonable for language pairs with a large amount of resource. • Start to include more “minor” languages.
  • 12.
    What is MTgood for? • Rough translation: web data • Computer-aided human translation • Translation for limited domain • Cross-lingual IR • Machine is better than human in: – Speed: much faster than humans – Memory: can easily memorize millions of word/phrase translations. – Manpower: machines are much cheaper than humans – Fast learner: it takes minutes or hours to build a new system. Erasable memory  – Never complain, never get tired, …
  • 13.
  • 14.
    Translation is hard •Novels • Word play, jokes, puns, hidden messages • Concept gaps: go Greek, bei fen • Other constraints: lyrics, dubbing, poem, …
  • 15.
    Major challenges • Gettingthe right words: – Choosing the correct root form – Getting the correct inflected form – Inserting “spontaneous” words • Putting the words in the correct order: – Word order: SVO vs. SOV, … – Unique constructions: – Divergence
  • 16.
    Lexical choice • Homonymy/Polysemy:bank, run • Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … • Coding (Concept  lexeme mapping) differences: – More distinction in one language: e.g., kinship vocabulary. – Different division of conceptual space:
  • 17.
    Choosing the appropriateinflection • Inflection: gender, number, case, tense, … • Ex: – Number: Ch-Eng: all the concrete nouns: ch_book  book, books – Gender: Eng-Fr: all the adjectives – Case: Eng-Korean: all the arguments – Tense: Ch-Eng: all the verbs: ch_buy  buy, bought, will buy
  • 18.
    Inserting spontaneous words •Function words: – Determiners: Ch-Eng: ch_book  a book, the book, the books, books – Prepositions: Ch-Eng: … ch_November  … in November – Relative pronouns: Ch-Eng: … ch_buy ch_book de ch_person  the person who bought /book/ – Possessive pronouns: Ch-Eng: ch_he ch_raise ch_hand  He raised his hand(s) – Conjunction: Eng-Ch: Although S1, S2  ch_although S1, ch_but S2 – …
  • 19.
    Inserting spontaneous words(cont) • Content words: – Dropped argument: Ch-Eng: ch_buy le ma  Has Subj bought Obj? – Chinese First name: Eng-Ch: Jiang …  ch_Jiang ch_Zemin … – Abbreviation, Acronyms: Ch-Eng: ch_12 ch_big  the 12th National Congress of the CPC (Communist Party of China) – …
  • 20.
    Major challenges • Gettingthe right words: – Choosing the correct root form – Getting the correct inflected form – Inserting “spontaneous” words • Putting the words in the correct order: – Word order: SVO vs. SOV, … – Unique construction: – Structural divergence
  • 21.
    Word order • SVO,SOV, VSO, … • VP + PP  PP VP • VP + AdvP  AdvP + VP • Adj + N  N + Adj • NP + PP  PP NP • NP + S  S NP • P + NP  NP + P
  • 22.
    “Unique” Constructions • Overtwh-movement: Eng-Ch: – Eng: Why do you think that he came yesterday? – Ch: you why think he yesterday come ASP? – Ch: you think he yesterday why come? • Ba-construction: Ch-Eng – She ba homework finish ASP  She finished her homework. – He ba wall dig ASP CL hole  He digged a hole in the wall. – She ba orange peel ASP skin  She peeled the orange’s skin.
  • 23.
    Translation divergences • Sourceand target parse trees (dependency trees) are not identical. • Example: I like Mary  S: Marta me gusta a mi (‘Mary pleases me’) • More discussion next time.
  • 24.
  • 25.
    How humans dotranslation? • Learn a foreign language: – Memorize word translations – Learn some patterns: – Exercise: • Passive activity: read, listen • Active activity: write, speak • Translation: – Understand the sentence – Clarify or ask for help (optional) – Translate the sentence Training stage Decoding stage Translation lexicon Templates, transfer rules Parsing, semantics analysis? Interactive MT? Word-level? Phrase-level? Generate from meaning? Reinforced learning? Reranking?
  • 26.
    What kinds ofresources are available to MT? • Translation lexicon: – Bilingual dictionary • Templates, transfer rules: – Grammar books • Parallel data, comparable data • Thesaurus, WordNet, FrameNet, … • NLP tools: tokenizer, morph analyzer, parser, …  More resources for major languages, less for “minor” languages.
  • 27.
    Major approaches • Transfer-based •Interlingua • Example-based (EBMT) • Statistical MT (SMT) • Hybrid approach
  • 28.
    The MT triangle wordWord Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) A n a l y s i s S y n t h e s i s
  • 29.
    Transfer-based MT • Analysis,transfer, generation: 1. Parse the source sentence 2. Transform the parse tree with transfer rules 3. Translate source words 4. Get the target sentence from the tree • Resources required: – Source parser – A translation lexicon – A set of transfer rules • An example: Mary bought a book yesterday.
  • 30.
    Transfer-based MT (cont) •Parsing: linguistically motivated grammar or formal grammar? • Transfer: – context-free rules? A path on a dependency tree? – Apply at most one rule at each level? – How are rules created? • Translating words: word-to-word translation? • Generation: using LM or other additional knowledge? • How to create the needed resources automatically?
  • 31.
    Interlingua • For nlanguages, we need n(n-1) MT systems. • Interlingua uses a language-independent representation. • Conceptually, Interlingua is elegant: we only need n analyzers, and n generators. • Resource needed: – A language-independent representation – Sophisticated analyzers – Sophisticated generators
  • 32.
    Interlingua (cont) • Questions: –Does language-independent meaning representation really exist? If so, what does it look like? – It requires deep analysis: how to get such an analyzer: e.g., semantic analysis – It requires non-trivial generation: How is that done? – It forces disambiguation at various levels: lexical, syntactic, semantic, discourse levels. – It cannot take advantage of similarities between a particular language pair.
  • 33.
    Example-based MT • Basicidea: translate a sentence by using the closest match in parallel data. • First proposed by Nagao (1981). • Ex: – Training data: • w1 w2 w3 w4  w1’ w2’ w3’ w4’ • w5 w6 w7  w5’ w6’ w7’ • w8 w9  w8’ w9’ – Test sent: • w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’
  • 34.
    EMBT (cont) • Typesof EBMT: – Lexical (shallow) – Morphological / POS analysis – Parse-tree based (deep) • Types of data required by EBMT systems: – Parallel text – Bilingual dictionary – Thesaurus for computing semantic similarity – Syntactic parser, dependency parser, etc.
  • 35.
    EBMT (cont) • Wordalignment: using dictionary and heuristics  exact match • Generalization: – Clusters: dates, numbers, colors, shapes, etc. – Clusters can be built by hand or learned automatically. • Ex: – Exact match: 12 players met in Paris last Tuesday  12 Spieler trafen sich letzen Dienstag in Paris – Templates: $num players met in $city $time  $num Spieler trafen sich $time in $city
  • 36.
    Statistical MT • Basicidea: learn all the parameters from parallel data. • Major types: – Word-based – Phrase-based • Strengths: – Easy to build, and it requires no human knowledge – Good performance when a large amount of training data is available. • Weaknesses: – How to express linguistic generalization?
  • 37.
    Comparison of resourcerequirement Transfer- based Interlingua EBMT SMT dictionary + + + Transfer rules + parser + + + (?) semantic analyzer + parallel data + + others Universal representation thesaurus
  • 38.
    Hybrid MT • Basicidea: combine strengths of different approaches: – Syntax-based: generalization at syntactic level – Interlingua: conceptually elegant – EBMT: memorizing translation of n-grams; generalization at various level. – SMT: fully automatic; using LM; optimizing some objective functions. • Types of hybrid HT: – Borrowing concepts/methods: • SMT from EBMT: phrase-based SMT; Alignment templates • EBMT from SMT: automatically learned translation lexicon • Transfer-based from SMT: automatically learned translation lexicon, transfer rules; using LM • … – Using two MTs in a pipeline: • Using transfer-based MT as a preprocessor of SMT – Using multiple MTs in parallel, then adding a re-ranker.
  • 39.
  • 40.
    Evaluation • Unlike manyNLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. • Human evaluation: accuracy, fluency, … – Problem: expensive, slow, subjective, non-reusable. • Automatic measures: – Edit distance – Word error rate (WER), Position-independent WER (PER) – Simple string accuracy (SSA), Generation string accuracy (GSA) – BLEU
  • 41.
    Edit distance • TheEdit distance (a.k.a. Levenshtein distance) is defined as the minimal cost of transforming str1 into str2, using three operations (substitution, insertion, deletion). • Use DP and the complexity is O(m*n).
  • 42.
    WER, PER, andSSA • WER (word error rate) is edit distance, divided by |Ref|. • PER (position-independent WER): same as WER but disregards word ordering • SSA (Simple string accuracy) = 1 - WER • Previous example: – Sys: w1 w2 w3 w4 – Ref: w1 w3 w2 – Edit distance = 2 – WER=2/3 – PER=1/3 – SSA=1/3
  • 43.
    Generation string accuracy(GSA) Example: Ref: w1 w2 w3 w4 Sys: w2 w3 w4 w1 Del=1, Ins=1  SSA=1/2 Move=1, Del=0, Ins=0  GSA=3/4 | Re | 1 f Sub Del Ins Move GSA     
  • 44.
    BLEU • Proposal byPapineni et. al. (2002) • Most widely used in MT community. • BLEU is a weighted average of n-gram precision (pn) between system output and all references, multiplied by a brevity penalty (BP). ) 1 ( ... * * * * 2 1 1 N w when p p p BP p BP BLEU n N N N n w n n     
  • 45.
    N-gram precision • N-gramprecision: the percent of n-grams in the system output that are correct. • Clipping: – Sys: the the the the the the – Ref: the cat sat on the mat – Unigram precision: – Max_Ref_count: the max number of times a ngram occurs in any single reference translation. ) _ Re _ , min( Count f Max count Countclip 
  • 46.
    N-gram precision i.e. thepercent of n-grams in the system output that are correct (after clipping).          Sys S S ngram Sys S S ngram clip n ngram Count ngram Count p ) ( ) (
  • 47.
    Brevity Penalty • Foreach sent si in system output, find closest matching reference ri (in terms of length). • Longer system output is already penalized by the n-gram precision measure.       otherwise e r c if BP c r / 1 1 | | |, |     i i i i r r s c Let
  • 48.
    An example • Sys:The cat was on the mat • Ref1: The cat sat on a mat • Ref2: There was a cat on the mat • Assuming N=3 • p1=5/6, p2=3/5, p3=1/4, BP=1  BLEU=0.50 • What if N=4?
  • 49.
    Summary • Course overview •Major challenges in MT – Choose the right words (root form, inflection, spontaneous words) – Put them in right positions (word order, unique constructions, divergences)
  • 50.
    Summary (cont) • Majorapproaches – Transfer-based MT – Interlingua – Example-based MT – Statistical MT – Hybrid MT • Evaluation of MT systems – Edit distance – WER, PER, SSA, GSA – BLEU
  • 51.
  • 52.
    Translation divergences (based onBonnie Dorr’s work) • Thematic divergence: I like Mary  S: Marta me gusta a mi (‘Mary pleases me’) • Promotional divergence: John usually goes home  S: Juan suele ira casa (‘John tends to go home’) • Demotional divergence: I like eating G: Ich esse gern (“I eat likingly) • Structural divergence: John entered the house  S: Juan entro en la casa (‘John entered in the house’)
  • 53.
    Translation divergences (cont) •Conflational divergence: I stabbed John  S: Yo le di punaladas a Juan (‘I gave knife- wounds to John’) • Categorial divergence: I am hungry  G: Ich habe Hunger (‘I have hunger’) • Lexical divergence: John broke into the room  S: Juan forzo la entrada al cuarto (‘John forced the entry to the room’)
  • 54.
    Calculating edit distance •D(0, 0) = 0 • D(i, 0) = delCost * i • D(0, j) = insCost * j • D(i+1, j+1) = min( D(i,j) + sub, D(i+1, j) + insCost, D(i, j+1) + delCost) sub = 0 if str1[i+1]=str2[j+1] = subCost otherwise
  • 55.
    An example • Sys:w1 w2 w3 w4 • Ref: w1 w3 w2 • All three costs are 1. • Edit distance=2 0 1 2 3 1 0 1 2 2 1 1 1 3 2 1 2 4 3 2 2 w1 w3 w2 w1 w2 w3 w4