2. Machine Translation
• One of the first applications for computers
– bilingual dictionary > word-word translation
• Good translation requires understanding!
– War and Peace, The Sound and The Fury?
• What can we do? Sublanguages.
– technical domains, static vocabulary
– Meteo in Canada, Caterpillar Tractor Manuals,
Botanical descriptions, Military Messages
CSE391 – 2005
2
NLP
4. Translation Issues:
Korean to English
- Word order
- Dropped arguments
- Lexical ambiguities
- Structure vs morphology
CSE391 – 2005
4
NLP
5. Common Thread
• Predicate-argument structure
– Basic constituents of the sentence and how
they are related to each other
• Constituents
– John, Mary, the dog, pleasure, the store.
• Relations
– Loves, feeds, go, to, bring
CSE391 – 2005
5
NLP
8. Machine Translation Lexical Choice- Word
Sense Disambiguation
Iraq lost the battle.
Ilakuka centwey ciessta.
[Iraq ] [battle] [lost].
John lost his computer.
John-i computer-lul ilepelyessta.
[John] [computer] [misplaced].
CSE391 – 2005
8
NLP
11. Syntactic Parsing
• The cat sat on the mat.
Det Noun Verb Prep Det Noun
• Time flies like an arrow.
Noun Verb
Prep Det Noun
• Fruit flies like a banana.
Noun Noun
CSE391 – 2005
Verb Det Noun
11
NLP
12. Parses
The cat sat on the mat
S
NP
Det
the
VP
N
cat
PP
V
sat
Prep
on
CSE391 – 2005
12
NP
Det
the
NLP
N
mat
13. Parses
Time flies like an arrow.
S
NP
VP
N
time
V
flies
PP
Prep
like
CSE391 – 2005
13
NP
Det
an
N
arrow
NLP
14. Parses
Time flies like an arrow.
S
NP
N
time
N
flies
VP
V
like
NP
Det
an
CSE391 – 2005
14
N
arrow
NLP
15. Recursive transition nets for CFGs
NP
S
S1
VP
S2
pp
det
NP
S4
S3
noun
noun
S5
adj
S6
pronoun
• s :- np,vp.
• np:- pronoun; noun; det,adj, noun; np,pp.
CSE391 – 2005
15
NLP
18. Parses
The old can can hold the water.
S
NP
VP
det
the
adj
old
N
can
CSE391 – 2005
aux
can V
hold
18
NP
det
the
N
water
NLP
19. Structural ambiguities
• That factory can can tuna.
• That factory cans cans of tuna and salmon.
• Have the students in cse91 finish the exam in
212.
• Have the students in cse91 finished the exam in
212?
CSE391 – 2005
19
NLP
20. Lexicon
The old can can hold the water.
Noun(can,can)
Verb(hold,hold)
Noun(cans,can)
Verb(holds,hold)
Noun(water,water)
Aux(can,can)
Noun(hold,hold)
Adj(old,old)
Noun(holds,hold)
Det(the,the)
CSE391 – 2005
20
NLP
21. Simple Context Free Grammar in BNF
notation
S
NP
→
→
NP VP
Pronoun | Noun | Det Adj Noun |NP PP
PP
→
Prep NP
V
VP
→
→
Verb | Aux Verb
V | V NP | V NP NP | V NP PP | VP PP
CSE391 – 2005
21
NLP
22. Top-down parse in progress
[The, old, can, can, hold, the, water]
S → NP VP
NP → NP?
NP → Pronoun?
Pronoun? fail
NP → Noun?
Noun? fail
NP → Det Adj Noun?
Det? the
ADJ?old
Noun? Can
Succeed.
Succeed.
VP?
CSE391 – 2005
22
NLP
23. Top-down parse in progress
[can, hold, the, water]
VP → VP?
V → Verb?
Verb? fail
V → Aux Verb?
Aux? can
Verb? hold
succeed
succeed
fail
[the, water]
CSE391 – 2005
23
NLP
24. Top-down parse in progress
[can, hold, the, water]
VP → VP NP
V → Verb?
Verb? fail
V → Aux Verb?
Aux? can
Verb? hold
NP → Pronoun?
Pronoun? fail
NP → Noun?
Noun? fail
NP → Det Adj Noun?
Det? the
ADJ? fail
CSE391 – 2005
24
NLP
28. Top-down approach
• Start with goal of sentence
S → NP VP
S → Wh-word Aux NP VP
• Will try to find an NP 4 different ways
before trying a parse where the verb
comes first.
• What does this remind you of?
– search
• What would be better?
CSE391 – 2005
28
NLP
29. Bottom-up approach
• Start with words in sentence.
• What structures do they correspond to?
• Once a structure is built, kept on a
CHART.
CSE391 – 2005
29
NLP
30. Bottom-up parse in progress
det adj noun
The old
can
det noun
CSE391 – 2005
aux
can
verb det
hold the
noun.
water.
aux/verb noun/verb noun det noun.
30
NLP
31. Bottom-up parse in progress
det adj noun
The old
can
det noun
CSE391 – 2005
aux
can
verb det
hold the
noun.
water.
aux/verb noun/verb noun det noun.
31
NLP
32. Bottom-up parse in progress
det adj noun
The old
can
det noun
CSE391 – 2005
aux
can
verb det
hold the
noun.
water.
aux/verb noun/verb noun det noun.
32
NLP
33. Top-down vs. Bottom-up
• Helps with POS
ambiguities – only
consider relevant
POS
• Rebuilds the same
structure repeatedly
• Spends a lot of time
on impossible parses
CSE391 – 2005
• Has to consider every
POS
• Builds each structure
once
• Spends a lot of time
on useless structures
What would be better?
33
NLP
34. Hybrid approach
• Top-down with a chart
• Use look ahead and heuristics to pick
most likely sentence type
• Use probabilities for pos tagging, pp
attachments, etc.
CSE391 – 2005
34
NLP
35. Features
• C for Case, Subjective/Objective
– She visited her.
• P for Person agreement, (1st, 2nd, 3rd)
– I like him, You like him, He likes him,
• N for Number agreement, Subject/Verb
– He likes him, They like him.
• G for Gender agreement, Subject/Verb
– English, reflexive pronouns He washed himself.
– Romance languages, det/noun
• T for Tense,
– auxiliaries, sentential complements, etc.
– * will finished is bad
CSE391 – 2005
35
NLP
37. Language to Logic
How do we get there?
• John went to the book store.
∃ John ∃ store1, go(John, store1)
• John bought a book.
buy(John,book1)
• John gave the book to Mary.
give(John,book1,Mary)
• Mary put the book on the table.
put(Mary,book1,table1)
CSE391 – 2005
37
NLP
38. Lexical Semantics
Same event - different sentences
John broke the window with a hammer.
John broke the window with the crack.
The hammer broke the window.
The window broke.
CSE391 – 2005
38
NLP
39. Same event - different syntactic frames
John broke the window with a hammer.
SUBJ VERB
OBJ
MODIFIER
John broke the window with the crack.
SUBJ VERB
OBJ
MODIFIER
The hammer broke the window.
SUBJ VERB
OBJ
The window broke.
SUBJ VERB
CSE391 – 2005
39
NLP
40. Semantics -predicate arguments
break(AGENT, INSTRUMENT, PATIENT)
AGENT
PATIENT
INSTRUMENT
John broke the window with a hammer.
INSTRUMENT
PATIENT
The hammer broke the window.
PATIENT
The window broke.
Fillmore 68 - The case for case
CSE391 – 2005
40
NLP
41. AGENT
PATIENT
INSTRUMENT
John broke the window with a hammer.
SUBJ
OBJ
INSTRUMENT
MODIFIER
PATIENT
The hammer broke the window.
SUBJ
OBJ
PATIENT
The window broke.
SUBJ
CSE391 – 2005
41
NLP
43. Syntax/semantics interaction
• Parsers will produce syntactically valid
parses for semantically anomalous
sentences
• Lexical semantics can be used to rule
them out
CSE391 – 2005
43
NLP
45. Subcategorization Frequencies
• The women kept the dogs on the beach.
– Where keep? Keep on beach 95%
• NP XP 81%
– Which dogs? Dogs on beach 5%
• NP 19%
• The women discussed the dogs on the beach.
– Where discuss? Discuss on beach 10%
• NP PP 24%
– Which dogs? Dogs on beach 90%
• NP 76%
CSE391 – 2005
Ford, Bresnan, Kaplan 82, Jurafsky 98, Roland,Jurafsky 99
45
NLP
46. Reading times
• NP-bias (slower times to bold word)
The waiter confirmed the reservation was
made yesterday.
The defendant accepted the verdict would
be decided soon.
CSE391 – 2005
46
NLP
47. Reading times
• S-bias (no slower times to bold word)
The waiter insisted the reservation was
made yesterday.
The defendant wished the verdict would be
decided soon.
CSE391 – 2005
Trueswell, Tanenhaus and Kello, 93
47
Trueswell and Kim 98NLP
49. Simple Context Free Grammar in BNF
S
NP
→
→
PP
V
→
→
VP
→
CSE391 – 2005
NP VP
Pronoun
| Noun
| Det Adj Noun
|NP PP
Prep NP
Verb
| Aux Verb
V
| V NP
| V NP NP
| V NP PP
| VP PP
49
NLP
50. Simple Probabilistic CFG
S
NP
→
→
PP
V
→
→
VP
→
CSE391 – 2005
NP VP
Pronoun
| Noun
| Det Adj Noun
|NP PP
Prep NP
Verb
| Aux Verb
V
| V NP
| V NP NP
| V NP PP
| VP PP
50
[0.10]
[0.20]
[0.50]
[0.20]
[1.00]
[0.20]
[0.20]
[0.10]
[0.40]
[0.10]
[0.20]
[0.20]
NLP
51. Simple Probabilistic Lexicalized CFG
S
NP
→
→
PP
V
→
→
VP
→
CSE391 – 2005
NP VP
Pronoun
| Noun
| Det Adj Noun
|NP PP
Prep NP
Verb
| Aux Verb
V
| V NP
| V NP NP
| V NP PP
| VP PP
51
[0.10]
[0.20]
[0.50]
[0.20]
[1.00]
[0.20]
[0.20]
[0.87] {sleep, cry, laugh}
[0.03]
[0.00]
[0.00]
[0.10]
NLP
52. Simple Probabilistic Lexicalized CFG
VP
→
V
| V NP
| V NP NP
| V NP PP
| VP PP
[0.30]
[0.60] {break,split,crack..}
[0.00]
[0.00]
[0.10]
VP
→
V
| V NP
| V NP NP
| V NP PP
| VP PP
[0.10]
[0.40]
[0.10]
[0.20]
[0.20]
CSE391 – 2005
52
what about
leave?
leave1, leave2?
NLP
53. A TreeBanked Sentence
(S (NP-SBJ Analysts)
(VP have
(VP been
VP
(VP expecting
(NP (NP a GM-Jaguar pact)
have VP
(SBAR (WHNP-1 that)
(S (NP-SBJ *T*-1)
NP-SBJ
been VP
(VP would
Analyst
(VP give
expectingNP
s
(NP the U.S. car maker)
SBAR
(NP (NP an eventual (ADJP 30 %) stake)
NP
S
(PP-LOC in (NP the British company))))))))))))
a GM-Jaguar WHNP-1
VP
pact
that NP-SBJ
VP
*T*-1 would
NP
give
PP-LOC
NP
Analysts have been expecting a GM-Jaguar
NP
the US car
pact that would give the U.S. car maker an
NP
an eventual
maker
eventual 30% stake in the British company.
in
the British
30% stake
company
S
CSE391 – 2005
53
NLP
54. The same sentence, PropBanked
(S Arg0 (NP-SBJ Analysts)
(VP have
(VP been
Arg1
(VP expecting
Arg1 (NP (NP a GM-Jaguar pact)
(SBAR (WHNP-1 that)
(S Arg0 (NP-SBJ *T*-1)
a GM-Jaguar
(VP would
pact
(VP give
Arg2 (NP the U.S. car maker)
Arg1 (NP (NP an eventual (ADJP 30 %) stake)
(PP-LOC in (NP the British company))))))))))))
have been expecting
Arg0
Analyst
s
Arg0
*T*-1
that would give
Arg2
Arg1
an eventual 30% stake in the
British company
the US car
maker
CSE391 – 2005
expect(Analysts, GM-J pact)
give(GM-J pact, US car maker, 30% stake)
54
NLP
55. Headlines
• Police Begin Campaign To Run Down Jaywalkers
• Iraqi Head Seeks Arms
• Teacher Strikes Idle Kids
• Miners Refuse To Work After Death
• Juvenile Court To Try Shooting Defendant
CSE391 – 2005
55
NLP
57. Context Sensitivity
• Programming languages are Context Free
• Natural languages are Context Sensitive?
– Movement
– Features
– respectively
John, Mary and Bill ate peaches, pears and apples,
respectively.
CSE391 – 2005
57
NLP
59. Recursive transition nets for CFGs
NP
S
S1
VP
S2
pp
det
NP
S4
S3
noun
noun
S5
adj
S6
pronoun
• s :- np,vp.
• np:- pronoun; noun; det,adj, noun;
np,pp.
CSE391 – 2005
59
NLP
60. Most parsers are Turing Machines
• To give a more natural and
comprehensible treatment of movement
• For a more efficient treatment of features
• Not because of respectively – most
parsers can’t handle it.
CSE391 – 2005
60
NLP
61. Nested Dependencies and
Crossing Dependencies
CF The dog chased the cat that bit the mouse that ran.
CF The mouse the cat the dog chased bit ran.
CS John, Mary and Bill ate peaches, pears and
apples, respectively
CSE391 – 2005
61
NLP
62. Movement
What did John give to Mary?
*Where did John give to Mary?
John gave cookies to Mary.
John gave <what> to Mary.
CSE391 – 2005
62
NLP
<number>
Here is an example from the Wall Street Journal Treebank II. The sentence is in the light blue box on the bottom
left corner, the syntactic structure as a parse tree is in the middle, the actual annotation is in the light gray box.
<number>
Here the same annotation is still in the gray box, with the argument labels added. The tree represents the dependency
structure, which gives rise to the predicates in the light blue box. Notice that the trace links back to the GM-Jaguar Pact.
Notice also that it could just as easily have said, “a GM-Jaguar pact that would give an eventual…stake to the US car maker.”
where it would be “ ARG0: a GM-Jaguar pact that would give an ARG1: eventual…stake to ARG2: the US car maker.”
This works in exactly the same way for Chinese and Korean as it works for English (and presumably will for Arabic as well.)