SlideShare a Scribd company logo
1 of 47
CMSC 723 / LING 645: Intro to
Computational Linguistics
September 1, 2004: Dorr
Overview, History, Goals, Problems,
Techniques; Intro to MT (J&M 1, 21)
Prof. Bonnie J. Dorr
Dr. Christof Monz
TA: Adam Lee
Administrivia
http://www.umiacs.umd.edu/~christof/courses/cmsc723-fall04/
IMPORTANT:
ā€¢For Today: Chapters 1 and 21
ā€¢For Next Time: Chapter 2
Other Important Stuff
ļ‚¬ This course is interdisciplinaryā€”cuts across different areas of expertise.
Expect that a subset of the class will be learning new material at any time,
while others will have to be patient! (The subsets will swap frequently!)
ļ‚¬ Project 1 and Project 2 are designed differently. Be prepared for this
distinction!
ā€“ P1 will focus on the fundamentals, getting your feet wet with software. By the
end, you should feel comfortable using/testing certain types of NLP software.
ā€“ P2 will require a significantly deeper level of understanding, critique, analysis.
Youā€™ll be expected to think deeply and write a lot in the second project. What you
write will be a major portion of the grade!
ļ‚¬ No solutions will be handed out. Written comments will be sent to you by
the TA.
ļ‚¬ All email correspondence MUST HAVE ā€œCMSC 723ā€ in the Subject line!!!
ļ‚¬ Submission format for assignments, projects: plain ascii, pdf
ļ‚¬ Assignment 1 will be posted next week.
CL vs NLP
Why ā€œComputational Linguistics (CL)ā€ rather
than ā€œNatural Language Processingā€ (NLP)?
ā€¢Computational Linguistics
ā€” Computers dealing with language
ā€” Modeling what people do
ā€¢Natural Language Processing
ā€”Applications on the computer side
Relation of CL to
Other Disciplines
Artificial Intelligence (AI)
(notions of rep, search, etc.)
Machine Learning
(particularly, probabilistic
or statistic ML techniques) CL
Linguistics (Syntax,
Semantics, etc.)
Psychology
Electrical Engineering
(EE) (Optical Character
Recognition)
Philosophy of Language,
Formal Logic
Information
Retrieval
Theory of
Computation
Human Computer
Interaction (HCI)
A Sampling of
ā€œOther Disciplinesā€
ļ‚¬Linguistics: formal grammars, abstract
characterization of what is to be learned.
ļ‚¬Computer Science: algorithms for efficient
learning or online deployment of these systems in
automata.
ļ‚¬Engineering: stochastic techniques for
characterizing regular patterns for learning and
ambiguity resolution.
ļ‚¬Psychology: Insights into what linguistic
constructions are easy or difficult for people to
learn or to use
History: 1940-1950ā€™s
ļ‚¬Development of formal language theory
(Chomsky, Kleene, Backus).
ā€“ Formal characterization of classes of grammar
(context-free, regular)
ā€“ Association with relevant automata
ļ‚¬ Probability theory: language understanding as
decoding through noisy channel (Shannon)
ā€“ Use of information theoretic concepts like entropy to
measure success of language models.
1957-1983
Symbolic vs. Stochastic
ļ‚¬Symbolic
ā€“ Use of formal grammars as basis for natural language
processing and learning systems. (Chomsky, Harris)
ā€“ Use of logic and logic based programming for
characterizing syntactic or semantic inference (Kaplan, Kay,
Pereira)
ā€“ First toy natural language understanding and generation
systems (Woods, Minsky, Schank, Winograd, Colmerauer)
ā€“ Discourse Processing: Role of Intention, Focus (Grosz,
Sidner, Hobbs)
ļ‚¬Stochastic Modeling
ā€“ Probabilistic methods for early speech recognition, OCR
(Bledsoe and Browning, Jelinek, Black, Mercer)
1983-1993:
Return of Empiricism
ļ‚¬Use of stochastic techniques for part of
speech tagging, parsing, word sense
disambiguation, etc.
ļ‚¬Comparison of stochastic, symbolic, more
or less powerful models for language
understanding and learning tasks.
1993-Present
ļ‚¬Advances in software and hardware create
NLP needs for information retrieval (web),
machine translation, spelling and grammar
checking, speech recognition and
synthesis.
ļ‚¬Stochastic and symbolic methods combine
for real world applications.
Language and Intelligence:
Turing Test
ļ‚¬Turing test:
ā€“ machine, human, and human judge
ļ‚¬Judge asks questions of computer and human.
ā€“ Machineā€™s job is to act like a human, humanā€™s job is to
convince judge that heā€™s not the machine.
ā€“ Machine judged ā€œintelligentā€ if it can fool judge.
ļ‚¬Judgement of ā€œintelligenceā€ linked to appropriate
answers to questions from the system.
ELIZA
ļ‚¬Remarkably simple ā€œRogerian
Psychologistā€
ļ‚¬Uses Pattern Matching to carry on
limited form of conversation.
ļ‚¬Seems to ā€œPass the Turing Test!ā€
(McCorduck, 1979, pp. 225-226)
ļ‚¬Eliza Demo:
http://www.lpa.co.uk/pws_dem4.htm
Whatā€™s involved in an
ā€œintelligentā€ Answer?
Analysis:
Decomposition of the signal (spoken or
written) eventually into meaningful units.
This involves ā€¦
Speech/Character Recognition
ļ‚¬Decomposition into words, segmentation
of words into appropriate phones or letters
ļ‚¬Requires knowledge of phonological
patterns:
ā€“ Iā€™m enormously proud.
ā€“ I mean to make you proud.
Morphological Analysis
ļ‚¬Inflectional
ā€“ duck + s = [N duck] + [plural s]
ā€“ duck + s = [V duck] + [3rd person s]
ļ‚¬Derivational
ā€“ kind, kindness
ļ‚¬Spelling changes
ā€“ drop, dropping
ā€“ hide, hiding
Syntactic Analysis
ļ‚¬Associate constituent structure with string
ļ‚¬Prepare for semantic interpretation
S
NP VP
I V NP
watched det N
the terrapin
OR: watch
Subject Object
I terrapin
Det
the
Semantics
ļ‚¬A way of representing meaning
ļ‚¬Abstracts away from syntactic structure
ļ‚¬Example:
ā€“ First-Order Logic: watch(I,terrapin)
ā€“ Can be: ā€œI watched the terrapinā€ or ā€œThe
terrapin was watched by meā€
ļ‚¬Real language is complex:
ā€“ Who did I watch?
Lexical Semantics
The Terrapin, is who I watched.
Watch the Terrapin is what I do best.
*Terrapin is what I watched the
I= experiencer
Watch the Terrapin = predicate
The Terrapin = patient
Compositional Semantics
ļ‚¬Association of parts of a proposition with
semantic roles
ļ‚¬Scoping
Experiencer Predicate: Be (perc)
I (1st pers, sg) pred patient
saw the Terrapin
Proposition
Word-Governed Semantics
ļ‚¬Any verb can add ā€œableā€ to form an
adjective.
ā€“ I taught the class . The class is teachable
ā€“ I rejected the idea. The idea is rejectable.
ļ‚¬Association of particular words with
specific semantic forms.
ā€“ John (masculine)
ā€“ The boys ( masculine, plural, human)
Pragmatics
ļ‚¬Real world knowledge, speaker intention,
goal of utterance.
ļ‚¬Related to sociology.
ļ‚¬Example 1:
ā€“ Could you turn in your assignments now (command)
ā€“ Could you finish the homework? (question, command)
ļ‚¬Example 2:
ā€“ I couldnā€™t decide how to catch the crook. Then I decided
to spy on the crook with binoculars.
ā€“ To my surprise, I found out he had them too. Then I knew
to just follow the crook with binoculars.
[ the crook [with binoculars]]
[ the crook] [ with binoculars]
Discourse Analysis
ļ‚¬Discourse: How propositions fit together in
a conversationā€”multi-sentence processing.
ā€“ Pronoun reference:
The professor told the student to finish the assignment.
He was pretty aggravated at how long it was taking to
pass it in.
ā€“ Multiple reference to same entity:
George W. Bush, president of the U.S.
ā€“ Relation between sentences:
John hit the man. He had stolen his bicycle
NLP Pipeline
Phonetic Analysis
Morphological analysis
OCR/Tokenization
Syntactic analysis
Semantic Interpretation
Discourse Processing
speech text
Relation to Machine Translation
Morphological analysis
Syntactic analysis
Semantic Interpretation
Interlingua
input
analysis generation
Morphological synthesis
Syntactic realization
Lexical selection
output
Ambiguity
I made her duck
I made duckling for her
I made the duckling belonging to her
I created the duck she owns
I forced her to lower her head
By magic, I changed her into a duck
S S
NP VP NP VP
I V NP VP I V NP
made her V made det N
duck her duck
Syntactic Disambiguation
ļ‚¬Structural ambiguity:
Part of Speech Tagging and
Word Sense Disambiguation
ļ‚¬[verb Duck ] !
[noun Duck] is delicious for dinner
ļ‚¬I went to the bank to deposit my check.
I went to the bank to look out at the river.
I went to the bank of windows and chose
the one dealing with last names beginning
with ā€œdā€.
Resources for
NLP Systems
ā€¢ Dictionary
ā€¢ Morphology and Spelling Rules
ā€¢ Grammar Rules
ā€¢ Semantic Interpretation Rules
ā€¢ Discourse Interpretation
Natural Language processing involves (1) learning
or fashioning the rules for each component, (2)
embedding the rules in the relevant automaton, (3)
and using the automaton to efficiently process the
input .
Some NLP Applications
ļ‚¬ Machine Translationā€”Babelfish (Alta Vista):
ļ‚¬ Question Answeringā€”Ask Jeeves (Ask Jeeves):
ļ‚¬ Language Summarizationā€”MEAD (U. Michigan):
ļ‚¬ Spoken Language Recognitionā€” EduSpeak (SRI):
ļ‚¬ Automatic Essay evaluationā€”E-Rater (ETS):
ļ‚¬ Information Retrieval and Extractionā€”NetOwl (SRA):
http://babelfish.altavista.com/translate.dyn
http://www.ets.org/research/erater.html
http://www.eduspeak.com/
http://www.netowl.com/extractor_summary.html
http://www.ask.com/
http://www.summarization.com/mead
What is MT?
ļ‚¬Definition: Translation from one natural
language to another by means of a
computerized system
ļ‚¬Early failures
ļ‚¬Later: varying degrees of success
An Old Example
The spirit is willing but the flesh is weak
The vodka is good but the meat is rotten
Machine Translation History
ļ‚¬1950ā€™s: Intensive research activity in MT
ļ‚¬1960ā€™s: Direct word-for-word replacement
ļ‚¬1966 (ALPAC): NRC Report on MT
ļ‚¬Conclusion: MT no longer worthy of serious
scientific investigation.
ļ‚¬1966-1975: `Recovery periodā€™
ļ‚¬1975-1985: Resurgence (Europe, Japan)
ļ‚¬1985-present: Resurgence (US)
http://ourworld.compuserve.com/homepages/WJHutchins/MTS-93.htm.
What happened between
ALPAC and Now?
ļ‚¬Need for MT and other NLP applications
confirmed
ļ‚¬Change in expectations
ļ‚¬Computers have become faster, more powerful
ļ‚¬WWW
ļ‚¬Political state of the world
ļ‚¬Maturation of Linguistics
ļ‚¬Development of hybrid statistical/symbolic
approaches
Three MT Approaches: Direct,
Transfer, Interlingual
Interlingua
Semantic
Structure
Semantic
Structure
Syntactic
Structure
Syntactic
Structure
Word
Structure
Word
Structure
Source Text Target Text
Semantic
Composition
Semantic
Decomposition
Semantic
Analysis
Semantic
Generation
Syntactic
Analysis
Syntactic
Generation
Morphological
Analysis
Morphological
Generation
Semantic
Transfer
Syntactic
Transfer
Direct
Examples of Three Approaches
ļ‚¬Direct:
ā€“ I checked his answers against those of the teacher ā†’
Yo comparƩ sus respuestas a las de la profesora
ā€“ Rule: [check X against Y] ā†’ [comparar X a Y]
ļ‚¬Transfer:
ā€“ Ich habe ihn gesehen ā†’ I have seen him
ā€“ Rule: [clause agt aux obj pred] ā†’ [clause agt aux pred ob
ļ‚¬Interlingual:
ā€“ I like Maryā†’ Mary me gusta a mĆ­
ā€“ Rep: [BeIdent (I [ATIdent (I, Mary)] Like+ingly)]
MT Systems: 1964-1990
ļ‚¬Direct: GAT [Georgetown, 1964],
TAUM-METEO [Colmerauer et al. 1971]
ļ‚¬Transfer: GETA/ARIANE [Boitet, 1978]
LMT [McCord, 1989], METAL [Thurmair,
1990], MiMo [Arnold & Sadler, 1990], ā€¦
ļ‚¬Interlingual: MOPTRANS [Schank, 1974],
KBMT [Nirenburg et al, 1992], UNITRAN
[Dorr, 1990]
Statistical MT and Hybrid
Symbolic/Stats MT: 1990-Present
Candide [Brown, 1990, 1992];
Halo/Nitrogen [Langkilde and Knight,
1998], [Yamada and Knight, 2002];
GHMT [Dorr and Habash, 2002];
DUSTer [Dorr et al. 2002]
Direct MT: Pros and Cons
ļ‚¬Pros
ā€“ Fast
ā€“ Simple
ā€“ Inexpensive
ā€“ No translation rules hidden in lexicon
ļ‚¬Cons
ā€“ Unreliable
ā€“ Not powerful
ā€“ Rule proliferation
ā€“ Requires too much context
ā€“ Major restructuring after lexical substitution
Transfer MT: Pros and Cons
ļ‚¬Pros
ā€“ Donā€™t need to find language-neutral rep
ā€“ Relatively fast
ļ‚¬Cons
ā€“ N2 sets of transfer rules: Difficult to extend
ā€“ Proliferation of language-specific rules in
lexicon and syntax
ā€“ Cross-language generalizations lost
Interlingual MT: Pros and Cons
ļ‚¬Pros
ā€“ Portable (avoids N2 problem)
ā€“ Lexical rules and structural transformations stated more
simply on normalized representation
ā€“ Explanatory Adequacy
ļ‚¬Cons
ā€“ Difficult to deal with terms on primitive level:
universals?
ā€“ Must decompose and reassemble concepts
ā€“ Useful information lost (paraphrase)
Approximate IL Approach
ļ‚¬Tap into richness of TL resources
ļ‚¬Use some, but not all, components of
IL representation
ļ‚¬Generate multiple sentences that are
statistically pared down
Approximating IL:
Handling Divergences
ļ‚¬Primitives
ļ‚¬Semantic Relations
ļ‚¬Lexical Information
Interlingual vs. Approximate IL
ļ‚¬ Interlingual MT:
ā€“ primitives & relations
ā€“ bi-directional lexicons
ā€“ analysis: compose IL
ā€“ generation: decompose IL
ļ‚¬ Approximate IL
ā€“ hybrid symbolic/statistical design
ā€“ overgeneration with statistical ranking
ā€“ uses dependency rep input and structural expansion
for ā€œdeeperā€ overgeneration
Mapping from Input Dependency
to English Dependency Tree
Knowledge Resources in English only: (LVD; Dorr, 2001).
Goal
GIVEV
MARY KICKN JOHN
Theme
Agent
[CAUSE GO]
Goal
KICKV
MARY JOHN
Agent
[CAUSE GO]
Mary le dio patadas a John ā†’ Mary kicked John
Statistical Extraction
Mary kicked John . [-0.670270 ]
Mary gave a kick at John . [-2.175831]
Mary gave the kick at John . [-3.969686]
Mary gave an kick at John . [-4.489933]
Mary gave a kick by John . [-4.803054]
Mary gave a kick to John . [-5.045810]
Mary gave a kick into John . [-5.810673]
Mary gave a kick through John . [-5.836419]
Mary gave a foot wound by John . [-6.041891]
Mary gave John a foot wound . [-6.212851]
Benefits of Approximate
IL Approach
ļ‚¬Explaining behaviors that appear to be
statistical in nature
ļ‚¬ā€œRe-sourceabilityā€: Re-use of already
existing components for MT from new
languages.
ļ‚¬Application to monolingual
alternations
What Resources are Required?
ļ‚¬Deep TL resources
ļ‚¬Requires SL parser and tralex
ļ‚¬TL resources are richer: LVD
representations, CatVar database
ļ‚¬Constrained overgeneration

More Related Content

Similar to NLP introduced and in 47 slides Lecture 1.ppt

Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing Rajnish Raj
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
Ā 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
Ā 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
Ā 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
Ā 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
Ā 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
Ā 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)IT Industry
Ā 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
Ā 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
Ā 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
Ā 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational SemanticsMarina Santini
Ā 

Similar to NLP introduced and in 47 slides Lecture 1.ppt (20)

Natural language procssing
Natural language procssing Natural language procssing
Natural language procssing
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Ā 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
Ā 
Nlp
NlpNlp
Nlp
Ā 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Ā 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
Ā 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
Ā 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
Ā 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)
Ā 
FinalReport
FinalReportFinalReport
FinalReport
Ā 
Lec 1
Lec 1Lec 1
Lec 1
Ā 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Ā 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
Ā 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
Ā 
Brain vs Computer
Brain vs ComputerBrain vs Computer
Brain vs Computer
Ā 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
Ā 
NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
Ā 
Lesson 40
Lesson 40Lesson 40
Lesson 40
Ā 
AI Lesson 40
AI Lesson 40AI Lesson 40
AI Lesson 40
Ā 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
Ā 

Recently uploaded

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
Ā 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
Ā 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
Ā 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
Ā 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
Ā 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
Ā 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
Ā 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
Ā 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
Ā 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Availablegargpaaro
Ā 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...HyderabadDolls
Ā 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
Ā 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
Ā 
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...HyderabadDolls
Ā 
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
Ā 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
Ā 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
Ā 
Call Girls in G.T.B. Nagar (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [šŸ”9953056974šŸ”] escort service 24X79953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
Ā 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
Ā 

Recently uploaded (20)

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Ā 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Ā 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Ā 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
Ā 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Ā 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Ā 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
Ā 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
Ā 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Ā 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Ā 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Ā 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
Ā 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Ā 
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ā‚¹,5K To @25k with A/C...
Ā 
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara šŸ’‹ Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Ā 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
Ā 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Ā 
Call Girls in G.T.B. Nagar (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7
Call Girls in G.T.B. Nagar  (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7Call Girls in G.T.B. Nagar  (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7
Call Girls in G.T.B. Nagar (delhi) call me [šŸ”9953056974šŸ”] escort service 24X7
Ā 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Ā 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Ā 

NLP introduced and in 47 slides Lecture 1.ppt

  • 1. CMSC 723 / LING 645: Intro to Computational Linguistics September 1, 2004: Dorr Overview, History, Goals, Problems, Techniques; Intro to MT (J&M 1, 21) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee
  • 3. Other Important Stuff ļ‚¬ This course is interdisciplinaryā€”cuts across different areas of expertise. Expect that a subset of the class will be learning new material at any time, while others will have to be patient! (The subsets will swap frequently!) ļ‚¬ Project 1 and Project 2 are designed differently. Be prepared for this distinction! ā€“ P1 will focus on the fundamentals, getting your feet wet with software. By the end, you should feel comfortable using/testing certain types of NLP software. ā€“ P2 will require a significantly deeper level of understanding, critique, analysis. Youā€™ll be expected to think deeply and write a lot in the second project. What you write will be a major portion of the grade! ļ‚¬ No solutions will be handed out. Written comments will be sent to you by the TA. ļ‚¬ All email correspondence MUST HAVE ā€œCMSC 723ā€ in the Subject line!!! ļ‚¬ Submission format for assignments, projects: plain ascii, pdf ļ‚¬ Assignment 1 will be posted next week.
  • 4. CL vs NLP Why ā€œComputational Linguistics (CL)ā€ rather than ā€œNatural Language Processingā€ (NLP)? ā€¢Computational Linguistics ā€” Computers dealing with language ā€” Modeling what people do ā€¢Natural Language Processing ā€”Applications on the computer side
  • 5. Relation of CL to Other Disciplines Artificial Intelligence (AI) (notions of rep, search, etc.) Machine Learning (particularly, probabilistic or statistic ML techniques) CL Linguistics (Syntax, Semantics, etc.) Psychology Electrical Engineering (EE) (Optical Character Recognition) Philosophy of Language, Formal Logic Information Retrieval Theory of Computation Human Computer Interaction (HCI)
  • 6. A Sampling of ā€œOther Disciplinesā€ ļ‚¬Linguistics: formal grammars, abstract characterization of what is to be learned. ļ‚¬Computer Science: algorithms for efficient learning or online deployment of these systems in automata. ļ‚¬Engineering: stochastic techniques for characterizing regular patterns for learning and ambiguity resolution. ļ‚¬Psychology: Insights into what linguistic constructions are easy or difficult for people to learn or to use
  • 7. History: 1940-1950ā€™s ļ‚¬Development of formal language theory (Chomsky, Kleene, Backus). ā€“ Formal characterization of classes of grammar (context-free, regular) ā€“ Association with relevant automata ļ‚¬ Probability theory: language understanding as decoding through noisy channel (Shannon) ā€“ Use of information theoretic concepts like entropy to measure success of language models.
  • 8. 1957-1983 Symbolic vs. Stochastic ļ‚¬Symbolic ā€“ Use of formal grammars as basis for natural language processing and learning systems. (Chomsky, Harris) ā€“ Use of logic and logic based programming for characterizing syntactic or semantic inference (Kaplan, Kay, Pereira) ā€“ First toy natural language understanding and generation systems (Woods, Minsky, Schank, Winograd, Colmerauer) ā€“ Discourse Processing: Role of Intention, Focus (Grosz, Sidner, Hobbs) ļ‚¬Stochastic Modeling ā€“ Probabilistic methods for early speech recognition, OCR (Bledsoe and Browning, Jelinek, Black, Mercer)
  • 9. 1983-1993: Return of Empiricism ļ‚¬Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc. ļ‚¬Comparison of stochastic, symbolic, more or less powerful models for language understanding and learning tasks.
  • 10. 1993-Present ļ‚¬Advances in software and hardware create NLP needs for information retrieval (web), machine translation, spelling and grammar checking, speech recognition and synthesis. ļ‚¬Stochastic and symbolic methods combine for real world applications.
  • 11. Language and Intelligence: Turing Test ļ‚¬Turing test: ā€“ machine, human, and human judge ļ‚¬Judge asks questions of computer and human. ā€“ Machineā€™s job is to act like a human, humanā€™s job is to convince judge that heā€™s not the machine. ā€“ Machine judged ā€œintelligentā€ if it can fool judge. ļ‚¬Judgement of ā€œintelligenceā€ linked to appropriate answers to questions from the system.
  • 12. ELIZA ļ‚¬Remarkably simple ā€œRogerian Psychologistā€ ļ‚¬Uses Pattern Matching to carry on limited form of conversation. ļ‚¬Seems to ā€œPass the Turing Test!ā€ (McCorduck, 1979, pp. 225-226) ļ‚¬Eliza Demo: http://www.lpa.co.uk/pws_dem4.htm
  • 13. Whatā€™s involved in an ā€œintelligentā€ Answer? Analysis: Decomposition of the signal (spoken or written) eventually into meaningful units. This involves ā€¦
  • 14. Speech/Character Recognition ļ‚¬Decomposition into words, segmentation of words into appropriate phones or letters ļ‚¬Requires knowledge of phonological patterns: ā€“ Iā€™m enormously proud. ā€“ I mean to make you proud.
  • 15. Morphological Analysis ļ‚¬Inflectional ā€“ duck + s = [N duck] + [plural s] ā€“ duck + s = [V duck] + [3rd person s] ļ‚¬Derivational ā€“ kind, kindness ļ‚¬Spelling changes ā€“ drop, dropping ā€“ hide, hiding
  • 16. Syntactic Analysis ļ‚¬Associate constituent structure with string ļ‚¬Prepare for semantic interpretation S NP VP I V NP watched det N the terrapin OR: watch Subject Object I terrapin Det the
  • 17. Semantics ļ‚¬A way of representing meaning ļ‚¬Abstracts away from syntactic structure ļ‚¬Example: ā€“ First-Order Logic: watch(I,terrapin) ā€“ Can be: ā€œI watched the terrapinā€ or ā€œThe terrapin was watched by meā€ ļ‚¬Real language is complex: ā€“ Who did I watch?
  • 18. Lexical Semantics The Terrapin, is who I watched. Watch the Terrapin is what I do best. *Terrapin is what I watched the I= experiencer Watch the Terrapin = predicate The Terrapin = patient
  • 19. Compositional Semantics ļ‚¬Association of parts of a proposition with semantic roles ļ‚¬Scoping Experiencer Predicate: Be (perc) I (1st pers, sg) pred patient saw the Terrapin Proposition
  • 20. Word-Governed Semantics ļ‚¬Any verb can add ā€œableā€ to form an adjective. ā€“ I taught the class . The class is teachable ā€“ I rejected the idea. The idea is rejectable. ļ‚¬Association of particular words with specific semantic forms. ā€“ John (masculine) ā€“ The boys ( masculine, plural, human)
  • 21. Pragmatics ļ‚¬Real world knowledge, speaker intention, goal of utterance. ļ‚¬Related to sociology. ļ‚¬Example 1: ā€“ Could you turn in your assignments now (command) ā€“ Could you finish the homework? (question, command) ļ‚¬Example 2: ā€“ I couldnā€™t decide how to catch the crook. Then I decided to spy on the crook with binoculars. ā€“ To my surprise, I found out he had them too. Then I knew to just follow the crook with binoculars. [ the crook [with binoculars]] [ the crook] [ with binoculars]
  • 22. Discourse Analysis ļ‚¬Discourse: How propositions fit together in a conversationā€”multi-sentence processing. ā€“ Pronoun reference: The professor told the student to finish the assignment. He was pretty aggravated at how long it was taking to pass it in. ā€“ Multiple reference to same entity: George W. Bush, president of the U.S. ā€“ Relation between sentences: John hit the man. He had stolen his bicycle
  • 23. NLP Pipeline Phonetic Analysis Morphological analysis OCR/Tokenization Syntactic analysis Semantic Interpretation Discourse Processing speech text
  • 24. Relation to Machine Translation Morphological analysis Syntactic analysis Semantic Interpretation Interlingua input analysis generation Morphological synthesis Syntactic realization Lexical selection output
  • 25. Ambiguity I made her duck I made duckling for her I made the duckling belonging to her I created the duck she owns I forced her to lower her head By magic, I changed her into a duck
  • 26. S S NP VP NP VP I V NP VP I V NP made her V made det N duck her duck Syntactic Disambiguation ļ‚¬Structural ambiguity:
  • 27. Part of Speech Tagging and Word Sense Disambiguation ļ‚¬[verb Duck ] ! [noun Duck] is delicious for dinner ļ‚¬I went to the bank to deposit my check. I went to the bank to look out at the river. I went to the bank of windows and chose the one dealing with last names beginning with ā€œdā€.
  • 28. Resources for NLP Systems ā€¢ Dictionary ā€¢ Morphology and Spelling Rules ā€¢ Grammar Rules ā€¢ Semantic Interpretation Rules ā€¢ Discourse Interpretation Natural Language processing involves (1) learning or fashioning the rules for each component, (2) embedding the rules in the relevant automaton, (3) and using the automaton to efficiently process the input .
  • 29. Some NLP Applications ļ‚¬ Machine Translationā€”Babelfish (Alta Vista): ļ‚¬ Question Answeringā€”Ask Jeeves (Ask Jeeves): ļ‚¬ Language Summarizationā€”MEAD (U. Michigan): ļ‚¬ Spoken Language Recognitionā€” EduSpeak (SRI): ļ‚¬ Automatic Essay evaluationā€”E-Rater (ETS): ļ‚¬ Information Retrieval and Extractionā€”NetOwl (SRA): http://babelfish.altavista.com/translate.dyn http://www.ets.org/research/erater.html http://www.eduspeak.com/ http://www.netowl.com/extractor_summary.html http://www.ask.com/ http://www.summarization.com/mead
  • 30. What is MT? ļ‚¬Definition: Translation from one natural language to another by means of a computerized system ļ‚¬Early failures ļ‚¬Later: varying degrees of success
  • 31. An Old Example The spirit is willing but the flesh is weak The vodka is good but the meat is rotten
  • 32. Machine Translation History ļ‚¬1950ā€™s: Intensive research activity in MT ļ‚¬1960ā€™s: Direct word-for-word replacement ļ‚¬1966 (ALPAC): NRC Report on MT ļ‚¬Conclusion: MT no longer worthy of serious scientific investigation. ļ‚¬1966-1975: `Recovery periodā€™ ļ‚¬1975-1985: Resurgence (Europe, Japan) ļ‚¬1985-present: Resurgence (US) http://ourworld.compuserve.com/homepages/WJHutchins/MTS-93.htm.
  • 33. What happened between ALPAC and Now? ļ‚¬Need for MT and other NLP applications confirmed ļ‚¬Change in expectations ļ‚¬Computers have become faster, more powerful ļ‚¬WWW ļ‚¬Political state of the world ļ‚¬Maturation of Linguistics ļ‚¬Development of hybrid statistical/symbolic approaches
  • 34. Three MT Approaches: Direct, Transfer, Interlingual Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct
  • 35. Examples of Three Approaches ļ‚¬Direct: ā€“ I checked his answers against those of the teacher ā†’ Yo comparĆ© sus respuestas a las de la profesora ā€“ Rule: [check X against Y] ā†’ [comparar X a Y] ļ‚¬Transfer: ā€“ Ich habe ihn gesehen ā†’ I have seen him ā€“ Rule: [clause agt aux obj pred] ā†’ [clause agt aux pred ob ļ‚¬Interlingual: ā€“ I like Maryā†’ Mary me gusta a mĆ­ ā€“ Rep: [BeIdent (I [ATIdent (I, Mary)] Like+ingly)]
  • 36. MT Systems: 1964-1990 ļ‚¬Direct: GAT [Georgetown, 1964], TAUM-METEO [Colmerauer et al. 1971] ļ‚¬Transfer: GETA/ARIANE [Boitet, 1978] LMT [McCord, 1989], METAL [Thurmair, 1990], MiMo [Arnold & Sadler, 1990], ā€¦ ļ‚¬Interlingual: MOPTRANS [Schank, 1974], KBMT [Nirenburg et al, 1992], UNITRAN [Dorr, 1990]
  • 37. Statistical MT and Hybrid Symbolic/Stats MT: 1990-Present Candide [Brown, 1990, 1992]; Halo/Nitrogen [Langkilde and Knight, 1998], [Yamada and Knight, 2002]; GHMT [Dorr and Habash, 2002]; DUSTer [Dorr et al. 2002]
  • 38. Direct MT: Pros and Cons ļ‚¬Pros ā€“ Fast ā€“ Simple ā€“ Inexpensive ā€“ No translation rules hidden in lexicon ļ‚¬Cons ā€“ Unreliable ā€“ Not powerful ā€“ Rule proliferation ā€“ Requires too much context ā€“ Major restructuring after lexical substitution
  • 39. Transfer MT: Pros and Cons ļ‚¬Pros ā€“ Donā€™t need to find language-neutral rep ā€“ Relatively fast ļ‚¬Cons ā€“ N2 sets of transfer rules: Difficult to extend ā€“ Proliferation of language-specific rules in lexicon and syntax ā€“ Cross-language generalizations lost
  • 40. Interlingual MT: Pros and Cons ļ‚¬Pros ā€“ Portable (avoids N2 problem) ā€“ Lexical rules and structural transformations stated more simply on normalized representation ā€“ Explanatory Adequacy ļ‚¬Cons ā€“ Difficult to deal with terms on primitive level: universals? ā€“ Must decompose and reassemble concepts ā€“ Useful information lost (paraphrase)
  • 41. Approximate IL Approach ļ‚¬Tap into richness of TL resources ļ‚¬Use some, but not all, components of IL representation ļ‚¬Generate multiple sentences that are statistically pared down
  • 43. Interlingual vs. Approximate IL ļ‚¬ Interlingual MT: ā€“ primitives & relations ā€“ bi-directional lexicons ā€“ analysis: compose IL ā€“ generation: decompose IL ļ‚¬ Approximate IL ā€“ hybrid symbolic/statistical design ā€“ overgeneration with statistical ranking ā€“ uses dependency rep input and structural expansion for ā€œdeeperā€ overgeneration
  • 44. Mapping from Input Dependency to English Dependency Tree Knowledge Resources in English only: (LVD; Dorr, 2001). Goal GIVEV MARY KICKN JOHN Theme Agent [CAUSE GO] Goal KICKV MARY JOHN Agent [CAUSE GO] Mary le dio patadas a John ā†’ Mary kicked John
  • 45. Statistical Extraction Mary kicked John . [-0.670270 ] Mary gave a kick at John . [-2.175831] Mary gave the kick at John . [-3.969686] Mary gave an kick at John . [-4.489933] Mary gave a kick by John . [-4.803054] Mary gave a kick to John . [-5.045810] Mary gave a kick into John . [-5.810673] Mary gave a kick through John . [-5.836419] Mary gave a foot wound by John . [-6.041891] Mary gave John a foot wound . [-6.212851]
  • 46. Benefits of Approximate IL Approach ļ‚¬Explaining behaviors that appear to be statistical in nature ļ‚¬ā€œRe-sourceabilityā€: Re-use of already existing components for MT from new languages. ļ‚¬Application to monolingual alternations
  • 47. What Resources are Required? ļ‚¬Deep TL resources ļ‚¬Requires SL parser and tralex ļ‚¬TL resources are richer: LVD representations, CatVar database ļ‚¬Constrained overgeneration