SlideShare a Scribd company logo
1 of 26
Natural Language Processing
Rada Mihalcea
Fall 2008
Any Light at The End of The Tunnel?
• Yahoo, Google, Microsoft  Information Retrieval
• Monster.com, HotJobs.com (Job finders)  Information Extraction +
Information Retrieval
• Systran powers Babelfish  Machine Translation
• Ask Jeeves  Question Answering
• Myspace, Facebook, Blogspot  Processing of User-Generated Content
• Tools for “business intelligence”
• All “Big Guys” have (several) strong NLP research labs:
– IBM, Microsoft, AT&T, Xerox, Sun, etc.
• Academia: research in an university environment
Why Natural Language Processing ?
• Huge amounts of data
– Internet = at least 20
billions pages
– Intranet
• Applications for
processing large
amounts of texts
require NLP expertise
• Classify text into categories
• Index and search large texts
• Automatic translation
• Speech understanding
– Understand phone conversations
• Information extraction
– Extract useful information from resumes
• Automatic summarization
– Condense 1 book into 1 page
• Question answering
• Knowledge acquisition
• Text generations / dialogues
Natural?
• Natural Language?
– Refers to the language spoken by people, e.g. English,
Japanese, Swahili, as opposed to artificial languages, like
C++, Java, etc.
• Natural Language Processing
– Applications that deal with natural language in a way or
another
• [Computational Linguistics
– Doing linguistics on computers
– More on the linguistic side than NLP, but closely related ]
Why Natural Language Processing?
• kJfmmfj mmmvvv nnnffn333
• Uj iheale eleee mnster vensi credur
• Baboi oi cestnitze
• Coovoel2^ ekk; ldsllk lkdf vnnjfj?
• Fgmflmllk mlfm kfre xnnn!
Computers Lack Knowledge!
• Computers “see” text in English the same you have
seen the previous text!
• People have no trouble understanding language
– Common sense knowledge
– Reasoning capacity
– Experience
• Computers have
– No common sense knowledge
– No reasoning capacity
Where does it fit in the CS taxonomy?
Computers
Artificial Intelligence Algorithms
Databases Networking
Robotics Search
Natural Language Processing
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics Parsing
Linguistics Levels of Analysis
• Speech
• Written language
– Phonology: sounds / letters / pronunciation
– Morphology: the structure of words
– Syntax: how these sequences are structured
– Semantics: meaning of the strings
• Interaction between levels
Issues in Syntax
“the dog ate my homework” - Who did what?
1. Identify the part of speech (POS)
Dog = noun ; ate = verb ; homework = noun
English POS tagging: 95%
2. Identify collocations
mother in law, hot dog
Compositional versus non-compositional
collocates
Issues in Syntax
• Shallow parsing:
“the dog chased the bear”
“the dog” “chased the bear”
subject - predicate
Identify basic structures
NP-[the dog] VP-[chased the bear]
Issues in Syntax
• Full parsing: John loves Mary
Help figuring out (automatically) questions like: Who did what
and when?
More Issues in Syntax
• Anaphora Resolution:
“The dog entered my room. It scared me”
• Preposition Attachment
“I saw the man in the park with a telescope”
Issues in Semantics
• Understand language! How?
• “plant” = industrial plant
• “plant” = living organism
• Words are ambiguous
• Importance of semantics?
– Machine Translation: wrong translations
– Information Retrieval: wrong information
– Anaphora Resolution: wrong referents
• The sea is at the home for billions factories and
animals
• The sea is home to million of plants and
animals
• English  French [commercial MT system]
• Le mer est a la maison de billion des usines et
des animaux
• French  English
Why Semantics?
Issues in Semantics
• How to learn the meaning of words?
• From dictionaries:
plant, works, industrial plant -- (buildings for carrying on
industrial labor; "they built a large plant to manufacture
automobiles")
plant, flora, plant life -- (a living organism lacking the power of
locomotion)
They are producing about 1,000 automobiles in the new plant
The sea flora consists in 1,000 different plant species
The plant was close to the farm of animals.
Issues in Semantics
• Learn from annotated examples:
– Assume 100 examples containing “plant”
previously tagged by a human
– Train a learning algorithm
– How to choose the learning algorithm?
– How to obtain the 100 tagged examples?
Issues in Learning Semantics
• Learning?
– Assume a (large) amount of annotated data = training
– Assume a new text not annotated = test
• Learn from previous experience (training) to
classify new data (test)
• Decision trees, memory based learning, neural
networks
– Machine Learning
Issues in Information Extraction
• “There was a group of about 8-9 people close to
the entrance on Highway 75”
• Who? “8-9 people”
• Where? “highway 75”
• Extract information
• Detect new patterns:
– Detect hacking / hidden information / etc.
• Gov./mil. puts lots of money put into IE
research
Issues in Information Retrieval
• General model:
– A huge collection of texts
– A query
• Task: find documents that are relevant to the given
query
• How? Create an index, like the index in a book
• More …
– Vector-space models
– Boolean models
• Examples: Google, Yahoo, Altavista, etc.
Issues in Information Retrieval
• Retrieve specific information
• Question Answering
• “What is the height of mount Everest?”
• 11,000 feet
Issues in Information Retrieval
• Find information across languages!
• Cross Language Information Retrieval
• “What is the minimum age requirement for car
rental in Italy?”
• Search also Italian texts for “eta minima per
noleggio macchine”
• Integrate large number of languages
• Integrate into performant IR engines
Issues in Machine Translations
• Text to Text Machine Translations
• Speech to Speech Machine Translations
• Most of the work has addressed pairs of widely
spread languages like English-French, English-
Chinese
Issues in Machine Translations
• How to translate text?
– Learn from previously translated data
•  Need parallel corpora
• French-English, Chinese-English have the
Hansards
• Reasonable translations
• Chinese-Hindi – no such tools available today!
Even More
• Discourse
• Summarization
• Subjectivity and sentiment analysis
• Text generation, dialog [pass the Turing test for
some million dollars] – Loebner prize
• Knowledge acquisition [how to get that
common sense knowledge]
• Speech processing
What will we study this semester?
• Intro to Perl
– Great great for text processing
– Fast: one person can do the work of ten others
– Easy to pick up
• Some linguistic basics
– Structure of English
– Parts of speech, phrases, parsing
• Morphology
• N-grams
– Also multi-word expressions
• Part of speech tagging
• Syntactic parsing
• Semantics
– Word sense disambiguation
What will we study this semester?
• Information Retrieval
• Question answering
• Text summarization
• Sentiment analysis
• Machine Translation
• Depending on time, we may touch on
– Speech recognition
– Dialogue
– Text generation
– Other topics of your interest

More Related Content

Similar to Natural_Language_Processing_1.ppt

Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Artificial intelligence(01)
Artificial intelligence(01)Artificial intelligence(01)
Artificial intelligence(01)Nazir Ahmed
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Nikola Milosevic
 
topics natural language processing and image processing
topics natural language processing and image processingtopics natural language processing and image processing
topics natural language processing and image processingyoukayaslam
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingMustafa Jarrar
 
introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptTemesgenTolcha2
 

Similar to Natural_Language_Processing_1.ppt (20)

The tipping point
The tipping pointThe tipping point
The tipping point
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
1004-nlp.ppt
1004-nlp.ppt1004-nlp.ppt
1004-nlp.ppt
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Artificial intelligence(01)
Artificial intelligence(01)Artificial intelligence(01)
Artificial intelligence(01)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
AI Introduction
AI Introduction AI Introduction
AI Introduction
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
 
L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
Universal design HCI
Universal design HCIUniversal design HCI
Universal design HCI
 
way_topics.ppt
way_topics.pptway_topics.ppt
way_topics.ppt
 
topics natural language processing and image processing
topics natural language processing and image processingtopics natural language processing and image processing
topics natural language processing and image processing
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
NLTK
NLTKNLTK
NLTK
 
Jarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language ProcessingJarrar: Introduction to Natural Language Processing
Jarrar: Introduction to Natural Language Processing
 
introduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).pptintroduction to natural language processing(NLP).ppt
introduction to natural language processing(NLP).ppt
 

Recently uploaded

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 

Recently uploaded (20)

Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 

Natural_Language_Processing_1.ppt

  • 1. Natural Language Processing Rada Mihalcea Fall 2008
  • 2. Any Light at The End of The Tunnel? • Yahoo, Google, Microsoft  Information Retrieval • Monster.com, HotJobs.com (Job finders)  Information Extraction + Information Retrieval • Systran powers Babelfish  Machine Translation • Ask Jeeves  Question Answering • Myspace, Facebook, Blogspot  Processing of User-Generated Content • Tools for “business intelligence” • All “Big Guys” have (several) strong NLP research labs: – IBM, Microsoft, AT&T, Xerox, Sun, etc. • Academia: research in an university environment
  • 3. Why Natural Language Processing ? • Huge amounts of data – Internet = at least 20 billions pages – Intranet • Applications for processing large amounts of texts require NLP expertise • Classify text into categories • Index and search large texts • Automatic translation • Speech understanding – Understand phone conversations • Information extraction – Extract useful information from resumes • Automatic summarization – Condense 1 book into 1 page • Question answering • Knowledge acquisition • Text generations / dialogues
  • 4. Natural? • Natural Language? – Refers to the language spoken by people, e.g. English, Japanese, Swahili, as opposed to artificial languages, like C++, Java, etc. • Natural Language Processing – Applications that deal with natural language in a way or another • [Computational Linguistics – Doing linguistics on computers – More on the linguistic side than NLP, but closely related ]
  • 5. Why Natural Language Processing? • kJfmmfj mmmvvv nnnffn333 • Uj iheale eleee mnster vensi credur • Baboi oi cestnitze • Coovoel2^ ekk; ldsllk lkdf vnnjfj? • Fgmflmllk mlfm kfre xnnn!
  • 6. Computers Lack Knowledge! • Computers “see” text in English the same you have seen the previous text! • People have no trouble understanding language – Common sense knowledge – Reasoning capacity – Experience • Computers have – No common sense knowledge – No reasoning capacity
  • 7. Where does it fit in the CS taxonomy? Computers Artificial Intelligence Algorithms Databases Networking Robotics Search Natural Language Processing Information Retrieval Machine Translation Language Analysis Semantics Parsing
  • 8. Linguistics Levels of Analysis • Speech • Written language – Phonology: sounds / letters / pronunciation – Morphology: the structure of words – Syntax: how these sequences are structured – Semantics: meaning of the strings • Interaction between levels
  • 9. Issues in Syntax “the dog ate my homework” - Who did what? 1. Identify the part of speech (POS) Dog = noun ; ate = verb ; homework = noun English POS tagging: 95% 2. Identify collocations mother in law, hot dog Compositional versus non-compositional collocates
  • 10. Issues in Syntax • Shallow parsing: “the dog chased the bear” “the dog” “chased the bear” subject - predicate Identify basic structures NP-[the dog] VP-[chased the bear]
  • 11. Issues in Syntax • Full parsing: John loves Mary Help figuring out (automatically) questions like: Who did what and when?
  • 12. More Issues in Syntax • Anaphora Resolution: “The dog entered my room. It scared me” • Preposition Attachment “I saw the man in the park with a telescope”
  • 13. Issues in Semantics • Understand language! How? • “plant” = industrial plant • “plant” = living organism • Words are ambiguous • Importance of semantics? – Machine Translation: wrong translations – Information Retrieval: wrong information – Anaphora Resolution: wrong referents
  • 14. • The sea is at the home for billions factories and animals • The sea is home to million of plants and animals • English  French [commercial MT system] • Le mer est a la maison de billion des usines et des animaux • French  English Why Semantics?
  • 15. Issues in Semantics • How to learn the meaning of words? • From dictionaries: plant, works, industrial plant -- (buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles") plant, flora, plant life -- (a living organism lacking the power of locomotion) They are producing about 1,000 automobiles in the new plant The sea flora consists in 1,000 different plant species The plant was close to the farm of animals.
  • 16. Issues in Semantics • Learn from annotated examples: – Assume 100 examples containing “plant” previously tagged by a human – Train a learning algorithm – How to choose the learning algorithm? – How to obtain the 100 tagged examples?
  • 17. Issues in Learning Semantics • Learning? – Assume a (large) amount of annotated data = training – Assume a new text not annotated = test • Learn from previous experience (training) to classify new data (test) • Decision trees, memory based learning, neural networks – Machine Learning
  • 18. Issues in Information Extraction • “There was a group of about 8-9 people close to the entrance on Highway 75” • Who? “8-9 people” • Where? “highway 75” • Extract information • Detect new patterns: – Detect hacking / hidden information / etc. • Gov./mil. puts lots of money put into IE research
  • 19. Issues in Information Retrieval • General model: – A huge collection of texts – A query • Task: find documents that are relevant to the given query • How? Create an index, like the index in a book • More … – Vector-space models – Boolean models • Examples: Google, Yahoo, Altavista, etc.
  • 20. Issues in Information Retrieval • Retrieve specific information • Question Answering • “What is the height of mount Everest?” • 11,000 feet
  • 21. Issues in Information Retrieval • Find information across languages! • Cross Language Information Retrieval • “What is the minimum age requirement for car rental in Italy?” • Search also Italian texts for “eta minima per noleggio macchine” • Integrate large number of languages • Integrate into performant IR engines
  • 22. Issues in Machine Translations • Text to Text Machine Translations • Speech to Speech Machine Translations • Most of the work has addressed pairs of widely spread languages like English-French, English- Chinese
  • 23. Issues in Machine Translations • How to translate text? – Learn from previously translated data •  Need parallel corpora • French-English, Chinese-English have the Hansards • Reasonable translations • Chinese-Hindi – no such tools available today!
  • 24. Even More • Discourse • Summarization • Subjectivity and sentiment analysis • Text generation, dialog [pass the Turing test for some million dollars] – Loebner prize • Knowledge acquisition [how to get that common sense knowledge] • Speech processing
  • 25. What will we study this semester? • Intro to Perl – Great great for text processing – Fast: one person can do the work of ten others – Easy to pick up • Some linguistic basics – Structure of English – Parts of speech, phrases, parsing • Morphology • N-grams – Also multi-word expressions • Part of speech tagging • Syntactic parsing • Semantics – Word sense disambiguation
  • 26. What will we study this semester? • Information Retrieval • Question answering • Text summarization • Sentiment analysis • Machine Translation • Depending on time, we may touch on – Speech recognition – Dialogue – Text generation – Other topics of your interest