SlideShare a Scribd company logo
Introduction to Natural Language
Processing, History and Origin
By:-
Shubhankar Mohan
The Dream
•It’d be great if machines could
–Process our email (usefully)
–Translate languages accurately
–Help us manage, summarize, and
aggregate information
–Use speech as a UI (when needed)
–Talk to us / listen to us
•But they can’t:
–Language is complex, ambiguous,
flexible, and subtle
–Good solutions need linguistics
and machine learning knowledge
•So:
The Mystery
• What’s now impossible for computers (and any other
species) to do is effortless for humans.
✕ ✕ ✓
The Mystery (continued)
• Patrick Suppes, eminent philosopher, in his 1978
autobiography:
“…the challenge to psychological theory made by linguists to
provide an adequate theory of language learning may well
be regarded as the most significant intellectual challenge to
theoretical psychology in this century.”
• So far, this challenge is still unmet in the 21st century
• Natural language processing (NLP) is the discipline in
which we study the tools that bring us closer to meeting
this challenge
What is NLP?
• Fundamental goal: deep understand of broad language
• Not just string processing or keyword matching!
What is NLP
• Computers use (analyze, understand, generate) natural language
• Text Processing
• Lexical: tokenization, part of speech, head, lemmas
• Parsing and chunking
• Semantic tagging: semantic role, word sense
• Certain expressions: named entities
• Discourse: coreference, discourse segments
• Speech Processing
• Phonetic transcription
• Segmentation (punctuations)
History of NLP
• First Introduced in 1950’s by Alan Turing.
• Georgetown Experiment in 1954, six Russian sentences
translated to English.
• Upto 80’s, NLP was governed by hand written rules only.
• From 80’s onward, introduction of ML gave NLP new
dimensions.
• In recent years, there has been a flurry of results showing
deep learning techniques.
Why Should You Care?
Trends
1. An enormous amount of knowledge is now available in
machine readable form as natural language text
2. Conversational agents are becoming an important form of
human-computer communication
3. Much of human-human communication is now mediated
by computers
Motivation for NLP
• Understand language analysis & generation
• Communication
• Language is a window to the mind
• Data is in linguistic form
• Data can be in Structured (table form), Semi
structured (XML form), Unstructured
(sentence form).
Components of NLP
There are two components of NLP as given −
1. Natural Language Understanding (NLU)
Understanding involves the following tasks −
1. Mapping the given input in natural language into useful representations.
2. Analysing different aspects of the language.
2. Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the
form of natural language from some internal representation. It involves :
1.Text planning − It includes retrieving the relevant content from
knowledge base.
2.Sentence planning − It includes choosing required words, forming
meaningful phrases, setting tone of the sentence.
3.Text Realization − It is mapping sentence plan into sentence structure.
Two Contrasting Views of Language
• Language as a phenomenon
• Language as a data
Language Processing
• Level 1 – Speech sound (Phonetics & Phonology)
• Level 2 – Words & their forms (Morphology, Lexicon)
• Level 3 – Structure of sentences (Syntax, Parsing)
• Level 4 – Meaning of sentences (Semantics)
• Level 5 – Meaning in context & for a purpose
(Pragmatics)
• Level 6 – Connected sentence processing in a larger
body of text (Discourse)
Syntactic Analysis (Parsing) − It involves analysis of words in the
sentence for grammar and arranging words in a manner that
shows the relationship among the words.
Steps in NLP
Discourse Integration − The meaning of any sentence
depends upon the meaning of the sentence just before it. In
addition, it also brings about the meaning of immediately
succeeding sentence.
Semantic Analysis − It draws the exact meaning or the
dictionary meaning from the text. The text is checked for
meaningfulness. It is done by mapping syntactic structures
and objects in the task domain.
Lexical Analysis − It involves identifying and analysing the
structure of words. Lexical analysis is dividing the whole chunk
of txt into paragraphs, sentences, and words.
Pragmatic Analysis − During this, what was said is re-
interpreted on what it actually meant. It involves deriving
those aspects of language which require real world
knowledge.
Ambiguity in Natural Language
NL has an extremely rich form and structure.
It is very ambiguous. There can be different levels of ambiguity :
• Lexical ambiguity − It is at very primitive level such as word-level.
For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different
ways.
For example, “He lifted the beetle with red cap.” − Did he use cap to
lift the beetle or he lifted a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns.
For example, Rima went to Gauri. She said, “I am tired.” − Exactly
who is tired?
- One input can mean different meanings.
- Many inputs can mean the same thing.
Conti…
Methods of NLP
• Mathematical Approach
• Pattern Matching
• Statistical Approach
• Bayesian Method
• HMM
• Soft Computing Approach
• Neural Network
Pattern Matching
Exact Pattern Matching
Problem: Find first match of a pattern of length M in a text stream of
length N..(N>>M)
Pattern: needle (M = 6)
Text: ianahaystackanneedleina(N=21)
Challenges:
• Brute-force is not good enough for all applications
• Theoretical challenge: Linear-time guarantee. Fundamental
Algorithmic Problem
• Practical challenge: Avoid backup in text stream. Often no room or
time to save text.
Knuth-Morris-Pratt (KMP) exact pattern-matching algorithm
Named after Don Knuth, Jim Morris, Vaughan Pratt
Classic algorithm that meets both challenges
• linear-time guarantee
• no backup in text stream
Basic plan (for binary alphabet)
• build DFA from pattern
• simulate DFA with text as input
Input Text
DFA for
pattern
Accept or
Reject
Regular-expression pattern matching
.
Search for occurrences of one of multiple patterns in a text file
Ex. (genomics)
• Fragile X syndrome is a common cause of mental retardation.
• human genome contains triplet repeats of cgg or agg
• bracketed by gcg at the beginning and ctg at the end
• number of repeats is variable, and correlated with syndrome
• use regular expression to specify pattern: gcg(cgg|agg)*ctg
• do RE pattern match on person’s genome to detect Fragile x
GREP
GREP stands for Global Regular Expression and Print. It was
introduced by Ken Thompson.
Basic Plan for GREP
• build DFA from RE
• simulate DFA with text as input
TEXT DFA for pattern
gcg(ccg|agg)*ctg
Accept/Reje
ct
Bayesian Method
• Uses Bayes Rule
• The Naive Bayes Assumption: Assume that all features
are independent given the class label Y
Example: Play Tennis
Learning Phase
HIDDEN MARKOV MODEL
A hidden Markov model (HMM) is a statistical model, in
which the system being modeled is assumed to be a
Markov process (Memoryless process: its future and past
are independent ) with hidden states
We want to find that:
given the past data of outcomes what is the probability of
any possible outcome today.
Example
• If the weather yesterday was rainy and today is foggy
what is the probability that tomorrow it will be sunny?
• Using Bayes rule:
• For n days:
Commercial World
• Lot’s of exciting stuff is going on in industry…
Powerset
Major Topics
1. Words
2. Syntax
3. Meaning
4. Discourse
5. Applications exploiting each
Applications
● Fighting Spam
○ Spam filters have become important as the first line of defense
against the ever-increasing problem of unwanted email.
● Information Extraction
○ Many important decisions in financial markets are increasingly
moving away from human oversight and control. Algorithmic trading
is becoming more popular.
● Summarization
○ Ability to summarize the meaning of documents and information is
becoming increasingly important.
● NLU interfaces to databases
○ intelligent natural language databases have been developed they
provide flexible options for manipulating the queries
● Intelligent Web searching
○ Natural language processing has made web search more intelligent
by transforming it from keyword based to expression based
Applications cont...
● Machine Translation
○ is a subfield of computational linguistics that investigates the
use of software to translate text or speech from one language
to another.
Natural Language Generation
• task of generating natural language from a machine
representation system such as a knowledge base or a logical
form
Conti…
• Speech Recognition
The process of enabling a computer to identify and respond
to the sounds produced in human speech.
Introduction to natural language processing, history and origin

More Related Content

What's hot

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 

What's hot (20)

Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
NLP
NLPNLP
NLP
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language Processing Natural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processing Natural language processing
Natural language processing
 

Similar to Introduction to natural language processing, history and origin

Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write5
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
write4
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
tanishamahajan11
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
c sharada
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
Aravind Reddy
 

Similar to Introduction to natural language processing, history and origin (20)

NLPinAAC
NLPinAACNLPinAAC
NLPinAAC
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...Jawaharlal Nehru Technological University Natural Language Processing Capston...
Jawaharlal Nehru Technological University Natural Language Processing Capston...
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
NLP.pptx
NLP.pptxNLP.pptx
NLP.pptx
 
Natural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and DifficultiesNatural Language Processing Theory, Applications and Difficulties
Natural Language Processing Theory, Applications and Difficulties
 
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEYA DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
A DECADE OF USING HYBRID INFERENCE SYSTEMS IN NLP (2005 – 2015): A SURVEY
 
Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)Domain Specific Terminology Extraction (ICICT 2006)
Domain Specific Terminology Extraction (ICICT 2006)
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
Nltk
NltkNltk
Nltk
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
A neural probabilistic language model
A neural probabilistic language modelA neural probabilistic language model
A neural probabilistic language model
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
Natural Language Processing for development
Natural Language Processing for developmentNatural Language Processing for development
Natural Language Processing for development
 
An Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLPAn Introduction to Recent Advances in the Field of NLP
An Introduction to Recent Advances in the Field of NLP
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 

Recently uploaded

Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
Kamal Acharya
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Atif Razi
 

Recently uploaded (20)

BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 

Introduction to natural language processing, history and origin

  • 1. Introduction to Natural Language Processing, History and Origin By:- Shubhankar Mohan
  • 2. The Dream •It’d be great if machines could –Process our email (usefully) –Translate languages accurately –Help us manage, summarize, and aggregate information –Use speech as a UI (when needed) –Talk to us / listen to us •But they can’t: –Language is complex, ambiguous, flexible, and subtle –Good solutions need linguistics and machine learning knowledge •So:
  • 3. The Mystery • What’s now impossible for computers (and any other species) to do is effortless for humans. ✕ ✕ ✓
  • 4. The Mystery (continued) • Patrick Suppes, eminent philosopher, in his 1978 autobiography: “…the challenge to psychological theory made by linguists to provide an adequate theory of language learning may well be regarded as the most significant intellectual challenge to theoretical psychology in this century.” • So far, this challenge is still unmet in the 21st century • Natural language processing (NLP) is the discipline in which we study the tools that bring us closer to meeting this challenge
  • 5. What is NLP? • Fundamental goal: deep understand of broad language • Not just string processing or keyword matching!
  • 6. What is NLP • Computers use (analyze, understand, generate) natural language • Text Processing • Lexical: tokenization, part of speech, head, lemmas • Parsing and chunking • Semantic tagging: semantic role, word sense • Certain expressions: named entities • Discourse: coreference, discourse segments • Speech Processing • Phonetic transcription • Segmentation (punctuations)
  • 7. History of NLP • First Introduced in 1950’s by Alan Turing. • Georgetown Experiment in 1954, six Russian sentences translated to English. • Upto 80’s, NLP was governed by hand written rules only. • From 80’s onward, introduction of ML gave NLP new dimensions. • In recent years, there has been a flurry of results showing deep learning techniques.
  • 8. Why Should You Care? Trends 1. An enormous amount of knowledge is now available in machine readable form as natural language text 2. Conversational agents are becoming an important form of human-computer communication 3. Much of human-human communication is now mediated by computers
  • 9. Motivation for NLP • Understand language analysis & generation • Communication • Language is a window to the mind • Data is in linguistic form • Data can be in Structured (table form), Semi structured (XML form), Unstructured (sentence form).
  • 10. Components of NLP There are two components of NLP as given − 1. Natural Language Understanding (NLU) Understanding involves the following tasks − 1. Mapping the given input in natural language into useful representations. 2. Analysing different aspects of the language. 2. Natural Language Generation (NLG) It is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation. It involves : 1.Text planning − It includes retrieving the relevant content from knowledge base. 2.Sentence planning − It includes choosing required words, forming meaningful phrases, setting tone of the sentence. 3.Text Realization − It is mapping sentence plan into sentence structure.
  • 11. Two Contrasting Views of Language • Language as a phenomenon • Language as a data
  • 12. Language Processing • Level 1 – Speech sound (Phonetics & Phonology) • Level 2 – Words & their forms (Morphology, Lexicon) • Level 3 – Structure of sentences (Syntax, Parsing) • Level 4 – Meaning of sentences (Semantics) • Level 5 – Meaning in context & for a purpose (Pragmatics) • Level 6 – Connected sentence processing in a larger body of text (Discourse)
  • 13. Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and arranging words in a manner that shows the relationship among the words. Steps in NLP Discourse Integration − The meaning of any sentence depends upon the meaning of the sentence just before it. In addition, it also brings about the meaning of immediately succeeding sentence. Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain. Lexical Analysis − It involves identifying and analysing the structure of words. Lexical analysis is dividing the whole chunk of txt into paragraphs, sentences, and words. Pragmatic Analysis − During this, what was said is re- interpreted on what it actually meant. It involves deriving those aspects of language which require real world knowledge.
  • 14. Ambiguity in Natural Language NL has an extremely rich form and structure. It is very ambiguous. There can be different levels of ambiguity : • Lexical ambiguity − It is at very primitive level such as word-level. For example, treating the word “board” as noun or verb? • Syntax Level ambiguity − A sentence can be parsed in different ways. For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he lifted a beetle that had red cap? • Referential ambiguity − Referring to something using pronouns. For example, Rima went to Gauri. She said, “I am tired.” − Exactly who is tired? - One input can mean different meanings. - Many inputs can mean the same thing.
  • 16. Methods of NLP • Mathematical Approach • Pattern Matching • Statistical Approach • Bayesian Method • HMM • Soft Computing Approach • Neural Network
  • 17. Pattern Matching Exact Pattern Matching Problem: Find first match of a pattern of length M in a text stream of length N..(N>>M) Pattern: needle (M = 6) Text: ianahaystackanneedleina(N=21) Challenges: • Brute-force is not good enough for all applications • Theoretical challenge: Linear-time guarantee. Fundamental Algorithmic Problem • Practical challenge: Avoid backup in text stream. Often no room or time to save text.
  • 18. Knuth-Morris-Pratt (KMP) exact pattern-matching algorithm Named after Don Knuth, Jim Morris, Vaughan Pratt Classic algorithm that meets both challenges • linear-time guarantee • no backup in text stream Basic plan (for binary alphabet) • build DFA from pattern • simulate DFA with text as input Input Text DFA for pattern Accept or Reject
  • 19. Regular-expression pattern matching . Search for occurrences of one of multiple patterns in a text file Ex. (genomics) • Fragile X syndrome is a common cause of mental retardation. • human genome contains triplet repeats of cgg or agg • bracketed by gcg at the beginning and ctg at the end • number of repeats is variable, and correlated with syndrome • use regular expression to specify pattern: gcg(cgg|agg)*ctg • do RE pattern match on person’s genome to detect Fragile x
  • 20. GREP GREP stands for Global Regular Expression and Print. It was introduced by Ken Thompson. Basic Plan for GREP • build DFA from RE • simulate DFA with text as input TEXT DFA for pattern gcg(ccg|agg)*ctg Accept/Reje ct
  • 21. Bayesian Method • Uses Bayes Rule • The Naive Bayes Assumption: Assume that all features are independent given the class label Y
  • 24. HIDDEN MARKOV MODEL A hidden Markov model (HMM) is a statistical model, in which the system being modeled is assumed to be a Markov process (Memoryless process: its future and past are independent ) with hidden states We want to find that: given the past data of outcomes what is the probability of any possible outcome today.
  • 25. Example • If the weather yesterday was rainy and today is foggy what is the probability that tomorrow it will be sunny? • Using Bayes rule: • For n days:
  • 26. Commercial World • Lot’s of exciting stuff is going on in industry… Powerset
  • 27. Major Topics 1. Words 2. Syntax 3. Meaning 4. Discourse 5. Applications exploiting each
  • 28. Applications ● Fighting Spam ○ Spam filters have become important as the first line of defense against the ever-increasing problem of unwanted email. ● Information Extraction ○ Many important decisions in financial markets are increasingly moving away from human oversight and control. Algorithmic trading is becoming more popular. ● Summarization ○ Ability to summarize the meaning of documents and information is becoming increasingly important. ● NLU interfaces to databases ○ intelligent natural language databases have been developed they provide flexible options for manipulating the queries ● Intelligent Web searching ○ Natural language processing has made web search more intelligent by transforming it from keyword based to expression based
  • 29. Applications cont... ● Machine Translation ○ is a subfield of computational linguistics that investigates the use of software to translate text or speech from one language to another. Natural Language Generation • task of generating natural language from a machine representation system such as a knowledge base or a logical form
  • 30. Conti… • Speech Recognition The process of enabling a computer to identify and respond to the sounds produced in human speech.