SlideShare a Scribd company logo
1 of 71
NLP
NATURAL LANGUAGE
PROCESSING
Girish Khanzode
Contents
 Natural Language Understanding
 Text Categorization
 Syntactic Analysis
 Parsing
 Semantic Analysis
 Pragmatic Analysis
 Corpus-based Statistical
Approaches
 Measuring Performance
 NLP - Supervised Learning
Methods
 Part of Speech Tagging
 Named Entity Recognition
 Simple Context-free Grammars
 N-grams
 References
NLP
 Natural Language Understanding
 Taking some spoken/typed sentence and working out what it means
 Natural Language Generation
 Taking some formal representation of what you want to say and working out a
way to express it in a natural human language like English
 Fundamental goal: deep understand of broad language
 Not just string processing or keyword matching
 Target end systems
 speech recognition, machine translation, question answering…
 spelling correction, text categorization…
Applications
 Text Categorization - classify
documents by topics, language,
author, spam filtering, sentiment
classification (positive, negative)
 Spelling & Grammar Corrections
 Speech Recognition
 Summarization
 Question Answering
 Better search engines
 Text-to-speech
 Machine aided translation
 Information Retrieval
 Selecting from a set of documents
the ones that are relevant to a query
 Extracting data from text
 Converting unstructured text into
structure data
Natural Language Understanding
 Answering an essay question in exam
 Deciding what to order at a restaurant by reading a
menu
 Realizing you’ve been praised
 Appreciating a poem
Natural Language Understanding
Raw speech
signal
Sequence of
words spoken
Structure of
the sentence
Partial
representation
of meaning of
sentence
Final
representation
of meaning of
sentence
Speech
recognition
Syntactic
analysis using
knowledge of
the grammar
Semantic
analysis
using
information
about
meaning of
words
Pragmatic
analysis
using info. about
context
Natural Language
Understanding
8
Text Categorization
 Text annotation - classify entire document
 Sentiment classification
 What features of the text could help predict # of likes?
 How to identify customer opinions?
 Are the features hard to compute? (syntax? sarcasm?)
 Is it spam?
 What medical billing code for this visit?
 What grade for an answer to this essay question?
9
Text Categorization
 Is it interesting to this user?
 News filtering; helpdesk routing
 Is it interesting to this NLP program?
 If it’s Spanish, translate it from Spanish
 If it’s subjective, run the sentiment classifier
 If it’s an appointment, run information extraction
 Where should it be filed?
 Which mail folder? (work, friends, junk, urgent ...)
 Yahoo! / Open Directory / digital libraries
10
Text Categorization
Syntactic Analysis
 Rules of syntax (grammar) specify the possible organization of
words in sentences and allows us to determine sentence’s
structure(s)
 John saw Mary with a telescope
 John saw (Mary with a telescope)
 John (saw Mary with a telescope)
 Parsing: given a sentence and a grammar
 Checks that the sentence is correct according with the grammar and if
so returns a parse tree representing the structure of the sentence
Syntactic Analysis
 Syntax mapped into semantics
 Nouns ↔ things, objects, abstractions.
 Verbs ↔ situations, events, activities.
 Adjectives ↔ properties of things, ...
 Adverbs ↔ properties of situations, ...
 A parser recovers the phrase structure of an utterance, given a
grammar (rules of syntax)
 Parser’s outcome is the structure (groups of words and respective
parts of speech)
Syntactic Analysis
 Phrase structure is represented in a parse tree
 Parsing is the first step towards determining the meaning of an
utterance
 Outcome of the syntactic analysis can still be a series of alternate
structures with respective probabilities
 Sometimes grammar rules can disambiguate a sentence
 John set the set of chairs
 Sometimes they can’t.
 …the next step is semantic analysis
Symbols in Grammar
Syntactic Analysis - Grammar
Parsing
 A method to analyze a sentence to determine its
structure as per grammar
 Grammar - formal specification of the structures
allowable in the language
 Syntax is important - a skeleton on which various
linguistic elements, meaning among them depends
 So recognizing syntactic structure is also important
Parsing
 Some researchers deny syntax its central role
 There is a verb-centered analysis that builds on Conceptual
Dependency [textbook, section 7.1.3] - a verb determines almost
everything in a sentence built around it - Verbs are fundamental in
many theories of language
 Another idea is to treat all connections in language as occurring
between pairs of words, and to assume no higher-level groupings
 Structure and meaning are expressed through variously linked networks
of words
Syntactic Analysis - Challenges
 Number (singular vs. plural) and gender
 sentence-> noun_phrase(n), verb_phrase(n)
 proper_noun(s) -> [mary]
 noun(p) -> [apples]
 Adjective
 noun_phrase-> determiner,adjectives,noun
 adjectives-> adjective, adjectives
 adjective->[ferocious]
Syntactic Analysis - Challenges
 Adverbs, …
 Handling ambiguity
 Syntactic ambiguity - fruit flies like a banana
 Having to parse syntactically incorrect sentences
Semantic Analysis
 Generates meaning/representation of the sentence from its
syntactic structures
 Represents the sentence in meaningful parts
 Uses possible syntactic structures and meaning
 Builds a parse tree with associated semantics
 Semantics typically represented with logic
Semantic Analysis
 Compositional semantics: meaning of the sentence from the meaning of its
parts
 Sentence - A tall man likes Mary
 Representation - x man(x) & tall(x) & likes(x,mary)
 Grammar + Semantics
 Sentence (Smeaning)-> noun_phrase(NPmeaning), verb_phrase(VPmeaning),
combine(NPmeaning,VPmeaning,Smeaning)
 Complications - Handling ambiguity
 Semantic ambiguity - I saw the prudential building flying into Paris
Compositional Semantics
 The semantics of a phrase is a function of the semantics of its sub-phrases
 It does not depend on any other phrase
 So if we know the meaning of sub-phrases, then we know the meaning of
the phrases
 A goal of semantic interpretation is to find a way that the meaning of the
whole sentence can be put together in a simple way from the meanings of
the parts of the sentence - Alison, 1997 p. 112
Compositional Semantics
Pragmatic Analysis
 Uses context
 Uses partial representation
 Includes purpose and performs disambiguation
 Where, when, by whom an utterance was said
 Uses context of utterance
 Where, by who, to whom, why, when it was said
 Intentions - inform, request, promise, criticize, …
Pragmatic Analysis
 Handling Pronouns
 Mary eats apples. She likes them
 She = Mary, them = apples
 Handling ambiguity
 Pragmatic ambiguity - you’re late - What’s the speaker’s intention -
informing or criticizing?
NLP Challenges
 NLP systems needs to answer the question “who did what to whom”
 MANY hidden variables
 Knowledge about the world
 Knowledge about the context
 Knowledge about human communication techniques
 Can you tell me the time?
 Problem of scale
 Many (infinite?) possible words, meanings, context
NLP Challenges
 Problem of sparsity
 Very difficult to do statistical analysis, most things (words, concepts) are never
seen before
 Long range correlations
 Key problems
 Representation of meaning
 Language presupposes knowledge about the world
 Language only reflects the surface of meaning
 Language presupposes communication between people
NLP Challenges
 Different ways of Parsing a sentence
 Word category ambiguity
 Word sense ambiguity
 Words can mean more than their sum of parts - The Times of India
 Imparting world knowledge is difficult - the blue pen ate the ice-
cream
NLP Challenges
 Fictitious worlds - people on mars can fly
 Defining scope
 people like ice-cream - does this mean all people like ice cream?
 Language is changing and evolving
 Complex ways of interaction between the kinds of knowledge
 Exponential complexity at each point in using the knowledge
NLP Challenges - Ambiguity
 At all levels - lexical, phrase, semantic
 Institute head seeks aid
 Word sense is ambiguous (head)
 Stolen Painting Found by Tree
 Thematic role is ambiguous: tree is agent or location?
 Ban on Dancing on Governor’s Desk
 Syntactic structure (attachment) is ambiguous - is the ban or the dancing on the
desk?
 Hospitals Are Sued by 7 Foot Doctors
 Semantics is ambiguous : what is 7 foot?
Meaning
 From NLP viewpoint, meaning is a mapping from linguistic forms to
some kind of representation of knowledge of the world
 Physical referent in the real world
 Semantic concepts, characterized also by relations.
 It is interpreted within the framework of some sort of action to be
taken
Meaning – Representation and
Usage
 I am Italian
 From lexical database (WordNet)
 Italian =a native or inhabitant of Italy Italy = republic in southern Europe [..]
 I am Italian
 Who is “I”?
 I know she is Italian/I think she is Italian
 How do we represent I know and I think
 Does this mean that I is Italian?
 What does it say about the I and about the person speaking?
 I thought she was Italian
 How do we represent tenses?
Corpus-based Statistical
Approaches
 How can a machine understand these differences?
 Decorate the cake with the frosting
 Decorate the cake with the kids
 Rules based approaches
 Hand coded syntactic constraints and preference rules
 The verb decorate require an animate being as agent
 The object cake is formed by any of the following, inanimate entities
 cream, dough, frosting…..
Corpus-based Statistical
Approaches
 These approaches are time consuming to build, do not scale
up well and are very brittle to new, unusual, metaphorical use
of language
 To swallow requires an animate being as agent/subject and a
physical object as object
 I swallowed his story
 The supernova swallowed the planet
 A Statistical NLP approach seeks to solve these problems by
automatically learning lexical and structural preferences from
text collections (corpora)
Corpus-based Statistical
Approaches
 Statistical models are robust, generalize well and
behave gracefully in the presence of errors and new
data
 Steps
 Get large text collections
 Compute statistics over those collections
 The bigger the collections, the better the statistics
Corpus-based Statistical
Approaches
 Decorate the cake with the frosting
 Decorate the cake with the kids
 From (labeled) corpora we can learn that
 #(kids are subject/agent of decorate) > #(frosting is subject/agent of decorate)
 From (unlabelled) corpora we can learn that
 #(“the kids decorate the cake”) >> #(“the frosting decorates the cake”)
 #(“cake with frosting”) >> #(“cake with kids”)
 Given these facts, we need a statistical model for the attachment decision
Corpus-based Statistical
Approaches
 Topic categorization: classify the document into semantics topics
Document 1
The nation swept into the Davis Cup final on
Saturday when twins Bob and Mike Bryan
defeated Belarus's Max Mirnyi and Vladimir
Voltchkov to give the Americans an
unsurmountable 3-0 lead in the best-of-five semi-
final tie.
Topic = sport
Document 2
One of the strangest, most relentless
hurricane seasons on record reached new
bizarre heights yesterday as the plodding
approach of Hurricane Jeanne prompted
evacuation orders for hundreds of thousands
of Floridians and high wind warnings that
stretched 350 miles from the swamp towns
south of Miami to the historic city of St.
Augustine.
Topic = Natural Event
Corpus-based Statistical
Approaches
 Topic categorization: classify the document into semantics topics
 From (labeled) corpora we can learn that
 #(sport documents containing word cup) > #(nature event documents containing word cup) - feature
 We then need a statistical model for the topic assignment
Document 1 (sport)
The nation swept into the Davis Cup final on
Saturday when twins Bob and Mike Bryan …
Document 2 (Natural Event)
One of the strangest, most relentless
hurricane seasons on record reached new
bizarre heights yesterday as….
Corpus-based Statistical
Approaches
 Feature extractions
 Usually linguistics motivated
 Statistical models
 Data
 Corpora, labels, linguistic resources
Measuring Performance
 Classification accuracy
 What % of messages were classified correctly?
 Is this what we care about?
 Which system is better?
Overall accuracy Accuracy on
spam
Accuracy on gen
System 1 95% 99.99% 90%
System 2 95% 90% 99.99%
Measuring Performance
 Precision = good messages kept
all messages kept
 Recall = good messages kept
all good messages
 Move from high precision to high recall by deleting fewer messages
 delete only if spam content > high threshold
Measuring Performance
0%
25%
50%
75%
100%
0% 25% 50% 75% 100%
Precision
Recall
Analysis of
Good (non-spam) Email
NLP - Supervised Learning
Methods
 Conditional log-linear models
 Feature engineering - Throw in enough features to fix most errors
 Training - Learn weights  such that in training data, the true answer
tends to have a high probability
 Test - Output the highest-probability answer
 If the evaluation metric allows for partial credit, can do fancier things
 minimum-risk training and decoding
NLP - Supervised Learning
Methods
 The most popular alternatives are roughly similar
 Perceptrons, SVM, MIRA, neural network, …
 These also learn a usually linear scoring function
 However, the score is not interpreted as a log-probability
 Learner just seeks weights  such that in training data, the desired
answer has a higher score than the wrong answers
Features
 Example - word to left
 Spelling correction using an n-gram language model
(n ≥ 2) would use words to left and right to help predict the true
word
 Similarly, an HMM (Hidden Markov Model) would predict a word’s
class using classes to left and right
 But we’d like to throw in all kinds of other features too
Features
49
Part of Speech Tagging
 Treat tagging as a token classification problem
 Tag each word independently given
 features of context
 features of the word’s spelling (suffixes, capitalization)
 Use an HMM
 the tag of one word might depend on the tags of adjacent words
 Combination of both
 Need rich features (in a log-linear model), but also want feature functions to depend on
adjacent tags
 So, the problem is to predict all tags together.
Part of Speech Tagging
 Idea 1 - classify tags one at a time from left to right
 p(tag | wordseq, prevtags) = (1/Z) exp score(tag, wordseq, prevtags)
 where Z sums up exp score(tag’, wordseq, prevtags) over all possible tags
 Idea 2 - maximum entropy Markov model (MEMM)
 Same model, but don’t commit to a tag before we predict
the next tag
 Instead, evaluate probability of every tag sequence
Part of Speech Tagging
 Idea 3 - linear-chain conditional random field (CRF)
 Symmetric and very popular
 Score each tag sequence as a whole, using arbitrary features
 p(tagseq | wordseq) = (1/Z) exp score(tagseq, wordseq)
 where Z sums up exp score(tagseq’, wordseq) over competing tagseqs
 Can still compute Z and best path using dynamic programming
 Dynamic programming works if each feature f(tagseq,wordseq) considers at most an n-
gram of tags
 Score a (tagseq,wordseq) pair with a WFST whose state remembers the previous (n-1)
tags
 As in 2, arc weight can consider the current tag n-gram and all words
 But unlike 2, arc weight isn’t a probability - only normalize at the end
Named Entity Recognition
 Deals with the detection and categorization of proper names
 Labeling all occurrences of named entities in a text
 Named Entity = People, organizations, lakes, bridges, hospitals,
mountains…
 Well-understood technology, readily available and works well
 Uses a combination of enumerated lists (often called gazetteers)
and regular expressions
Named Entity Recognition
Complex Entities and
Relationships
 Uses Named Entities as components
 Pattern-matching rules specific to a given domain
 May be multi-pass - One pass creates entities which are used as
part of later passes
 First pass locates CEOs, second pass locates dates for CEOs being
replaced…
 May involve syntactic analysis as well, especially for things like
negatives and reference resolution
Complex Entities and
Relationships
Why Does It Work?
 The problem is very constrained
 Only looking for a small set of items that can appear in a small set of
roles
 We can ignore stuff we don’t care about
 BUT creating the knowledge bases can be very complex and time-
consuming
Simple Context-free Grammars
 Consider the simplest Context-Free Grammars - without and with parameters
 parameters allows to express more interesting facts
 sentence  noun_phrase verb_phrase
 noun_phrase  proper_name
 noun_phrase  article noun
 verb_phrase  verb
 verb_phrase  verb noun_phrase
 verb_phrase  verb noun_phrase
prep_phrase
 verb_phrase  verb prep_phrase
 prep_phrase  preposition noun_phrase
Simple CF Grammars
 The still-undefined syntactic units are pre-terminals
 They correspond to parts of speech
 We can define them by adding lexical productions to the grammar
 article  the | a | an
 noun  pizza | bus | boys | ...
 preposition  to | on | ...
 proper_name  Jim | Dan | ...
 verb  ate | yawns | ...
 This is not practical on a large scale
 Normally, we have a lexicon (dictionary) stored in a database, that can be interfaced with the
grammar
Simple CF Grammars
 sentence 
 noun_phrase verb_phrase 
 proper_name verb_phrase 
 Jim verb_phrase 
 Jim verb noun_phrase prep_phrase 
 Jim ate noun_phrase prep_phrase 
 Jim ate article noun prep_phrase 
 Jim ate a noun prep_phrase 
 Jim ate a pizza prep_phrase 
 Jim ate a pizza preposition noun_phrase 
 Jim ate a pizza on noun_phrase 
 Jim ate a pizza on article noun 
 Jim ate a pizza on the noun 
 Jim ate a pizza on the bus
Simple CF Grammars
 Other examples of sentences generated by this grammar:
 Jim ate a pizza
 Dan yawns on the bus
 These wrong data will also be recognized:
 Jim ate an pizza
 Jim yawns a pizza
 Jim ate to the bus
 the boys yawns
 the bus yawns
 ... but not these, obviously correct:
 the pizza was eaten by Jim
 Jim ate a hot pizza
 and so on, and so forth.
Simple CF Grammars
 This simple grammar can be improved in many interesting ways
 Add productions, for example to allow adjectives
 Add words (in lexical productions, or in a more realistic lexicon)
 Check agreement (noun-verb, noun-adjective, and so on)
 rabbitspl runpl  a rabbitsg runssg
 le bureaum blancm  la tablef blanchef
 An obvious, but naïve, method of enforcing agreement is to
duplicate the productions and the lexical data.
Simple CF Grammars
 sentence  noun_phr_sg verb_phr_sg
 sentence  noun_phr_pl verb_phr_pl
 noun_phr_sg  art_sg noun_sg
 noun_phr_sg  proper_name_sg
 noun_phr_pl  art_pl noun_pl
 noun_phr_pl  proper_name_pl
 art_sg  the | a | an
 art_pl  the
 noun_sg  pizza | bus | ...
 noun_pl  boys | … and so on.
Simple CF Grammars
 A much better method is to add parameters, and to parameterize words as well as productions
 sentence  noun_phr(Num) verb_phr(Num)
 noun_phr(Num)  art(Num) noun(Num)
 noun_phr(Num)  proper_name(Num)
 art(sg)  the | a | an
 art(pl)  the
 noun(sg)  pizza | bus | ...
 noun(sg)  boys | … and so on.
 This notations slightly extends the basic Context-Free Grammar formalism.
Simple CF Grammars
 Another use of parameters in productions: represent transitivity
 We want to exclude such sentences as
 Jim yawns a pizza
 Jim ate to the bus
 verb_phr(Num)  verb(intrans, Num)
 verb_phr(Num) 
 verb(trans, Num) noun_phr(Num1)
 verb(intrans, sg)  yawns | ...
 verb(trans, sg)  ate | ...
 verb(trans, pl)  ate | ...
Direction of Parsing – Top Down
 Starts from root and moves to leaves
 Top-down, hypothesis-driven - assume that we have a sentence,
keep rewriting, aim to derive a sequence of terminal symbols,
backtrack if data tell us to reject a hypothesis
 For example, we had assumed a noun phrase that begins with an
article, but there is no article
 Problem - wrong guesses, wasted computation
Direction of Parsing – Top Down
Direction of Parsing – Bottom Up
 Data-driven - look for complete right-hand sides of
productions, keep rewriting, aim to derive the goal
symbol
 Problem: lexical ambiguity that may lead to many unfinished
partial analyses
 Lexical ambiguity is generally troublesome. For example, in the
sentence "Johnny runs the show", both runs and show can be a
verb or a noun, but only one of 2*2 possibilities is correct
Direction of Parsing – Bottom Up
Direction of Parsing
 In practice, parsing is never pure
 Top-down, enriched - check data early to discard wrong hypotheses
(somewhat like recursive-descent parsing in compiler construction)
 Bottom-up, enriched - use productions, suggested by data, to limit
choices (somewhat like LR parsing in compiler construction)
Direction of Parsing
 A popular bottom-up analysis method - chart parsing
 Popular top-down analysis methods
 transition networks (used with Lisp)
 logic grammars (used with Prolog)
N-grams
 Letter or word frequencies - 1-grams - unigrams
 useful in solving cryptograms - ETAOINSHRDLU…
 If you know the previous letter - 2-grams - bigrams
 “h” is rare in English (4%)
 but “h” is common after “t” (20%)
 If you know the previous two letters - 3-grams - trigrams
 “h” is really common after “(space) t”
References
1. Ruslan Mitkov, The Oxford Handbook Of Computational Linguistics, Oxford Universitty Press, 2003.
2. Robert Dale, Hermani Moisi, Harold Somers, Handbook Of Natural Language Processing, Markcel Dekker Inc.
3. James Allen, Natural Language Processing, Pearson Education, 2003.
4. Manning C, Raghavan P, Schuetze H. Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008
5. Douglas Biber, Susan Conrad, Randi Reppen, Corpus Linguistics – Investigating Language Structure And Use, Cambridge University Press, 2000.
6. David Singleton, Language And The Lexicon: An Introduction, Arnold Publishers, 2000.
7. Allen, James, Natural Language Understanding, second edition (Redwood City: Benjamin/Cummings, 1995).
8. Ginsberg, Matt, Essentials of Artificial Intelligence (San Mateo: Morgan Kaufmann, 1993)
9. Hutchins W. The First Public Demonstration of Machine Translation: the Georgetown-IBM System, 7th January 1954. 2005.
10. Chomsky N. Three models for the description of language. IRE Trans Inf Theory 1956;2:113–24
11. Aho AV, Sethi R, Ullman JD. Compilers: Principles, Techniques, Tools. Reading, MA: Addison-Wesley, 1988
12. Chomsky N. On certain formal properties of grammars. Inform Contr 1959;2:137–67
13. Friedl JEF. Mastering Regular Expressions. Sebastopol, CA: O'Reilly & Associates, Inc., 1997
Thank You
Check Out My LinkedIn Profile at
https://in.linkedin.com/in/girishkhanzode

More Related Content

What's hot

Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural language processing
Natural language processingNatural language processing
Natural language processingHansi Thenuwara
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AISaurav Shrestha
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLPkartikaVashisht
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processingSanzid Kawsar
 

What's hot (20)

Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Nlp
NlpNlp
Nlp
 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
NLP
NLPNLP
NLP
 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 

Viewers also liked

Selling technique - With NLP
Selling technique - With NLPSelling technique - With NLP
Selling technique - With NLPPratibha Mishra
 
Building effective communication skills using NLP
Building effective communication skills using NLPBuilding effective communication skills using NLP
Building effective communication skills using NLPIIBA UK Chapter
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)Steven Hoober
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureReferralCandy
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideCrispy Presentations
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017sparks & honey
 
Digital Strategy 101
Digital Strategy 101Digital Strategy 101
Digital Strategy 101Bud Caddell
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)Board of Innovation
 
The What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignThe What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignMotivate Design
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsXPLAIN
 
The History of SEO
The History of SEOThe History of SEO
The History of SEOHubSpot
 
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersWhat Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersHubSpot
 
10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next PresentationSOAP Presentations
 

Viewers also liked (17)

Selling technique - With NLP
Selling technique - With NLPSelling technique - With NLP
Selling technique - With NLP
 
Building effective communication skills using NLP
Building effective communication skills using NLPBuilding effective communication skills using NLP
Building effective communication skills using NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)How People Really Hold and Touch (their Phones)
How People Really Hold and Touch (their Phones)
 
What 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From FailureWhat 33 Successful Entrepreneurs Learned From Failure
What 33 Successful Entrepreneurs Learned From Failure
 
Upworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The InternetsUpworthy: 10 Ways To Win The Internets
Upworthy: 10 Ways To Win The Internets
 
Five Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same SlideFive Killer Ways to Design The Same Slide
Five Killer Ways to Design The Same Slide
 
A-Z Culture Glossary 2017
A-Z Culture Glossary 2017A-Z Culture Glossary 2017
A-Z Culture Glossary 2017
 
Digital Strategy 101
Digital Strategy 101Digital Strategy 101
Digital Strategy 101
 
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
How I got 2.5 Million views on Slideshare (by @nickdemey - Board of Innovation)
 
The What If Technique presented by Motivate Design
The What If Technique presented by Motivate DesignThe What If Technique presented by Motivate Design
The What If Technique presented by Motivate Design
 
The Seven Deadly Social Media Sins
The Seven Deadly Social Media SinsThe Seven Deadly Social Media Sins
The Seven Deadly Social Media Sins
 
The History of SEO
The History of SEOThe History of SEO
The History of SEO
 
Displaying Data
Displaying DataDisplaying Data
Displaying Data
 
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating PresentersWhat Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
What Would Steve Do? 10 Lessons from the World's Most Captivating Presenters
 
How Google Works
How Google WorksHow Google Works
How Google Works
 
10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation10 Powerful Body Language Tips for your next Presentation
10 Powerful Body Language Tips for your next Presentation
 

Similar to NLP

Class3 - What is This Language Structure
Class3 - What is This Language StructureClass3 - What is This Language Structure
Class3 - What is This Language StructureNathacia Lucena
 
Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Satya Permadi
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processingpunedevscom
 
Discourse Studies 2
Discourse Studies 2Discourse Studies 2
Discourse Studies 2nina s
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdftheboysaiml
 
Artificial Intelligence_NLP
Artificial Intelligence_NLPArtificial Intelligence_NLP
Artificial Intelligence_NLPThenmozhiK5
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxprakashvs7
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Object Automation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 

Similar to NLP (20)

Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
NLP
NLPNLP
NLP
 
Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Class5 - Languaging
Class5 - LanguagingClass5 - Languaging
Class5 - Languaging
 
Class3 - What is This Language Structure
Class3 - What is This Language StructureClass3 - What is This Language Structure
Class3 - What is This Language Structure
 
Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)Discourse analysis (Linguistics Forms and Functions)
Discourse analysis (Linguistics Forms and Functions)
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Presentation ...prose
Presentation ...prosePresentation ...prose
Presentation ...prose
 
Discourse Studies 2
Discourse Studies 2Discourse Studies 2
Discourse Studies 2
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdfNatural-Language-Processing-by-Dr-A-Nagesh.pdf
Natural-Language-Processing-by-Dr-A-Nagesh.pdf
 
NLP todo
NLP todoNLP todo
NLP todo
 
Artificial Intelligence_NLP
Artificial Intelligence_NLPArtificial Intelligence_NLP
Artificial Intelligence_NLP
 
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptxAI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 

More from Girish Khanzode (13)

Apache Spark Components
Apache Spark ComponentsApache Spark Components
Apache Spark Components
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Data Visulalization
Data VisulalizationData Visulalization
Data Visulalization
 
Graph Databases
Graph DatabasesGraph Databases
Graph Databases
 
IR
IRIR
IR
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
NLTK
NLTKNLTK
NLTK
 
NoSql
NoSqlNoSql
NoSql
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Hadoop
HadoopHadoop
Hadoop
 
Language R
Language RLanguage R
Language R
 
Python Scipy Numpy
Python Scipy NumpyPython Scipy Numpy
Python Scipy Numpy
 
Funtional Programming
Funtional ProgrammingFuntional Programming
Funtional Programming
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 

NLP

  • 2. Contents  Natural Language Understanding  Text Categorization  Syntactic Analysis  Parsing  Semantic Analysis  Pragmatic Analysis  Corpus-based Statistical Approaches  Measuring Performance  NLP - Supervised Learning Methods  Part of Speech Tagging  Named Entity Recognition  Simple Context-free Grammars  N-grams  References
  • 3. NLP  Natural Language Understanding  Taking some spoken/typed sentence and working out what it means  Natural Language Generation  Taking some formal representation of what you want to say and working out a way to express it in a natural human language like English  Fundamental goal: deep understand of broad language  Not just string processing or keyword matching  Target end systems  speech recognition, machine translation, question answering…  spelling correction, text categorization…
  • 4. Applications  Text Categorization - classify documents by topics, language, author, spam filtering, sentiment classification (positive, negative)  Spelling & Grammar Corrections  Speech Recognition  Summarization  Question Answering  Better search engines  Text-to-speech  Machine aided translation  Information Retrieval  Selecting from a set of documents the ones that are relevant to a query  Extracting data from text  Converting unstructured text into structure data
  • 5. Natural Language Understanding  Answering an essay question in exam  Deciding what to order at a restaurant by reading a menu  Realizing you’ve been praised  Appreciating a poem
  • 6. Natural Language Understanding Raw speech signal Sequence of words spoken Structure of the sentence Partial representation of meaning of sentence Final representation of meaning of sentence Speech recognition Syntactic analysis using knowledge of the grammar Semantic analysis using information about meaning of words Pragmatic analysis using info. about context
  • 8. 8 Text Categorization  Text annotation - classify entire document  Sentiment classification  What features of the text could help predict # of likes?  How to identify customer opinions?  Are the features hard to compute? (syntax? sarcasm?)  Is it spam?  What medical billing code for this visit?  What grade for an answer to this essay question?
  • 9. 9 Text Categorization  Is it interesting to this user?  News filtering; helpdesk routing  Is it interesting to this NLP program?  If it’s Spanish, translate it from Spanish  If it’s subjective, run the sentiment classifier  If it’s an appointment, run information extraction  Where should it be filed?  Which mail folder? (work, friends, junk, urgent ...)  Yahoo! / Open Directory / digital libraries
  • 11. Syntactic Analysis  Rules of syntax (grammar) specify the possible organization of words in sentences and allows us to determine sentence’s structure(s)  John saw Mary with a telescope  John saw (Mary with a telescope)  John (saw Mary with a telescope)  Parsing: given a sentence and a grammar  Checks that the sentence is correct according with the grammar and if so returns a parse tree representing the structure of the sentence
  • 12. Syntactic Analysis  Syntax mapped into semantics  Nouns ↔ things, objects, abstractions.  Verbs ↔ situations, events, activities.  Adjectives ↔ properties of things, ...  Adverbs ↔ properties of situations, ...  A parser recovers the phrase structure of an utterance, given a grammar (rules of syntax)  Parser’s outcome is the structure (groups of words and respective parts of speech)
  • 13. Syntactic Analysis  Phrase structure is represented in a parse tree  Parsing is the first step towards determining the meaning of an utterance  Outcome of the syntactic analysis can still be a series of alternate structures with respective probabilities  Sometimes grammar rules can disambiguate a sentence  John set the set of chairs  Sometimes they can’t.  …the next step is semantic analysis
  • 16. Parsing  A method to analyze a sentence to determine its structure as per grammar  Grammar - formal specification of the structures allowable in the language  Syntax is important - a skeleton on which various linguistic elements, meaning among them depends  So recognizing syntactic structure is also important
  • 17. Parsing  Some researchers deny syntax its central role  There is a verb-centered analysis that builds on Conceptual Dependency [textbook, section 7.1.3] - a verb determines almost everything in a sentence built around it - Verbs are fundamental in many theories of language  Another idea is to treat all connections in language as occurring between pairs of words, and to assume no higher-level groupings  Structure and meaning are expressed through variously linked networks of words
  • 18. Syntactic Analysis - Challenges  Number (singular vs. plural) and gender  sentence-> noun_phrase(n), verb_phrase(n)  proper_noun(s) -> [mary]  noun(p) -> [apples]  Adjective  noun_phrase-> determiner,adjectives,noun  adjectives-> adjective, adjectives  adjective->[ferocious]
  • 19. Syntactic Analysis - Challenges  Adverbs, …  Handling ambiguity  Syntactic ambiguity - fruit flies like a banana  Having to parse syntactically incorrect sentences
  • 20. Semantic Analysis  Generates meaning/representation of the sentence from its syntactic structures  Represents the sentence in meaningful parts  Uses possible syntactic structures and meaning  Builds a parse tree with associated semantics  Semantics typically represented with logic
  • 21. Semantic Analysis  Compositional semantics: meaning of the sentence from the meaning of its parts  Sentence - A tall man likes Mary  Representation - x man(x) & tall(x) & likes(x,mary)  Grammar + Semantics  Sentence (Smeaning)-> noun_phrase(NPmeaning), verb_phrase(VPmeaning), combine(NPmeaning,VPmeaning,Smeaning)  Complications - Handling ambiguity  Semantic ambiguity - I saw the prudential building flying into Paris
  • 22. Compositional Semantics  The semantics of a phrase is a function of the semantics of its sub-phrases  It does not depend on any other phrase  So if we know the meaning of sub-phrases, then we know the meaning of the phrases  A goal of semantic interpretation is to find a way that the meaning of the whole sentence can be put together in a simple way from the meanings of the parts of the sentence - Alison, 1997 p. 112
  • 24. Pragmatic Analysis  Uses context  Uses partial representation  Includes purpose and performs disambiguation  Where, when, by whom an utterance was said  Uses context of utterance  Where, by who, to whom, why, when it was said  Intentions - inform, request, promise, criticize, …
  • 25. Pragmatic Analysis  Handling Pronouns  Mary eats apples. She likes them  She = Mary, them = apples  Handling ambiguity  Pragmatic ambiguity - you’re late - What’s the speaker’s intention - informing or criticizing?
  • 26. NLP Challenges  NLP systems needs to answer the question “who did what to whom”  MANY hidden variables  Knowledge about the world  Knowledge about the context  Knowledge about human communication techniques  Can you tell me the time?  Problem of scale  Many (infinite?) possible words, meanings, context
  • 27. NLP Challenges  Problem of sparsity  Very difficult to do statistical analysis, most things (words, concepts) are never seen before  Long range correlations  Key problems  Representation of meaning  Language presupposes knowledge about the world  Language only reflects the surface of meaning  Language presupposes communication between people
  • 28. NLP Challenges  Different ways of Parsing a sentence  Word category ambiguity  Word sense ambiguity  Words can mean more than their sum of parts - The Times of India  Imparting world knowledge is difficult - the blue pen ate the ice- cream
  • 29. NLP Challenges  Fictitious worlds - people on mars can fly  Defining scope  people like ice-cream - does this mean all people like ice cream?  Language is changing and evolving  Complex ways of interaction between the kinds of knowledge  Exponential complexity at each point in using the knowledge
  • 30. NLP Challenges - Ambiguity  At all levels - lexical, phrase, semantic  Institute head seeks aid  Word sense is ambiguous (head)  Stolen Painting Found by Tree  Thematic role is ambiguous: tree is agent or location?  Ban on Dancing on Governor’s Desk  Syntactic structure (attachment) is ambiguous - is the ban or the dancing on the desk?  Hospitals Are Sued by 7 Foot Doctors  Semantics is ambiguous : what is 7 foot?
  • 31. Meaning  From NLP viewpoint, meaning is a mapping from linguistic forms to some kind of representation of knowledge of the world  Physical referent in the real world  Semantic concepts, characterized also by relations.  It is interpreted within the framework of some sort of action to be taken
  • 32. Meaning – Representation and Usage  I am Italian  From lexical database (WordNet)  Italian =a native or inhabitant of Italy Italy = republic in southern Europe [..]  I am Italian  Who is “I”?  I know she is Italian/I think she is Italian  How do we represent I know and I think  Does this mean that I is Italian?  What does it say about the I and about the person speaking?  I thought she was Italian  How do we represent tenses?
  • 33. Corpus-based Statistical Approaches  How can a machine understand these differences?  Decorate the cake with the frosting  Decorate the cake with the kids  Rules based approaches  Hand coded syntactic constraints and preference rules  The verb decorate require an animate being as agent  The object cake is formed by any of the following, inanimate entities  cream, dough, frosting…..
  • 34. Corpus-based Statistical Approaches  These approaches are time consuming to build, do not scale up well and are very brittle to new, unusual, metaphorical use of language  To swallow requires an animate being as agent/subject and a physical object as object  I swallowed his story  The supernova swallowed the planet  A Statistical NLP approach seeks to solve these problems by automatically learning lexical and structural preferences from text collections (corpora)
  • 35. Corpus-based Statistical Approaches  Statistical models are robust, generalize well and behave gracefully in the presence of errors and new data  Steps  Get large text collections  Compute statistics over those collections  The bigger the collections, the better the statistics
  • 36. Corpus-based Statistical Approaches  Decorate the cake with the frosting  Decorate the cake with the kids  From (labeled) corpora we can learn that  #(kids are subject/agent of decorate) > #(frosting is subject/agent of decorate)  From (unlabelled) corpora we can learn that  #(“the kids decorate the cake”) >> #(“the frosting decorates the cake”)  #(“cake with frosting”) >> #(“cake with kids”)  Given these facts, we need a statistical model for the attachment decision
  • 37. Corpus-based Statistical Approaches  Topic categorization: classify the document into semantics topics Document 1 The nation swept into the Davis Cup final on Saturday when twins Bob and Mike Bryan defeated Belarus's Max Mirnyi and Vladimir Voltchkov to give the Americans an unsurmountable 3-0 lead in the best-of-five semi- final tie. Topic = sport Document 2 One of the strangest, most relentless hurricane seasons on record reached new bizarre heights yesterday as the plodding approach of Hurricane Jeanne prompted evacuation orders for hundreds of thousands of Floridians and high wind warnings that stretched 350 miles from the swamp towns south of Miami to the historic city of St. Augustine. Topic = Natural Event
  • 38. Corpus-based Statistical Approaches  Topic categorization: classify the document into semantics topics  From (labeled) corpora we can learn that  #(sport documents containing word cup) > #(nature event documents containing word cup) - feature  We then need a statistical model for the topic assignment Document 1 (sport) The nation swept into the Davis Cup final on Saturday when twins Bob and Mike Bryan … Document 2 (Natural Event) One of the strangest, most relentless hurricane seasons on record reached new bizarre heights yesterday as….
  • 39. Corpus-based Statistical Approaches  Feature extractions  Usually linguistics motivated  Statistical models  Data  Corpora, labels, linguistic resources
  • 40. Measuring Performance  Classification accuracy  What % of messages were classified correctly?  Is this what we care about?  Which system is better? Overall accuracy Accuracy on spam Accuracy on gen System 1 95% 99.99% 90% System 2 95% 90% 99.99%
  • 41. Measuring Performance  Precision = good messages kept all messages kept  Recall = good messages kept all good messages  Move from high precision to high recall by deleting fewer messages  delete only if spam content > high threshold
  • 42. Measuring Performance 0% 25% 50% 75% 100% 0% 25% 50% 75% 100% Precision Recall Analysis of Good (non-spam) Email
  • 43. NLP - Supervised Learning Methods  Conditional log-linear models  Feature engineering - Throw in enough features to fix most errors  Training - Learn weights  such that in training data, the true answer tends to have a high probability  Test - Output the highest-probability answer  If the evaluation metric allows for partial credit, can do fancier things  minimum-risk training and decoding
  • 44. NLP - Supervised Learning Methods  The most popular alternatives are roughly similar  Perceptrons, SVM, MIRA, neural network, …  These also learn a usually linear scoring function  However, the score is not interpreted as a log-probability  Learner just seeks weights  such that in training data, the desired answer has a higher score than the wrong answers
  • 45. Features  Example - word to left  Spelling correction using an n-gram language model (n ≥ 2) would use words to left and right to help predict the true word  Similarly, an HMM (Hidden Markov Model) would predict a word’s class using classes to left and right  But we’d like to throw in all kinds of other features too
  • 47. 49 Part of Speech Tagging  Treat tagging as a token classification problem  Tag each word independently given  features of context  features of the word’s spelling (suffixes, capitalization)  Use an HMM  the tag of one word might depend on the tags of adjacent words  Combination of both  Need rich features (in a log-linear model), but also want feature functions to depend on adjacent tags  So, the problem is to predict all tags together.
  • 48. Part of Speech Tagging  Idea 1 - classify tags one at a time from left to right  p(tag | wordseq, prevtags) = (1/Z) exp score(tag, wordseq, prevtags)  where Z sums up exp score(tag’, wordseq, prevtags) over all possible tags  Idea 2 - maximum entropy Markov model (MEMM)  Same model, but don’t commit to a tag before we predict the next tag  Instead, evaluate probability of every tag sequence
  • 49. Part of Speech Tagging  Idea 3 - linear-chain conditional random field (CRF)  Symmetric and very popular  Score each tag sequence as a whole, using arbitrary features  p(tagseq | wordseq) = (1/Z) exp score(tagseq, wordseq)  where Z sums up exp score(tagseq’, wordseq) over competing tagseqs  Can still compute Z and best path using dynamic programming  Dynamic programming works if each feature f(tagseq,wordseq) considers at most an n- gram of tags  Score a (tagseq,wordseq) pair with a WFST whose state remembers the previous (n-1) tags  As in 2, arc weight can consider the current tag n-gram and all words  But unlike 2, arc weight isn’t a probability - only normalize at the end
  • 50. Named Entity Recognition  Deals with the detection and categorization of proper names  Labeling all occurrences of named entities in a text  Named Entity = People, organizations, lakes, bridges, hospitals, mountains…  Well-understood technology, readily available and works well  Uses a combination of enumerated lists (often called gazetteers) and regular expressions
  • 52. Complex Entities and Relationships  Uses Named Entities as components  Pattern-matching rules specific to a given domain  May be multi-pass - One pass creates entities which are used as part of later passes  First pass locates CEOs, second pass locates dates for CEOs being replaced…  May involve syntactic analysis as well, especially for things like negatives and reference resolution
  • 54. Why Does It Work?  The problem is very constrained  Only looking for a small set of items that can appear in a small set of roles  We can ignore stuff we don’t care about  BUT creating the knowledge bases can be very complex and time- consuming
  • 55. Simple Context-free Grammars  Consider the simplest Context-Free Grammars - without and with parameters  parameters allows to express more interesting facts  sentence  noun_phrase verb_phrase  noun_phrase  proper_name  noun_phrase  article noun  verb_phrase  verb  verb_phrase  verb noun_phrase  verb_phrase  verb noun_phrase prep_phrase  verb_phrase  verb prep_phrase  prep_phrase  preposition noun_phrase
  • 56. Simple CF Grammars  The still-undefined syntactic units are pre-terminals  They correspond to parts of speech  We can define them by adding lexical productions to the grammar  article  the | a | an  noun  pizza | bus | boys | ...  preposition  to | on | ...  proper_name  Jim | Dan | ...  verb  ate | yawns | ...  This is not practical on a large scale  Normally, we have a lexicon (dictionary) stored in a database, that can be interfaced with the grammar
  • 57. Simple CF Grammars  sentence   noun_phrase verb_phrase   proper_name verb_phrase   Jim verb_phrase   Jim verb noun_phrase prep_phrase   Jim ate noun_phrase prep_phrase   Jim ate article noun prep_phrase   Jim ate a noun prep_phrase   Jim ate a pizza prep_phrase   Jim ate a pizza preposition noun_phrase   Jim ate a pizza on noun_phrase   Jim ate a pizza on article noun   Jim ate a pizza on the noun   Jim ate a pizza on the bus
  • 58. Simple CF Grammars  Other examples of sentences generated by this grammar:  Jim ate a pizza  Dan yawns on the bus  These wrong data will also be recognized:  Jim ate an pizza  Jim yawns a pizza  Jim ate to the bus  the boys yawns  the bus yawns  ... but not these, obviously correct:  the pizza was eaten by Jim  Jim ate a hot pizza  and so on, and so forth.
  • 59. Simple CF Grammars  This simple grammar can be improved in many interesting ways  Add productions, for example to allow adjectives  Add words (in lexical productions, or in a more realistic lexicon)  Check agreement (noun-verb, noun-adjective, and so on)  rabbitspl runpl  a rabbitsg runssg  le bureaum blancm  la tablef blanchef  An obvious, but naïve, method of enforcing agreement is to duplicate the productions and the lexical data.
  • 60. Simple CF Grammars  sentence  noun_phr_sg verb_phr_sg  sentence  noun_phr_pl verb_phr_pl  noun_phr_sg  art_sg noun_sg  noun_phr_sg  proper_name_sg  noun_phr_pl  art_pl noun_pl  noun_phr_pl  proper_name_pl  art_sg  the | a | an  art_pl  the  noun_sg  pizza | bus | ...  noun_pl  boys | … and so on.
  • 61. Simple CF Grammars  A much better method is to add parameters, and to parameterize words as well as productions  sentence  noun_phr(Num) verb_phr(Num)  noun_phr(Num)  art(Num) noun(Num)  noun_phr(Num)  proper_name(Num)  art(sg)  the | a | an  art(pl)  the  noun(sg)  pizza | bus | ...  noun(sg)  boys | … and so on.  This notations slightly extends the basic Context-Free Grammar formalism.
  • 62. Simple CF Grammars  Another use of parameters in productions: represent transitivity  We want to exclude such sentences as  Jim yawns a pizza  Jim ate to the bus  verb_phr(Num)  verb(intrans, Num)  verb_phr(Num)   verb(trans, Num) noun_phr(Num1)  verb(intrans, sg)  yawns | ...  verb(trans, sg)  ate | ...  verb(trans, pl)  ate | ...
  • 63. Direction of Parsing – Top Down  Starts from root and moves to leaves  Top-down, hypothesis-driven - assume that we have a sentence, keep rewriting, aim to derive a sequence of terminal symbols, backtrack if data tell us to reject a hypothesis  For example, we had assumed a noun phrase that begins with an article, but there is no article  Problem - wrong guesses, wasted computation
  • 64. Direction of Parsing – Top Down
  • 65. Direction of Parsing – Bottom Up  Data-driven - look for complete right-hand sides of productions, keep rewriting, aim to derive the goal symbol  Problem: lexical ambiguity that may lead to many unfinished partial analyses  Lexical ambiguity is generally troublesome. For example, in the sentence "Johnny runs the show", both runs and show can be a verb or a noun, but only one of 2*2 possibilities is correct
  • 66. Direction of Parsing – Bottom Up
  • 67. Direction of Parsing  In practice, parsing is never pure  Top-down, enriched - check data early to discard wrong hypotheses (somewhat like recursive-descent parsing in compiler construction)  Bottom-up, enriched - use productions, suggested by data, to limit choices (somewhat like LR parsing in compiler construction)
  • 68. Direction of Parsing  A popular bottom-up analysis method - chart parsing  Popular top-down analysis methods  transition networks (used with Lisp)  logic grammars (used with Prolog)
  • 69. N-grams  Letter or word frequencies - 1-grams - unigrams  useful in solving cryptograms - ETAOINSHRDLU…  If you know the previous letter - 2-grams - bigrams  “h” is rare in English (4%)  but “h” is common after “t” (20%)  If you know the previous two letters - 3-grams - trigrams  “h” is really common after “(space) t”
  • 70. References 1. Ruslan Mitkov, The Oxford Handbook Of Computational Linguistics, Oxford Universitty Press, 2003. 2. Robert Dale, Hermani Moisi, Harold Somers, Handbook Of Natural Language Processing, Markcel Dekker Inc. 3. James Allen, Natural Language Processing, Pearson Education, 2003. 4. Manning C, Raghavan P, Schuetze H. Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press, 2008 5. Douglas Biber, Susan Conrad, Randi Reppen, Corpus Linguistics – Investigating Language Structure And Use, Cambridge University Press, 2000. 6. David Singleton, Language And The Lexicon: An Introduction, Arnold Publishers, 2000. 7. Allen, James, Natural Language Understanding, second edition (Redwood City: Benjamin/Cummings, 1995). 8. Ginsberg, Matt, Essentials of Artificial Intelligence (San Mateo: Morgan Kaufmann, 1993) 9. Hutchins W. The First Public Demonstration of Machine Translation: the Georgetown-IBM System, 7th January 1954. 2005. 10. Chomsky N. Three models for the description of language. IRE Trans Inf Theory 1956;2:113–24 11. Aho AV, Sethi R, Ullman JD. Compilers: Principles, Techniques, Tools. Reading, MA: Addison-Wesley, 1988 12. Chomsky N. On certain formal properties of grammars. Inform Contr 1959;2:137–67 13. Friedl JEF. Mastering Regular Expressions. Sebastopol, CA: O'Reilly & Associates, Inc., 1997
  • 71. Thank You Check Out My LinkedIn Profile at https://in.linkedin.com/in/girishkhanzode