SlideShare a Scribd company logo
1 of 36
Download to read offline
NLP
It’s not neuro-linguistic programming
1
Topic
Auto-suggestion
Normalization
Stemming
Lemmatization
Spelling Correction
BLEU Score
Morphological Analysis
Transliteration
Sentiment Analysis
Summerization
Confidence Score
2
Funny Autocomplete
“autocomplete is not a function” is current top-1 Google
autocomplete of “autocomplete is”.
3
Autocomplete is not A function
• Neither is auto-suggestion
• They are many-to-many relations with scores.
• Remember this?
4
Many-to-many Scoring
• Map by prefix, rank by popularity
• Google search box autocomplete
• Map by occurrence, rank by similarity
• Search (information retrieval)
• Map by information, rank by knowledge
• Translation
5
Prefix, Occurrence
• Surface pattern
• Regular
6
• Context-free
• Context-sensitive
• Recursively blahblah……
Information?
• Surface patterns and……
7
• Imaginations
• Will get back to that later.
Popularity & Similarity
• Popular: famous or infamous?
• Off-topic
• Similarity
• Distance
8
Map & Rank
• Regular expression
• Edit distance
9
Regular expression
• [a-z]+
• Colours of cats and dogs.
• [^o]{2}
• Colours of cats and dogs.
• cat|dog
• Colours of cats and dogs.
• Colou?rs?
• Colours of cats and dogs.
• Colors of cats and dogs.
• Color of a cat.
• <[A-Za-z][A-Za-z]*>
• <html>Colours of cats and dogs.</html>
10
Edit Distance
• Colors
• Delete s
• Color
• Insert u
• Colour
• Replace C with c
• colour
• Distance from Colors to colour: 3
(or 4 if the cost of replacing is 2)
11
– One may ask
“What if I wanted to map 1,1, one, and ONE?”
12
Normalization
• time flies like an arrow. fruit flies like bananas.
• Case restoration
• Time flies like an arrow. Fruit flies like bananas.
• Sentence segmentation
• time flies like an arrow.
• fruit flies like bananas.
• Word normalization: stemming or lemmatization?
13
Stemming
• Porter Stemmer (mainly suffix
stripping)
• flies → fli
• bananas → banana
• How about “flies → fly”?
• Lemmatization
14
Lemmatization
• flies → fly
• better → good
• meeting
• meet?
• axes
• axe?
• axis?
15
Stemming or lemmatization, which is better?
“Battlestar Galactica is frakking wierd.”
16
Spelling Correction
• Hello again, edit distance.
• Just one step from “wierd” to “weird”
• Language modeling
• “Battlestar Galactica” often comes with “frak”
17
Language modeling
• Information (entropy) about encoding
• Horse race analogy, assuming winners were
• B A C B C C D C
• P(A) =1/8, P(B) = 2/8, P(C) = 4/8, P(D) = 1/8
• C = 0(00), B = 10(0), A = 110, D = 111
• n-gram
• Although B won fewer times than C, but what if B always
won when A was next to D?
18
Are we doing good?
Evaluate it!
19
BLEU Score
• Horse race analogy
• “B A C B C C D C” vs. “C C C C C C C C”
• Sequence precision: 4/8 = 0.5
• Unigram precision (as long as a unigram matched): 8/8 = 1
• When “natural-ness” matters
• “there is a cat on the mat | the cat is on the mat” vs. “the the the the the the”
• Sequence precision: ?
• Unigram precision: 7/7 = 1
• Modified unigram precision: 2/7
• Modified bigram precision: 0/7
20
I want more info
Less is more.
21
Imagine there’s……
• No more vocabularies than
• N
• V
• Adj
• Adv
22
Read the signs
• Morphology? Word-formation? Part-of-speech?
• They are sequential structures.
• Remember this?
23
Sequential Structures
• Morphological typology
• Analytic (isolating)
• Chinese
• Synthetic
• Agglutinative
• Japanese, Korean
• Fusional (inflecting)
• Arabic, English, French, Italian, Spanish
• Syntax? Morphosyntax?
• Morphological word? Prosodic word?
24
Read my lips
It’s not only about sound
25
Transliteration is not……
• Romanization
• Transcription
26
Transliteration
• Alignment
• Alignment
• Alignment
27
(1)
er of
n the
and
ence
also
s or
of
to-one-alignments possible. Furthermore,
combine to produce a single phoneme (d
single letter can sometimes produce tw
phonemes). For example, the English wo
Chinese transliteration “ ”, which
“phonemes”, is aligned as [15]:
A BE RT
| | |
The name of the rose
Sounds negative? Let’s try it anyway……
28
Sentiment Analysis
• Classification
• Polarity
• やばい
• Subjectivity
• In my
opinion……
• Emotion
29
Semantics?
• Classification vs.
• Ranking (as we’ve seen so far)
• Clustering
• Regression
• ……
30
Summarization
• Extraction
• Classification
• Discriminative
• Abstraction
• Aggregation
• Generative
31
Classification
32
• Surface patterns and ……
• Imaginations
Machine Learning
• Generative models
• Hidden-Markov models
• Language models
• Discriminative models
• Support Vector Machine
• Logistic Regression
• Conditional Random Fields
• Maximum Entropy
33
Confidence Score
• Confidence interval? Confidence level?
• Not really
• But it can be
• Just a buzz word from speech recognition
• Shannon’s game
• Hidden-Markov models
• Generative
• The Italian who went to Malta
• Can be any reasonable score
• Mostly probability
34
Wrap up
https://class.coursera.org/nlp/lecture
35
Questions?
<(_ _)>
36

More Related Content

More from Mike Tian-Jian Jiang

From minimal feedback vertex set to democracy
From minimal feedback vertex set to democracyFrom minimal feedback vertex set to democracy
From minimal feedback vertex set to democracyMike Tian-Jian Jiang
 
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011Mike Tian-Jian Jiang
 
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...Mike Tian-Jian Jiang
 
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...Mike Tian-Jian Jiang
 

More from Mike Tian-Jian Jiang (6)

ELUTE
ELUTEELUTE
ELUTE
 
From minimal feedback vertex set to democracy
From minimal feedback vertex set to democracyFrom minimal feedback vertex set to democracy
From minimal feedback vertex set to democracy
 
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
Robustness Analysis of Adaptive Chinese Input Methods @ WTIM2011
 
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
Evaluation via Negativa of Chinese Word Segmentation for Information Retrieva...
 
HLT
HLTHLT
HLT
 
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
Japanese-English Composite Translation Memory of Number Phrases ─ An Imitatio...
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

NLP