SlideShare a Scribd company logo
Natural Language Processing
Outline
• N-grams
• Grammars
• Parsing
• Dependencies
Natural Language Processing
Some basic terms:
• Syntax: the allowable structures in the language: sentences,
phrases, affixes (-ing, -ed, -ment, etc.).
• Semantics: the meaning(s) of texts in the language.
• Part-of-Speech (POS): the category of a word (noun, verb,
preposition etc.).
• Bag-of-words (BoW): a featurization that uses a vector of word
counts (or binary) ignoring order.
• N-gram: for a fixed, small N (2-5 is common), an n-gram is a
consecutive sequence of words in a text.
Bag of words Featurization
Assuming we have a dictionary mapping words to a unique integer
id, a bag-of-words featurization of a sentence could look like this:
Sentence: The cat sat on the mat
word id’s: 1 12 5 3 1 14
The BoW featurization would be the vector:
Vector 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1
position 1 3 5 12 14
In practice this would be stored as a sparse vector of (id, count)s:
(1,2),(3,1),(5,1),(12,1),(14,1)
Note that the original word order is lost, replaced by the order of
id’s.
N-grams
Because word order is lost, the sentence meaning is weakened.
This sentence has quite a different meaning but the same BoW
vector:
Sentence: The mat sat on the cat
word id s: 1 14 5 3 1 12
BoW featurization:
Vector 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1
But word order is important, especially the order of nearby words.
N-grams capture this, by modeling tuples of consecutive words.
N-grams
Sentence: The cat sat on the mat
2-grams: the-cat, cat-sat, sat-on, on-the, the-mat
Notice how even these short n-grams “make sense” as linguistic
units. For the other sentence we would have different features:
Sentence: The mat sat on the cat
2-grams: the-mat, mat-sat, sat-on, on-the, the-cat
We can go still further and construct 3-grams:
Sentence: The cat sat on the mat
3-grams: the-cat-sat, cat-sat-on, sat-on-the, on-the-mat
Which capture still more of the meaning:
Sentence: The mat sat on the cat
3-grams: the-mat-sat, mat-sat-on, sat-on-the, on-the-cat
N-gram Language Models
N-grams can be used to build statistical models of texts.
When this is done, they are called n-gram language models.
An n-gram language model associates a probability with each n-
gram, such that the sum over all n-grams (for fixed n) is 1.
Parts of Speech
Thrax’s original list (c. 100 B.C):
• Noun
• Verb
• Pronoun
• Preposition
• Adverb
• Conjunction
• Participle
• Article
Parts of Speech
Thrax’s original list (c. 100 B.C):
• Noun (boat, plane, Obama)
• Verb (goes, spun, hunted)
• Pronoun (She, Her)
• Preposition (in, on)
• Adverb (quietly, then)
• Conjunction (and, but)
• Participle (eaten, running)
• Article (the, a)
Parts of Speech (Penn Treebank 2014)
1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN
Preposition or subordinating
conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP
Verb, non-3rd person singular
present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb
Shallow Parsing or Chunking
• There are five major categories of phrases:
• Noun phrase (NP): These are phrases where a noun acts as the head word. Noun
phrases act as a subject or object to a verb.
• Verb phrase (VP): These phrases are lexical units that have a verb acting as the
head word. Usually, there are two forms of verb phrases. One form has the verb
components as well as other entities such as nouns, adjectives, or adverbs as parts
of the object.
• Adjective phrase (ADJP): These are phrases with an adjective as the head word.
Their main role is to describe or qualify nouns and pronouns in a sentence, and
they will be either placed before or after the noun or pronoun.
• Adverb phrase (ADVP): These phrases act like adverbs since the adverb acts as the
head word in the phrase. Adverb phrases are used as modifiers for nouns, verbs,
or adverbs themselves by providing further details that describe or qualify them.
• Prepositional phrase (PP): These phrases usually contain a preposition as the head
word and other lexical components like nouns, pronouns, and so on. These act like
an adjective or adverb describing other words or phrases.
Grammars
Grammars comprise rules that specify acceptable sentences in the
language: (S is the sentence or root node) “the cat sat on the mat”
• S  NP VP
• S  NP VP PP (the cat) (sat) (on the mat)
• NP  DT NN (the cat), (the mat)
• VP  VB NP
• VP  VBD
• PP  IN NP
• DT  “the”
• NN  “mat”, “cat”
• VBD  “sat”
• IN  “on”
Grammars
The reconstruction of a sequence of grammar productions from a
sentence is called “parsing” the sentence.
It is most conveniently represented as a tree:
Parse Trees
“The cat sat on the mat”
Parse Trees
In bracket notation:
(ROOT
(S
(NP (DT the) (NN cat))
(VP (VBD sat)
(PP (IN on)
(NP (DT the) (NN mat))))))
Grammars
There are typically multiple ways to produce the same sentence.
Consider the statement by Groucho Marx:
“While I was in Africa, I shot an elephant in my pajamas”
“How he got into my pajamas, I don’t know”
Parse Trees
“…,I shot an elephant in my pajamas” -what people hear first
Parse Trees
Groucho’s version
Recursion in Grammars
Its also possible to have
“sentences” inside other
sentences…
S  NP VP
VP  VB NP SBAR
SBAR  IN S
SBAR stands for Subordinate
clause
For example;
After she ate the cake, Emma
visited Tony in his room.
Recursion is common in grammar rules, e.g.
Because of this, sentences of arbitrary length are possible.
“Nero played his lyre while Rome burned”.
PCFGs
Complex sentences can be parsed in many ways, most of which
make no sense or are extremely improbable (like Groucho’s
example).
Probabilistic Context-Free Grammars (PCFGs) associate and learn
probabilities for each rule:
S  NP VP 0.3
S  NP VP PP 0.7
The parser then tries to find the most likely sequence of
productions that generate the given sentence. This adds more
realistic “world knowledge” and generally gives much better results.
Most state-of-the-art parsers these days use PCFGs.
Systems
• NLTK: Python-based NLP system. Many modules, good
visualization tools, but not quite state-of-the-art performance.
• Stanford Parser: Another comprehensive suite of tools (also POS
tagger), and state-of-the-art accuracy. Has the definitive
dependency module.
• Berkeley Parser: Slightly higher parsing accuracy (than Stanford)
but not as many modules.
• Note: high-quality parsing is usually very slow, but see:
https://github.com/dlwh/puck
Dependencies
In a constituency parse, there is no direct relation between the
constituents and words from the sentence (except for leaf nodes
which produce a single word).
In dependency parsing, the idea is to decompose the sentence into
relations directly between words.
Lets look at a simple example:
Dependencies
“The cat sat on the mat”
dependency tree parse tree
constituency labels of leaf nodes
Dependencies
From the dependency tree, we can obtain a “sketch” of the
sentence. i.e. by starting at the root we can look down one level to
get:
“cat sat on”
And then by looking for the object of the prepositional child, we get:
“cat sat on mat”
We can easily ignore determiners “a, the”.
And importantly, adjectival and adverbial modifiers generally
connect to their targets:
Dependencies
“Brave Merida prepared for a long, cold winter”
Dependencies
“Russell reveals himself here as a supremely gifted director of
actors”

More Related Content

What's hot

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Roelof Pieters
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
Marina Santini
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
Rajnish Raj
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
Oswal Abhishek
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
Marina Santini
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
Hemantha Kulathilake
 
CS571: Phrase Structure Grammar
CS571: Phrase Structure GrammarCS571: Phrase Structure Grammar
CS571: Phrase Structure Grammar
Jinho Choi
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
HeneWijaya
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Marina Santini
 
An Introduction to Morphology
An Introduction to MorphologyAn Introduction to Morphology
An Introduction to MorphologySorina Dragusanu
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
kartikaVashisht
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
Gurram Poorna Prudhvi
 
Word2Vec
Word2VecWord2Vec
Word2Vec
hyunyoung Lee
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
ThennarasuSakkan
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Robert Lujo
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
irpycon
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
Marina Santini
 

What's hot (20)

Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
CS571: Phrase Structure Grammar
CS571: Phrase Structure GrammarCS571: Phrase Structure Grammar
CS571: Phrase Structure Grammar
 
Phrase structure
Phrase structurePhrase structure
Phrase structure
 
Phrase structure
Phrase structurePhrase structure
Phrase structure
 
Language Model (N-Gram).pptx
Language Model (N-Gram).pptxLanguage Model (N-Gram).pptx
Language Model (N-Gram).pptx
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
An Introduction to Morphology
An Introduction to MorphologyAn Introduction to Morphology
An Introduction to Morphology
 
Syntactic analysis in NLP
Syntactic analysis in NLPSyntactic analysis in NLP
Syntactic analysis in NLP
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
6 shallow parsing introduction
6 shallow parsing introduction6 shallow parsing introduction
6 shallow parsing introduction
 
Morphs and allomorphs
Morphs and allomorphsMorphs and allomorphs
Morphs and allomorphs
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
Word2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad MahdaviWord2Vec: Vector presentation of words - Mohammad Mahdavi
Word2Vec: Vector presentation of words - Mohammad Mahdavi
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 

Similar to Natural Language parsing.pptx

Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
KhenAguinillo
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
LamhotNaibaho3
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
jaishreemane73
 
sentence patterns
sentence patternssentence patterns
sentence patterns
Jack Tony
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Pranav Gupta
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Mustafa Jarrar
 
Syntax
SyntaxSyntax
unit -3 part 1.ppt
unit -3 part 1.pptunit -3 part 1.ppt
unit -3 part 1.ppt
LSURYAPRAKASHREDDY
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
DigiGurukul
 
Syntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesSyntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesAkashgary
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
Lidia Pivovarova
 
Introduction to linguistics syntax
Introduction to linguistics syntaxIntroduction to linguistics syntax
Introduction to linguistics syntax
AYOUBDRAOUI
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
shrey bhate
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parametersVelnar
 
Syntax
SyntaxSyntax
Syntax
gloriousteph
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
INESC-ID (Spoken Language Systems Laboratory - L2F)
 

Similar to Natural Language parsing.pptx (20)

Syntax.ppt
Syntax.pptSyntax.ppt
Syntax.ppt
 
INFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.pptINFO-2950-Languages-and-Grammars.ppt
INFO-2950-Languages-and-Grammars.ppt
 
ToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdfToC_M1L3_Grammar and Derivation.pdf
ToC_M1L3_Grammar and Derivation.pdf
 
sentence patterns
sentence patternssentence patterns
sentence patterns
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing Adnan: Introduction to Natural Language Processing
Adnan: Introduction to Natural Language Processing
 
Syntax
SyntaxSyntax
Syntax
 
unit -3 part 1.ppt
unit -3 part 1.pptunit -3 part 1.ppt
unit -3 part 1.ppt
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
Syntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesSyntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrases
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
 
Syntax
SyntaxSyntax
Syntax
 
Introduction to linguistics syntax
Introduction to linguistics syntaxIntroduction to linguistics syntax
Introduction to linguistics syntax
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parameters
 
NLP
NLPNLP
NLP
 
NLP
NLPNLP
NLP
 
Syntax
SyntaxSyntax
Syntax
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 

Recently uploaded

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 

Recently uploaded (20)

The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 

Natural Language parsing.pptx

  • 2. Outline • N-grams • Grammars • Parsing • Dependencies
  • 3. Natural Language Processing Some basic terms: • Syntax: the allowable structures in the language: sentences, phrases, affixes (-ing, -ed, -ment, etc.). • Semantics: the meaning(s) of texts in the language. • Part-of-Speech (POS): the category of a word (noun, verb, preposition etc.). • Bag-of-words (BoW): a featurization that uses a vector of word counts (or binary) ignoring order. • N-gram: for a fixed, small N (2-5 is common), an n-gram is a consecutive sequence of words in a text.
  • 4. Bag of words Featurization Assuming we have a dictionary mapping words to a unique integer id, a bag-of-words featurization of a sentence could look like this: Sentence: The cat sat on the mat word id’s: 1 12 5 3 1 14 The BoW featurization would be the vector: Vector 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1 position 1 3 5 12 14 In practice this would be stored as a sparse vector of (id, count)s: (1,2),(3,1),(5,1),(12,1),(14,1) Note that the original word order is lost, replaced by the order of id’s.
  • 5. N-grams Because word order is lost, the sentence meaning is weakened. This sentence has quite a different meaning but the same BoW vector: Sentence: The mat sat on the cat word id s: 1 14 5 3 1 12 BoW featurization: Vector 2, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1 But word order is important, especially the order of nearby words. N-grams capture this, by modeling tuples of consecutive words.
  • 6. N-grams Sentence: The cat sat on the mat 2-grams: the-cat, cat-sat, sat-on, on-the, the-mat Notice how even these short n-grams “make sense” as linguistic units. For the other sentence we would have different features: Sentence: The mat sat on the cat 2-grams: the-mat, mat-sat, sat-on, on-the, the-cat We can go still further and construct 3-grams: Sentence: The cat sat on the mat 3-grams: the-cat-sat, cat-sat-on, sat-on-the, on-the-mat Which capture still more of the meaning: Sentence: The mat sat on the cat 3-grams: the-mat-sat, mat-sat-on, sat-on-the, on-the-cat
  • 7. N-gram Language Models N-grams can be used to build statistical models of texts. When this is done, they are called n-gram language models. An n-gram language model associates a probability with each n- gram, such that the sum over all n-grams (for fixed n) is 1.
  • 8. Parts of Speech Thrax’s original list (c. 100 B.C): • Noun • Verb • Pronoun • Preposition • Adverb • Conjunction • Participle • Article
  • 9. Parts of Speech Thrax’s original list (c. 100 B.C): • Noun (boat, plane, Obama) • Verb (goes, spun, hunted) • Pronoun (She, Her) • Preposition (in, on) • Adverb (quietly, then) • Conjunction (and, but) • Participle (eaten, running) • Article (the, a)
  • 10. Parts of Speech (Penn Treebank 2014) 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 14. NNP Proper noun, singular 15. NNPS Proper noun, plural 16. PDT Predeterminer 17. POS Possessive ending 18. PRP Personal pronoun 19. PRP$ Possessive pronoun 20. RB Adverb 21. RBR Adverb, comparative 22. RBS Adverb, superlative 23. RP Particle 24. SYM Symbol 25. TO to 26. UH Interjection 27. VB Verb, base form 28. VBD Verb, past tense 29. VBG Verb, gerund or present participle 30. VBN Verb, past participle 31. VBP Verb, non-3rd person singular present 32. VBZ Verb, 3rd person singular present 33. WDT Wh-determiner 34. WP Wh-pronoun 35. WP$ Possessive wh-pronoun 36. WRB Wh-adverb
  • 11. Shallow Parsing or Chunking • There are five major categories of phrases: • Noun phrase (NP): These are phrases where a noun acts as the head word. Noun phrases act as a subject or object to a verb. • Verb phrase (VP): These phrases are lexical units that have a verb acting as the head word. Usually, there are two forms of verb phrases. One form has the verb components as well as other entities such as nouns, adjectives, or adverbs as parts of the object. • Adjective phrase (ADJP): These are phrases with an adjective as the head word. Their main role is to describe or qualify nouns and pronouns in a sentence, and they will be either placed before or after the noun or pronoun. • Adverb phrase (ADVP): These phrases act like adverbs since the adverb acts as the head word in the phrase. Adverb phrases are used as modifiers for nouns, verbs, or adverbs themselves by providing further details that describe or qualify them. • Prepositional phrase (PP): These phrases usually contain a preposition as the head word and other lexical components like nouns, pronouns, and so on. These act like an adjective or adverb describing other words or phrases.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16. Grammars Grammars comprise rules that specify acceptable sentences in the language: (S is the sentence or root node) “the cat sat on the mat” • S  NP VP • S  NP VP PP (the cat) (sat) (on the mat) • NP  DT NN (the cat), (the mat) • VP  VB NP • VP  VBD • PP  IN NP • DT  “the” • NN  “mat”, “cat” • VBD  “sat” • IN  “on”
  • 17. Grammars The reconstruction of a sequence of grammar productions from a sentence is called “parsing” the sentence. It is most conveniently represented as a tree:
  • 18. Parse Trees “The cat sat on the mat”
  • 19. Parse Trees In bracket notation: (ROOT (S (NP (DT the) (NN cat)) (VP (VBD sat) (PP (IN on) (NP (DT the) (NN mat))))))
  • 20. Grammars There are typically multiple ways to produce the same sentence. Consider the statement by Groucho Marx: “While I was in Africa, I shot an elephant in my pajamas” “How he got into my pajamas, I don’t know”
  • 21. Parse Trees “…,I shot an elephant in my pajamas” -what people hear first
  • 23. Recursion in Grammars Its also possible to have “sentences” inside other sentences… S  NP VP VP  VB NP SBAR SBAR  IN S SBAR stands for Subordinate clause For example; After she ate the cake, Emma visited Tony in his room. Recursion is common in grammar rules, e.g. Because of this, sentences of arbitrary length are possible.
  • 24. “Nero played his lyre while Rome burned”.
  • 25. PCFGs Complex sentences can be parsed in many ways, most of which make no sense or are extremely improbable (like Groucho’s example). Probabilistic Context-Free Grammars (PCFGs) associate and learn probabilities for each rule: S  NP VP 0.3 S  NP VP PP 0.7 The parser then tries to find the most likely sequence of productions that generate the given sentence. This adds more realistic “world knowledge” and generally gives much better results. Most state-of-the-art parsers these days use PCFGs.
  • 26. Systems • NLTK: Python-based NLP system. Many modules, good visualization tools, but not quite state-of-the-art performance. • Stanford Parser: Another comprehensive suite of tools (also POS tagger), and state-of-the-art accuracy. Has the definitive dependency module. • Berkeley Parser: Slightly higher parsing accuracy (than Stanford) but not as many modules. • Note: high-quality parsing is usually very slow, but see: https://github.com/dlwh/puck
  • 27. Dependencies In a constituency parse, there is no direct relation between the constituents and words from the sentence (except for leaf nodes which produce a single word). In dependency parsing, the idea is to decompose the sentence into relations directly between words. Lets look at a simple example:
  • 28. Dependencies “The cat sat on the mat” dependency tree parse tree constituency labels of leaf nodes
  • 29. Dependencies From the dependency tree, we can obtain a “sketch” of the sentence. i.e. by starting at the root we can look down one level to get: “cat sat on” And then by looking for the object of the prepositional child, we get: “cat sat on mat” We can easily ignore determiners “a, the”. And importantly, adjectival and adverbial modifiers generally connect to their targets:
  • 30. Dependencies “Brave Merida prepared for a long, cold winter”
  • 31. Dependencies “Russell reveals himself here as a supremely gifted director of actors”

Editor's Notes

  1. What’s missing?
  2. These are the labels you’ll see in constituency parsers these days