SlideShare a Scribd company logo
1 of 47
Tópicos Especiais em Computação: Introdução ao
Processamento da Linguagem Natural - Profa. Dra. Aline
Villavicencio
Speech and Language Processing: An introduction to natural
language processing, computational linguistics, and speech
recognition - Daniel Jurafsky & James H. Martin.
Chapter 12 - FORMAL GRAMMARS OF ENGLISH
por Luciano Sclovsky - setembro de 2015
1
2
Introduction
Study of grammar dates back to Panini's study of Sanskrit
The grammar is wrong!
Syntax (greek) - setting out together or arrangement
ATIS domain example (Air Traffic Information Systems)
3
Main ideas in this chapter
Constituent (such as noun phrase)
Grammatical relations (such as subject and object from traditional grammar)
Subcategorization and dependency relations - kinds of relations between
words and phrases
Context Free Grammars are the backbone of formal models
grammar checking, semantic interpretation, dialogue understanding, machine translation
4
Constituency
Noun phrase: sequence of words around at least one noun
Can appear in similar syntactic environments (such as before a verb)
They
Harry the horse
a high-class spot such as Mandy's
Preposed
On september 17th, I'd like to fly from Atlanta to Denver
Postposed 5
Context-Free Grammars (CFG)
The most commonly used mathematical system
AKA: Phrase-Structure Grammar
Equivalent to: Backus-Naur Form (BNF)
Dates back to 1900 (psy Wilhelm Wundt), formalized by Chomsky (1956)
is a set of rules or productions
expresses the way symbols can be grouped and ordered together
includes a lexicon of words
can be hierarchically embedded
6
Context-Free Grammars (CFG)
terminals: correspond to words in the language
non-terminals: expresses generalization
production:
left: non-terminal
right: ordered list of terminals and non-terminals
the production associated to a single word is associated with its lexical category
A CFG can be thought as
a device for generating sentenced
7
Context-Free Grammars (CFG)
NP immediately dominates Det and Nom
NP dominates all nodes
Each CFG has a start symbol: often S (sentence node)
Example:
sentence can consist of a noun phrase followed by VERB PHRASE by a verb phrase
S → NP VP I prefer a morning flight
VP → Verb NP prefer a morning flight
VP → Verb NP PP leave Boston in the morning
VP → Verb PP leaving on Thursday
PP → Preposition NP from Los Angeles (locations, times, dates) 8
9
Formal definition of a CFG
A CFG is defined by a "4-tuple":
Convention:
10
Formal definition of a CFG
Parsing: mapping a string of words into a parse tree
Direct derivation: product of a single production
Derivation: product of n productions
11
Some Grammar rules for English: Sentence Level Constructions
Four important sentence structures:
declarative:
subject NP + VP I prefer a morning flight
imperative
S → VP (and no subject) Show the lowest fare
yes-no question
S → Aux NP VP (aux verb) Do any of these flights have stops
wh-question 12
Some Grammar rules for English: Sentence Level Constructions
wh-question
mainly wh-subject-question: identical to declarative, with the wh-word
S → Wh-NP VP What airlines fly from Burbank to Denver?
wh-non-subject-question: sentence includes another subject
S → Wh-NP Aux NP VP What flights do you have from Burbank to Tacoma Washington?
wh-non-subject-question is an example of a long-distance dependency
semantic interpretation: parse that flights is the argument of have
13
English: Clauses and Sentences
Clause: forms a complete thought
a clause is the smallest grammatical unit that can express a complete proposition (Wikipedia)
Sentence: may form larger structures
S may be in the right side of a production
14
English: the Noun Phrase
Most frequent types: pronouns, proper-nouns and the NP construction
NP → Det Nominal
Determiner: NP may begin with simple lexical determiners
a stop these flights some flights
may have more complex structures (recursive)
Det → NP ′s Denver's mayor's mother's canceled flight
Nominal: follows the determiner and contains pre- and post-head noun
modifiers
Nominal → Noun
15
English: the Noun Phrase
Determiner: before the Head Noun
cardinal numbers: two friends
ordinal numbers: the first one the last flight
quantifiers: many fares much luggage
adjective phrase: the least expensive fare
NP → (Det) (Card) (Ord) (Quant) (AP) Nominal
16
English: the Noun Phrase
Determiner: after the Head Noun (postmodifiers)
prepositional phrases: all flights from Cleveland
common in ATIS corpus to mark origin and destination
Nominal → Nominal PP
non-finite clauses any flights arriving after eleven a.m.
gerundive (-ing, -ed) any of those leaving on Thursday
Nominal → Nominal GerundVP
infinitive the last flight to arrive in Boston
relative clauses a flight that serves breakfast 17
English: the Noun Phrase
predeterminer: word classes that modify and appear before the NP
all the flights
18
19
English: the Agreement
Agreement: in 3rd person singular the final verb form is different
the flight does the flights do
Occurs also when the verb has a noun acting as a subject
What flight leave in the morning?
One simple solution is to have different productions, but it doubles the size of
the grammar!
S → 3sgAux 3sgNP VP
S → Non3sgAux Non3sgNP VP
Rule proliferation would happen also in the noun case: nominative (I, she, he,
they) and accusative (me, her, him, them) pronouns
20
English: Verb Phrase and Subcategorization
Verb Phrase may contain NPs and PPs and combinations of both
VP → Verb disappear
VP → Verb NP prefer a morning flight
VP → Verb NP PP leave Boston in the morning
VP → Verb PP leaving on Thursday
Sentential complement: an entire embedded sentence, and others
VP → Verb S You said there were two flights that were the cheapest
A VP may include another VP
I want to fly from Milwaukee to Orlando
Not every verb is compatible with every verb phrase
21
English: Verb Phrase and Subcategorization
transitive verb: takes a direct object I found a flight
intransitive verb: I
disappeared
modern grammars sub categorizes verbs into more than 100 subcategories
complements of the verb
Verb-with-NP-complement → find | leave | repeat | . . .
Verb-with-S-complement → think | believe | say | . . .
Verb-with-Inf-VP-complement → want | try | need | . . .
VP → Verb-with-no-complement disappear
VP → Verb-with-NP-comp NP prefer a morning flight
VP → Verb-with-S-comp S said there
were two flights
22
English: Auxiliaries
auxiliary or helping verbs: places a constraint on the form of the following
verb, and must combine in a particular order
modal verbs: can, could, may, might, must, will, would, shall, should
perfect auxiliary: have
progressive auxiliary: be
passive auxiliary: be
A sentence can have multiple auxiliary verbs, but they must occur in a particular order: modal <
perfect < progressive < passive
modal perfect could have been a contender
modal passive will be married
perfect progressive have been feasting
modal perfect passive might have been prevented
23
English: Auxiliaries
verb group: includes the main verb and its auxiliaries
active sentence: subject is the semantic event described by the verb
I prevented a catastrophe
passive sentence: different than other auxiliaries: subject is the undergoer of
the event
a catastrophe was prevented
24
English: Coordination
Conjunctions: conjoins major phrase types
Coordinate noun phrase
NP → NP and NP Please repeat the flights and the costs
VP → VP and VP What flights do you have leaving Denver and arriving in
San Francisco
S → S and S
Metarule: X → X and X
25
Treebanks
A treebank is a corpus in which each sentence is syntactically annotated with
a parse tree
The Penn Treebank Project
26
27
long distance
dependencies or
syntactic
movement are
marked
28
Using a treebank as a grammar
In Penn Treebank there are 4,500 rules for expanding VP in every possible
arrangement
VP → VBD PP
VP → VBD PP PP
VP → VBD PP PP PP
VP → VBD PP PP PP PP
VP → VB ADVP PP
...
VP → VBP PP PP PP PP PP ADVP PP
Thousands of NP rules
[DT The] [JJ state-owned] [JJ industrial] [VBG holding] [NN company] [NNP Instituto] [NNP
Nacional] [FW de] [NNP Industria]
NP → DT JJ JJ VBG NN NNP NNP FW NNP 29
Searching Treebanks
TGrep2
Example: a node which either is the word bush or else ends in the string tree can be expressed
as:
/tree$/|bush
30
31
Heads
Head: the word in the phrase which is grammatically the most important
Dates back to 1914 (Bloomfield)
Central to Head-Driven Phrase Structure Grammar
32
Heads finding
Heads are passed up the parse tree
Finding heads is simple for textbooks examples
Modern linguistic theories of syntax generally include a component that define
heads
33
Head finding rule example
If the last word is tagged POS, return last-word.
Else search from right to left for the first child which is an NN, NNP, NNPS, NX, POS,
or JJR.
Else search from left to right for the first child which is an NP.
Else search from right to left for the first child which is a $, ADJP, or PRN.
Else search from right to left for the first child which is a CD.
Else search from right to left for the first child which is a JJ, JJS, RB or QP.
Else return the last word
34
Grammar equivalence and normal form
A formal language is defined as a possibly infinite set of strings of word
Two grammars are equivalent if they generate the same set of strings
Two CFGs may generate the same language
weak equivalence - do not assign the same phrase structure
strong equivalence - assign the same phrase structure
Normal form: each of productions take the same form
Chomsky Normal Form (CNF)
right-side: two non-terminal symbols, or one terminal symbol 35
Finite-State and Context-Free Grammars
Adequate models of grammar need to be able to represent the complexities of
a natural language
Why can't we use finite-state methods?
mathematical reason: certains structures present in English make them non regular languages
subjective reason: non expressive
A finite state machine does not have center-embedded recursions
The luggage arrived.
The luggage that the passengers checked arrived.
The luggage that the passengers that the storm delayed checked arrived.
For many practical purposes matching semantic and syntactic rules aren't 36
Dependency Grammars
the syntactic structure of a sentence is described purely in terms of words and
binary semantic or syntactic relations between these words
since 1959 (Tesnière), trace back to ancient Greek and Indian linguistic
traditions
handle languages with relative free word order, such as Czech
37
Similarity between dependency graphs and head structures
38
Categorial grammar
combinatory categorial grammar (CCG)
two types of arguments: functors and arguments, that can be combined
For example, a determiner can be thought of as a function that applies to an N on its right to
produce an NP
39
40
Spoken language syntax
Utterance is the unit for spoken language (fala)
Fragments are common: pauses, non-verbal sounds, partial words
The subject of a sentence is often a pronoun
Disfluencies
words uh and um
word repetitions
restarts
word fragments: wha-, betwee- 41
Spoken language syntax
Repair, reparandum, interruption point
Edit terms: you know, I mean, uh, um
Find the interruption point, delete the reparandum, replace with repair
42
Treebanks of Spoken Language
Augmented notation with
disfluencies and other
phenomena
43
Grammars and Human processing
Do people use context-free grammars in their mental processing of language?
It has proved very difficult to find clear-cut evidence that they do.
Constituent: perceptual unit
constituents are often semantic units as well as syntactic units
It is necessary to find evidence for a constituent which is not a semantic unit
ditransitive alternation: change two arguments of a verb
The wealthy widow gave [NP the church] [NP her Mercedes].
The wealthy widow gave [NP her Mercedes] [PP to the church].
44
Summary
In many languages, groups of consecutive words act as a group or a constituent, which can be
modeled by context-free grammars (also known as phrase-structure grammars).
A context-free grammar consists of a set of rules or productions, expressed over a set of non-
terminal symbols and a set of terminal symbols. Formally, a particular context-free language is the
set of strings which can be derived from a particular context-free grammar.
A generative grammar is a traditional name in linguistics for a formal language which is used to
model the grammar of a natural language.
There are many sentence-level grammatical constructions in English; declarative, imperative, yes-no-
question, and wh-question are four very common types, which can be modeled with context-free
rules.
An English noun phrase can have determiners, numbers, quantifiers, and adjective phrases preceding
the head noun, which can be followed by a number of postmodifiers; gerundive VPs, infinitives
VPs, and past participial VPs are common possibilities.
45
Summary
Subjects in English agree with the main verb in person and number.
Verbs can be subcategorized by the types of complements they expect. Simple subcategories are
transitive and intransitive; most grammars include many more categories than these.
The correlate of sentences in spoken language are generally called utterances. Utterances may be
disfluent, containing filled pauses like um and uh, restarts, and repairs.
Treebanks of parsed sentences exist for many genres of English and for many languages. Treebanks
can be searched using tree-search tools.
Any context-free grammar can be converted to Chomsky normal form, in which the right-hand-side
of each rule has either two non-terminals or a single terminal.
Context-free grammars are more powerful than finite-state automata, but it is nonetheless possible
to approximate a context-free grammar with a FSA.
There is some evidence that constituency plays a role in the human processing of language.
46
Thanks!
luciano.sky@gmail.com
@cafedosky
47

More Related Content

What's hot

Natural language processing
Natural language processing Natural language processing
Natural language processing Md.Sumon Sarder
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingVeenaSKumar2
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural language processing
Natural language processingNatural language processing
Natural language processingsowmiya_mohan
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language ProcessingMichel Bruley
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Natural language processing
Natural language processingNatural language processing
Natural language processingSaurav Aryal
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 

What's hot (20)

Natural language processing
Natural language processing Natural language processing
Natural language processing
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
 
NLP_KASHK:POS Tagging
NLP_KASHK:POS TaggingNLP_KASHK:POS Tagging
NLP_KASHK:POS Tagging
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Big Data and Natural Language Processing
Big Data and Natural Language ProcessingBig Data and Natural Language Processing
Big Data and Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLTK
NLTKNLTK
NLTK
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 

Similar to Formal Grammars of English

Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Phrases and Conjuction
Phrases and ConjuctionPhrases and Conjuction
Phrases and Conjuctiontrinorei22
 
Derivation & inflection
Derivation & inflectionDerivation & inflection
Derivation & inflectionLinda Julie
 
Generative Theory on Language
Generative Theory on LanguageGenerative Theory on Language
Generative Theory on LanguageRyan Gan
 
Syntax tree diagrams
Syntax tree diagramsSyntax tree diagrams
Syntax tree diagramsRUBEN ZAPATA
 
Meeting 4-structure-of-modification2
Meeting 4-structure-of-modification2Meeting 4-structure-of-modification2
Meeting 4-structure-of-modification2Tawarik
 
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_words
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_wordsClass 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_words
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_wordsLisa Lavoie
 
sentence patterns
sentence patternssentence patterns
sentence patternsJack Tony
 
Mind map esl 502
Mind map esl 502Mind map esl 502
Mind map esl 502k1hinze
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
natural language processing
natural language processing natural language processing
natural language processing sunanthakrishnan
 
Syntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesSyntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesAkashgary
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetationqamaraftab6
 

Similar to Formal Grammars of English (20)

Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
Syntax
SyntaxSyntax
Syntax
 
Phrases and Conjuction
Phrases and ConjuctionPhrases and Conjuction
Phrases and Conjuction
 
Derivation & inflection
Derivation & inflectionDerivation & inflection
Derivation & inflection
 
English Syntax Primer Bimestre
English Syntax Primer BimestreEnglish Syntax Primer Bimestre
English Syntax Primer Bimestre
 
Generative Theory on Language
Generative Theory on LanguageGenerative Theory on Language
Generative Theory on Language
 
Syntax tree diagrams
Syntax tree diagramsSyntax tree diagrams
Syntax tree diagrams
 
Meeting 4-structure-of-modification2
Meeting 4-structure-of-modification2Meeting 4-structure-of-modification2
Meeting 4-structure-of-modification2
 
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_words
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_wordsClass 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_words
Class 04 emerson_phonetics_fall2014_stress_calif_vowels_don_dawn_related_words
 
sentence patterns
sentence patternssentence patterns
sentence patterns
 
Week 2-syntax student copy
Week 2-syntax student copyWeek 2-syntax student copy
Week 2-syntax student copy
 
syntax
syntaxsyntax
syntax
 
Mind map esl 502
Mind map esl 502Mind map esl 502
Mind map esl 502
 
Week 2-syntax student copy
Week 2-syntax student copyWeek 2-syntax student copy
Week 2-syntax student copy
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
natural language processing
natural language processing natural language processing
natural language processing
 
Grammar Syntax(1).pptx
Grammar Syntax(1).pptxGrammar Syntax(1).pptx
Grammar Syntax(1).pptx
 
Generative grammer
Generative grammerGenerative grammer
Generative grammer
 
Syntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrasesSyntax 334 lecture 4 Noun phrases
Syntax 334 lecture 4 Noun phrases
 
Syntax presetation
Syntax presetationSyntax presetation
Syntax presetation
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Formal Grammars of English

  • 1. Tópicos Especiais em Computação: Introdução ao Processamento da Linguagem Natural - Profa. Dra. Aline Villavicencio Speech and Language Processing: An introduction to natural language processing, computational linguistics, and speech recognition - Daniel Jurafsky & James H. Martin. Chapter 12 - FORMAL GRAMMARS OF ENGLISH por Luciano Sclovsky - setembro de 2015 1
  • 2. 2
  • 3. Introduction Study of grammar dates back to Panini's study of Sanskrit The grammar is wrong! Syntax (greek) - setting out together or arrangement ATIS domain example (Air Traffic Information Systems) 3
  • 4. Main ideas in this chapter Constituent (such as noun phrase) Grammatical relations (such as subject and object from traditional grammar) Subcategorization and dependency relations - kinds of relations between words and phrases Context Free Grammars are the backbone of formal models grammar checking, semantic interpretation, dialogue understanding, machine translation 4
  • 5. Constituency Noun phrase: sequence of words around at least one noun Can appear in similar syntactic environments (such as before a verb) They Harry the horse a high-class spot such as Mandy's Preposed On september 17th, I'd like to fly from Atlanta to Denver Postposed 5
  • 6. Context-Free Grammars (CFG) The most commonly used mathematical system AKA: Phrase-Structure Grammar Equivalent to: Backus-Naur Form (BNF) Dates back to 1900 (psy Wilhelm Wundt), formalized by Chomsky (1956) is a set of rules or productions expresses the way symbols can be grouped and ordered together includes a lexicon of words can be hierarchically embedded 6
  • 7. Context-Free Grammars (CFG) terminals: correspond to words in the language non-terminals: expresses generalization production: left: non-terminal right: ordered list of terminals and non-terminals the production associated to a single word is associated with its lexical category A CFG can be thought as a device for generating sentenced 7
  • 8. Context-Free Grammars (CFG) NP immediately dominates Det and Nom NP dominates all nodes Each CFG has a start symbol: often S (sentence node) Example: sentence can consist of a noun phrase followed by VERB PHRASE by a verb phrase S → NP VP I prefer a morning flight VP → Verb NP prefer a morning flight VP → Verb NP PP leave Boston in the morning VP → Verb PP leaving on Thursday PP → Preposition NP from Los Angeles (locations, times, dates) 8
  • 9. 9
  • 10. Formal definition of a CFG A CFG is defined by a "4-tuple": Convention: 10
  • 11. Formal definition of a CFG Parsing: mapping a string of words into a parse tree Direct derivation: product of a single production Derivation: product of n productions 11
  • 12. Some Grammar rules for English: Sentence Level Constructions Four important sentence structures: declarative: subject NP + VP I prefer a morning flight imperative S → VP (and no subject) Show the lowest fare yes-no question S → Aux NP VP (aux verb) Do any of these flights have stops wh-question 12
  • 13. Some Grammar rules for English: Sentence Level Constructions wh-question mainly wh-subject-question: identical to declarative, with the wh-word S → Wh-NP VP What airlines fly from Burbank to Denver? wh-non-subject-question: sentence includes another subject S → Wh-NP Aux NP VP What flights do you have from Burbank to Tacoma Washington? wh-non-subject-question is an example of a long-distance dependency semantic interpretation: parse that flights is the argument of have 13
  • 14. English: Clauses and Sentences Clause: forms a complete thought a clause is the smallest grammatical unit that can express a complete proposition (Wikipedia) Sentence: may form larger structures S may be in the right side of a production 14
  • 15. English: the Noun Phrase Most frequent types: pronouns, proper-nouns and the NP construction NP → Det Nominal Determiner: NP may begin with simple lexical determiners a stop these flights some flights may have more complex structures (recursive) Det → NP ′s Denver's mayor's mother's canceled flight Nominal: follows the determiner and contains pre- and post-head noun modifiers Nominal → Noun 15
  • 16. English: the Noun Phrase Determiner: before the Head Noun cardinal numbers: two friends ordinal numbers: the first one the last flight quantifiers: many fares much luggage adjective phrase: the least expensive fare NP → (Det) (Card) (Ord) (Quant) (AP) Nominal 16
  • 17. English: the Noun Phrase Determiner: after the Head Noun (postmodifiers) prepositional phrases: all flights from Cleveland common in ATIS corpus to mark origin and destination Nominal → Nominal PP non-finite clauses any flights arriving after eleven a.m. gerundive (-ing, -ed) any of those leaving on Thursday Nominal → Nominal GerundVP infinitive the last flight to arrive in Boston relative clauses a flight that serves breakfast 17
  • 18. English: the Noun Phrase predeterminer: word classes that modify and appear before the NP all the flights 18
  • 19. 19
  • 20. English: the Agreement Agreement: in 3rd person singular the final verb form is different the flight does the flights do Occurs also when the verb has a noun acting as a subject What flight leave in the morning? One simple solution is to have different productions, but it doubles the size of the grammar! S → 3sgAux 3sgNP VP S → Non3sgAux Non3sgNP VP Rule proliferation would happen also in the noun case: nominative (I, she, he, they) and accusative (me, her, him, them) pronouns 20
  • 21. English: Verb Phrase and Subcategorization Verb Phrase may contain NPs and PPs and combinations of both VP → Verb disappear VP → Verb NP prefer a morning flight VP → Verb NP PP leave Boston in the morning VP → Verb PP leaving on Thursday Sentential complement: an entire embedded sentence, and others VP → Verb S You said there were two flights that were the cheapest A VP may include another VP I want to fly from Milwaukee to Orlando Not every verb is compatible with every verb phrase 21
  • 22. English: Verb Phrase and Subcategorization transitive verb: takes a direct object I found a flight intransitive verb: I disappeared modern grammars sub categorizes verbs into more than 100 subcategories complements of the verb Verb-with-NP-complement → find | leave | repeat | . . . Verb-with-S-complement → think | believe | say | . . . Verb-with-Inf-VP-complement → want | try | need | . . . VP → Verb-with-no-complement disappear VP → Verb-with-NP-comp NP prefer a morning flight VP → Verb-with-S-comp S said there were two flights 22
  • 23. English: Auxiliaries auxiliary or helping verbs: places a constraint on the form of the following verb, and must combine in a particular order modal verbs: can, could, may, might, must, will, would, shall, should perfect auxiliary: have progressive auxiliary: be passive auxiliary: be A sentence can have multiple auxiliary verbs, but they must occur in a particular order: modal < perfect < progressive < passive modal perfect could have been a contender modal passive will be married perfect progressive have been feasting modal perfect passive might have been prevented 23
  • 24. English: Auxiliaries verb group: includes the main verb and its auxiliaries active sentence: subject is the semantic event described by the verb I prevented a catastrophe passive sentence: different than other auxiliaries: subject is the undergoer of the event a catastrophe was prevented 24
  • 25. English: Coordination Conjunctions: conjoins major phrase types Coordinate noun phrase NP → NP and NP Please repeat the flights and the costs VP → VP and VP What flights do you have leaving Denver and arriving in San Francisco S → S and S Metarule: X → X and X 25
  • 26. Treebanks A treebank is a corpus in which each sentence is syntactically annotated with a parse tree The Penn Treebank Project 26
  • 27. 27
  • 29. Using a treebank as a grammar In Penn Treebank there are 4,500 rules for expanding VP in every possible arrangement VP → VBD PP VP → VBD PP PP VP → VBD PP PP PP VP → VBD PP PP PP PP VP → VB ADVP PP ... VP → VBP PP PP PP PP PP ADVP PP Thousands of NP rules [DT The] [JJ state-owned] [JJ industrial] [VBG holding] [NN company] [NNP Instituto] [NNP Nacional] [FW de] [NNP Industria] NP → DT JJ JJ VBG NN NNP NNP FW NNP 29
  • 30. Searching Treebanks TGrep2 Example: a node which either is the word bush or else ends in the string tree can be expressed as: /tree$/|bush 30
  • 31. 31
  • 32. Heads Head: the word in the phrase which is grammatically the most important Dates back to 1914 (Bloomfield) Central to Head-Driven Phrase Structure Grammar 32
  • 33. Heads finding Heads are passed up the parse tree Finding heads is simple for textbooks examples Modern linguistic theories of syntax generally include a component that define heads 33
  • 34. Head finding rule example If the last word is tagged POS, return last-word. Else search from right to left for the first child which is an NN, NNP, NNPS, NX, POS, or JJR. Else search from left to right for the first child which is an NP. Else search from right to left for the first child which is a $, ADJP, or PRN. Else search from right to left for the first child which is a CD. Else search from right to left for the first child which is a JJ, JJS, RB or QP. Else return the last word 34
  • 35. Grammar equivalence and normal form A formal language is defined as a possibly infinite set of strings of word Two grammars are equivalent if they generate the same set of strings Two CFGs may generate the same language weak equivalence - do not assign the same phrase structure strong equivalence - assign the same phrase structure Normal form: each of productions take the same form Chomsky Normal Form (CNF) right-side: two non-terminal symbols, or one terminal symbol 35
  • 36. Finite-State and Context-Free Grammars Adequate models of grammar need to be able to represent the complexities of a natural language Why can't we use finite-state methods? mathematical reason: certains structures present in English make them non regular languages subjective reason: non expressive A finite state machine does not have center-embedded recursions The luggage arrived. The luggage that the passengers checked arrived. The luggage that the passengers that the storm delayed checked arrived. For many practical purposes matching semantic and syntactic rules aren't 36
  • 37. Dependency Grammars the syntactic structure of a sentence is described purely in terms of words and binary semantic or syntactic relations between these words since 1959 (Tesnière), trace back to ancient Greek and Indian linguistic traditions handle languages with relative free word order, such as Czech 37
  • 38. Similarity between dependency graphs and head structures 38
  • 39. Categorial grammar combinatory categorial grammar (CCG) two types of arguments: functors and arguments, that can be combined For example, a determiner can be thought of as a function that applies to an N on its right to produce an NP 39
  • 40. 40
  • 41. Spoken language syntax Utterance is the unit for spoken language (fala) Fragments are common: pauses, non-verbal sounds, partial words The subject of a sentence is often a pronoun Disfluencies words uh and um word repetitions restarts word fragments: wha-, betwee- 41
  • 42. Spoken language syntax Repair, reparandum, interruption point Edit terms: you know, I mean, uh, um Find the interruption point, delete the reparandum, replace with repair 42
  • 43. Treebanks of Spoken Language Augmented notation with disfluencies and other phenomena 43
  • 44. Grammars and Human processing Do people use context-free grammars in their mental processing of language? It has proved very difficult to find clear-cut evidence that they do. Constituent: perceptual unit constituents are often semantic units as well as syntactic units It is necessary to find evidence for a constituent which is not a semantic unit ditransitive alternation: change two arguments of a verb The wealthy widow gave [NP the church] [NP her Mercedes]. The wealthy widow gave [NP her Mercedes] [PP to the church]. 44
  • 45. Summary In many languages, groups of consecutive words act as a group or a constituent, which can be modeled by context-free grammars (also known as phrase-structure grammars). A context-free grammar consists of a set of rules or productions, expressed over a set of non- terminal symbols and a set of terminal symbols. Formally, a particular context-free language is the set of strings which can be derived from a particular context-free grammar. A generative grammar is a traditional name in linguistics for a formal language which is used to model the grammar of a natural language. There are many sentence-level grammatical constructions in English; declarative, imperative, yes-no- question, and wh-question are four very common types, which can be modeled with context-free rules. An English noun phrase can have determiners, numbers, quantifiers, and adjective phrases preceding the head noun, which can be followed by a number of postmodifiers; gerundive VPs, infinitives VPs, and past participial VPs are common possibilities. 45
  • 46. Summary Subjects in English agree with the main verb in person and number. Verbs can be subcategorized by the types of complements they expect. Simple subcategories are transitive and intransitive; most grammars include many more categories than these. The correlate of sentences in spoken language are generally called utterances. Utterances may be disfluent, containing filled pauses like um and uh, restarts, and repairs. Treebanks of parsed sentences exist for many genres of English and for many languages. Treebanks can be searched using tree-search tools. Any context-free grammar can be converted to Chomsky normal form, in which the right-hand-side of each rule has either two non-terminals or a single terminal. Context-free grammars are more powerful than finite-state automata, but it is nonetheless possible to approximate a context-free grammar with a FSA. There is some evidence that constituency plays a role in the human processing of language. 46