Context-Free Grammar (CFG)
â
Acontext-free grammar (CFG) G is a quadruple
(V, ÎŁ, R, S) where
â V: a set of non-terminal symbols
â ÎŁ: a set of terminals (V ⊠Σ = Çž)
â R: a set of rules (R: V â (V U ÎŁ)*)
â S: a start symbol.
6.
CFG - Example
â
V= {q, f,}
â
ÎŁ = {0, 1}
â
R = {q â 11q, q â 00f,
â
f â 11f, f â Îľ }
â
S = q
â
(R= {q â 11q | 00f, f â 11f | Îľ })
7.
CFG - Rules
â
IfA â B, then xAy â xBy and we say that
â
xAy derivates xBy.
â
If s â ¡¡¡ â t, then we write s * t.
â
A string x in ÎŁ* is generated by G=(V,ÎŁ,R,S)
if S * x.
â
L(G) = { x in ÎŁ* | S * x}.
8.
CFG - Example
â
G= ({S}, {0,1}. {S â 0S1 | Îľ }, S)
â
Îľ in L(G) because
S â Îľ .
â
01 in L(G) because
S â 0S1 â 01.
â
0011 in L(G) because
S â 0S1 â 00S11 â 0011.
â
L(G) = {0n
1n
| n > 0}
9.
Context-free Language (CFL)
â
Alanguage L is context-free if there exists a
CFG G such that L = L(G).
â
A grammar G generates a language L
10.
Example
G = (V,T, S, P)
V = {S, NP, VP, PP, Det, Noun,
Verb, Aux, Pre }
T = {âaâ, âateâ, âcakeâ, âchildâ,
âforkâ, âtheâ, âwithâ}
S = S
P = { S â NP VP
NP â Det Noun | NP PP
PP â Pre NP
VP â Verb NP
Det â âaâ | âtheâ
Noun â âcakeâ | âchildâ | âforkâ
Pre â âwithâ
Verb â âateâ}
11.
Example
â
Some notes:
â Note1: In P, pipe symbol (|) is used to combine productions
into single representation for productions that have same
LHS.
â
For example, Det â âaâ | âtheâ derived from two rules Det â âaâ and
Det â âtheâ. Yet it denotes two rules not one.
â Note 2: The production highlighted in red are referred as
grammar, and green are referred as lexicon.
â Note 3:
â
NP â Noun Phrase, VP â Verb Phrase, PP â Prepositional Phrase,
Det â Determiner, Aux â Auxiliary verb
12.
Sample derivation
â
S âNP VP
â Det Noun VP
â the Noun VP
â the child VP
â the child Verb NP
â the child ate NP
â the child ate Det Noun
â the child ate a Noun
â the child ate a cake
P = { S â NP VP
NP â Det Noun | NP PP
PP â Pre NP
VP â Verb NP
Det â âaâ | âtheâ
Noun â âcakeâ | âchildâ | âforkâ
Pre â âwithâ
Verb â âateâ}
13.
Probabilistic Context FreeGrammar (PCFG)
â
PCFG is an extension of CFG with a probability for
each production rule
â
Ambiguity is the reason why we are using probabilistic
version of CFG
â For instance, some sentences may have more than one underlying derivation.
â The sentence can be parsed in more than one ways.
â In this case, the parse of the sentence become ambiguous.
â
To eliminate this ambiguity, we can use PCFG to find
the probability of each parse of the given sentence
14.
PCFG - Definition
â
Aprobabilistic context free grammar G is a quintuple G = (N, T, S, R, P)
where
â (N, T, S, R) is a context free grammar
where N is set of non-terminal (variable) symbols, T is set of terminal symbols, S is
the start symbol and R is the set of production rules where each rule of the form A â
S
â A probability P(A â s) for each rule in R. The properties governing the
probability are as follows;
â
P(A â s) is a conditional probability of choosing a rule A â s in a left-most derivation,
given that A is the non-terminal that is expanded.
â
The value for each probability lies between 0 and 1.
â
The sum of all probabilities of rules with A as the left hand side non-terminal should be
equal to 1.
15.
PCFG - Example
â
ProbabilisticContext Free Grammar G = (N, T, S, R, P)
N = {S, NP, VP, PP, Det, Noun, Verb, Pre}
T = {âaâ, âateâ, âcakeâ, âchildâ, âforkâ, âtheâ, âwithâ}
S = S
R = { S â NP VP
NP â Det Noun | NP PP
PP â Pre NP
VP â Verb NP
Det â âaâ | âtheâ
Noun â âcakeâ | âchildâ | âforkâ
Pre â âwithâ
Verb â âateâ }
16.
PCFG - Example
â
P= R with associated probability
as in the table below
Please observe from the table, the sum of probability values for all rules that have same left hand
side is 1
R = {
S â NP VP
NP â Det Noun | NP PP
PP â Pre NP
VP â Verb NP
Det â âaâ | âtheâ
Noun â âcakeâ | âchildâ | âforkâ
Pre â âwithâ
Verb â âateâ }
17.
Parse
â
resolve (a sentence)into its component parts
and describe their syntactic roles.
â
On NLP â Parsing can be visualized in the tree
form
â
A parse ofthe sentence "the giraffe dreams" is:
â s => np vp => det n vp => the n vp => the giraffe vp
=> the giraffe iv => the giraffe dreams
20.
Probability of aparse tree
â
Use of PCFG
â
A sentence can be parsed into more than one
way
â
We can have more than one parse trees for the
sentence as per the CFG due to ambiguity.
21.
Probability of aparse tree
â
Given a parse tree t, with the production rules Îą1 â
β1, Îą2 â β2, ⌠, Îąn â βn from R (ie., Îąi â βi R),
â
we can find the probability of tree t using PCFG as
follows;
â
As per the equation, the probability P(t) of parse tree is
the product of probabilities of production rules in the
tree t.
22.
Probability of aparse tree
â
Which is the most probable tree?
â The probability of the parse tree t1 is greater than
the probability of parse tree t2. Hence, t1 is the more
probable of the two parses.
23.
Probability of asentence
â
Probability of a sentence is the sum of
probabilities of all parse trees that can be
derived from the sentence under PCFG
â
Probability of the sentence âastronomers saw
the stars with earsâ
Ambiguity
â
The phrase inmy pajamas can be part of the
NP headed by elephant or a part of the VP
headed by shot
26.
Ambiguity
â
Two common kindsof ambiguity are
â attachment ambiguity and
â coordination ambiguity
â
A sentence has an attachment ambiguity if a
particular constituent can be attached to
attachment ambiguity the parse tree at more than
one place
â
In coordination ambiguity phrases can be
conjoined by a conjunction like and