Context Free Grammars

        Ronak Thakkar
          Roll no 32
    M.Sc. Computer Science
What are Context Free Grammars?
In Formal Language Theory , a Context free Grammar(CFG)
  is a formal grammar in which every production rule is of the
  form

  V      w
Where V is a single nonterminal symbol and w is a string of
 terminals and/or nonterminals (w can be empty)

The languages generated by context free grammars are
  knows as the context free languages
What does CFG do?
A CFG provides a simple and mathematically precise
  mechanism for describing the methods by which phrases in
  some natural language are built from smaller blocks,
  capturing the “block structure” of sentences in a natural way.

Important features of natural language syntax such as
  agreement and reference is are not the part of context free
  grammar , but the basic recursive structure of sentences , the
  way in which clauses nest inside other clauses, and the way in
  which list of adjectives and adverbs are swallowed by nouns
  and verbs is described exactly.
Formal Definition of CFG
A context-free grammar G is a 4-tuple (V, ∑, R, S), where:
  V is a finite set; each element v ∈ V  is called a non-terminal character or
  a variable.
 ∑ is a finite set of terminals, disjoint from , which make up the actual
  content of the sentence.
 R is a finite relation from V to (V U ∑)* , where the asterisk
  represents the Kleene star operation.


       If (α,β) ∈ R, we write production α  β
       β is called a sentential form



•    S, the start symbol, used to represent the whole sentence (or
    program). It must be an element of  V.
Production rule notation

A production rule in R is formalized mathematically as a pair
  (α,β) , where α is a non-terminal and β is a string of
 variables and nonterminals; rather than using ordered pair
 notation, production rules are usually written using an arrow
 operator with α as its left hand side and β as its right hand
 side: α  β.
It is allowed for β to be the empty string, and in this case it is
 customary to denote it by ε. The form α  ε is called an ε-
 production.
Context-Free Languages
•Given a context-free grammar
G = (V,∑,R, S), the language generated or derived from
G is the set
L(G) = {w :S ⇒* w}

A language L is context-free if there is a context-free
grammar G = (V,∑, R, S), such that L is generated from G.
Example :Well-formed
parentheses
The canonical example of a context free grammar is
 parenthesis matching, which is representative of the general
 case. There are two terminal symbols "(" and ")" and one
 nonterminal symbol S. The production rules are
S → S
SS → (S)
S → ()
The first rule allows Ss to multiply; the second rule allows Ss
 to become enclosed by matching parentheses; and the third
 rule terminates the recursion.
Parse Tree
A parse tree of a derivation is a tree in which:


  • Each internal node is labeled with a nonterminal


  • If a rule A  A1A2…An occurs in the derivation then A is
    a parent node of nodes labeled A1, A2, …, An

                     S
                a     S
                      a   S
                               b
                          S
                          e
Leftmost, Rightmost Derivations
A left-most derivation of a sentential form is one in
  which rules transforming the left-most nonterminal are
  always applied



A right-most derivation of a sentential form is one in
  which rules transforming the right-most nonterminal
  are always applied
Ambiguous Grammar
. A grammar G is ambiguous if there is a word w ∈
  L(G) having are least two different parse trees
                   SA
                   SB
                   S  AB
                   A  aA
                   B  bB
                   Ae
                   Be



Notice that a has at least two left-most derivations
Ambiguity & Disambiguation
Given an ambiguous grammar, would like an equivalent
 unambiguous grammar.
  Allows you to know more about structure of a given
   derivation.
  Simplifies inductive proofs on derivations.
  Can lead to more efficient parsing algorithms.
  In programming languages, want to impose a canonical
   structure on derivations. E.g., for 1+2×3.

Strategy: Force an ordering on all derivations.
CFG Simplification
Can’t always eliminate ambiguity.

But, CFG simplification & restriction still useful
  theoretically & pragmatically.
  Simpler grammars are easier to understand.
  Simpler grammars can lead to faster parsing.
  Restricted forms useful for some parsing algorithms.
  Restricted forms can give you more knowledge about
    derivations.

Context free grammars

  • 1.
    Context Free Grammars Ronak Thakkar Roll no 32 M.Sc. Computer Science
  • 2.
    What are ContextFree Grammars? In Formal Language Theory , a Context free Grammar(CFG) is a formal grammar in which every production rule is of the form V w Where V is a single nonterminal symbol and w is a string of terminals and/or nonterminals (w can be empty) The languages generated by context free grammars are knows as the context free languages
  • 3.
    What does CFGdo? A CFG provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the “block structure” of sentences in a natural way. Important features of natural language syntax such as agreement and reference is are not the part of context free grammar , but the basic recursive structure of sentences , the way in which clauses nest inside other clauses, and the way in which list of adjectives and adverbs are swallowed by nouns and verbs is described exactly.
  • 4.
    Formal Definition ofCFG A context-free grammar G is a 4-tuple (V, ∑, R, S), where:   V is a finite set; each element v ∈ V  is called a non-terminal character or a variable.  ∑ is a finite set of terminals, disjoint from , which make up the actual content of the sentence.  R is a finite relation from V to (V U ∑)* , where the asterisk represents the Kleene star operation. If (α,β) ∈ R, we write production α  β β is called a sentential form • S, the start symbol, used to represent the whole sentence (or program). It must be an element of  V.
  • 5.
    Production rule notation A productionrule in R is formalized mathematically as a pair (α,β) , where α is a non-terminal and β is a string of variables and nonterminals; rather than using ordered pair notation, production rules are usually written using an arrow operator with α as its left hand side and β as its right hand side: α  β. It is allowed for β to be the empty string, and in this case it is customary to denote it by ε. The form α  ε is called an ε- production.
  • 6.
    Context-Free Languages •Given acontext-free grammar G = (V,∑,R, S), the language generated or derived from G is the set L(G) = {w :S ⇒* w} A language L is context-free if there is a context-free grammar G = (V,∑, R, S), such that L is generated from G.
  • 7.
    Example :Well-formed parentheses The canonicalexample of a context free grammar is parenthesis matching, which is representative of the general case. There are two terminal symbols "(" and ")" and one nonterminal symbol S. The production rules are S → S SS → (S) S → () The first rule allows Ss to multiply; the second rule allows Ss to become enclosed by matching parentheses; and the third rule terminates the recursion.
  • 8.
    Parse Tree A parsetree of a derivation is a tree in which: • Each internal node is labeled with a nonterminal • If a rule A  A1A2…An occurs in the derivation then A is a parent node of nodes labeled A1, A2, …, An S a S a S b S e
  • 9.
    Leftmost, Rightmost Derivations Aleft-most derivation of a sentential form is one in which rules transforming the left-most nonterminal are always applied A right-most derivation of a sentential form is one in which rules transforming the right-most nonterminal are always applied
  • 10.
    Ambiguous Grammar . Agrammar G is ambiguous if there is a word w ∈ L(G) having are least two different parse trees SA SB S  AB A  aA B  bB Ae Be Notice that a has at least two left-most derivations
  • 11.
    Ambiguity & Disambiguation Givenan ambiguous grammar, would like an equivalent unambiguous grammar. Allows you to know more about structure of a given derivation. Simplifies inductive proofs on derivations. Can lead to more efficient parsing algorithms. In programming languages, want to impose a canonical structure on derivations. E.g., for 1+2×3. Strategy: Force an ordering on all derivations.
  • 12.
    CFG Simplification Can’t alwayseliminate ambiguity. But, CFG simplification & restriction still useful theoretically & pragmatically. Simpler grammars are easier to understand. Simpler grammars can lead to faster parsing. Restricted forms useful for some parsing algorithms. Restricted forms can give you more knowledge about derivations.