OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
COMPILER DESIGN
1. Compiler Design
Swarnalatha Prathipati
Assistant Professor
Department of CSE
GITAM Institute of Technology (GIT)
Visakhapatnam – 530045
Email: sprathip2@gitam.edu
Ph no:7893210891
2/15/2023
Department of Computer Science & Engineering, GIT
1
3. Syllabus
Syntax Analysis (Part-I):
Introduction
Context free grammars
Top down parsing : Brute force parsing, recursive descent parsing,
predictive parsing, error recovery in predictive parsing
Bottom up parsing, shift reduce parsing, operator precedence parsing
Error recovery in operator precedence parsing.
2/15/2023
Department of Computer Science & Engineering, GIT
3
4. Learning Outcomes:
Explore both Top-Down and Bottom-Up Parsing techniques
Prescribed Textbook:
Compilers Principles, Techniques and Tools- Alfred.V. Aho, J.D.Ullman, Ravi Sethi; 2nd
Edition, Pearson Education
2/15/2023
Department of Computer Science & Engineering, GIT
4
5. Introduction
The syntax analyzer basically checks for the syntax of the language.
The syntax analyzer takes the tokens from lexical analyzer and groups them
in such a way that some programming structure can be recognized.
After grouping the tokens if at all any syntax cannot recognized, the syntactic
error will be generated.
A parsing or syntax analysis is a process which takes the input string W and
produces either a parse tree or generates the syntactic errors.
The syntax analyzer takes the tokens as input and generates a tree like
structure called parse tree.
2/15/2023
Department of Computer Science & Engineering, GIT
5
6. Role of Parser
The parser obtains a string of tokens from the lexical analyzer and verifies
that the string can be generated by the grammar for the source language
The parser returns any syntax error for the source language
2/15/2023
Department of Computer Science & Engineering, GIT
6
7. Basic Issues in Parsing
There are two important issues in parsing
1) Specifications of syntax
2) Representation of input after parsing
2/15/2023
Department of Computer Science & Engineering, GIT
7
8. Specification of syntax
To write any programming statements there are certain characteristics of
specification of syntax.
This specification should be precise and unambiguous
This specification should be in detail(cover all the details of programming
language)
This specification should be complete
Such a specification is called Context free grammar
2/15/2023
Department of Computer Science & Engineering, GIT
8
9. Representation of the input
after parsing
This is important because all the subsequent phases of compiler take the
information from the parse tree being generated.
This is important because the information suggested by any input
programming statement should not be differed after building the syntax tree
for it.
2/15/2023
Department of Computer Science & Engineering, GIT
9
10. Grammars
A grammar is a collection of production rules used to generate set of strings
A grammar is denoted by symbol G
Grammar is defined by 4 tuples G= (V,T,P,S)
where V-set of Variables
T-set of terminal
P-set of production rules
S-start symbol
2/15/2023
Department of Computer Science & Engineering, GIT
10
11. Context free Grammars
Languages defined by Type2 grammars are accepted by Push down automata.
The Productions are in the form of α->β
such that | α|=1
where α ∈V and β ∈ (VUT)*
Example: S->abc
A->bA | ε
B->Bb
2/15/2023
Department of Computer Science & Engineering, GIT
11
12. Derivations
The sequence of substitutions used to obtain a string is called a derivation
Derivation produces a new string from a given string
If a string is obtained as a result of the derivation contains only terminal
symbols.
2/15/2023
Department of Computer Science & Engineering, GIT
12
13. Types of Derivations
There are two types:
1.Left most derivation
2.Right most derivation
2/15/2023
Department of Computer Science & Engineering, GIT
13
14. Left most derivation
If at each step of derivation a production is applied to the leftmost variable
then it is called left most derivation.
Example: E->E+E|E*E|id then generate the string id+id*id using LMD
E->E+E
->id+E
->id+E*E
->id+id*E
->id+id*id
2/15/2023
Department of Computer Science & Engineering, GIT
14
15. Right most derivation
If at each step of derivation a production is applied to the right most variable then
it is called right most derivation.
Example: E->E+E|E*E|id then generate the string id+id*id using RMD
E->E+E
->E+E*E
->E+E*id
->E+id*id
->id+id*id
2/15/2023
Department of Computer Science & Engineering, GIT
15
16. Parse Tree
Parse tree is the hierarchical representation of terminals or non-terminals.
The starting symbol of the grammar must be used as the root of the Parse Tree.
Leaves of parse tree represent terminals.
Each interior node represents productions of grammar.
All leaf nodes need to be terminals.
All interior nodes need to be non-terminals.
In-order traversal gives original input string.(Left,Root,Right)
2/15/2023
Department of Computer Science & Engineering, GIT
16
17. Example
Construct a parse tree for the given CFG given below to derive the string
“acbd”
S -> AB
A -> c/aA
B -> d/bB
2/15/2023
Department of Computer Science & Engineering, GIT
17
18. Left most & Right most derivation
trees
2/15/2023
Department of Computer Science & Engineering, GIT
18
19. Examples
1.Consider the grammar given below
E->E+E|E*E|E-E|E/E|a|b find LMD and RMD to obtain the string a+b*a+b
2. Consider the grammar given below
S->0A|1B|0|1
A->0S|1B|1
B->0A|1S
construct LMD and parse tree for the following sentences
i) 0101 ii)1100101
2/15/2023
Department of Computer Science & Engineering, GIT
19
20. 3. Let G be a grammar contain set of production rules
S->aAS|a
A->SbA|SS|ba
find LMD and RMD to obtain the string aabbaa
4. Let G be a grammar contain set of production rules
S->aB|bA
A->a|aS|bAA
B->b|bS|aBB
find LMD and RMD to obtain the string aabbabab
2/15/2023
Department of Computer Science & Engineering, GIT
20
21. Ambiguous Grammars
A Grammar G is said to be ambiguous grammar if there exist two or more left most
derivations or two or more right most derivations.
Example: E->E+E|E*E|id then generate the string id+id*id using LMD
LMD 1: E->E+E LMD 2: E->E*E
->id+E ->E+E*E
->id+E*E ->id+E*E
->id+id*E ->id+id*E
->id+id*id ->id+id*id
2/15/2023
Department of Computer Science & Engineering, GIT
21
22. RMD 1: E->E+E RMD 2: E->E*E
->E+E*E ->E*id
->E+E*id ->E+E*id
->E+id*id ->E+id*id
->id+id*id ->id+id*id
Therefore it is an ambiguous grammar
2/15/2023
Department of Computer Science & Engineering, GIT
22
23. Practice problems
1. Show that the following grammar is ambiguous by consider the string “abab”
S->aSbS
S->bSaS
S-> ε
2. Show that the following grammar is ambiguous by consider the string “aab”
S->AB
B->ab
A->aa
A->a
B->b
2/15/2023
Department of Computer Science & Engineering, GIT
23
25. Parsing techniques work on the following principles
1 . The parser scan the input string from left to right and identifies that the
derivation is leftmost or rightmost
2. The parser makes use of production rules for choosing the appropriate
derivation
The different parsing techniques use different approaches in selecting the
appropriate rules for derivation and finally parse tree is constructed
2/15/2023
Department of Computer Science & Engineering, GIT
25
26. Top down Parsing
A parse tree can be constructed from root and expanded to leaves then such type of
parser is called top down parser. It is generated from top to bottom.
The derivation terminates when the required input string terminates.
Top down parsing can be viewed as finding a leftmost derivation for an input string.
The main task of top down parsing is find the appropriate production rule in order to
produce the correct input string.
In top down parsing selection of proper rule is very important task. This selection is
based on trail and error techniques
2/15/2023
Department of Computer Science & Engineering, GIT
26
27. Problems with Top down
Parsing
There are certain problems in top down parsing. In order to implement the
parsing we need to eliminate these problems.
1. Back tracking
2. Left recursion
3. Left factoring
4. Ambiguity
2/15/2023
Department of Computer Science & Engineering, GIT
27
28. Backtracking
Backtracking is a technique in which for expansion of non- terminal symbol
we choose one alternative and if some mismatch occurs then we try another
alternative if any
If for a non terminal there are multiple production rules beginning with the
same input symbol then get the correct derivation we need to try all these
alternatives.
In backtracking we need to move some levels upward in order to check
possibilities. This increases of overhead in implementation of parsing.
It is necessary to eliminate backtracking by modifying the grammar.
2/15/2023
Department of Computer Science & Engineering, GIT
28
30. Left Recursion
Grammar of the form A->A α| β is called left recursive grammar. To
eliminate left recursion rewrite grammar as
A-> β AI
AI ->αAI|ε
In a given production the L.H.S is equal to first symbol of R.H.S the
production contains left recursion i.e., A->A α| β
2/15/2023
Department of Computer Science & Engineering, GIT
30
31. Problems on elimination of
left recursion
Eliminate the left recursion from the following grammar
1. E->E+T|T
2. T->T*F|F
3. B->Be|b
4. A->Abd|Aa|a
2/15/2023
Department of Computer Science & Engineering, GIT
31
33. Left factoring
If the grammar is left factored then it becomes suitable for the use. Basically
left factoring is used when it is not clear that which of the two alternatives is
used to expand the non terminal.
A-> α β 1 | α β 2
To eliminate left factoring we will write the grammar as
A-> α AI
AI -> β 1| β 2
2/15/2023
Department of Computer Science & Engineering, GIT
33
34. Problems on left factoring
Eliminate left factoring from the below grammar
S->iEtS|iEtSeS|a
E->b
After eliminating left factoring
S-> iEtSSI |a
SI -> ε|eS
Do left factoring in the following grammar.
A->aAB|aA|a
B->bB|b
2/15/2023
Department of Computer Science & Engineering, GIT
34
35. Ambiguity
A grammar G is said to be ambiguous if there exist two or more derivation trees for the given input
string.(either leftmost or rightmost)
If the grammar has ambiguity then it is not good for a compiler construction. No method can
automatically detect and remove the ambiguity but you can remove ambiguity by re-writing the whole
grammar without ambiguity.
S -> aSb | SS
S -> ∈
For the string aabb, the above grammar generates two parse trees:
2/15/2023
Department of Computer Science & Engineering, GIT
35
36. Removing Ambiguity By Precedence &
Associatively Rules-
2/15/2023
Department of Computer Science & Engineering, GIT
36
An ambiguous grammar may be converted into an unambiguous grammar by
implementing-
Precedence Constraints
Associatively Constraints
37. Precedence Constraints
The precedence constraint is implemented using the following rules-
The level at which the production is present defines the priority of the
operator contained in it.
The higher the level of the production, the lower the priority of operator.
The lower the level of the production, the higher the priority of operator.
2/15/2023
Department of Computer Science & Engineering, GIT
37
38. Associatively Constraints
The associatively constraint is implemented using the following rules-
If the operator is left associative, induce left recursion in its production.
If the operator is right associative, induce right recursion in its production.
2/15/2023
Department of Computer Science & Engineering, GIT
38
40. Example
Convert the following ambiguous grammar into unambiguous grammar-
R → R + R / R . R / R* / a / b
where * is kleen closure and . is concatenation.
Solution:
To convert the given grammar into its corresponding unambiguous grammar, we
implement the precedence and associativity constraints.
We have
Given grammar consists of the following operators- + , . , *
Given grammar consists of the following operands- a , b
2/15/2023
Department of Computer Science & Engineering, GIT
40
41. The priority order is- (a , b) > * > . > +
where-
. operator is left associative
+ operator is left associative
Using the precedence and associatively rules, we write the corresponding
unambiguous grammar as-
E → E + T / T
T → T . F / F
F → F* / G
G → a / b
2/15/2023
Department of Computer Science & Engineering, GIT
41
42. OR
Unambiguous Grammar
E → E + T / T
T → T . F / F
F → F* / a / b
2/15/2023
Department of Computer Science & Engineering, GIT
42
43. There are two types in top down parsing
1. Back tracking
2. Predictive parsing
Predictive parsing is two types
1.Recusive descent Parser
2. LL(1) Parser
2/15/2023
Department of Computer Science & Engineering, GIT
43
44. Recursive Descent Parser
A Parser that uses collection of recursive procedures for parsing the given input
string is called “Recursive Descent parser”
CFG is used to build the recursive routines
The R.H.S of the production rule is directly converted to a program.
2/15/2023
Department of Computer Science & Engineering, GIT
44
45. Procedure
If the input symbol is non terminal then a call to the procedure corresponding to the
non terminal is made
If the input symbol is terminal then it is matched with the look ahead from input .
The look ahead pointer has to be advanced on matching of the input symbol
If the production rule has many alternatives then all these alternatives has to be
combined into a single body of procedure
The parser should be activated by a procedure corresponding to the start symbol
2/15/2023
Department of Computer Science & Engineering, GIT
45
50. Advantages & Limitations of
Recursive Descent Parser
Advantages
Recursive descent parser are simple to build
It can be constructed with the help of parse tree
Limitations
It is not very efficient as compared to other parsing techniques
There are chances that the program for recursive descent parser may enter in to an
infinite loop for some input.
It cannot provide good error messaging
It is difficult to parse the string if look ahead symbol is arbitrarily long
2/15/2023
Department of Computer Science & Engineering, GIT
50
52. LL(1)- The first L means the input is scanned for left to right
- The second L means it uses leftmost derivation for input string
- number 1 in the input symbol means it uses only one input symbol to
predict the parsing process.
INPUT: Contains string to be parsed with $ as it's end marker
STACK: Contains sequence of grammar symbols with $ as it's bottom marker. Initially
stack contains only $
PARSING TABLE: A two dimensional array M[A,a], where A is a non-terminal and a is a
Terminal
2/15/2023
Department of Computer Science & Engineering, GIT
52
53. As shown the parser program works with the following 3 components to produce
output
INPUT: Contains string to be parsed with $ as it's end marker
STACK: Contains sequence of grammar symbols with $ as it's bottom marker.
Initially stack contains only $
PARSING TABLE: A two dimensional array M[A,a], where A is a non-terminal and a is
a Terminal
2/15/2023
Department of Computer Science & Engineering, GIT
53
54. Procedure for constructing
LL(1) Parser
1. Computation of FIRST and FOLLOW functions
2. Construct the predictive parsing table using FIRST and FOLLOW functions
3. Parse the input string with the help of predictive parsing table
2/15/2023
Department of Computer Science & Engineering, GIT
54
55. Rules used to compute FIRST
function
If the terminal symbol ‘a’ the FIRST(a) = {a}
If there is a rule X-> ε then FIRST(X) = {ε}
For the rule A-> X1 X2 X3......XK
FIRST(A) = (FIRST( X1 )U FIRST( X2)U FIRST( X3 )……U FIRST( Xk )
2/15/2023
Department of Computer Science & Engineering, GIT
55
60. Rules used to compute FOLLOW
function
FOLLOW(A) is defined as the set of terminal symbols that appear immediately to right
of A
FOLLOW(A) = {a|S=>αAaβ} where α and β are some grammar symbols may be
terminal or non terminal.
1. For the start symbol S place $ in follow(S)
2. If there is a production A-> αBβ then every thing in FIRST(β) without ε is
to be placed in FOLLOW(B) where β is a non terminal
3. If there is a production A-> αBβ or A-> αB and FIRST(β) ={ε } then
FOLLOW(A)=FOLLOW(B) or FOLLOW(B)=FOLLOW(A) that means
everything in FOLLOW(A) is in FOLLOW(B)
2/15/2023
Department of Computer Science & Engineering, GIT
60
70. Algorithm for predictive parsing
table:
For the rule A->α of grammar G
For each a in FIRST(α) create entry M[A,a]=A->α where a is terminal symbol
For ε in FIRST(α) create entry M[A,b]=A->α where b is the symbol from
FOLLOW(A)
If ε is in FIRST(α) and $ is in FOLLOW(A) then create entry in the table
M[A,$]=A-> α
All the remaining entries in the table M are marked as SYNTAX ERROR.
2/15/2023
Department of Computer Science & Engineering, GIT
70
77. Bottom-up Parsers
In bottom-up parser method ,the input string is taken first and we try to reduce
this string with the help of grammar and try to obtain the start symbol.
The parse tree is constructed from bottom to up that is from leaves to root.
The bottom-up parse tree is created starting from leaves, the leaf nodes together
are reduced further to internal nodes, these internal nodes are further reduced
and eventually a root node is obtained.
In this process, basically parser tries to identify R.H.S of production rule and
replace it by corresponding L.H.S. this activity is called reduction.
The sentential forms that are produced in the reduction process should trace out
rightmost derivation reverse.
2/15/2023
Department of Computer Science & Engineering, GIT
77
79. Handle Pruning
Handle :
It is a substring of string that matches the right side of the production and we can
reduce such string by a non-terminal on left hand side production.
Handle Pruning :
A process of detecting handles and using them in reduction is called handle
pruning.
2/15/2023
Department of Computer Science & Engineering, GIT
79
80. Example
Consider the grammar
E-> E+E| id and derive the string “id+id+id” using right most derivation.
-> E
-> E + E
->E + E + E
->E + E + id
->E + id + id
-> id + id + id
2/15/2023
Department of Computer Science & Engineering, GIT
80
Right sentential
form
Handle Production
id + id + id id E->id
E + id + id id E->id
E + E + id id E->id
E + E + E E + E E-> E + E
E + E E + E E-> E + E
E
81. Shift Reduce Parser
Shift reduce parser attempts to construct parse tree from leaves to root.
A shift reduce parser requires following data structures.
1. The input buffer storing the input string.
2. A stack for storing and accessing the L.H.S and R.H.S of rules.
2/15/2023
Department of Computer Science & Engineering, GIT
81
82. The parser performs following basic operations.
1. Shift: Moving of the symbols from input buffer on to the stack.
2. Reduce: If the handle appears on the top of the stack then reduce of it by
appropriate rule is done .That means R.H.S of rule is popped of and L.H.S is pushed
ion to the stack.
3. Accept : If the stack contains start symbol only and input buffer is empty at the
same time then the parser accept the string .
4. Error : A situation in which parser cannot either shift or reduce the symbols. It
cannot perform even the accept action is called as error.
2/15/2023
Department of Computer Science & Engineering, GIT
82
83. Example on SRP
Stack Input buffer Parsing Action
$ id-id*id $ Shift
id $ -id*id $ Reduce by E-> id
E$ -id*id $ Shift
-E$ id*id $ Shift
id –E $ *id $ Reduce by E-> id
E – E $ * id $ Shift
* E – E $ id $ Shift
83
84. Stack Input buffer Parsing Action
id * E – E $ $ Reduce by E-> id
E * E – E $ $ Reduce by E-> E * E
E – E $ $ Reduce by E-> E - E
E $ $ Accept
2/15/2023
Department of Computer Science & Engineering, GIT
84
85. Operator Precedence Parser
A Grammar G is said to be operator precedence if it posses following properties
1. No production on the right side is ε
2. There should not be any production rule possessing two adjacent non-terminals
at the right hand side.
Example: E-> EAE |( E ) |- E| id
A-> + | - |/ |^|*
This is not a operator precedence grammar.
Because production E - >EAE contains two consecutive non terminals. we
will convert it in to equivalent operator precedence grammar by
removing A.
E –> E + E | E-E | E*E | E/E | E^E
E- > ( E) | -E |id
2/15/2023
Department of Computer Science & Engineering, GIT
85
86. Simple operator precedence Parser
In operator precedence parsing we first define three disjoint precedence relations
between every pair of terminals and construct the operator precedence table.
a <. b if b has higher precedence than a
a = b if b has same precedence as a
a.>b if b has lower precedence than a
Rules to determine precedence relations:
The determination of correct precedence relations between terminal are based on the
traditional notations of associatively and precedence of operations.
id has higher precedence than any other symbol
$ has lowest precedence
If two operators have equal precedence then we check the associatively of that particular
operator.
2/15/2023
Department of Computer Science & Engineering, GIT
86
87. Rules to parsing the string
Step1: Insert
* $ symbol at the start and at the end of input string
* Precedence operation in between every two symbols of the string by
referring (<. Id .>)
the designed precedence table.
Step2: Start scanning the string from left until seeing .> and put a pointer on its location
Now scan backward the string from right to left until seeing <. Everything
between the two relations <. and .> form the handle. Replace handle with
the head of the respective production
Step3: Repeat this step until reaching start symbol.
2/15/2023
Department of Computer Science & Engineering, GIT
87
88. Example
Construct operator precedence parser for the following grammar.
E -> EAE |id
A-> +| *
parse the following string id + id * id
2/15/2023
Department of Computer Science & Engineering, GIT
88
89. Advantages and disadvantages of
simple operator precedence
Parsing
Advantages:
This type of parsing is simple to implement.
Disadvantages:
The operator like minus has two different precedence (unary and binary).Hence it is
hard to handle tokens like minus sign.
This kind of parsing is applicable to only small class of grammars.
2/15/2023
Department of Computer Science & Engineering, GIT
89
90. Operator Precedence Parsing
For construction of operator precedence parsing we have to follow the following steps:
1. Computation of Leading and Trailing symbols
2. Construct the operator precedence table using leading and trailing functions.
3. Parse the input string with the help of operator precedence table
2/15/2023
Department of Computer Science & Engineering, GIT
90
91. Leading function rules:
Rule-1:
If the production rule is in the form of A->YaB and the production start with a single
non terminal then we have to take next terminal as lead of A.
If the production start with the terminal in the R.H.S we can take terminal directly.
Rule-2:
If the production rule is in the form of A->B means in the R.H.S single non-terminal is
there then we have to write whatever lead of B is there add to lead of A
2/15/2023
Department of Computer Science & Engineering, GIT
91
92. Trailing Function rules
Rule-1
If the production rule is A->YaB means the production ending with a single no –terminal
in the R.H.S then we consider the previous symbol as trail of A
If the production end with the terminal in the R.H.S then we consider terminal as trail
of a directly
Rule-2
If the production rule is A->B means in the R.H.S a single non-terminal is there then we
have to write whatever trail of B is there we have to add to trail of A.
2/15/2023
Department of Computer Science & Engineering, GIT
92
93. Example
Consider the following grammar and find leading and trailing functions.
E- >E+T
E->T
T->T*F
T->F
F-> (E) |id
2/15/2023
Department of Computer Science & Engineering, GIT
93