Module 3 - Semantic Analysis(Compiler Design).pptx

BCSE307L
Compiler Design
MODULE – 3
SEMANTIC ANALYSIS
Dr. D.Jeya Mala
Professor
School of Computer Science and Engineering (SCOPE)
VIT Chennai
Jeyamala.d@vit.ac.in

Semantic Analysis
• Semantic Analysis is the third phase of Compiler.
• Semantic Analysis makes sure that declarations and statements of
program are semantically correct.
• It is a collection of procedures which is called by parser as and
when required by grammar.
• Both syntax tree of previous phase and symbol table are used to
check the consistency of the given code.
• Type checking is an important part of semantic analysis where
compiler makes sure that each operator has matching operands.

Why Semantic Analysis?
• We have learnt how a parser constructs parse trees in the syntax
analysis phase.
• The plain parse-tree constructed in that phase is generally of no use for
a compiler, as it does not carry any information of how to evaluate the
tree.
• The productions of context-free grammar, which makes the rules of the
language, do not accommodate how to interpret them.
• For example
E → E + T
• The above CFG production has no semantic rule associated with it, and
it cannot help in making any sense of the production.

Semantics
• Semantics of a language provide meaning to its constructs, like tokens and
syntax structure.
• Semantics help interpret symbols, their types, and their relations with each
other.
• Semantic analysis judges whether the syntax structure constructed in the
source program derives any meaning or not.
• Eg.
int a = “value”;
• should not issue an error in lexical and syntax analysis phase, as it is lexically
and structurally correct, but it should generate a semantic error as the type of
the assignment differs.
• These rules are set by the grammar of the language and evaluated in
semantic analysis.

Tasks performed by Semantic Analyzer
• Scope resolution
• Type checking
• Array-bound checking
• Type mismatch
• Undeclared variable
• Reserved identifier misuse
• Multiple declaration of variable in a scope
• Accessing an out of scope variable
• Actual and formal parameter mismatch

Attribute Grammar
• Attribute grammar is a special form of context-free grammar where
some additional information (attributes) are appended to one or
more of its non-terminals in order to provide context-sensitive
information.
• Each attribute has well-defined domain of values, such as integer,
float, character, string, and expressions.
• Attribute grammar is a medium to provide semantics to the
context-free grammar and it can help specify the syntax and
semantics of a programming language.
• Attribute grammar (when viewed as a parse-tree) can pass values or
information among the nodes of a tree
• Eg. E → E + T  { E.value = E1.value + T.value}

 We associate information with a language construct by attaching attributes
to the grammar symbol(s) representing the construct. (Eg Attributes: Data type, value,
memory location etc.)
 A syntax-directed definition specifies the values of attributes by associating semantic rules
with the grammar productions.
 For example, an infix-to-postfix translator might have a production and semantic rule as
below
 PRODUCTION SEMANTIC RULE
 E  E1 + T E.code = E1.code || T.code || ‘+’
 This production has two non-terminals, E and T; the subscript in E1 distinguishes the
occurrence of E in the production body from the occurrence of E as the head.

Syntax directed definitions
CFG + semantic rules = Syntax Directed Definitions

 The semantic rule specifies that the string E.code is formed by concatenating E1code,
T:code, and the character ‘+’.
 While the rule makes it explicit that the translation of E is built up from the translations
of E1, T, and ‘+’, it may be inefficient to implement the translation directly by
manipulating strings.
 A syntax-directed translation scheme embeds program fragments called semantic
actions within production bodies, as in
E  E1 + T {print ‘+’ }
 By convention, semantic actions are enclosed within curly braces.
 (If curly braces occur as grammar symbols, we enclose them within single quotes, as in
‘f’ and ‘g’.)
 The position of a semantic action in a production body determines the order in which
the action is executed.
 In the given production the action occurs at the end, after all the grammar symbols; in
general, semantic actions may occur at any position in a production body.
Syntax directed definitions & Syntax directed Translation

Syntax directed definitions
• Syntax directed definition is a generalization of context free grammar in
which each grammar symbol has an associated set of attributes.
• The attributes can be number, type, memory location, return type etc….
• Types of attributes are:
1. Synthesized attribute
2. Inherited attribute

Syntax Directed Translation (SDT) Scheme
• The Syntax directed translation scheme is a context free grammar
• It is used to evaluate the order of semantic rules.
• In translation scheme, the semantic rules are embedded within
the right side of the production.
• The position at which the action to be executed is shown by
enclosed between braces.

Applications of SDT
• Executing Arithmetic Expression
• Conversion from Infix to Postfix
• Conversion from Infix to Prefix
• Conversion from Binary to Decimal
• Counting No. of Reductions
• Creating Syntax Tree
• Generating Intermediate Code
• Type Checking
• Storing Type information into Symbol Table.

Synthesized attributes
• Value of synthesized attribute at a node can be computed from the value of attributes
at the children of that node in the parse tree.
• A syntax directed definition that uses synthesized attribute exclusively is said to be
S-attribute definition.
• Example: Syntax directed definition of simple desk calculator
Production Semantic rules
L  En
Print (E.val)
E  E1+T E.val = E1.val + T.val
E  T E.val = T.val
T  T1*F T.val = T1.val * F.val
T  F T.val = F.val
F  (E) F.val = E.val
F  digit F.val = digit.lexval

Example: Synthesized attributes
L
E.val=19
n
+
T.val=15
E.val=15
*
F.val=3
T.val=3
digit.lexval=3
F.val=4
T.val=4
digit.lexval=4
F.val=5
digit.lexval=5
L  En
Print (E.val)
E  E1+T E.Val = E1.val + T.val
E  T E.Val = T.val
T  T1*F T.Val = T1.val * F.val
T  F T.Val = F.val
F  (E) F.Val = E.val
F  digit F.Val = digit . lexval
Annotated parse tree for 3*5+4n
parse tree showing the value
of the attributes at each node
is called Annotated parse tree
The process of computing the
attribute values at the node is
called annotating or
decorating the parse tree
String: 3*5+4n;

Exercise
Draw Annotated Parse tree for following:
1. 7+3*2n
2. (3+4)*(5+6)n

Syntax directed definition to translate arithmetic expressions from infix to prefix
notation
LE Print(E.val)
EE+T E.val=’+’ E.val T.val
EE-T E.val=’-‘ E.val T.val
ET E.val= T.val
TT*F T.val=’*’ T.val F.val
TT/F T.val=’/’ T.val F.val
TF T.val= F.val
FF^P F.val=’^’ F.val P.val
FP F.val= P.val
P(E) P.val= E.val
Pdigit P.val=digit.lexval

Inherited attribute
• An inherited value at a node in a parse tree is computed from the value
of attributes at the parent and/or siblings of the node.
• Symbol T is associated with a synthesized attribute type.
• Symbol L is associated with an inherited attribute in.
D → T L L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 , id L1.in = L.in, addtype(id.entry,L.in)
L → id addtype(id.entry,L.in)
Syntax directed definition with inherited attribute
L.in

Example: Inherited attribute
Example: Pass data types to all identifier
real id1,id2,id3
D → T L L.in = T.type
T → int T.type = integer
T → real T.type = real
L → L1 , id L1.in = L.in,
addtype(id.entry,L.in)
L → id addtype(id.entry,L.in)
D
T.type=real L
real
T L.in=real
,
L1
L.in=real id
,
L1
L.in=real id
id
id3
id2
id1
DTL
L → L1 , id
L → id

Evaluation order
• A topological sort of a directed acyclic graph is any ordering of the nodes
of the graph such that edges go from nodes earlier in the ordering to later
nodes.
• If is an edge from to then appears before in the ordering.
D
T.type=real
real
L.in=real
,
L.in=real
L.in=real
id3
id2
id1
,
1 2
3 4
5 6
7

Construction of syntax tree
• Following functions are used to create the nodes of the syntax tree.
1. Mknode (op,left,right): creates an operator node with label op and two fields
containing pointers to left and right.
2. Mkleaf (id, entry): creates an identifier node with label id and a field
containing entry, a pointer to the symbol table entry for the identifier.
3. Mkleaf (num, val): creates a number node with label num and a field
containing val, the value of the number.

Construction of syntax tree for expressions
Example: construct syntax tree
for a-4+c
P1: mkleaf(id, entry for a);
P2: mkleaf(num, 4);
P3: mknode(‘-‘,p1,p2);
P4: mkleaf(id, entry for c);
P5: mknode(‘+’,p3,p4);
id Num 4
id
-
P1 P2
P3 P4
P5 +
Entry for a
Entry for c

Bottom up evaluation of S-attributed definitions
L  En
Print (val[top])
E  E1+T val[top]=val[top-2] + val[top]
E  T
T  T1*F val[top]=val[top-2] * val[top]
T  F
F  (E) val[top]=val[top-2] - val[top]
F  digit
Input State Val Production Used
3*5n - -
*5n 3 3
*5n F 3 Fdigit
*5n T 3 TF
5n T* 3
n T*5 3,5
n T*F 3,5 Fdigit
n T 15 TT1*F
n E 15 ET
En 15
L 15 L  En
Implementation of a desk calculator
with bottom up parser
Move made by translator

L-Attributed definitions
• A syntax directed definition is L-attributed if each inherited attribute
of , , on the right side of depends only on:
1. The attributes of the symbols j-1to the left of in the production and
2. The inherited attribute of A.
• Example:
• Above syntax directed definition is not L-attributed because the inherited
attribute Q.i of the grammar symbol Q depends on the attribute R.s of
the grammar symbol to its right.
Production Semantic Rules
A LM L.i:=l(A.i)
M.i=m(L.s)
A.s=f(M.s)
A QR R.i=r(A.i)
Q.i=q(R.s)
A.s=f(Q.s)
L- Attributed
Not L- Attributed
AXYZ

Bottom up evaluation of S-attributed definitions
• Translation scheme is a context free grammar in which attributes are
associated with the grammar symbols and semantic actions enclosed
between braces { } are inserted within the right sides of productions.
• Attributes are used to evaluate the expression along the process of
parsing.
• During the process of parsing the evaluation of attribute takes place by
consulting the semantic action enclosed in { }.
• A translation scheme generates the output by executing the semantic
actions in an ordered manner.
• This process uses the depth first traversal.

Example: Translation scheme (Infix to postfix notation)
ETR
R addop R1 | 𝜖
T num
String: 9-5+2
E
T R
- R
𝜖
9 {Print(9)} T {Print(-)}
5 {Print(5)} + R
T {Print(+)}
2 {Print(2)}
Postfix=95-2+
Now, Perform Depth first traversal

Types of Errors
Errors
Compile time Run time
Lexical
Phase error
Syntactic
Phase error
Semantic
Phase error

Lexical error
• Lexical errors can be detected during lexical analysis phase.
• Typical lexical phase errors are:
1. Spelling errors
2. Exceeding length of identifier or numeric constants
3. Appearance of illegal characters
• Example:
fi ( )
{
}
• In above code 'fi' cannot be recognized as a misspelling of keyword if rather
lexical analyzer will understand that it is an identifier and will return it as
valid identifier.
• Thus misspelling causes errors in token formation.

Syntax error
• Syntax error appear during syntax analysis phase of compiler.
• Typical syntax phase errors are:
1. Errors in structure
2. Missing operators
3. Unbalanced parenthesis
• The parser demands for tokens from lexical analyzer and if the tokens do
not satisfy the grammatical rules of programming language then the
syntactical errors get raised.
• Example:
printf(“Hello World !!!”) Error: Semicolon missing

Semantic error
• Semantic error detected during semantic analysis phase.
• Typical semantic phase errors are:
1. Incompatible types of operands
2. Undeclared variable
3. Not matching of actual argument with formal argument
• Example:
id1=id2+id3*60 (Note: id1, id2, id3 are real)
(Directly we can not perform multiplication due to incompatible types of variables)

Error recovery strategies
(Ad-Hoc & systematic
methods)

Error recovery strategies (Ad-Hoc & systematic methods)
• There are mainly four error recovery strategies:
1. Panic mode
2. Phrase level recovery
3. Error production
4. Global generation

Panic mode
• In this method on discovering error, the parser discards input symbol
one at a time. This process is continued until one of a designated set of
synchronizing tokens is found.
• Synchronizing tokens are delimiters such as semicolon or end.
• These tokens indicate an end of the statement.
• If there is less number of errors in the same statement then this strategy
is best choice.
fi ( )
{
}
Scan entire line otherwise scanner will return fi as valid identifier

Phrase level recovery
• In this method, on discovering an error parser performs local correction
on remaining input.
• The local correction can be replacing comma by semicolon, deletion of
semicolons or inserting missing semicolon.
• This type of local correction is decided by compiler designer.
• This method is used in many error-repairing compilers.

Error production
• If we have good knowledge of common errors that might be encountered,
then we can augment the grammar for the corresponding language with
error productions that generate the erroneous constructs.
• Then we use the grammar augmented by these error production to
construct a parser.
• If error production is used then, during parsing we can generate
appropriate error message and parsing can be continued.

Global correction
• Given an incorrect input string x and grammar G, the algorithm will find
a parse tree for a related string y, such that number of insertions,
deletions and changes of token require to transform x into y is as small as
possible.
• Such methods increase time and space requirements at parsing time.
• Global correction is thus simply a theoretical concept.

Module 3 - Semantic Analysis(Compiler Design).pptx

More Related Content

Similar to Module 3 - Semantic Analysis(Compiler Design).pptx

Recently uploaded

Module 3 - Semantic Analysis(Compiler Design).pptx