4 compiler lab - Syntax Ana

COMPILER
ENGINEERING
LAB # 4: SYNTAX ANALYSIS (PARSING)

LEXICAL ANALYSIS SUMMARY

1. Start New Token
2. Read 1st character to start recognizing its type
according to the algorithm specified in 3. Slide 3
3. Pass its Token (Lexeme Type) and Value Attribute
 send to Parser
4. Repeat Steps (1-3)
5. Repeat Until End

Department of Computer Science -
10-14/3/12 2
Compiler Engineering Lab

Start New TOKEN

Read 1st Character

If is Digit? TOKEN = NUM

If is Letter?

Read Following Characters

If any is digit or _? TOKEN = ID

Is a
If all letters? Keyword
?

Is RELOP? Is 2nd TOKEN=
TOKEN=KEYWOR
>, <, !, = Char (=)? RELOP
D
Is AROP?
+, -. /, *, = TOEKN = AROP

10-14/3/12 3

SYNTAX ANALYSIS (PARSING)

• is the process of analyzing a text, made of a
sequence of tokens
• to determine its grammatical structure with respect
to a given (more or less) formal grammar.
• Builds Abstract Syntax Tree (AST)
• Part from an Interpreter or a Compiler
• Creates some form of Internal Representation (IR)
• Programming Languages tend to be written in
Context-free grammar Efficient + fast Parsers can
be written for them

10-14/3/12 4

PHASE 2 : SYNTAX ANALYSIS

• also called sometimes Syntax Checking
• Ensures that:
• the code is valid grammatically (without worrying about the
meaning)
• and will sequence into an executable program.
• The syntactical analyzer applies rules to the code;
For example:
• checking to make sure that each opening brace has a
corresponding closing brace,
• and that each declaration has a type,
• and that the type exists .. etc

10-14/3/12 5

CONTEXT-FREE GRAMMAR

• Defines the components that forms an expression +
defines the order they must appear in
• A context-free grammar is a set of rules specifying
how syntactic elements in some language can be
formed from simpler ones
• The grammar specifies allowable ways to combine
tokens(called terminals), into higher-level syntactic
elements (called non-terminal)

10-14/3/12 6

CONTEXT-FREE GRAMMAR

• Ex.:
• Any ID is an expression (Preferred to say TOKEN)
• Any Number is an expression (Preferred to say TOKEN)
• If Expr1 and Expr2 are expressions then:
• Expr1+ Expr2  are expressions
• Expr1* Expr2  are expressions
• If id1 and Expr2 are expressions then:
• Id1 = Expr2  is a statement
• If Expr1and Statement 2 then
• While (Expr1) do Statement 2,
• If (Expr1) then Statement 2
are statements

10-14/3/12 7

GRAMMAR & AST

TOKEN (terminals) = Leaf
Expressions, Statements (Non-Terminals) = Nodes

Lexical Stream of TOKENs
Stream of Characters
Analysis

Syntax
Stream of TOKENs Abstract Syntax Tree (AST)
Analysis

10-14/3/12 8


10-14/3/12 9


Token Syntax
Token Analyzer
Tokens (Parser)

10-14/3/12 10

SYMBOL TABLE

• A Symbol Table is a data structure containing a
record for each identifier with fields for the
attributes of an ID
• Tokens formed are recorded in the ST
• Purpose:
• To analyze expressionsstatements, that is a hierarchal or
nesting structure is required
• Data structure allows us to: find, retrieve, store a record for
an ID quickly.
• For example: in Semantic Analysis Phase + Code Generation
phase  retrieve ID Type to Type Check and Implementation
purposes

10-14/3/12 11

SYMBOL TABLE MANAGEMENT

• The Symbol Table may contain any of the following
information:
• For an ID:
• The storage allocated for an ID,
• its TYPE,
• Its Scope (Where it’s valid in the program)
• For a function also:
• Number of Arguments
• Types of Arguments
• Passing Method (By Reference or By Value)
• Return Type
• Identifiers will be added if they don’t already exist

10-14/3/12 12

SYMBOL TABLE MANAGEMENT

• Not all attributes can always be determined by a
lexical analyzer  because of its linear nature
• E.g. dim a, x as integer
• In this example the analyzer at the time when seeing the IDs
has still unreached the type keyword
• So, following phases will complete filling IDs
attributes and using them as well
• For example: the storage location attribute is assigned by
the code generator phase

10-14/3/12 13

ERROR DETECTION & REPORTING

• In order the Compilation process proceed
correctly, Each phase must:
• Detect any error
• Deal with detected error(s)
• Errors detection:
• Most in Syntax + Semantic Analysis
• In Lexical Analysis: if characters aren’t legal for token
formation
• In Syntax Analysis: violates structure rules
• In Semantic Analysis: correct structure but wrong invalid
meaning (e.g. ID = Array Name + Function Name)

10-14/3/12 14

COMPILER PHASES
10-14/3/12 15

LEXICAL ANALYZER &
Lexical Analyzer

SYMBOL TABLE
Token Token Token Location
ID Type Value

Id1 ID position
expr1 AROP ASS
1d2 ID Initial
Expr2 AROP SUM
Id3 ID Rate
Expr3 AROP MUL
N1 Num 60
10-14/3/12 16

SYNTAX ANALYZER &
SYMBOL TABLE
10-14/3/12 17

SYNTAX ANALYZER &
SYMBOL TABLE
A LEAF is a record with two or more fields
One to identify the TOKEN and others to identify info
attributes

Token Token Token
Location
ID Type Value
Id1 ID position
expr1 AROP ASS
1d2 ID Initial
Expr2 AROP SUM
Id3 ID Rate
Expr3 AROP MUL
N1 NUM 60
10-14/3/12 18

SYNTAX ANALYZER &
SYMBOL TABLE
An interior NODE is a record with a field for the operator
and two fields of pointers to the left and right children

Left Child Right Child
Operator
(Pointer) (Pointer)
Expr1 id1 Expr2
Expr2 id2 Expr3
Expr3 id3 N1

10-14/3/12 19

TASK 1: THINK AS A COMPILER!

• Analyze the following program syntactically:

int main()
{
std::cout << "hello world" << std::endl;
return 0;
}

10-14/3/12 20

LEXICAL ANALYZER OUTPUT

• 1 = string "int”
• 2 = string "main”
• 3 = opening parenthesis
• 4 = closing parenthesis
• 5 = opening brace
• 6 = string "std”
• 7 = namespace operator
8 = string "cout”
• 9 = << operator
• 10 = string ""hello world"”
• 11 = string "endl”
• 12 = semicolon
• 13 = string "return”
• 14 = number 0
• 15 = closing brace

10-14/3/12 21

TASK 2: A STATEMENT AST

• Create an abstract syntax tree for the following
code for the Euclidean algorithm:
while b ≠ 0
if a > b
a := a − b
else
b := b − a
return a

10-14/3/12 22

TASK 2: A STATEMENT AST

10-14/3/12 23

LAB ASSIGNMENT

Write the Syntax Analyzer Components and Ensure
fulfilling the following :
• Create a Symbol Table (for all types including IDs,
Functions, .. Etc)
• Fill the Symbol Table with Tokens extracted from the
Lexical Analysis phase
•Differentiate between Node and Leaf
• Applying grammar rules (tokens, expressions,
statements)

10-14/3/12 24

QUESTIONS?

Thank you for listening 

10-14/3/12 25

4 compiler lab - Syntax Ana

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 4 compiler lab - Syntax Ana

Similar to 4 compiler lab - Syntax Ana (20)

Recently uploaded

Recently uploaded (20)

4 compiler lab - Syntax Ana

Editor's Notes