SlideShare a Scribd company logo
1 of 55
2/8/2008 Prof. Hilfinger CS164 Lecture 8 1
Introduction to Parsing
Lecture 8
Adapted from slides by G. Necula and R. Bodik
2/8/2008 Prof. Hilfinger CS164 Lecture 8 2
Outline
• Limitations of regular languages
• Parser overview
• Context-free grammars (CFG’s)
• Derivations
• Syntax-Directed Translation
2/8/2008 Prof. Hilfinger CS164 Lecture 8 3
Languages and Automata
• Formal languages are very important in CS
– Especially in programming languages
• Regular languages
– The weakest formal languages widely used
– Many applications
• We will also study context-free languages
2/8/2008 Prof. Hilfinger CS164 Lecture 8 4
Limitations of Regular Languages
• Intuition: A finite automaton that runs long
enough must repeat states
• Finite automaton can’t remember # of times it
has visited a particular state
• Finite automaton has finite memory
– Only enough to store in which state it is
– Cannot count, except up to a finite limit
• E.g., language of balanced parentheses is not
regular: { (i )i | i  0}
2/8/2008 Prof. Hilfinger CS164 Lecture 8 5
The Structure of a Compiler
Source Tokens
Interm.
Language
Lexical
analysis
Parsing
Code
Gen.
Machine
Code
Today we start
Optimization
2/8/2008 Prof. Hilfinger CS164 Lecture 8 6
The Functionality of the Parser
• Input: sequence of tokens from lexer
• Output: abstract syntax tree of the program
2/8/2008 Prof. Hilfinger CS164 Lecture 8 7
Example
• Pyth: if x == y: z =1
else: z = 2
• Parser input: IF ID == ID : ID = INT  ELSE : ID = INT 
• Parser output (abstract syntax tree):
IF-THEN-ELSE
== = =
ID ID ID ID INT
INT
2/8/2008 Prof. Hilfinger CS164 Lecture 8 8
Why A Tree?
• Each stage of the compiler has two purposes:
– Detect and filter out some class of errors
– Compute some new information or translate the
representation of the program to make things
easier for later stages
• Recursive structure of tree suits recursive
structure of language definition
• With tree, later stages can easily find “the
else clause”, e.g., rather than having to scan
through tokens to find it.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 9
Comparison with Lexical Analysis
Phase Input Output
Lexer Sequence of
characters
Sequence of
tokens
Parser Sequence of
tokens
Syntax tree
2/8/2008 Prof. Hilfinger CS164 Lecture 8 10
The Role of the Parser
• Not all sequences of tokens are programs . . .
• . . . Parser must distinguish between valid and
invalid sequences of tokens
• We need
– A language for describing valid sequences of tokens
– A method for distinguishing valid from invalid
sequences of tokens
2/8/2008 Prof. Hilfinger CS164 Lecture 8 11
Programming Language Structure
• Programming languages have recursive structure
• Consider the language of arithmetic expressions
with integers, +, *, and ( )
• An expression is either:
– an integer
– an expression followed by “+” followed by expression
– an expression followed by “*” followed by expression
– a ‘(‘ followed by an expression followed by ‘)’
• int , int + int , ( int + int) * int are expressions
2/8/2008 Prof. Hilfinger CS164 Lecture 8 12
Notation for Programming Languages
• An alternative notation:
E  int
E  E + E
E  E * E
E  ( E )
• We can view these rules as rewrite rules
– We start with E and replace occurrences of E with
some right-hand side
• E  E * E  ( E ) * E  ( E + E ) * E  …
 (int + int) * int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 13
Observation
• All arithmetic expressions can be obtained by
a sequence of replacements
• Any sequence of replacements forms a valid
arithmetic expression
• This means that we cannot obtain
( int ) )
by any sequence of replacements. Why?
• This set of rules is a context-free grammar
2/8/2008 Prof. Hilfinger CS164 Lecture 8 14
Context-Free Grammars
• A CFG consists of
– A set of non-terminals N
• By convention, written with capital letter in these notes
– A set of terminals T
• By convention, either lower case names or punctuation
– A start symbol S (a non-terminal)
– A set of productions
• Assuming E  N
E  e , or
E  Y1 Y2 ... Yn where Yi  N  T
2/8/2008 Prof. Hilfinger CS164 Lecture 8 15
Examples of CFGs
Simple arithmetic expressions:
E  int
E  E + E
E  E * E
E  ( E )
– One non-terminal: E
– Several terminals: int, +, *, (, )
• Called terminals because they are never replaced
– By convention the non-terminal for the first
production is the start one
2/8/2008 Prof. Hilfinger CS164 Lecture 8 16
The Language of a CFG
Read productions as replacement rules:
X  Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X  e
Means X can be erased (replaced with empty string)
2/8/2008 Prof. Hilfinger CS164 Lecture 8 17
Key Idea
1. Begin with a string consisting of the start
symbol “S”
2. Replace any non-terminal X in the string by a
right-hand side of some production
X  Y1 … Yn
3. Repeat (2) until there are only terminals in
the string
4. The successive strings created in this way
are called sentential forms.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 18
The Language of a CFG (Cont.)
More formally, may write
X1 … Xi-1 Xi Xi+1… Xn  X1 … Xi-1 Y1 … Ym Xi+1 … Xn
if there is a production
Xi  Y1 … Ym
2/8/2008 Prof. Hilfinger CS164 Lecture 8 19
The Language of a CFG (Cont.)
Write
X1 … Xn * Y1 … Ym
if
X1 … Xn  …  …  Y1 … Ym
in 0 or more steps
2/8/2008 Prof. Hilfinger CS164 Lecture 8 20
The Language of a CFG
Let G be a context-free grammar with start
symbol S. Then the language of G is:
L(G) = { a1 … an | S * a1 … an and every ai
is a terminal }
2/8/2008 Prof. Hilfinger CS164 Lecture 8 21
Examples:
• S  0 also written as S  0 | 1
S  1
Generates the language { “0”, “1” }
• What about S  1 A
A  0 | 1
• What about S  1 A
A  0 | 1 A
• What about S  e | ( S )
2/8/2008 Prof. Hilfinger CS164 Lecture 8 22
Pyth Example
A fragment of Pyth:
Compound  while Expr: Block
| if Expr: Block Elses
Elses  e | else: Block | elif Expr: Block Elses
Block  Stmt_List | Suite
(Formal language papers use one-character non-
terminals, but we don’t have to!)
2/8/2008 Prof. Hilfinger CS164 Lecture 8 23
Derivations and Parse Trees
• A derivation is a sequence of sentential forms
resulting from the application of a sequence of
productions
S  …  …
• A derivation can be represented as a parse tree
– Start symbol is the tree’s root
– For a production X  Y1 … Yn add children
Y1, …, Yn to node X
2/8/2008 Prof. Hilfinger CS164 Lecture 8 24
Derivation Example
• Grammar
E  E + E | E * E | (E) | int
• String
int * int + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 25
Derivation Example (Cont.)
E
 E + E
 E * E + E
 int * E + E
 int * int + E
 int * int + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 26
Derivation in Detail (1)
E
E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 27
Derivation in Detail (2)
E
E E
+
E
 E + E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 28
Derivation in Detail (3)
E
E
E E
E
+
*
E
 E + E
 E * E + E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 29
Derivation in Detail (4)
E
E
E E
E
+
*
int
E
 E + E
 E * E + E
 int * E + E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 30
Derivation in Detail (5)
E
E
E E
E
+
*
int
int
E
 E + E
 E * E + E
 int * E + E
 int * int + E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 31
Derivation in Detail (6)
E
E
E E
E
+
int
*
int
int
E
 E + E
 E * E + E
 int * E + E
 int * int + E
 int * int + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 32
Notes on Derivations
• A parse tree has
– Terminals at the leaves
– Non-terminals at the interior nodes
• A left-right traversal of the leaves is the
original input
• The parse tree shows the association of
operations, the input string does not !
– There may be multiple ways to match the input
– Derivations (and parse trees) choose one
2/8/2008 Prof. Hilfinger CS164 Lecture 8 33
The Payoff: parser as a translator
syntax-directed translation
parser
syntax + translation rules
(typically hardcoded in the parser)
stream of
tokens
ASTs, or
assembly code
2/8/2008 Prof. Hilfinger CS164 Lecture 8 34
Mechanism of syntax-directed translation
• syntax-directed translation is done by
extending the CFG
– a translation rule is defined for each production
given
X  d A B c
the translation of X is defined recursively using
• translation of nonterminals A, B
• values of attributes of terminals d, c
• constants
2/8/2008 Prof. Hilfinger CS164 Lecture 8 35
To translate an input string:
1. Build the parse tree.
2. Working bottom-up
• Use the translation rules to compute the translation of each
nonterminal in the tree
Result: the translation of the string is the translation of the parse
tree's root nonterminal.
Why bottom up?
• a nonterminal's value may depend on the value of the symbols
on the right-hand side,
• so translate a non-terminal node only after children
translations are available.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 36
Example 1: Arithmetic expression to value
Syntax-directed translation rules:
E  E + T E1.trans = E2.trans + T.trans
E  T E.trans = T.trans
T  T * F T1.trans = T2.trans * F.trans
T  F T.trans = F.trans
F  int F.trans = int.value
F  ( E ) F.trans = E.trans
2/8/2008 Prof. Hilfinger CS164 Lecture 8 37
Example 1: Bison/Yacc Notation
E : E + T { $$ = $1 + $3; }
T : T * F { $$ = $1 * $3; }
F : int { $$ = $1; }
F : ‘(‘ E ‘) ‘ { $$ = $2; }
• KEY: $$ : Semantic value of left-hand side
$n : Semantic value of nth symbol on
right-hand side
2/8/2008 Prof. Hilfinger CS164 Lecture 8 38
Example 1 (cont): Annotated Parse Tree
Input: 2 * (4 + 5)
E (18)
T (18)
F (9)
T (2)
F (2)
E (9)
T (5)
F (5)
E (4)
T (4)
F (4)
*
)
+
(
int (2)
int (4)
int (5)
2/8/2008 Prof. Hilfinger CS164 Lecture 8 39
Example 2: Compute the type of an expression
E -> E + E if $1 == INT and $3 == INT:
$$ = INT
else: $$ = ERROR
E -> E and E if $1 == BOOL and $3 == BOOL:
$$ = BOOL
else: $$ = ERROR
E -> E == E if $1 == $3 and $2 != ERROR:
$$ = BOOL
else: $$ = ERROR
E -> true $$ = BOOL
E -> false $$ = BOOL
E -> int $$ = INT
E -> ( E ) $$ = $2
2/8/2008 Prof. Hilfinger CS164 Lecture 8 40
Example 2 (cont)
• Input: (2 + 2) == 4
E (INT)
E (BOOL)
E (INT)
int (INT)
==
E (INT) E (INT)
+
int (INT)
int (INT)
E (INT)
( )
2/8/2008 Prof. Hilfinger CS164 Lecture 8 41
Building Abstract Syntax Trees
• Examples so far, streams of tokens translated
into
– integer values, or
– types
• Translating into ASTs is not very different
2/8/2008 Prof. Hilfinger CS164 Lecture 8 42
AST vs. Parse Tree
• AST is condensed form of a parse tree
– operators appear at internal nodes, not at leaves.
– "Chains" of single productions are collapsed.
– Lists are "flattened".
– Syntactic details are omitted
• e.g., parentheses, commas, semi-colons
• AST is a better structure for later compiler
stages
– omits details having to do with the source language,
– only contains information about the essential
structure of the program.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 43
Example: 2 * (4 + 5) Parse tree vs. AST
E
int (2)
*
+
2
5
4
T
F
T
F
E
T
F
E
T
F
*
)
+
(
int (5)
int (4)
2/8/2008 Prof. Hilfinger CS164 Lecture 8 44
AST-building translation rules
E  E + T $$ = new PlusNode($1, $3)
E  T $$ = $1
T  T * F $$ = new TimesNode($1, $3)
T  F $$ = $1
F  int $$ = new IntLitNode($1)
F  ( E ) $$ = $2
2/8/2008 Prof. Hilfinger CS164 Lecture 8 45
Example: 2 * (4 + 5): Steps in Creating AST
E
int (2)
T
F
T
F
E
T
F
E
T
F
*
)
+
(
int (5)
int (4)
+
5
4
*
2
+
5
4
2 (Only some of the
semantic values are
shown)
2/8/2008 Prof. Hilfinger CS164 Lecture 8 46
Leftmost and Rightmost Derivations
E
 E + E
 E * E + E
 int * E + E
 int * int + E
 int * int + int
Leftmost derivation:
always act on leftmost
non-terminal
E
 E + E
 E + int
 E * E + int
 E * int + int
 int * int + int
Rightmost derivation:
always act on rightmost
non-terminal
2/8/2008 Prof. Hilfinger CS164 Lecture 8 47
rightmost Derivation in Detail (1)
E
E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 48
rightmost Derivation in Detail (2)
E
E E
+
E
 E + E
2/8/2008 Prof. Hilfinger CS164 Lecture 8 49
rightmost Derivation in Detail (3)
E
E E
+
int
E
 E + E
 E + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 50
rightmost Derivation in Detail (4)
E
E
E E
E
+
int
*
E
 E + E
 E + int
 E * E + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 51
rightmost Derivation in Detail (5)
E
E
E E
E
+
int
*
int
E
 E + E
 E + int
 E * E + int
 E * int + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 52
rightmost Derivation in Detail (6)
E
E
E E
E
+
int
*
int
int
E
 E + E
 E + int
 E * E + int
 E * int + int
 int * int + int
2/8/2008 Prof. Hilfinger CS164 Lecture 8 53
Aside: Canonical Derivations
• Take a look at that last derivation in reverse.
• The active part (red) tends to move left to
right.
• We call this a reverse rightmost or canonical
derivation.
• Comes up in bottom-up parsing. We’ll return
to it in a couple of lectures.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 54
Derivations and Parse Trees
• For each parse tree there is exactly one
leftmost and one rightmost derivation
• The difference is the order in which branches
are added, not the structure of the tree.
2/8/2008 Prof. Hilfinger CS164 Lecture 8 55
Summary of Derivations
• We are not just interested in whether
s  L(G)
• Also need derivation (or parse tree) and AST.
• Parse trees slavishly reflect the grammar.
• Abstract syntax trees abstract from the grammar,
cutting out detail that interferes with later stages.
• A derivation defines a parse tree
– But one parse tree may have many derivations
• Derivations drive translation (to ASTs, etc.)
• Leftmost and rightmost derivations most important in
parser implementation

More Related Content

Similar to Compiler design.ppt

Dynamic Type Inference for Gradual Hindley–Milner Typing
Dynamic Type Inference for Gradual Hindley–Milner TypingDynamic Type Inference for Gradual Hindley–Milner Typing
Dynamic Type Inference for Gradual Hindley–Milner TypingYusuke Miyazaki
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdfTANZINTANZINA
 
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...Magnus Sedlacek
 
c programming 2nd chapter pdf.PPT
c programming 2nd chapter pdf.PPTc programming 2nd chapter pdf.PPT
c programming 2nd chapter pdf.PPTKauserJahan6
 
A File is a collection of data stored in the secondary memory. So far data wa...
A File is a collection of data stored in the secondary memory. So far data wa...A File is a collection of data stored in the secondary memory. So far data wa...
A File is a collection of data stored in the secondary memory. So far data wa...bhargavi804095
 
Introduction to Problem Solving C Programming
Introduction to Problem Solving C ProgrammingIntroduction to Problem Solving C Programming
Introduction to Problem Solving C ProgrammingRKarthickCSEKIOT
 
33-Procedures-Switch Case Statements-27-03-2024.pdf
33-Procedures-Switch Case Statements-27-03-2024.pdf33-Procedures-Switch Case Statements-27-03-2024.pdf
33-Procedures-Switch Case Statements-27-03-2024.pdfYash218469
 
INTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxINTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxNimrahafzal1
 
Full Python in 20 slides
Full Python in 20 slidesFull Python in 20 slides
Full Python in 20 slidesrfojdar
 

Similar to Compiler design.ppt (20)

Dynamic Type Inference for Gradual Hindley–Milner Typing
Dynamic Type Inference for Gradual Hindley–Milner TypingDynamic Type Inference for Gradual Hindley–Milner Typing
Dynamic Type Inference for Gradual Hindley–Milner Typing
 
3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf3a. Context Free Grammar.pdf
3a. Context Free Grammar.pdf
 
Syntax part1
Syntax part1Syntax part1
Syntax part1
 
Syntax
SyntaxSyntax
Syntax
 
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...
Data efficiency on BEAM - Choose the right data representation by Dmytro Lyto...
 
Chapter02.PPT
Chapter02.PPTChapter02.PPT
Chapter02.PPT
 
c programming 2nd chapter pdf.PPT
c programming 2nd chapter pdf.PPTc programming 2nd chapter pdf.PPT
c programming 2nd chapter pdf.PPT
 
A File is a collection of data stored in the secondary memory. So far data wa...
A File is a collection of data stored in the secondary memory. So far data wa...A File is a collection of data stored in the secondary memory. So far data wa...
A File is a collection of data stored in the secondary memory. So far data wa...
 
Introduction to Problem Solving C Programming
Introduction to Problem Solving C ProgrammingIntroduction to Problem Solving C Programming
Introduction to Problem Solving C Programming
 
Cfg part i
Cfg   part iCfg   part i
Cfg part i
 
33-Procedures-Switch Case Statements-27-03-2024.pdf
33-Procedures-Switch Case Statements-27-03-2024.pdf33-Procedures-Switch Case Statements-27-03-2024.pdf
33-Procedures-Switch Case Statements-27-03-2024.pdf
 
INTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptxINTRODUCTION TO PYTHON.pptx
INTRODUCTION TO PYTHON.pptx
 
Lecture8 syntax analysis_4
Lecture8 syntax analysis_4Lecture8 syntax analysis_4
Lecture8 syntax analysis_4
 
Lecture4 lexical analysis2
Lecture4 lexical analysis2Lecture4 lexical analysis2
Lecture4 lexical analysis2
 
anjaan007
anjaan007anjaan007
anjaan007
 
C Programming language
C Programming languageC Programming language
C Programming language
 
Clanguage
ClanguageClanguage
Clanguage
 
Clanguage
ClanguageClanguage
Clanguage
 
Clanguage
ClanguageClanguage
Clanguage
 
Full Python in 20 slides
Full Python in 20 slidesFull Python in 20 slides
Full Python in 20 slides
 

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 

Compiler design.ppt

  • 1. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 1 Introduction to Parsing Lecture 8 Adapted from slides by G. Necula and R. Bodik
  • 2. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 2 Outline • Limitations of regular languages • Parser overview • Context-free grammars (CFG’s) • Derivations • Syntax-Directed Translation
  • 3. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 3 Languages and Automata • Formal languages are very important in CS – Especially in programming languages • Regular languages – The weakest formal languages widely used – Many applications • We will also study context-free languages
  • 4. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 4 Limitations of Regular Languages • Intuition: A finite automaton that runs long enough must repeat states • Finite automaton can’t remember # of times it has visited a particular state • Finite automaton has finite memory – Only enough to store in which state it is – Cannot count, except up to a finite limit • E.g., language of balanced parentheses is not regular: { (i )i | i  0}
  • 5. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 5 The Structure of a Compiler Source Tokens Interm. Language Lexical analysis Parsing Code Gen. Machine Code Today we start Optimization
  • 6. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 6 The Functionality of the Parser • Input: sequence of tokens from lexer • Output: abstract syntax tree of the program
  • 7. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 7 Example • Pyth: if x == y: z =1 else: z = 2 • Parser input: IF ID == ID : ID = INT  ELSE : ID = INT  • Parser output (abstract syntax tree): IF-THEN-ELSE == = = ID ID ID ID INT INT
  • 8. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 8 Why A Tree? • Each stage of the compiler has two purposes: – Detect and filter out some class of errors – Compute some new information or translate the representation of the program to make things easier for later stages • Recursive structure of tree suits recursive structure of language definition • With tree, later stages can easily find “the else clause”, e.g., rather than having to scan through tokens to find it.
  • 9. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 9 Comparison with Lexical Analysis Phase Input Output Lexer Sequence of characters Sequence of tokens Parser Sequence of tokens Syntax tree
  • 10. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 10 The Role of the Parser • Not all sequences of tokens are programs . . . • . . . Parser must distinguish between valid and invalid sequences of tokens • We need – A language for describing valid sequences of tokens – A method for distinguishing valid from invalid sequences of tokens
  • 11. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 11 Programming Language Structure • Programming languages have recursive structure • Consider the language of arithmetic expressions with integers, +, *, and ( ) • An expression is either: – an integer – an expression followed by “+” followed by expression – an expression followed by “*” followed by expression – a ‘(‘ followed by an expression followed by ‘)’ • int , int + int , ( int + int) * int are expressions
  • 12. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 12 Notation for Programming Languages • An alternative notation: E  int E  E + E E  E * E E  ( E ) • We can view these rules as rewrite rules – We start with E and replace occurrences of E with some right-hand side • E  E * E  ( E ) * E  ( E + E ) * E  …  (int + int) * int
  • 13. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 13 Observation • All arithmetic expressions can be obtained by a sequence of replacements • Any sequence of replacements forms a valid arithmetic expression • This means that we cannot obtain ( int ) ) by any sequence of replacements. Why? • This set of rules is a context-free grammar
  • 14. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 14 Context-Free Grammars • A CFG consists of – A set of non-terminals N • By convention, written with capital letter in these notes – A set of terminals T • By convention, either lower case names or punctuation – A start symbol S (a non-terminal) – A set of productions • Assuming E  N E  e , or E  Y1 Y2 ... Yn where Yi  N  T
  • 15. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 15 Examples of CFGs Simple arithmetic expressions: E  int E  E + E E  E * E E  ( E ) – One non-terminal: E – Several terminals: int, +, *, (, ) • Called terminals because they are never replaced – By convention the non-terminal for the first production is the start one
  • 16. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 16 The Language of a CFG Read productions as replacement rules: X  Y1 ... Yn Means X can be replaced by Y1 ... Yn X  e Means X can be erased (replaced with empty string)
  • 17. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 17 Key Idea 1. Begin with a string consisting of the start symbol “S” 2. Replace any non-terminal X in the string by a right-hand side of some production X  Y1 … Yn 3. Repeat (2) until there are only terminals in the string 4. The successive strings created in this way are called sentential forms.
  • 18. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 18 The Language of a CFG (Cont.) More formally, may write X1 … Xi-1 Xi Xi+1… Xn  X1 … Xi-1 Y1 … Ym Xi+1 … Xn if there is a production Xi  Y1 … Ym
  • 19. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 19 The Language of a CFG (Cont.) Write X1 … Xn * Y1 … Ym if X1 … Xn  …  …  Y1 … Ym in 0 or more steps
  • 20. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 20 The Language of a CFG Let G be a context-free grammar with start symbol S. Then the language of G is: L(G) = { a1 … an | S * a1 … an and every ai is a terminal }
  • 21. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 21 Examples: • S  0 also written as S  0 | 1 S  1 Generates the language { “0”, “1” } • What about S  1 A A  0 | 1 • What about S  1 A A  0 | 1 A • What about S  e | ( S )
  • 22. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 22 Pyth Example A fragment of Pyth: Compound  while Expr: Block | if Expr: Block Elses Elses  e | else: Block | elif Expr: Block Elses Block  Stmt_List | Suite (Formal language papers use one-character non- terminals, but we don’t have to!)
  • 23. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 23 Derivations and Parse Trees • A derivation is a sequence of sentential forms resulting from the application of a sequence of productions S  …  … • A derivation can be represented as a parse tree – Start symbol is the tree’s root – For a production X  Y1 … Yn add children Y1, …, Yn to node X
  • 24. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 24 Derivation Example • Grammar E  E + E | E * E | (E) | int • String int * int + int
  • 25. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 25 Derivation Example (Cont.) E  E + E  E * E + E  int * E + E  int * int + E  int * int + int
  • 26. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 26 Derivation in Detail (1) E E
  • 27. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 27 Derivation in Detail (2) E E E + E  E + E
  • 28. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 28 Derivation in Detail (3) E E E E E + * E  E + E  E * E + E
  • 29. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 29 Derivation in Detail (4) E E E E E + * int E  E + E  E * E + E  int * E + E
  • 30. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 30 Derivation in Detail (5) E E E E E + * int int E  E + E  E * E + E  int * E + E  int * int + E
  • 31. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 31 Derivation in Detail (6) E E E E E + int * int int E  E + E  E * E + E  int * E + E  int * int + E  int * int + int
  • 32. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 32 Notes on Derivations • A parse tree has – Terminals at the leaves – Non-terminals at the interior nodes • A left-right traversal of the leaves is the original input • The parse tree shows the association of operations, the input string does not ! – There may be multiple ways to match the input – Derivations (and parse trees) choose one
  • 33. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 33 The Payoff: parser as a translator syntax-directed translation parser syntax + translation rules (typically hardcoded in the parser) stream of tokens ASTs, or assembly code
  • 34. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 34 Mechanism of syntax-directed translation • syntax-directed translation is done by extending the CFG – a translation rule is defined for each production given X  d A B c the translation of X is defined recursively using • translation of nonterminals A, B • values of attributes of terminals d, c • constants
  • 35. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 35 To translate an input string: 1. Build the parse tree. 2. Working bottom-up • Use the translation rules to compute the translation of each nonterminal in the tree Result: the translation of the string is the translation of the parse tree's root nonterminal. Why bottom up? • a nonterminal's value may depend on the value of the symbols on the right-hand side, • so translate a non-terminal node only after children translations are available.
  • 36. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 36 Example 1: Arithmetic expression to value Syntax-directed translation rules: E  E + T E1.trans = E2.trans + T.trans E  T E.trans = T.trans T  T * F T1.trans = T2.trans * F.trans T  F T.trans = F.trans F  int F.trans = int.value F  ( E ) F.trans = E.trans
  • 37. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 37 Example 1: Bison/Yacc Notation E : E + T { $$ = $1 + $3; } T : T * F { $$ = $1 * $3; } F : int { $$ = $1; } F : ‘(‘ E ‘) ‘ { $$ = $2; } • KEY: $$ : Semantic value of left-hand side $n : Semantic value of nth symbol on right-hand side
  • 38. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 38 Example 1 (cont): Annotated Parse Tree Input: 2 * (4 + 5) E (18) T (18) F (9) T (2) F (2) E (9) T (5) F (5) E (4) T (4) F (4) * ) + ( int (2) int (4) int (5)
  • 39. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 39 Example 2: Compute the type of an expression E -> E + E if $1 == INT and $3 == INT: $$ = INT else: $$ = ERROR E -> E and E if $1 == BOOL and $3 == BOOL: $$ = BOOL else: $$ = ERROR E -> E == E if $1 == $3 and $2 != ERROR: $$ = BOOL else: $$ = ERROR E -> true $$ = BOOL E -> false $$ = BOOL E -> int $$ = INT E -> ( E ) $$ = $2
  • 40. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 40 Example 2 (cont) • Input: (2 + 2) == 4 E (INT) E (BOOL) E (INT) int (INT) == E (INT) E (INT) + int (INT) int (INT) E (INT) ( )
  • 41. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 41 Building Abstract Syntax Trees • Examples so far, streams of tokens translated into – integer values, or – types • Translating into ASTs is not very different
  • 42. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 42 AST vs. Parse Tree • AST is condensed form of a parse tree – operators appear at internal nodes, not at leaves. – "Chains" of single productions are collapsed. – Lists are "flattened". – Syntactic details are omitted • e.g., parentheses, commas, semi-colons • AST is a better structure for later compiler stages – omits details having to do with the source language, – only contains information about the essential structure of the program.
  • 43. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 43 Example: 2 * (4 + 5) Parse tree vs. AST E int (2) * + 2 5 4 T F T F E T F E T F * ) + ( int (5) int (4)
  • 44. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 44 AST-building translation rules E  E + T $$ = new PlusNode($1, $3) E  T $$ = $1 T  T * F $$ = new TimesNode($1, $3) T  F $$ = $1 F  int $$ = new IntLitNode($1) F  ( E ) $$ = $2
  • 45. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 45 Example: 2 * (4 + 5): Steps in Creating AST E int (2) T F T F E T F E T F * ) + ( int (5) int (4) + 5 4 * 2 + 5 4 2 (Only some of the semantic values are shown)
  • 46. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 46 Leftmost and Rightmost Derivations E  E + E  E * E + E  int * E + E  int * int + E  int * int + int Leftmost derivation: always act on leftmost non-terminal E  E + E  E + int  E * E + int  E * int + int  int * int + int Rightmost derivation: always act on rightmost non-terminal
  • 47. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 47 rightmost Derivation in Detail (1) E E
  • 48. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 48 rightmost Derivation in Detail (2) E E E + E  E + E
  • 49. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 49 rightmost Derivation in Detail (3) E E E + int E  E + E  E + int
  • 50. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 50 rightmost Derivation in Detail (4) E E E E E + int * E  E + E  E + int  E * E + int
  • 51. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 51 rightmost Derivation in Detail (5) E E E E E + int * int E  E + E  E + int  E * E + int  E * int + int
  • 52. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 52 rightmost Derivation in Detail (6) E E E E E + int * int int E  E + E  E + int  E * E + int  E * int + int  int * int + int
  • 53. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 53 Aside: Canonical Derivations • Take a look at that last derivation in reverse. • The active part (red) tends to move left to right. • We call this a reverse rightmost or canonical derivation. • Comes up in bottom-up parsing. We’ll return to it in a couple of lectures.
  • 54. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 54 Derivations and Parse Trees • For each parse tree there is exactly one leftmost and one rightmost derivation • The difference is the order in which branches are added, not the structure of the tree.
  • 55. 2/8/2008 Prof. Hilfinger CS164 Lecture 8 55 Summary of Derivations • We are not just interested in whether s  L(G) • Also need derivation (or parse tree) and AST. • Parse trees slavishly reflect the grammar. • Abstract syntax trees abstract from the grammar, cutting out detail that interferes with later stages. • A derivation defines a parse tree – But one parse tree may have many derivations • Derivations drive translation (to ASTs, etc.) • Leftmost and rightmost derivations most important in parser implementation