CSE340 - Principles of
Programming Languages
Lecture 07:
Syntactic Analysis 1
Javier Gonzalez-Sanchez
javiergs@asu.edu
BYENG M1-38
Office Hours: By appointment
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 2
Next Step
þ Lexical Analysis ☐	
 Syntactic Analysis
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 3
Question
For each cases indicate whether it is possible or not to
generate a regular expression or a DFA.
i.  Detect the balance of N parenthesis in a string
that has N parenthesis nested and any characters
in between the parenthesis.
ii.  Is it possible to detect binary strings with the same
quantity of 0’s and 1’s (it does not matter the order
or sequence).
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 4
Where are we now?
After lexical analysis, we have a series of tokens.
But we can not:
I.  define a regular expression matching all
expressions with properly balanced parentheses.
II.  i.e., define a regular expression matching all
functions with properly nested block structure.
void a () { b (c); for (;;) {a=(-(1+2)+5); } }
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 5
Where are we now?
Now, we want to:
I.  Review the structure described by that series of
tokens
II.  Report errors if those tokens do not properly
encode a structure
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 6
High-Level Languages
X,E,G,O,O
#e1,I,I,0,7
@
OPR 19, AX
STO x, AX
LIT 5, AX
OPR 21, AX
LOD #e1,AX
CAL 1, AX
OPR 0, AX
5
Virtual Machine
(interpreter)
// sorce code
int x;
int foo () {
read (x);
print (5);
}
main () {
foo ();
}
Lexer
Parser
Semantic Analyzer
Code Generation
01001010101000010
01010100101010010
10100100000011011
11010010110101111
00010010101010010
10101001010101011
Assembler
compilation execution
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 7
Outline
Language
Lexical Analysis
(Lexer)
Rules
Symbols
Token
Tools
Regular Expression
DFA
Syntactic
Analysis
(Parser)
Grammar
(Rules)
Terminal
Non-terminal
Tools
BNF
(Backus-Naur Form)
Syntax Diagrams
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 8
Grammar | Example
Describe all legal arithmetic expressions using
addition, subtraction, multiplication, and division with
integer values
E à E OP E
E à integer
OP à + | - | * | /
E à ( E )
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 9
A Grammar is a collection of
four elements:
§  Set of nonterminal symbols
(uppercase)
§  Set of terminal symbols
(lowercase). Terminals can be
tokens or specific words
§  Set of production rules saying
how each nonterminal can
be converted by a string of
terminals and nonterminals,
§  A start symbol
E à E OP E
E à integer
OP à + | - | * | /
E à ( E )
Grammar | Definition
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 10
5 / 20
integer operator integer
E à E OP E
E à integer
OP à + | - | * | /
E à ( E )
E
⇒  E OP E
⇒  integer OP E
⇒  integer / E
⇒  integer / integer
Grammar | Derivation
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 11
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter
E à E OP E
E à integer
OP à + | - | * | /
E à ( E )
E
⇒  E OP E
⇒  integer OP E
⇒  integer * E
⇒  integer * (E)
⇒ integer * (E OP E)
⇒  integer * (integer OP E)
⇒  integer * (integer + E)
⇒  integer * (integer + integer)
Grammar | Derivation
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 12
5 * ( 7 + 20 )
Integer operator delimiter integer operator integer delimiter
E à E OP E
E à integer
OP à + | - | * | /
E à ( E )
E
⇒  E OP E
⇒  E OP (E)
⇒  E OP (E OP E)
⇒  E OP (E OP integer)
⇒  E OP (E + integer)
⇒  E OP (integer + integer)
⇒  E * (integer + integer)
⇒  integer * (integer + integer)
Grammar | Derivation
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 13
Derivations
§  A leftmost derivation is a derivation in which
each step expands the leftmost
nonterminal.
§  A rightmost derivation is a derivation in
which each step expands the rightmost
nonterminal.
§  Derivation will be very important when we
talk about parsing.
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 14
Notation 1:
Comp → Mix | Mix Num | Comp Comp
Mix → Elem | ( Comp )
Elem → H|O|C|S|Na|Cl| ...
Num → 1|2|3|4| ...
Notation 2:
<Comp> → <Mix>|<Mix><Num> | <Comp><Comp>
<Mix> → <Elem> | ( <Comp> )
<Elem> → H|O|C|S|Na|Cl| ...
<Num> → 1|2|3|4| ...
H2 O
C O2 (S O4)3
Na Cl
S O3
Grammar | Example
Javier Gonzalez-Sanchez | CSE340 | Summer 2013 | 15
C O 2
Comp → Term | Term Num | Comp Comp
Term → Elem | ( Comp )
Elem → H|O|C|S|Na|Cl| ...
Num → 1|2|3|4| …
Comp
⇒  Comp Comp
⇒  Term Comp
⇒  Elem Comp
⇒  C Comp
⇒ C Term Num
⇒  C Elem Num
⇒  CO Num
⇒  CO2
Grammar | Derivation
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 16
What about this?
BLOCK → STMT | { STMTS } | { }
STMTS → STMT | STMT STMTS
STMT → EXPR; |
if (EXPR) BLOCK |
while (EXPR) BLOCK |
BLOCK |
. . .
EXPR → EXPR + EXPR |
EXPR – EXPR |
EXPR * EXPR |
identifier |
integer |
...
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 17
Homework
Using the rules in the previous slide, apply derivation to show that the following
expression is syntactically correct
while ( 5 ) { if ( 6 ) { } }
Javier Gonzalez-Sanchez | CSE340 | Summer 2015 | 18
Homework
Review Recursion
Solve the Problem Set #1 in preparation for your exam
CSE340 - Principles of Programming Languages
Javier Gonzalez-Sanchez
javiergs@asu.edu
Summer 2015
Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.

201506 CSE340 Lecture 07

  • 1.
    CSE340 - Principlesof Programming Languages Lecture 07: Syntactic Analysis 1 Javier Gonzalez-Sanchez javiergs@asu.edu BYENG M1-38 Office Hours: By appointment
  • 2.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 2 Next Step þ Lexical Analysis ☐ Syntactic Analysis
  • 3.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 3 Question For each cases indicate whether it is possible or not to generate a regular expression or a DFA. i.  Detect the balance of N parenthesis in a string that has N parenthesis nested and any characters in between the parenthesis. ii.  Is it possible to detect binary strings with the same quantity of 0’s and 1’s (it does not matter the order or sequence).
  • 4.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 4 Where are we now? After lexical analysis, we have a series of tokens. But we can not: I.  define a regular expression matching all expressions with properly balanced parentheses. II.  i.e., define a regular expression matching all functions with properly nested block structure. void a () { b (c); for (;;) {a=(-(1+2)+5); } }
  • 5.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 5 Where are we now? Now, we want to: I.  Review the structure described by that series of tokens II.  Report errors if those tokens do not properly encode a structure
  • 6.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 6 High-Level Languages X,E,G,O,O #e1,I,I,0,7 @ OPR 19, AX STO x, AX LIT 5, AX OPR 21, AX LOD #e1,AX CAL 1, AX OPR 0, AX 5 Virtual Machine (interpreter) // sorce code int x; int foo () { read (x); print (5); } main () { foo (); } Lexer Parser Semantic Analyzer Code Generation 01001010101000010 01010100101010010 10100100000011011 11010010110101111 00010010101010010 10101001010101011 Assembler compilation execution
  • 7.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 7 Outline Language Lexical Analysis (Lexer) Rules Symbols Token Tools Regular Expression DFA Syntactic Analysis (Parser) Grammar (Rules) Terminal Non-terminal Tools BNF (Backus-Naur Form) Syntax Diagrams
  • 8.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 8 Grammar | Example Describe all legal arithmetic expressions using addition, subtraction, multiplication, and division with integer values E à E OP E E à integer OP à + | - | * | / E à ( E )
  • 9.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 9 A Grammar is a collection of four elements: §  Set of nonterminal symbols (uppercase) §  Set of terminal symbols (lowercase). Terminals can be tokens or specific words §  Set of production rules saying how each nonterminal can be converted by a string of terminals and nonterminals, §  A start symbol E à E OP E E à integer OP à + | - | * | / E à ( E ) Grammar | Definition
  • 10.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 10 5 / 20 integer operator integer E à E OP E E à integer OP à + | - | * | / E à ( E ) E ⇒  E OP E ⇒  integer OP E ⇒  integer / E ⇒  integer / integer Grammar | Derivation
  • 11.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 11 5 * ( 7 + 20 ) Integer operator delimiter integer operator integer delimiter E à E OP E E à integer OP à + | - | * | / E à ( E ) E ⇒  E OP E ⇒  integer OP E ⇒  integer * E ⇒  integer * (E) ⇒ integer * (E OP E) ⇒  integer * (integer OP E) ⇒  integer * (integer + E) ⇒  integer * (integer + integer) Grammar | Derivation
  • 12.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 12 5 * ( 7 + 20 ) Integer operator delimiter integer operator integer delimiter E à E OP E E à integer OP à + | - | * | / E à ( E ) E ⇒  E OP E ⇒  E OP (E) ⇒  E OP (E OP E) ⇒  E OP (E OP integer) ⇒  E OP (E + integer) ⇒  E OP (integer + integer) ⇒  E * (integer + integer) ⇒  integer * (integer + integer) Grammar | Derivation
  • 13.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 13 Derivations §  A leftmost derivation is a derivation in which each step expands the leftmost nonterminal. §  A rightmost derivation is a derivation in which each step expands the rightmost nonterminal. §  Derivation will be very important when we talk about parsing.
  • 14.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 14 Notation 1: Comp → Mix | Mix Num | Comp Comp Mix → Elem | ( Comp ) Elem → H|O|C|S|Na|Cl| ... Num → 1|2|3|4| ... Notation 2: <Comp> → <Mix>|<Mix><Num> | <Comp><Comp> <Mix> → <Elem> | ( <Comp> ) <Elem> → H|O|C|S|Na|Cl| ... <Num> → 1|2|3|4| ... H2 O C O2 (S O4)3 Na Cl S O3 Grammar | Example
  • 15.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2013 | 15 C O 2 Comp → Term | Term Num | Comp Comp Term → Elem | ( Comp ) Elem → H|O|C|S|Na|Cl| ... Num → 1|2|3|4| … Comp ⇒  Comp Comp ⇒  Term Comp ⇒  Elem Comp ⇒  C Comp ⇒ C Term Num ⇒  C Elem Num ⇒  CO Num ⇒  CO2 Grammar | Derivation
  • 16.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 16 What about this? BLOCK → STMT | { STMTS } | { } STMTS → STMT | STMT STMTS STMT → EXPR; | if (EXPR) BLOCK | while (EXPR) BLOCK | BLOCK | . . . EXPR → EXPR + EXPR | EXPR – EXPR | EXPR * EXPR | identifier | integer | ...
  • 17.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 17 Homework Using the rules in the previous slide, apply derivation to show that the following expression is syntactically correct while ( 5 ) { if ( 6 ) { } }
  • 18.
    Javier Gonzalez-Sanchez |CSE340 | Summer 2015 | 18 Homework Review Recursion Solve the Problem Set #1 in preparation for your exam
  • 19.
    CSE340 - Principlesof Programming Languages Javier Gonzalez-Sanchez javiergs@asu.edu Summer 2015 Disclaimer. These slides can only be used as study material for the class CSE340 at ASU. They cannot be distributed or used for another purpose.