Programming Languages
Building a Web Brower
Instructor : Westley Weimer
1
2
3
Syntactical Analysis
(identifier, number, …)
Syntactical
Analysis
Parse TreeList of Tokens
∙∙∙
the process of turning a sequence of tokens into a parse tree
4
Parse Tree
represent the syntactic structure of a string according to some grammar
"1+2+3"
exp → exp + exp
exp → exp – exp
exp → number
exp
exp exp+
exp exp+
num num
num
1 2 3
⇒
5
Why Parse Tree?
although this tree structure is cumbersome for us, it's very convenient for computers
"<b>wecome to web page</b>"
html → elt html
html → E
elt → word
elt → to word tc
to → <word>
tc → </word>
html
elt html
to tchtml
word
E
b
< > < /elt
word
welcome
html …
elt html
……
6
How Parse Tree?
Top-down VS Bottom-up
exp
exp exp+
exp exp+
num num
num
1 2 3
①
②
③
④
⑤
exp
exp exp+
exp exp+
num num
num
1 2 3
⑤
④
③
②
①
7
How Parse Tree?
S
ACTION GOTO
, a $ LIST ELE
0 s3 1 2
1 s4 acc
2 r2
3 r3 r3
4 s3 5
5 r1
STEP STACK
IN
PUT
ACTION TREE
1 0 a, a$ shift 3 Node(a)
2 0 a 3 , a$ reduce 3 Tree(3)
3 0 ELE , a$ GOTO 2
4 0 ELE 2 , a$ reduce 2 Tree(2)
5 0 LIST , a$ GOTO 1
6 0 LIST 1 , a$ shift 4 Node(,)
7 0 LIST 1 , 4 a$ shift 3 Node(a)
8 0 LIST 1 , 4 a 3 $ reduce 3 Tree(3)
9 0 LIST 1 , 4 ELE $ reduce 1 Tree(1)
10 0 LIST $ GOTO 1
11 0 LIST 1 $ accept Return
LIST
LIST
ELEMENT
a , a
ELEMENT
①
②
④
⑨
⑧
⑦⑥
G = ({LIST, ELEMENT}, {, , a}, P, LIST)
P : LIST → LIST , ELEMENT
P : LIST → ELEMENT
P : ELEMENT → a
8
exp
exp exp+
exp exp+
num num
num
1 2 3
+
+ 3
1 2
Abstact Syntax Tree
not representing every detail appearing in the real syntax
⇒
9
Python Lex-Yacc
A computer program that generates parser
ParserToken Parse Tree
YaccInput
Definition section
%%
Rules section
%%
C code section
10
tokens = (
‘LANGLE’, # <
‘LANGLESLASH’, # </
‘RANGLE’, # >
‘RANGLESLASH’, # />
‘EQUAL’, # =
‘STRING’, # “love”
‘WORD’, # like
)
1. Define Name of Token
11
G : exp → number
→ def p_exp_number(p):
’exp : NUMBER’
p[0] = (“number”, p[1])
G : exp → not exp
→ def p_exp_not(p):
’exp : NOT exp’
p[0] = (“not”, p[2])
2. Define Grammar
12
→ jslexer = lex.lex()
jsparser = yacc.yacc()
jsast = jsparser.parse(jscode, lexer=jslexer)
print jsast
3. Building and Using the Parser
13
Python Code
14
Input(Jscode) = myfun()
→ (‘call’, ‘myfun’, [])
Input(Jscode) = myfun(11,12,13)
→ (‘call’, ‘myfun’, [(‘number’, 11.0), (‘number’, 12.0)])
Output
15
- Changing the starting symbol
- Precedence
- Tracking Line Number
Notice
16
start = ‘arg’
→ def p_exp(p):
‘exp : NUMBER’
def p_arg(p):
’arg : exp’
Changing the starting symbol
the first rule defines the starting grammar rule
17
→ 1 - 2 - 3
Precedence
precedence = ((‘left’, ‘PLUS’, ‘MINUS’), //↑lower
(‘left’, ‘TIMES’, ‘DIVIDE’)) //↓higher
–
3–
21
–
–1
32?
“-4” “2”
18
→ def p_exp(p)
’exp : exp PLUS exp’
line = p.lineno(2) # line number of the PLUS token
index = p.lexpos(2) # Position of the PLUS token
Tracking Line Number
tracks the line number and position of all tokens
19

Open course(programming languages) 20150225

  • 1.
    Programming Languages Building aWeb Brower Instructor : Westley Weimer 1
  • 2.
  • 3.
    3 Syntactical Analysis (identifier, number,…) Syntactical Analysis Parse TreeList of Tokens ∙∙∙ the process of turning a sequence of tokens into a parse tree
  • 4.
    4 Parse Tree represent thesyntactic structure of a string according to some grammar "1+2+3" exp → exp + exp exp → exp – exp exp → number exp exp exp+ exp exp+ num num num 1 2 3 ⇒
  • 5.
    5 Why Parse Tree? althoughthis tree structure is cumbersome for us, it's very convenient for computers "<b>wecome to web page</b>" html → elt html html → E elt → word elt → to word tc to → <word> tc → </word> html elt html to tchtml word E b < > < /elt word welcome html … elt html ……
  • 6.
    6 How Parse Tree? Top-downVS Bottom-up exp exp exp+ exp exp+ num num num 1 2 3 ① ② ③ ④ ⑤ exp exp exp+ exp exp+ num num num 1 2 3 ⑤ ④ ③ ② ①
  • 7.
    7 How Parse Tree? S ACTIONGOTO , a $ LIST ELE 0 s3 1 2 1 s4 acc 2 r2 3 r3 r3 4 s3 5 5 r1 STEP STACK IN PUT ACTION TREE 1 0 a, a$ shift 3 Node(a) 2 0 a 3 , a$ reduce 3 Tree(3) 3 0 ELE , a$ GOTO 2 4 0 ELE 2 , a$ reduce 2 Tree(2) 5 0 LIST , a$ GOTO 1 6 0 LIST 1 , a$ shift 4 Node(,) 7 0 LIST 1 , 4 a$ shift 3 Node(a) 8 0 LIST 1 , 4 a 3 $ reduce 3 Tree(3) 9 0 LIST 1 , 4 ELE $ reduce 1 Tree(1) 10 0 LIST $ GOTO 1 11 0 LIST 1 $ accept Return LIST LIST ELEMENT a , a ELEMENT ① ② ④ ⑨ ⑧ ⑦⑥ G = ({LIST, ELEMENT}, {, , a}, P, LIST) P : LIST → LIST , ELEMENT P : LIST → ELEMENT P : ELEMENT → a
  • 8.
    8 exp exp exp+ exp exp+ numnum num 1 2 3 + + 3 1 2 Abstact Syntax Tree not representing every detail appearing in the real syntax ⇒
  • 9.
    9 Python Lex-Yacc A computerprogram that generates parser ParserToken Parse Tree YaccInput Definition section %% Rules section %% C code section
  • 10.
    10 tokens = ( ‘LANGLE’,# < ‘LANGLESLASH’, # </ ‘RANGLE’, # > ‘RANGLESLASH’, # /> ‘EQUAL’, # = ‘STRING’, # “love” ‘WORD’, # like ) 1. Define Name of Token
  • 11.
    11 G : exp→ number → def p_exp_number(p): ’exp : NUMBER’ p[0] = (“number”, p[1]) G : exp → not exp → def p_exp_not(p): ’exp : NOT exp’ p[0] = (“not”, p[2]) 2. Define Grammar
  • 12.
    12 → jslexer =lex.lex() jsparser = yacc.yacc() jsast = jsparser.parse(jscode, lexer=jslexer) print jsast 3. Building and Using the Parser
  • 13.
  • 14.
    14 Input(Jscode) = myfun() →(‘call’, ‘myfun’, []) Input(Jscode) = myfun(11,12,13) → (‘call’, ‘myfun’, [(‘number’, 11.0), (‘number’, 12.0)]) Output
  • 15.
    15 - Changing thestarting symbol - Precedence - Tracking Line Number Notice
  • 16.
    16 start = ‘arg’ →def p_exp(p): ‘exp : NUMBER’ def p_arg(p): ’arg : exp’ Changing the starting symbol the first rule defines the starting grammar rule
  • 17.
    17 → 1 -2 - 3 Precedence precedence = ((‘left’, ‘PLUS’, ‘MINUS’), //↑lower (‘left’, ‘TIMES’, ‘DIVIDE’)) //↓higher – 3– 21 – –1 32? “-4” “2”
  • 18.
    18 → def p_exp(p) ’exp: exp PLUS exp’ line = p.lineno(2) # line number of the PLUS token index = p.lexpos(2) # Position of the PLUS token Tracking Line Number tracks the line number and position of all tokens
  • 19.