SlideShare a Scribd company logo
1 of 71
Download to read offline
Standing on the shoulders of giants:
Learn from LL(1) to PEG parser the hard way
Kir Chou
1
2
https://www.youtube.com/watch?v=DZTLgVBxET4
About me
Presented at PyCon TW/JP since 2017
https://note35.github.io/about/
https://github.com/note35/Parser-Learning
3
Agenda
● Motivation
● What is parser in CPython?
● Parser 101 - CFG
● Parser 101 - Traditional parser (LL(1) / LR(0))
● Parser 102 - PEG and PEG parser
● Parser 102 - Packrat parser
● CPython’s PEG parser
● Take away
4
Motivation
5
Motivation
What’s New In Python 3.9?
PEP 617, CPython now uses a new parser based on PEG;
“IIRC, I took a Compiler class in school…”
6
Motivation (Cont.)
School taught us the brief concept of the Compiler’s frontend and backend.
School’s parser assignment used Bison + YACC.
And...
7
My motivation = Talk objectives
What is PEG parser?
Why did python use LL(1) parser before?
Why did Guido choose PEG parser?
What other parsers do we have?
What’s the difference between those parsers?
How to implement those parsers?
8
What is parser in CPython?
CPython DevGuide - Design of CPython’s Compiler
9
Compilation
Steps
10
Source Code
Tokens
Abstract Syntax Tree
(AST)
Bytecode
Result
Lexer
Parser
Compiler
VM
Import
11
https://docs.python.org/3/library/tokenize.html#examples
Lexer
12
https://docs.python.org/3/library/ast.html
Parser
13
https://docs.python.org/3/library/dis.html#dis.disassemble
Compiler
= print(2*3+4)
14
Source Code
Tokens
Abstract Syntax Tree
(AST)
Bytecode
Result
Lexer
Parser
Compiler
VM
Import
Talk’s focus!
Parser 101 - CFG
Uncode - GATE Computer Science - Compiler Design Lecture
15
Grammar
Context Free Grammar (CFG)
16
Interpretation of this Grammar
“Both B and a can be derived from A”
Derivation
*some paper write <-
Non-terminal
AND
*support ambigious syntax
A -> B | a
Terminal
rule
What is “Context Free”?
Left-hand side in all the rules only contains 1 non-terminal.
Valid CFG Example:
Invalid CFG Example:
17
S -> aSb
xSy -> axSyb
Semantic Analysis: Parse Tree
Concret Syntax Tree (CST)
An ordered, rooted tree that represents
the syntactic structure of a string
according to some context-free
grammar.
Abstract Syntax Tree (AST)
A tree representation of the abstract
syntactic structure of source code
written in a programming language.
18
CFG Simplification
1. Ambiguous -> Unambiguous
2. Nondeterministic -> Deterministic
3. Left recursion -> No left recursion
19
Ambiguious Definition
A grammar contains rules that can generate more than one tree.
20
E -> E + E | E * E | Num
N N
N
E E
E
+
E
*
E
N
E
E
N N
E E
E
*
+
Ambiguious -> Unambiguous
21
N
E
E
N
N
T F
T
*
+
E -> E + T | T
T -> T * F | F
F -> Num
E -> E + E | E * E | Num
Step1
Rewrite Grammar
Step2
Make sure the
grammar only
generate one tree
T
F F
Non-deterministic -> Deterministic
A grammar contains rules that have common prefix.
22
A -> ab | ac
A -> aA’
A’ -> b | c
Rewrite Grammar
*A non-deterministic grammar can be rewritten into more than one
deterministic grammar.
Left recursion -> No left recursion
A grammar contains direct or indirect left recursion.
23
E -> E + T | T
T -> T * F | F
F -> Num
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
Rewrite Grammar
E in first E + T will recursively derives to second E + T,
E in second E + T will repeat it to third E + T,
and so on recursively.
Recap: CFG Simplification
24
Before After
Ambiguous
Non-deterministic
Left Recursion
Parser 101 - Traditional parser
Uncode - GATE Computer Science - Compiler Design Lecture
25
Parser classification
26
N
E
E
N N
E E
E
*
+
Top-down
Type
Bottom-up
Type N
E
N
E
N
E
+
N
E
E
N
E
E
N
E
E
+
N
E
E
N N
E E
E
*
+
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
27
*Both LL/LR parser scan
input string from left to right
Input String: 2 + 3 * 4
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
28
*The derivation time of
LL/LR parser is different.
N
E
E
N N
E E
E
*
+
N
E
E
N N
E E
E
*
+
+ → * * → +
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
29
Input String: 2 + 3 * 4
I am "a token of number".
If I perform 1-token lookahead and
meet "a token of +",
what to do next?
Top Down - Recursive descent parser
30
LL(k) - Implementation
31
2 + 3 * 4
parse_E()
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
parse_Tp(parse_F())
parse_Ep( )
Step3
*recursively parse the input string
started from first rule parse_E()
Step2
*parse from left to right
*perform k-lookahead
parse_T()
Step1
write function for each non-terminal
32
Grammar
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
*perform 1-lookahead
LL(1) - Example code
Derivation
x
x
Top Down - Non recursive descent parser
33
LL(1) - Parsing table
34
Step1
Build first/follow table for each non-terminal
Note: $ means endmark
Step2
Build parsing table based on first/follow table
LL(1) - Implementation
35
Step3
Implement with stack
(take shift/reduce action
based on parsing table)
N
E
E
N N
E E
E
*
+
LL(1) - Example code
36
Grammar
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
Non-terminal stack
Reduce (Derivation)
Shift
Reduce (Derivation)
Bottom Up - LR(0) parser
37
LR(0) - Deterministic finite automaton
38
E’ -> .E --- (1)
E -> .E + T --- (2)
E -> .T --- (3)
T -> .T * Num --- (4)
T -> .Num --- (5)
Step1
Build Deterministic Finite Automaton(DFA)
E’ -> E.
E -> E. + T
E -> T.
T -> T. * Num
T -> Num.
E -> E + .T
T -> .T * Num
T -> .Num
T -> T * .Num
E -> E + T.
T -> T. * Num
T -> T * Num.
E
T
Num
*
+ T
Num
*
Num
S1 S2
S3
S4 S5
S6 S7
S8
Left recursion support
LR(0) - Parsing table
39
Step2
Build parsing table
(For parser like SLR(1), it
requires first/follow table)
Shift
acc
Reduce (Derivation)
acc
LR(0) - Implementation
40
Step3
Implement with stack
(take shift/reduce action based on parsing table)
N
E
E
N N
E E
E
*
+
LR(0) - Example code
41
Grammar
E -> E + T | T
T -> T * F | F
F -> Num Shift
Reduce (Derivation)
Parser 102 - PEG and PEG parser
42
Grammar
Parsing Expression Grammar (PEG)
43
*Difference from traditional CFG
A will try A -> B first.
Only after it fails at A -> B, A will only try A -> a.
Derivation
*some paper write <-
Non-Terminal
OR (if / elif / ...)
*disallow ambigious syntax
A -> B | a
Terminal
*Introduced in 2002 (Packrat Parsing: Simple, Powerful, Lazy, Linear Time)
rule
*support Regular Expression
(EBNF grammar) in another
paper
Example of difference
44
Grammar1: A -> a b | a
Grammar2: A -> a | a b
● LL/LR parser will fail to complete when the input grammar is ambiguous.
● PEG parser only tries the first PEG rule. The latter rule will never succeed.
“A PEG parser generator will resolve unintended ambiguities earliest-match-first, which may
be arbitrary and lead to surprising parses.” (source)
PEG Parser
PEG parser means “parser generated based on PEG”.
PEG parser can be a Packrat parser, or other traditional parser with k-lookahead
limitation. Mostly, PEG parser means Packrat parser.
45
CFG
EBNF
grammar
PEG
Packrat
parser
Traditional
parser
PEG Parser
Parser 102 - Packrat parser
46
Type of Packrat parser
47
Top-down
Type
N
E
E
N
E
E
N
E
E
+
N
E
E
N N
E E
E
*
+
Packrat parser is top-down type.
Packrat Parsing - Implementation
48
2 + 3 * 4
parse_E()
E -> E + T | T
T -> T * F | F
F -> Num
parse_T() and parse_F()
parse_E() and parse_T()
Step2
*parse from left to right
*perform infinite lookahead + memoization
Step1
*write function for each non-terminal
(PEG rule)
*Idea of memoization was Introduced in 1970
Step3
*recursively parse the input string
started from first rule parse_E()
Left recursion support
Packrat Parsing - Example code
49
Grammar
E -> E + T | T
T -> T * F | F
F -> Num
Derivation
Memoization
Packrat - what is memoization?
50
509. Fibonacci Number
4
3
2
2
1
fib(0) = 0
fib(1) = 1
fib(2) = fib(1) + fib(0) = 1
fib(3) = fib(2) + fib(1) = fib(1) + fib(0) + fib(1) = 2
...
1
0
1
0
if n = 4, we calculate
fib(2), fib(0) twice, fib(1) thrice, fib(4), fib(3) once
TIme Complexity: O(2^n)
Packrat - what is memoization? (Cont.)
51
509. Fibonacci Number
if n = 4, we…
calculate fib(4), fib(3), fib(2), fib(1), fib(0) once
Time Complexity: O(2^n) => O(n)
Space Complexity: O(1) => O(n)
Left recursion in Packrat parser
52
Approach 1
if (count of operator) < (count function call):
return False
Approach 2
reverse the call stack (adopted in CPython!)
Source: Guido's Medium (Left-recursive PEG Grammars)
53
Normal Memoization
54
Left-recursion
Memoization
*perform
infinite-lookahead
Traditional parser V.S Packrat parser
55
Traditional parser vs Packrat parser
56
Packrat Traditional
Scan Left-to-right (*Right-to-left memo) Left-to-right
Left Recursion Support (*Not support in first paper) LL needs to rewrite the grammar
Ambigious Disallowed (determinism) Allowed
Space Complexity O(Code Size) (space consumption) O(Depth of Parse Tree)
Worst Time
Complexity
Super linear time (statelessness)
*Because of feature like typedef in C
Expotenial time
Capability Basically covers all traditional cases
(infinite lookahead)
No left-recursion/ambigious for LL
Has k lookup limitations for both (e.g.
dangling else)
Red text: 3 highlighted characteristics of Packrat parser.
57
Parenthesized context managers
PEP 622/634/635/636 - Structural Pattern Matching
New rule in Python 3.10 based on PEG
CPython’s PEG parser
58
CPython Parser - Before/After
CPython3.8 and before use LL(1) parser written by Guido 30 years ago
The parser requires steps to generate CST and convert CST to AST.
CPython3.9 uses PEG (Packrat) parser (Infinite lookahead)
PEG rule supports left-recursion
No more CST to AST step - source
CPython3.10 drops LL(1) parser support
59
This answers
“Why PEG?”
CPython Parser - Workflow
60
Meta Grammar
Tools/peg_generator/
pegen/metagrammar.gram
Grammar
Grammar/python.gram
Token
Grammar/Tokens
my_parser.py
my_parser.c
pegen
(PEG Parser)
Tools/peg_generator/
*CPython contains a peg parser generator written in python3.8+ (because of warlus operator)
Input: Meta Grammar Example
Syntax Directed Translation (SDT)
61
rule
non-Terminal
return type
PEG rule divider
PEG rule
action
(python code)
Parser header
(python code)
Output: Generated PEG Parser
(Partial code)
62
Recap: Benefit / Performance
Benefit
Grammar is more flexible: from LL(1) to LL(∞) (infinite lookahead)
Hardware supports Packrat’s memory consumption now
Skip intermediate parse tree (CST) construction
Performance
Within 10% of LL(1) parser both in speed and memory consumption (PEP 617)
63
Take away
64
Recap
● Parser 101 (Compiler class in school)
○ CFG
○ Traditional Parser
■ Top-down: LL(1)
■ Bottom-up: LR(0)
● Parser 102
○ PEG
○ Packrat Parser
● CPython
○ Parser in CPython
○ CPython’s PEG parser
65
66
Need Answer? note35/Parser-Learning
You can implement traditional parser like LL(1) and LR(0)
parser, and Packrat parser from scratch!
Leetcode: 227. Basic Calculator II
Q. How to verify my understanding?
A. Get your hands dirty!
Q & A
67
Appendix
68
Related Articles
Guido van Rossum
PEG Parsing Series Overview
Bryan Ford
Packrat Parsing: Simple, Powerful, Lazy, Linear Time
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
69
Related Talks
Guido van Rossum @ North Bay Python 2019
Writing a PEG parser for fun and profit
Pablo Galindo and Lysandros Nikolaou @ Podcast.__init__
The Journey To Replace Python's Parser And What It Means For The Future
Emily Morehouse-Valcarcel @ PyCon 2018
The AST and Me
Alex Gaynor @ PyCon 2013
So you want to write an interpreter?
70
Thanks for your listening!
71

More Related Content

What's hot

Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysisIffat Anjum
 
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...taeseon ryu
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysisDr. Rajdeep Chatterjee
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IMohamed Loey
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Algorithms Lecture 3: Analysis of Algorithms II
Algorithms Lecture 3: Analysis of Algorithms IIAlgorithms Lecture 3: Analysis of Algorithms II
Algorithms Lecture 3: Analysis of Algorithms IIMohamed Loey
 
asymptotic notations i
asymptotic notations iasymptotic notations i
asymptotic notations iAli mahmood
 
Propositional And First-Order Logic
Propositional And First-Order LogicPropositional And First-Order Logic
Propositional And First-Order Logicankush_kumar
 
REVIEW TECHNIQUES.pptx
REVIEW TECHNIQUES.pptxREVIEW TECHNIQUES.pptx
REVIEW TECHNIQUES.pptxnavikvel
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler DesignShine Raj
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statisticsRajendran
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler DesignKuppusamy P
 

What's hot (20)

Алгоритмы сортировки
Алгоритмы сортировкиАлгоритмы сортировки
Алгоритмы сортировки
 
Lecture 02 lexical analysis
Lecture 02 lexical analysisLecture 02 lexical analysis
Lecture 02 lexical analysis
 
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
Bart : Denoising Sequence-to-Sequence Pre-training for Natural Language Gener...
 
Data Structure: Algorithm and analysis
Data Structure: Algorithm and analysisData Structure: Algorithm and analysis
Data Structure: Algorithm and analysis
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms I
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Top down parsing
Top down parsingTop down parsing
Top down parsing
 
Algorithms Lecture 3: Analysis of Algorithms II
Algorithms Lecture 3: Analysis of Algorithms IIAlgorithms Lecture 3: Analysis of Algorithms II
Algorithms Lecture 3: Analysis of Algorithms II
 
asymptotic notations i
asymptotic notations iasymptotic notations i
asymptotic notations i
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Grammar
Regular GrammarRegular Grammar
Regular Grammar
 
Propositional And First-Order Logic
Propositional And First-Order LogicPropositional And First-Order Logic
Propositional And First-Order Logic
 
C2.0 propositional logic
C2.0 propositional logicC2.0 propositional logic
C2.0 propositional logic
 
REVIEW TECHNIQUES.pptx
REVIEW TECHNIQUES.pptxREVIEW TECHNIQUES.pptx
REVIEW TECHNIQUES.pptx
 
Three address code In Compiler Design
Three address code In Compiler DesignThree address code In Compiler Design
Three address code In Compiler Design
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statistics
 
Symbol table in compiler Design
Symbol table in compiler DesignSymbol table in compiler Design
Symbol table in compiler Design
 
DBMS Canonical cover
DBMS Canonical coverDBMS Canonical cover
DBMS Canonical cover
 

Similar to Learn from LL(1) to PEG parser the hard way

Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonshin
 
Master method
Master method Master method
Master method Rajendran
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxdudelover
 
Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9Alex Pruden
 
Infix prefix postfix
Infix prefix postfixInfix prefix postfix
Infix prefix postfixSelf-Employed
 
12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdfSHUJEHASSAN
 
phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2Bui Loi
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................nourhandardeer3
 
Session2
Session2Session2
Session2vattam
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific ComputingAlbert DeFusco
 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfAmayJaiswal4
 
CS330-Lectures Statistics And Probability
CS330-Lectures Statistics And ProbabilityCS330-Lectures Statistics And Probability
CS330-Lectures Statistics And Probabilitybryan111472
 

Similar to Learn from LL(1) to PEG parser the hard way (20)

Left factor put
Left factor putLeft factor put
Left factor put
 
Lecture8 syntax analysis_4
Lecture8 syntax analysis_4Lecture8 syntax analysis_4
Lecture8 syntax analysis_4
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluiton
 
calculus-4c-1.pdf
calculus-4c-1.pdfcalculus-4c-1.pdf
calculus-4c-1.pdf
 
Lecture6 syntax analysis_2
Lecture6 syntax analysis_2Lecture6 syntax analysis_2
Lecture6 syntax analysis_2
 
Master method
Master method Master method
Master method
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptx
 
Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9
 
Infix prefix postfix
Infix prefix postfixInfix prefix postfix
Infix prefix postfix
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
2.pptx
2.pptx2.pptx
2.pptx
 
12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdf
 
Recurrences
RecurrencesRecurrences
Recurrences
 
phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................
 
Dynamic programing
Dynamic programingDynamic programing
Dynamic programing
 
Session2
Session2Session2
Session2
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific Computing
 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdf
 
CS330-Lectures Statistics And Probability
CS330-Lectures Statistics And ProbabilityCS330-Lectures Statistics And Probability
CS330-Lectures Statistics And Probability
 

More from Kir Chou

Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Kir Chou
 
Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Kir Chou
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLKir Chou
 
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) Kir Chou
 
Introduction of CTF and CGC
Introduction of CTF and CGCIntroduction of CTF and CGC
Introduction of CTF and CGCKir Chou
 
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYPyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYKir Chou
 
Spime - personal assistant
Spime - personal assistantSpime - personal assistant
Spime - personal assistantKir Chou
 
Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Kir Chou
 
Ch8 file system management(2013 ncu-nos_nm)
Ch8   file system management(2013 ncu-nos_nm)Ch8   file system management(2013 ncu-nos_nm)
Ch8 file system management(2013 ncu-nos_nm)Kir Chou
 
Ch7 user management(2013 ncu-nos_nm)
Ch7   user management(2013 ncu-nos_nm)Ch7   user management(2013 ncu-nos_nm)
Ch7 user management(2013 ncu-nos_nm)Kir Chou
 
Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Kir Chou
 
Knowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKnowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKir Chou
 
Sitcon2014 community by server (kir)
Sitcon2014   community by server (kir)Sitcon2014   community by server (kir)
Sitcon2014 community by server (kir)Kir Chou
 
Webapp(2014 ncucc)
Webapp(2014 ncucc)Webapp(2014 ncucc)
Webapp(2014 ncucc)Kir Chou
 
廢除雙二一議題 保留方論點 (2013ncu全幹會)
廢除雙二一議題   保留方論點 (2013ncu全幹會)廢除雙二一議題   保留方論點 (2013ncu全幹會)
廢除雙二一議題 保留方論點 (2013ncu全幹會)Kir Chou
 
Ch6 ssh(2013 ncu-nos_nm)
Ch6   ssh(2013 ncu-nos_nm)Ch6   ssh(2013 ncu-nos_nm)
Ch6 ssh(2013 ncu-nos_nm)Kir Chou
 
Ch5 network basic(2013 ncu-nos_nm)
Ch5   network basic(2013 ncu-nos_nm)Ch5   network basic(2013 ncu-nos_nm)
Ch5 network basic(2013 ncu-nos_nm)Kir Chou
 

More from Kir Chou (20)

Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!
 
Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOL
 
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
 
Introduction of CTF and CGC
Introduction of CTF and CGCIntroduction of CTF and CGC
Introduction of CTF and CGC
 
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYPyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
 
GCC
GCCGCC
GCC
 
Spime - personal assistant
Spime - personal assistantSpime - personal assistant
Spime - personal assistant
 
Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)
 
Ch8 file system management(2013 ncu-nos_nm)
Ch8   file system management(2013 ncu-nos_nm)Ch8   file system management(2013 ncu-nos_nm)
Ch8 file system management(2013 ncu-nos_nm)
 
Ch7 user management(2013 ncu-nos_nm)
Ch7   user management(2013 ncu-nos_nm)Ch7   user management(2013 ncu-nos_nm)
Ch7 user management(2013 ncu-nos_nm)
 
Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)
 
Knowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKnowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software Development
 
Cms part2
Cms part2Cms part2
Cms part2
 
Cms part1
Cms part1Cms part1
Cms part1
 
Sitcon2014 community by server (kir)
Sitcon2014   community by server (kir)Sitcon2014   community by server (kir)
Sitcon2014 community by server (kir)
 
Webapp(2014 ncucc)
Webapp(2014 ncucc)Webapp(2014 ncucc)
Webapp(2014 ncucc)
 
廢除雙二一議題 保留方論點 (2013ncu全幹會)
廢除雙二一議題   保留方論點 (2013ncu全幹會)廢除雙二一議題   保留方論點 (2013ncu全幹會)
廢除雙二一議題 保留方論點 (2013ncu全幹會)
 
Ch6 ssh(2013 ncu-nos_nm)
Ch6   ssh(2013 ncu-nos_nm)Ch6   ssh(2013 ncu-nos_nm)
Ch6 ssh(2013 ncu-nos_nm)
 
Ch5 network basic(2013 ncu-nos_nm)
Ch5   network basic(2013 ncu-nos_nm)Ch5   network basic(2013 ncu-nos_nm)
Ch5 network basic(2013 ncu-nos_nm)
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 

Recently uploaded (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Learn from LL(1) to PEG parser the hard way

  • 1. Standing on the shoulders of giants: Learn from LL(1) to PEG parser the hard way Kir Chou 1
  • 3. About me Presented at PyCon TW/JP since 2017 https://note35.github.io/about/ https://github.com/note35/Parser-Learning 3
  • 4. Agenda ● Motivation ● What is parser in CPython? ● Parser 101 - CFG ● Parser 101 - Traditional parser (LL(1) / LR(0)) ● Parser 102 - PEG and PEG parser ● Parser 102 - Packrat parser ● CPython’s PEG parser ● Take away 4
  • 6. Motivation What’s New In Python 3.9? PEP 617, CPython now uses a new parser based on PEG; “IIRC, I took a Compiler class in school…” 6
  • 7. Motivation (Cont.) School taught us the brief concept of the Compiler’s frontend and backend. School’s parser assignment used Bison + YACC. And... 7
  • 8. My motivation = Talk objectives What is PEG parser? Why did python use LL(1) parser before? Why did Guido choose PEG parser? What other parsers do we have? What’s the difference between those parsers? How to implement those parsers? 8
  • 9. What is parser in CPython? CPython DevGuide - Design of CPython’s Compiler 9
  • 10. Compilation Steps 10 Source Code Tokens Abstract Syntax Tree (AST) Bytecode Result Lexer Parser Compiler VM Import
  • 14. 14 Source Code Tokens Abstract Syntax Tree (AST) Bytecode Result Lexer Parser Compiler VM Import Talk’s focus!
  • 15. Parser 101 - CFG Uncode - GATE Computer Science - Compiler Design Lecture 15
  • 16. Grammar Context Free Grammar (CFG) 16 Interpretation of this Grammar “Both B and a can be derived from A” Derivation *some paper write <- Non-terminal AND *support ambigious syntax A -> B | a Terminal rule
  • 17. What is “Context Free”? Left-hand side in all the rules only contains 1 non-terminal. Valid CFG Example: Invalid CFG Example: 17 S -> aSb xSy -> axSyb
  • 18. Semantic Analysis: Parse Tree Concret Syntax Tree (CST) An ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. Abstract Syntax Tree (AST) A tree representation of the abstract syntactic structure of source code written in a programming language. 18
  • 19. CFG Simplification 1. Ambiguous -> Unambiguous 2. Nondeterministic -> Deterministic 3. Left recursion -> No left recursion 19
  • 20. Ambiguious Definition A grammar contains rules that can generate more than one tree. 20 E -> E + E | E * E | Num N N N E E E + E * E N E E N N E E E * +
  • 21. Ambiguious -> Unambiguous 21 N E E N N T F T * + E -> E + T | T T -> T * F | F F -> Num E -> E + E | E * E | Num Step1 Rewrite Grammar Step2 Make sure the grammar only generate one tree T F F
  • 22. Non-deterministic -> Deterministic A grammar contains rules that have common prefix. 22 A -> ab | ac A -> aA’ A’ -> b | c Rewrite Grammar *A non-deterministic grammar can be rewritten into more than one deterministic grammar.
  • 23. Left recursion -> No left recursion A grammar contains direct or indirect left recursion. 23 E -> E + T | T T -> T * F | F F -> Num E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num Rewrite Grammar E in first E + T will recursively derives to second E + T, E in second E + T will repeat it to third E + T, and so on recursively.
  • 24. Recap: CFG Simplification 24 Before After Ambiguous Non-deterministic Left Recursion
  • 25. Parser 101 - Traditional parser Uncode - GATE Computer Science - Compiler Design Lecture 25
  • 26. Parser classification 26 N E E N N E E E * + Top-down Type Bottom-up Type N E N E N E + N E E N E E N E E + N E E N N E E E * +
  • 27. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 27 *Both LL/LR parser scan input string from left to right Input String: 2 + 3 * 4
  • 28. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 28 *The derivation time of LL/LR parser is different. N E E N N E E E * + N E E N N E E E * + + → * * → +
  • 29. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 29 Input String: 2 + 3 * 4 I am "a token of number". If I perform 1-token lookahead and meet "a token of +", what to do next?
  • 30. Top Down - Recursive descent parser 30
  • 31. LL(k) - Implementation 31 2 + 3 * 4 parse_E() E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num parse_Tp(parse_F()) parse_Ep( ) Step3 *recursively parse the input string started from first rule parse_E() Step2 *parse from left to right *perform k-lookahead parse_T() Step1 write function for each non-terminal
  • 32. 32 Grammar E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num *perform 1-lookahead LL(1) - Example code Derivation x x
  • 33. Top Down - Non recursive descent parser 33
  • 34. LL(1) - Parsing table 34 Step1 Build first/follow table for each non-terminal Note: $ means endmark Step2 Build parsing table based on first/follow table
  • 35. LL(1) - Implementation 35 Step3 Implement with stack (take shift/reduce action based on parsing table) N E E N N E E E * +
  • 36. LL(1) - Example code 36 Grammar E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num Non-terminal stack Reduce (Derivation) Shift Reduce (Derivation)
  • 37. Bottom Up - LR(0) parser 37
  • 38. LR(0) - Deterministic finite automaton 38 E’ -> .E --- (1) E -> .E + T --- (2) E -> .T --- (3) T -> .T * Num --- (4) T -> .Num --- (5) Step1 Build Deterministic Finite Automaton(DFA) E’ -> E. E -> E. + T E -> T. T -> T. * Num T -> Num. E -> E + .T T -> .T * Num T -> .Num T -> T * .Num E -> E + T. T -> T. * Num T -> T * Num. E T Num * + T Num * Num S1 S2 S3 S4 S5 S6 S7 S8 Left recursion support
  • 39. LR(0) - Parsing table 39 Step2 Build parsing table (For parser like SLR(1), it requires first/follow table) Shift acc Reduce (Derivation) acc
  • 40. LR(0) - Implementation 40 Step3 Implement with stack (take shift/reduce action based on parsing table) N E E N N E E E * +
  • 41. LR(0) - Example code 41 Grammar E -> E + T | T T -> T * F | F F -> Num Shift Reduce (Derivation)
  • 42. Parser 102 - PEG and PEG parser 42
  • 43. Grammar Parsing Expression Grammar (PEG) 43 *Difference from traditional CFG A will try A -> B first. Only after it fails at A -> B, A will only try A -> a. Derivation *some paper write <- Non-Terminal OR (if / elif / ...) *disallow ambigious syntax A -> B | a Terminal *Introduced in 2002 (Packrat Parsing: Simple, Powerful, Lazy, Linear Time) rule *support Regular Expression (EBNF grammar) in another paper
  • 44. Example of difference 44 Grammar1: A -> a b | a Grammar2: A -> a | a b ● LL/LR parser will fail to complete when the input grammar is ambiguous. ● PEG parser only tries the first PEG rule. The latter rule will never succeed. “A PEG parser generator will resolve unintended ambiguities earliest-match-first, which may be arbitrary and lead to surprising parses.” (source)
  • 45. PEG Parser PEG parser means “parser generated based on PEG”. PEG parser can be a Packrat parser, or other traditional parser with k-lookahead limitation. Mostly, PEG parser means Packrat parser. 45 CFG EBNF grammar PEG Packrat parser Traditional parser PEG Parser
  • 46. Parser 102 - Packrat parser 46
  • 47. Type of Packrat parser 47 Top-down Type N E E N E E N E E + N E E N N E E E * + Packrat parser is top-down type.
  • 48. Packrat Parsing - Implementation 48 2 + 3 * 4 parse_E() E -> E + T | T T -> T * F | F F -> Num parse_T() and parse_F() parse_E() and parse_T() Step2 *parse from left to right *perform infinite lookahead + memoization Step1 *write function for each non-terminal (PEG rule) *Idea of memoization was Introduced in 1970 Step3 *recursively parse the input string started from first rule parse_E() Left recursion support
  • 49. Packrat Parsing - Example code 49 Grammar E -> E + T | T T -> T * F | F F -> Num Derivation Memoization
  • 50. Packrat - what is memoization? 50 509. Fibonacci Number 4 3 2 2 1 fib(0) = 0 fib(1) = 1 fib(2) = fib(1) + fib(0) = 1 fib(3) = fib(2) + fib(1) = fib(1) + fib(0) + fib(1) = 2 ... 1 0 1 0 if n = 4, we calculate fib(2), fib(0) twice, fib(1) thrice, fib(4), fib(3) once TIme Complexity: O(2^n)
  • 51. Packrat - what is memoization? (Cont.) 51 509. Fibonacci Number if n = 4, we… calculate fib(4), fib(3), fib(2), fib(1), fib(0) once Time Complexity: O(2^n) => O(n) Space Complexity: O(1) => O(n)
  • 52. Left recursion in Packrat parser 52 Approach 1 if (count of operator) < (count function call): return False Approach 2 reverse the call stack (adopted in CPython!) Source: Guido's Medium (Left-recursive PEG Grammars)
  • 55. Traditional parser V.S Packrat parser 55
  • 56. Traditional parser vs Packrat parser 56 Packrat Traditional Scan Left-to-right (*Right-to-left memo) Left-to-right Left Recursion Support (*Not support in first paper) LL needs to rewrite the grammar Ambigious Disallowed (determinism) Allowed Space Complexity O(Code Size) (space consumption) O(Depth of Parse Tree) Worst Time Complexity Super linear time (statelessness) *Because of feature like typedef in C Expotenial time Capability Basically covers all traditional cases (infinite lookahead) No left-recursion/ambigious for LL Has k lookup limitations for both (e.g. dangling else) Red text: 3 highlighted characteristics of Packrat parser.
  • 57. 57 Parenthesized context managers PEP 622/634/635/636 - Structural Pattern Matching New rule in Python 3.10 based on PEG
  • 59. CPython Parser - Before/After CPython3.8 and before use LL(1) parser written by Guido 30 years ago The parser requires steps to generate CST and convert CST to AST. CPython3.9 uses PEG (Packrat) parser (Infinite lookahead) PEG rule supports left-recursion No more CST to AST step - source CPython3.10 drops LL(1) parser support 59 This answers “Why PEG?”
  • 60. CPython Parser - Workflow 60 Meta Grammar Tools/peg_generator/ pegen/metagrammar.gram Grammar Grammar/python.gram Token Grammar/Tokens my_parser.py my_parser.c pegen (PEG Parser) Tools/peg_generator/ *CPython contains a peg parser generator written in python3.8+ (because of warlus operator)
  • 61. Input: Meta Grammar Example Syntax Directed Translation (SDT) 61 rule non-Terminal return type PEG rule divider PEG rule action (python code) Parser header (python code)
  • 62. Output: Generated PEG Parser (Partial code) 62
  • 63. Recap: Benefit / Performance Benefit Grammar is more flexible: from LL(1) to LL(∞) (infinite lookahead) Hardware supports Packrat’s memory consumption now Skip intermediate parse tree (CST) construction Performance Within 10% of LL(1) parser both in speed and memory consumption (PEP 617) 63
  • 65. Recap ● Parser 101 (Compiler class in school) ○ CFG ○ Traditional Parser ■ Top-down: LL(1) ■ Bottom-up: LR(0) ● Parser 102 ○ PEG ○ Packrat Parser ● CPython ○ Parser in CPython ○ CPython’s PEG parser 65
  • 66. 66 Need Answer? note35/Parser-Learning You can implement traditional parser like LL(1) and LR(0) parser, and Packrat parser from scratch! Leetcode: 227. Basic Calculator II Q. How to verify my understanding? A. Get your hands dirty!
  • 69. Related Articles Guido van Rossum PEG Parsing Series Overview Bryan Ford Packrat Parsing: Simple, Powerful, Lazy, Linear Time Parsing Expression Grammars: A Recognition-Based Syntactic Foundation 69
  • 70. Related Talks Guido van Rossum @ North Bay Python 2019 Writing a PEG parser for fun and profit Pablo Galindo and Lysandros Nikolaou @ Podcast.__init__ The Journey To Replace Python's Parser And What It Means For The Future Emily Morehouse-Valcarcel @ PyCon 2018 The AST and Me Alex Gaynor @ PyCon 2013 So you want to write an interpreter? 70
  • 71. Thanks for your listening! 71