SlideShare a Scribd company logo
1 of 71
Download to read offline
Standing on the shoulders of giants:
Learn from LL(1) to PEG parser the hard way
Kir Chou
1
2
https://www.youtube.com/watch?v=DZTLgVBxET4
About me
Presented at PyCon TW/JP since 2017
https://note35.github.io/about/
https://github.com/note35/Parser-Learning
3
Agenda
● Motivation
● What is parser in CPython?
● Parser 101 - CFG
● Parser 101 - Traditional parser (LL(1) / LR(0))
● Parser 102 - PEG and PEG parser
● Parser 102 - Packrat parser
● CPython’s PEG parser
● Take away
4
Motivation
5
Motivation
What’s New In Python 3.9?
PEP 617, CPython now uses a new parser based on PEG;
“IIRC, I took a Compiler class in school…”
6
Motivation (Cont.)
School taught us the brief concept of the Compiler’s frontend and backend.
School’s parser assignment used Bison + YACC.
And...
7
My motivation = Talk objectives
What is PEG parser?
Why did python use LL(1) parser before?
Why did Guido choose PEG parser?
What other parsers do we have?
What’s the difference between those parsers?
How to implement those parsers?
8
What is parser in CPython?
CPython DevGuide - Design of CPython’s Compiler
9
Compilation
Steps
10
Source Code
Tokens
Abstract Syntax Tree
(AST)
Bytecode
Result
Lexer
Parser
Compiler
VM
Import
11
https://docs.python.org/3/library/tokenize.html#examples
Lexer
12
https://docs.python.org/3/library/ast.html
Parser
13
https://docs.python.org/3/library/dis.html#dis.disassemble
Compiler
= print(2*3+4)
14
Source Code
Tokens
Abstract Syntax Tree
(AST)
Bytecode
Result
Lexer
Parser
Compiler
VM
Import
Talk’s focus!
Parser 101 - CFG
Uncode - GATE Computer Science - Compiler Design Lecture
15
Grammar
Context Free Grammar (CFG)
16
Interpretation of this Grammar
“Both B and a can be derived from A”
Derivation
*some paper write <-
Non-terminal
AND
*support ambigious syntax
A -> B | a
Terminal
rule
What is “Context Free”?
Left-hand side in all the rules only contains 1 non-terminal.
Valid CFG Example:
Invalid CFG Example:
17
S -> aSb
xSy -> axSyb
Semantic Analysis: Parse Tree
Concret Syntax Tree (CST)
An ordered, rooted tree that represents
the syntactic structure of a string
according to some context-free
grammar.
Abstract Syntax Tree (AST)
A tree representation of the abstract
syntactic structure of source code
written in a programming language.
18
CFG Simplification
1. Ambiguous -> Unambiguous
2. Nondeterministic -> Deterministic
3. Left recursion -> No left recursion
19
Ambiguious Definition
A grammar contains rules that can generate more than one tree.
20
E -> E + E | E * E | Num
N N
N
E E
E
+
E
*
E
N
E
E
N N
E E
E
*
+
Ambiguious -> Unambiguous
21
N
E
E
N
N
T F
T
*
+
E -> E + T | T
T -> T * F | F
F -> Num
E -> E + E | E * E | Num
Step1
Rewrite Grammar
Step2
Make sure the
grammar only
generate one tree
T
F F
Non-deterministic -> Deterministic
A grammar contains rules that have common prefix.
22
A -> ab | ac
A -> aA’
A’ -> b | c
Rewrite Grammar
*A non-deterministic grammar can be rewritten into more than one
deterministic grammar.
Left recursion -> No left recursion
A grammar contains direct or indirect left recursion.
23
E -> E + T | T
T -> T * F | F
F -> Num
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
Rewrite Grammar
E in first E + T will recursively derives to second E + T,
E in second E + T will repeat it to third E + T,
and so on recursively.
Recap: CFG Simplification
24
Before After
Ambiguous
Non-deterministic
Left Recursion
Parser 101 - Traditional parser
Uncode - GATE Computer Science - Compiler Design Lecture
25
Parser classification
26
N
E
E
N N
E E
E
*
+
Top-down
Type
Bottom-up
Type N
E
N
E
N
E
+
N
E
E
N
E
E
N
E
E
+
N
E
E
N N
E E
E
*
+
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
27
*Both LL/LR parser scan
input string from left to right
Input String: 2 + 3 * 4
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
28
*The derivation time of
LL/LR parser is different.
N
E
E
N N
E E
E
*
+
N
E
E
N N
E E
E
*
+
+ → * * → +
LL / LR Parser
LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0)
LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0)
29
Input String: 2 + 3 * 4
I am "a token of number".
If I perform 1-token lookahead and
meet "a token of +",
what to do next?
Top Down - Recursive descent parser
30
LL(k) - Implementation
31
2 + 3 * 4
parse_E()
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
parse_Tp(parse_F())
parse_Ep( )
Step3
*recursively parse the input string
started from first rule parse_E()
Step2
*parse from left to right
*perform k-lookahead
parse_T()
Step1
write function for each non-terminal
32
Grammar
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
*perform 1-lookahead
LL(1) - Example code
Derivation
x
x
Top Down - Non recursive descent parser
33
LL(1) - Parsing table
34
Step1
Build first/follow table for each non-terminal
Note: $ means endmark
Step2
Build parsing table based on first/follow table
LL(1) - Implementation
35
Step3
Implement with stack
(take shift/reduce action
based on parsing table)
N
E
E
N N
E E
E
*
+
LL(1) - Example code
36
Grammar
E -> TE’
E’ -> +TE’ | None
T -> FT’
T’ -> *FT’ | None
F -> Num
Non-terminal stack
Reduce (Derivation)
Shift
Reduce (Derivation)
Bottom Up - LR(0) parser
37
LR(0) - Deterministic finite automaton
38
E’ -> .E --- (1)
E -> .E + T --- (2)
E -> .T --- (3)
T -> .T * Num --- (4)
T -> .Num --- (5)
Step1
Build Deterministic Finite Automaton(DFA)
E’ -> E.
E -> E. + T
E -> T.
T -> T. * Num
T -> Num.
E -> E + .T
T -> .T * Num
T -> .Num
T -> T * .Num
E -> E + T.
T -> T. * Num
T -> T * Num.
E
T
Num
*
+ T
Num
*
Num
S1 S2
S3
S4 S5
S6 S7
S8
Left recursion support
LR(0) - Parsing table
39
Step2
Build parsing table
(For parser like SLR(1), it
requires first/follow table)
Shift
acc
Reduce (Derivation)
acc
LR(0) - Implementation
40
Step3
Implement with stack
(take shift/reduce action based on parsing table)
N
E
E
N N
E E
E
*
+
LR(0) - Example code
41
Grammar
E -> E + T | T
T -> T * F | F
F -> Num Shift
Reduce (Derivation)
Parser 102 - PEG and PEG parser
42
Grammar
Parsing Expression Grammar (PEG)
43
*Difference from traditional CFG
A will try A -> B first.
Only after it fails at A -> B, A will only try A -> a.
Derivation
*some paper write <-
Non-Terminal
OR (if / elif / ...)
*disallow ambigious syntax
A -> B | a
Terminal
*Introduced in 2002 (Packrat Parsing: Simple, Powerful, Lazy, Linear Time)
rule
*support Regular Expression
(EBNF grammar) in another
paper
Example of difference
44
Grammar1: A -> a b | a
Grammar2: A -> a | a b
● LL/LR parser will fail to complete when the input grammar is ambiguous.
● PEG parser only tries the first PEG rule. The latter rule will never succeed.
“A PEG parser generator will resolve unintended ambiguities earliest-match-first, which may
be arbitrary and lead to surprising parses.” (source)
PEG Parser
PEG parser means “parser generated based on PEG”.
PEG parser can be a Packrat parser, or other traditional parser with k-lookahead
limitation. Mostly, PEG parser means Packrat parser.
45
CFG
EBNF
grammar
PEG
Packrat
parser
Traditional
parser
PEG Parser
Parser 102 - Packrat parser
46
Type of Packrat parser
47
Top-down
Type
N
E
E
N
E
E
N
E
E
+
N
E
E
N N
E E
E
*
+
Packrat parser is top-down type.
Packrat Parsing - Implementation
48
2 + 3 * 4
parse_E()
E -> E + T | T
T -> T * F | F
F -> Num
parse_T() and parse_F()
parse_E() and parse_T()
Step2
*parse from left to right
*perform infinite lookahead + memoization
Step1
*write function for each non-terminal
(PEG rule)
*Idea of memoization was Introduced in 1970
Step3
*recursively parse the input string
started from first rule parse_E()
Left recursion support
Packrat Parsing - Example code
49
Grammar
E -> E + T | T
T -> T * F | F
F -> Num
Derivation
Memoization
Packrat - what is memoization?
50
509. Fibonacci Number
4
3
2
2
1
fib(0) = 0
fib(1) = 1
fib(2) = fib(1) + fib(0) = 1
fib(3) = fib(2) + fib(1) = fib(1) + fib(0) + fib(1) = 2
...
1
0
1
0
if n = 4, we calculate
fib(2), fib(0) twice, fib(1) thrice, fib(4), fib(3) once
TIme Complexity: O(2^n)
Packrat - what is memoization? (Cont.)
51
509. Fibonacci Number
if n = 4, we…
calculate fib(4), fib(3), fib(2), fib(1), fib(0) once
Time Complexity: O(2^n) => O(n)
Space Complexity: O(1) => O(n)
Left recursion in Packrat parser
52
Approach 1
if (count of operator) < (count function call):
return False
Approach 2
reverse the call stack (adopted in CPython!)
Source: Guido's Medium (Left-recursive PEG Grammars)
53
Normal Memoization
54
Left-recursion
Memoization
*perform
infinite-lookahead
Traditional parser V.S Packrat parser
55
Traditional parser vs Packrat parser
56
Packrat Traditional
Scan Left-to-right (*Right-to-left memo) Left-to-right
Left Recursion Support (*Not support in first paper) LL needs to rewrite the grammar
Ambigious Disallowed (determinism) Allowed
Space Complexity O(Code Size) (space consumption) O(Depth of Parse Tree)
Worst Time
Complexity
Super linear time (statelessness)
*Because of feature like typedef in C
Expotenial time
Capability Basically covers all traditional cases
(infinite lookahead)
No left-recursion/ambigious for LL
Has k lookup limitations for both (e.g.
dangling else)
Red text: 3 highlighted characteristics of Packrat parser.
57
Parenthesized context managers
PEP 622/634/635/636 - Structural Pattern Matching
New rule in Python 3.10 based on PEG
CPython’s PEG parser
58
CPython Parser - Before/After
CPython3.8 and before use LL(1) parser written by Guido 30 years ago
The parser requires steps to generate CST and convert CST to AST.
CPython3.9 uses PEG (Packrat) parser (Infinite lookahead)
PEG rule supports left-recursion
No more CST to AST step - source
CPython3.10 drops LL(1) parser support
59
This answers
“Why PEG?”
CPython Parser - Workflow
60
Meta Grammar
Tools/peg_generator/
pegen/metagrammar.gram
Grammar
Grammar/python.gram
Token
Grammar/Tokens
my_parser.py
my_parser.c
pegen
(PEG Parser)
Tools/peg_generator/
*CPython contains a peg parser generator written in python3.8+ (because of warlus operator)
Input: Meta Grammar Example
Syntax Directed Translation (SDT)
61
rule
non-Terminal
return type
PEG rule divider
PEG rule
action
(python code)
Parser header
(python code)
Output: Generated PEG Parser
(Partial code)
62
Recap: Benefit / Performance
Benefit
Grammar is more flexible: from LL(1) to LL(∞) (infinite lookahead)
Hardware supports Packrat’s memory consumption now
Skip intermediate parse tree (CST) construction
Performance
Within 10% of LL(1) parser both in speed and memory consumption (PEP 617)
63
Take away
64
Recap
● Parser 101 (Compiler class in school)
○ CFG
○ Traditional Parser
■ Top-down: LL(1)
■ Bottom-up: LR(0)
● Parser 102
○ PEG
○ Packrat Parser
● CPython
○ Parser in CPython
○ CPython’s PEG parser
65
66
Need Answer? note35/Parser-Learning
You can implement traditional parser like LL(1) and LR(0)
parser, and Packrat parser from scratch!
Leetcode: 227. Basic Calculator II
Q. How to verify my understanding?
A. Get your hands dirty!
Q & A
67
Appendix
68
Related Articles
Guido van Rossum
PEG Parsing Series Overview
Bryan Ford
Packrat Parsing: Simple, Powerful, Lazy, Linear Time
Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
69
Related Talks
Guido van Rossum @ North Bay Python 2019
Writing a PEG parser for fun and profit
Pablo Galindo and Lysandros Nikolaou @ Podcast.__init__
The Journey To Replace Python's Parser And What It Means For The Future
Emily Morehouse-Valcarcel @ PyCon 2018
The AST and Me
Alex Gaynor @ PyCon 2013
So you want to write an interpreter?
70
Thanks for your listening!
71

More Related Content

What's hot

Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical AnalysisMunni28
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMRajendran
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IMohamed Loey
 
Conversion of Infix to Prefix and Postfix with Stack
Conversion of Infix to Prefix and Postfix with StackConversion of Infix to Prefix and Postfix with Stack
Conversion of Infix to Prefix and Postfix with Stacksahil kumar
 
Recursive Function
Recursive FunctionRecursive Function
Recursive FunctionHarsh Pathak
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithnKumar
 
Introduction to Computer Programming
Introduction to Computer ProgrammingIntroduction to Computer Programming
Introduction to Computer ProgrammingProf. Erwin Globio
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statisticsRajendran
 
Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Srimatre K
 
Design & Analysis of Algorithms Lecture Notes
Design & Analysis of Algorithms Lecture NotesDesign & Analysis of Algorithms Lecture Notes
Design & Analysis of Algorithms Lecture NotesFellowBuddy.com
 

What's hot (20)

Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
3.8 quicksort
3.8 quicksort3.8 quicksort
3.8 quicksort
 
POST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEMPOST’s CORRESPONDENCE PROBLEM
POST’s CORRESPONDENCE PROBLEM
 
Programming in c
Programming in cProgramming in c
Programming in c
 
Algorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms IAlgorithms Lecture 2: Analysis of Algorithms I
Algorithms Lecture 2: Analysis of Algorithms I
 
Master theorem
Master theoremMaster theorem
Master theorem
 
Conversion of Infix to Prefix and Postfix with Stack
Conversion of Infix to Prefix and Postfix with StackConversion of Infix to Prefix and Postfix with Stack
Conversion of Infix to Prefix and Postfix with Stack
 
Recursive Function
Recursive FunctionRecursive Function
Recursive Function
 
Quick sort algorithn
Quick sort algorithnQuick sort algorithn
Quick sort algorithn
 
Huffman tree
Huffman tree Huffman tree
Huffman tree
 
Asymptotic notation
Asymptotic notationAsymptotic notation
Asymptotic notation
 
Parsing
ParsingParsing
Parsing
 
Introduction to Computer Programming
Introduction to Computer ProgrammingIntroduction to Computer Programming
Introduction to Computer Programming
 
Python Tutorial
Python TutorialPython Tutorial
Python Tutorial
 
Merge sort and quick sort
Merge sort and quick sortMerge sort and quick sort
Merge sort and quick sort
 
Medians and order statistics
Medians and order statisticsMedians and order statistics
Medians and order statistics
 
Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2Formal Languages and Automata Theory unit 2
Formal Languages and Automata Theory unit 2
 
Design & Analysis of Algorithms Lecture Notes
Design & Analysis of Algorithms Lecture NotesDesign & Analysis of Algorithms Lecture Notes
Design & Analysis of Algorithms Lecture Notes
 
Priority queues
Priority queuesPriority queues
Priority queues
 
Introduction to Compiler design
Introduction to Compiler design Introduction to Compiler design
Introduction to Compiler design
 

Similar to Learn from LL(1) to PEG parser the hard way

Similar to Learn from LL(1) to PEG parser the hard way (20)

Left factor put
Left factor putLeft factor put
Left factor put
 
Lecture8 syntax analysis_4
Lecture8 syntax analysis_4Lecture8 syntax analysis_4
Lecture8 syntax analysis_4
 
Ch01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluitonCh01 basic concepts_nosoluiton
Ch01 basic concepts_nosoluiton
 
calculus-4c-1.pdf
calculus-4c-1.pdfcalculus-4c-1.pdf
calculus-4c-1.pdf
 
Lecture6 syntax analysis_2
Lecture6 syntax analysis_2Lecture6 syntax analysis_2
Lecture6 syntax analysis_2
 
Master method
Master method Master method
Master method
 
Time and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptxTime and Space Complexity Analysis.pptx
Time and Space Complexity Analysis.pptx
 
COMPILER DESIGN- Syntax Directed Translation
COMPILER DESIGN- Syntax Directed TranslationCOMPILER DESIGN- Syntax Directed Translation
COMPILER DESIGN- Syntax Directed Translation
 
Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9Ecfft zk studyclub 9.9
Ecfft zk studyclub 9.9
 
Infix prefix postfix
Infix prefix postfixInfix prefix postfix
Infix prefix postfix
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
2.pptx
2.pptx2.pptx
2.pptx
 
12IRGeneration.pdf
12IRGeneration.pdf12IRGeneration.pdf
12IRGeneration.pdf
 
Recurrences
RecurrencesRecurrences
Recurrences
 
phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2phuong trinh vi phan d geometry part 2
phuong trinh vi phan d geometry part 2
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................
 
Dynamic programing
Dynamic programingDynamic programing
Dynamic programing
 
Session2
Session2Session2
Session2
 
Python for Scientific Computing
Python for Scientific ComputingPython for Scientific Computing
Python for Scientific Computing
 
Unit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdfUnit-1 DAA_Notes.pdf
Unit-1 DAA_Notes.pdf
 

More from Kir Chou

Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Kir Chou
 
Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Kir Chou
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLKir Chou
 
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) Kir Chou
 
Introduction of CTF and CGC
Introduction of CTF and CGCIntroduction of CTF and CGC
Introduction of CTF and CGCKir Chou
 
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYPyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYKir Chou
 
Spime - personal assistant
Spime - personal assistantSpime - personal assistant
Spime - personal assistantKir Chou
 
Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Kir Chou
 
Ch8 file system management(2013 ncu-nos_nm)
Ch8   file system management(2013 ncu-nos_nm)Ch8   file system management(2013 ncu-nos_nm)
Ch8 file system management(2013 ncu-nos_nm)Kir Chou
 
Ch7 user management(2013 ncu-nos_nm)
Ch7   user management(2013 ncu-nos_nm)Ch7   user management(2013 ncu-nos_nm)
Ch7 user management(2013 ncu-nos_nm)Kir Chou
 
Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Kir Chou
 
Knowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKnowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKir Chou
 
Sitcon2014 community by server (kir)
Sitcon2014   community by server (kir)Sitcon2014   community by server (kir)
Sitcon2014 community by server (kir)Kir Chou
 
Webapp(2014 ncucc)
Webapp(2014 ncucc)Webapp(2014 ncucc)
Webapp(2014 ncucc)Kir Chou
 
廢除雙二一議題 保留方論點 (2013ncu全幹會)
廢除雙二一議題   保留方論點 (2013ncu全幹會)廢除雙二一議題   保留方論點 (2013ncu全幹會)
廢除雙二一議題 保留方論點 (2013ncu全幹會)Kir Chou
 
Ch6 ssh(2013 ncu-nos_nm)
Ch6   ssh(2013 ncu-nos_nm)Ch6   ssh(2013 ncu-nos_nm)
Ch6 ssh(2013 ncu-nos_nm)Kir Chou
 
Ch5 network basic(2013 ncu-nos_nm)
Ch5   network basic(2013 ncu-nos_nm)Ch5   network basic(2013 ncu-nos_nm)
Ch5 network basic(2013 ncu-nos_nm)Kir Chou
 

More from Kir Chou (20)

Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!Time travel: Let’s learn from the history of Python packaging!
Time travel: Let’s learn from the history of Python packaging!
 
Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!Python パッケージの影響を歴史から理解してみよう!
Python パッケージの影響を歴史から理解してみよう!
 
The str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOLThe str/bytes nightmare before python2 EOL
The str/bytes nightmare before python2 EOL
 
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史) PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
PyCon TW 2018 - A Python Engineer Under Giant Umbrella (巨大保護傘下的 Python 碼農辛酸史)
 
Introduction of CTF and CGC
Introduction of CTF and CGCIntroduction of CTF and CGC
Introduction of CTF and CGC
 
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PYPyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
PyCon TW 2017 - Why do projects fail? Let's talk about the story of Sinon.PY
 
GCC
GCCGCC
GCC
 
Spime - personal assistant
Spime - personal assistantSpime - personal assistant
Spime - personal assistant
 
Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)Ch9 package & port(2013 ncu-nos_nm)
Ch9 package & port(2013 ncu-nos_nm)
 
Ch8 file system management(2013 ncu-nos_nm)
Ch8   file system management(2013 ncu-nos_nm)Ch8   file system management(2013 ncu-nos_nm)
Ch8 file system management(2013 ncu-nos_nm)
 
Ch7 user management(2013 ncu-nos_nm)
Ch7   user management(2013 ncu-nos_nm)Ch7   user management(2013 ncu-nos_nm)
Ch7 user management(2013 ncu-nos_nm)
 
Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)Ch10 firewall(2013 ncu-nos_nm)
Ch10 firewall(2013 ncu-nos_nm)
 
Knowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software DevelopmentKnowledge Management in Distributed Agile Software Development
Knowledge Management in Distributed Agile Software Development
 
Cms part2
Cms part2Cms part2
Cms part2
 
Cms part1
Cms part1Cms part1
Cms part1
 
Sitcon2014 community by server (kir)
Sitcon2014   community by server (kir)Sitcon2014   community by server (kir)
Sitcon2014 community by server (kir)
 
Webapp(2014 ncucc)
Webapp(2014 ncucc)Webapp(2014 ncucc)
Webapp(2014 ncucc)
 
廢除雙二一議題 保留方論點 (2013ncu全幹會)
廢除雙二一議題   保留方論點 (2013ncu全幹會)廢除雙二一議題   保留方論點 (2013ncu全幹會)
廢除雙二一議題 保留方論點 (2013ncu全幹會)
 
Ch6 ssh(2013 ncu-nos_nm)
Ch6   ssh(2013 ncu-nos_nm)Ch6   ssh(2013 ncu-nos_nm)
Ch6 ssh(2013 ncu-nos_nm)
 
Ch5 network basic(2013 ncu-nos_nm)
Ch5   network basic(2013 ncu-nos_nm)Ch5   network basic(2013 ncu-nos_nm)
Ch5 network basic(2013 ncu-nos_nm)
 

Recently uploaded

Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Andreas Granig
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Chirag Panchal
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)Roberto Bettazzoni
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Flutter Agency
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypseTomasz Kowalczewski
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMarkus Moeller
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfkalichargn70th171
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Clinic
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...drm1699
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024SimonedeGijt
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAShane Coughlan
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaNeo4j
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletAndrea Goulet
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfSrushith Repakula
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AIAGATSoftware
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanNeo4j
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxNeo4j
 

Recently uploaded (20)

Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024Food Delivery Business App Development Guide 2024
Food Delivery Business App Development Guide 2024
 
The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)The mythical technical debt. (Brooke, please, forgive me)
The mythical technical debt. (Brooke, please, forgive me)
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
[GeeCON2024] How I learned to stop worrying and love the dark silicon apocalypse
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
Abortion Pill Prices Jane Furse ](+27832195400*)[ 🏥 Women's Abortion Clinic i...
 
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
Abortion Pills For Sale WhatsApp[[+27737758557]] In Birch Acres, Abortion Pil...
 
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
Wired_2.0_CREATE YOUR ULTIMATE LEARNING ENVIRONMENT_JCON_16052024
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale IbridaUNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
UNI DI NAPOLI FEDERICO II - Il ruolo dei grafi nell'AI Conversazionale Ibrida
 
Community is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea GouletCommunity is Just as Important as Code by Andrea Goulet
Community is Just as Important as Code by Andrea Goulet
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
BusinessGPT - Security and Governance for Generative AI
BusinessGPT  - Security and Governance for Generative AIBusinessGPT  - Security and Governance for Generative AI
BusinessGPT - Security and Governance for Generative AI
 
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit MilanWorkshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
 
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptxFrom Knowledge Graphs via Lego Bricks to scientific conversations.pptx
From Knowledge Graphs via Lego Bricks to scientific conversations.pptx
 

Learn from LL(1) to PEG parser the hard way

  • 1. Standing on the shoulders of giants: Learn from LL(1) to PEG parser the hard way Kir Chou 1
  • 3. About me Presented at PyCon TW/JP since 2017 https://note35.github.io/about/ https://github.com/note35/Parser-Learning 3
  • 4. Agenda ● Motivation ● What is parser in CPython? ● Parser 101 - CFG ● Parser 101 - Traditional parser (LL(1) / LR(0)) ● Parser 102 - PEG and PEG parser ● Parser 102 - Packrat parser ● CPython’s PEG parser ● Take away 4
  • 6. Motivation What’s New In Python 3.9? PEP 617, CPython now uses a new parser based on PEG; “IIRC, I took a Compiler class in school…” 6
  • 7. Motivation (Cont.) School taught us the brief concept of the Compiler’s frontend and backend. School’s parser assignment used Bison + YACC. And... 7
  • 8. My motivation = Talk objectives What is PEG parser? Why did python use LL(1) parser before? Why did Guido choose PEG parser? What other parsers do we have? What’s the difference between those parsers? How to implement those parsers? 8
  • 9. What is parser in CPython? CPython DevGuide - Design of CPython’s Compiler 9
  • 10. Compilation Steps 10 Source Code Tokens Abstract Syntax Tree (AST) Bytecode Result Lexer Parser Compiler VM Import
  • 14. 14 Source Code Tokens Abstract Syntax Tree (AST) Bytecode Result Lexer Parser Compiler VM Import Talk’s focus!
  • 15. Parser 101 - CFG Uncode - GATE Computer Science - Compiler Design Lecture 15
  • 16. Grammar Context Free Grammar (CFG) 16 Interpretation of this Grammar “Both B and a can be derived from A” Derivation *some paper write <- Non-terminal AND *support ambigious syntax A -> B | a Terminal rule
  • 17. What is “Context Free”? Left-hand side in all the rules only contains 1 non-terminal. Valid CFG Example: Invalid CFG Example: 17 S -> aSb xSy -> axSyb
  • 18. Semantic Analysis: Parse Tree Concret Syntax Tree (CST) An ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. Abstract Syntax Tree (AST) A tree representation of the abstract syntactic structure of source code written in a programming language. 18
  • 19. CFG Simplification 1. Ambiguous -> Unambiguous 2. Nondeterministic -> Deterministic 3. Left recursion -> No left recursion 19
  • 20. Ambiguious Definition A grammar contains rules that can generate more than one tree. 20 E -> E + E | E * E | Num N N N E E E + E * E N E E N N E E E * +
  • 21. Ambiguious -> Unambiguous 21 N E E N N T F T * + E -> E + T | T T -> T * F | F F -> Num E -> E + E | E * E | Num Step1 Rewrite Grammar Step2 Make sure the grammar only generate one tree T F F
  • 22. Non-deterministic -> Deterministic A grammar contains rules that have common prefix. 22 A -> ab | ac A -> aA’ A’ -> b | c Rewrite Grammar *A non-deterministic grammar can be rewritten into more than one deterministic grammar.
  • 23. Left recursion -> No left recursion A grammar contains direct or indirect left recursion. 23 E -> E + T | T T -> T * F | F F -> Num E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num Rewrite Grammar E in first E + T will recursively derives to second E + T, E in second E + T will repeat it to third E + T, and so on recursively.
  • 24. Recap: CFG Simplification 24 Before After Ambiguous Non-deterministic Left Recursion
  • 25. Parser 101 - Traditional parser Uncode - GATE Computer Science - Compiler Design Lecture 25
  • 26. Parser classification 26 N E E N N E E E * + Top-down Type Bottom-up Type N E N E N E + N E E N E E N E E + N E E N N E E E * +
  • 27. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 27 *Both LL/LR parser scan input string from left to right Input String: 2 + 3 * 4
  • 28. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 28 *The derivation time of LL/LR parser is different. N E E N N E E E * + N E E N N E E E * + + → * * → +
  • 29. LL / LR Parser LL(k) = Left-to-right, Leftmost derivation, k-token lookahead (k>0) LR(k) = Left-to-right, Rightmost derivation, k-token lookahead (k>=0) 29 Input String: 2 + 3 * 4 I am "a token of number". If I perform 1-token lookahead and meet "a token of +", what to do next?
  • 30. Top Down - Recursive descent parser 30
  • 31. LL(k) - Implementation 31 2 + 3 * 4 parse_E() E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num parse_Tp(parse_F()) parse_Ep( ) Step3 *recursively parse the input string started from first rule parse_E() Step2 *parse from left to right *perform k-lookahead parse_T() Step1 write function for each non-terminal
  • 32. 32 Grammar E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num *perform 1-lookahead LL(1) - Example code Derivation x x
  • 33. Top Down - Non recursive descent parser 33
  • 34. LL(1) - Parsing table 34 Step1 Build first/follow table for each non-terminal Note: $ means endmark Step2 Build parsing table based on first/follow table
  • 35. LL(1) - Implementation 35 Step3 Implement with stack (take shift/reduce action based on parsing table) N E E N N E E E * +
  • 36. LL(1) - Example code 36 Grammar E -> TE’ E’ -> +TE’ | None T -> FT’ T’ -> *FT’ | None F -> Num Non-terminal stack Reduce (Derivation) Shift Reduce (Derivation)
  • 37. Bottom Up - LR(0) parser 37
  • 38. LR(0) - Deterministic finite automaton 38 E’ -> .E --- (1) E -> .E + T --- (2) E -> .T --- (3) T -> .T * Num --- (4) T -> .Num --- (5) Step1 Build Deterministic Finite Automaton(DFA) E’ -> E. E -> E. + T E -> T. T -> T. * Num T -> Num. E -> E + .T T -> .T * Num T -> .Num T -> T * .Num E -> E + T. T -> T. * Num T -> T * Num. E T Num * + T Num * Num S1 S2 S3 S4 S5 S6 S7 S8 Left recursion support
  • 39. LR(0) - Parsing table 39 Step2 Build parsing table (For parser like SLR(1), it requires first/follow table) Shift acc Reduce (Derivation) acc
  • 40. LR(0) - Implementation 40 Step3 Implement with stack (take shift/reduce action based on parsing table) N E E N N E E E * +
  • 41. LR(0) - Example code 41 Grammar E -> E + T | T T -> T * F | F F -> Num Shift Reduce (Derivation)
  • 42. Parser 102 - PEG and PEG parser 42
  • 43. Grammar Parsing Expression Grammar (PEG) 43 *Difference from traditional CFG A will try A -> B first. Only after it fails at A -> B, A will only try A -> a. Derivation *some paper write <- Non-Terminal OR (if / elif / ...) *disallow ambigious syntax A -> B | a Terminal *Introduced in 2002 (Packrat Parsing: Simple, Powerful, Lazy, Linear Time) rule *support Regular Expression (EBNF grammar) in another paper
  • 44. Example of difference 44 Grammar1: A -> a b | a Grammar2: A -> a | a b ● LL/LR parser will fail to complete when the input grammar is ambiguous. ● PEG parser only tries the first PEG rule. The latter rule will never succeed. “A PEG parser generator will resolve unintended ambiguities earliest-match-first, which may be arbitrary and lead to surprising parses.” (source)
  • 45. PEG Parser PEG parser means “parser generated based on PEG”. PEG parser can be a Packrat parser, or other traditional parser with k-lookahead limitation. Mostly, PEG parser means Packrat parser. 45 CFG EBNF grammar PEG Packrat parser Traditional parser PEG Parser
  • 46. Parser 102 - Packrat parser 46
  • 47. Type of Packrat parser 47 Top-down Type N E E N E E N E E + N E E N N E E E * + Packrat parser is top-down type.
  • 48. Packrat Parsing - Implementation 48 2 + 3 * 4 parse_E() E -> E + T | T T -> T * F | F F -> Num parse_T() and parse_F() parse_E() and parse_T() Step2 *parse from left to right *perform infinite lookahead + memoization Step1 *write function for each non-terminal (PEG rule) *Idea of memoization was Introduced in 1970 Step3 *recursively parse the input string started from first rule parse_E() Left recursion support
  • 49. Packrat Parsing - Example code 49 Grammar E -> E + T | T T -> T * F | F F -> Num Derivation Memoization
  • 50. Packrat - what is memoization? 50 509. Fibonacci Number 4 3 2 2 1 fib(0) = 0 fib(1) = 1 fib(2) = fib(1) + fib(0) = 1 fib(3) = fib(2) + fib(1) = fib(1) + fib(0) + fib(1) = 2 ... 1 0 1 0 if n = 4, we calculate fib(2), fib(0) twice, fib(1) thrice, fib(4), fib(3) once TIme Complexity: O(2^n)
  • 51. Packrat - what is memoization? (Cont.) 51 509. Fibonacci Number if n = 4, we… calculate fib(4), fib(3), fib(2), fib(1), fib(0) once Time Complexity: O(2^n) => O(n) Space Complexity: O(1) => O(n)
  • 52. Left recursion in Packrat parser 52 Approach 1 if (count of operator) < (count function call): return False Approach 2 reverse the call stack (adopted in CPython!) Source: Guido's Medium (Left-recursive PEG Grammars)
  • 55. Traditional parser V.S Packrat parser 55
  • 56. Traditional parser vs Packrat parser 56 Packrat Traditional Scan Left-to-right (*Right-to-left memo) Left-to-right Left Recursion Support (*Not support in first paper) LL needs to rewrite the grammar Ambigious Disallowed (determinism) Allowed Space Complexity O(Code Size) (space consumption) O(Depth of Parse Tree) Worst Time Complexity Super linear time (statelessness) *Because of feature like typedef in C Expotenial time Capability Basically covers all traditional cases (infinite lookahead) No left-recursion/ambigious for LL Has k lookup limitations for both (e.g. dangling else) Red text: 3 highlighted characteristics of Packrat parser.
  • 57. 57 Parenthesized context managers PEP 622/634/635/636 - Structural Pattern Matching New rule in Python 3.10 based on PEG
  • 59. CPython Parser - Before/After CPython3.8 and before use LL(1) parser written by Guido 30 years ago The parser requires steps to generate CST and convert CST to AST. CPython3.9 uses PEG (Packrat) parser (Infinite lookahead) PEG rule supports left-recursion No more CST to AST step - source CPython3.10 drops LL(1) parser support 59 This answers “Why PEG?”
  • 60. CPython Parser - Workflow 60 Meta Grammar Tools/peg_generator/ pegen/metagrammar.gram Grammar Grammar/python.gram Token Grammar/Tokens my_parser.py my_parser.c pegen (PEG Parser) Tools/peg_generator/ *CPython contains a peg parser generator written in python3.8+ (because of warlus operator)
  • 61. Input: Meta Grammar Example Syntax Directed Translation (SDT) 61 rule non-Terminal return type PEG rule divider PEG rule action (python code) Parser header (python code)
  • 62. Output: Generated PEG Parser (Partial code) 62
  • 63. Recap: Benefit / Performance Benefit Grammar is more flexible: from LL(1) to LL(∞) (infinite lookahead) Hardware supports Packrat’s memory consumption now Skip intermediate parse tree (CST) construction Performance Within 10% of LL(1) parser both in speed and memory consumption (PEP 617) 63
  • 65. Recap ● Parser 101 (Compiler class in school) ○ CFG ○ Traditional Parser ■ Top-down: LL(1) ■ Bottom-up: LR(0) ● Parser 102 ○ PEG ○ Packrat Parser ● CPython ○ Parser in CPython ○ CPython’s PEG parser 65
  • 66. 66 Need Answer? note35/Parser-Learning You can implement traditional parser like LL(1) and LR(0) parser, and Packrat parser from scratch! Leetcode: 227. Basic Calculator II Q. How to verify my understanding? A. Get your hands dirty!
  • 69. Related Articles Guido van Rossum PEG Parsing Series Overview Bryan Ford Packrat Parsing: Simple, Powerful, Lazy, Linear Time Parsing Expression Grammars: A Recognition-Based Syntactic Foundation 69
  • 70. Related Talks Guido van Rossum @ North Bay Python 2019 Writing a PEG parser for fun and profit Pablo Galindo and Lysandros Nikolaou @ Podcast.__init__ The Journey To Replace Python's Parser And What It Means For The Future Emily Morehouse-Valcarcel @ PyCon 2018 The AST and Me Alex Gaynor @ PyCon 2013 So you want to write an interpreter? 70
  • 71. Thanks for your listening! 71