SlideShare a Scribd company logo
1 of 36
Download to read offline
1
Outline
 Who am I? Why I did this?
 Introduction to PEG
 Introduction to programming language
 Write a parser in PEG
 No demo QQ
2
About Me
 葉闆, Yodalee <lc85301@gmail.com>
 Study EE in college, Microwave in graduate school,
now rookie engineer in Synopsys.
3
Github: yodalee Blogger: http://yodalee.blogspot.tw
Why Did I Do This
 “Understanding Computation: From
Simple Machines to Impossible
Programs”
 In the book, it implements a
programming language parser, regular
expression parser with Ruby Treetop,
which is a PEG parser.
 I re-write all the code in Rust, so I did a
little research on PEG.
https://github.com/yodalee/computationbook
-rust
4
Introduction to PEG
5
Parsing Expression Grammar, PEG
 Bryan Ford, <Parsing Expression Grammars: A Recognition-
Based Syntactic Foundation>, 2004
 A replacement to Chomsky language, by removing the
ambiguity in grammar.
 The ambiguity is useful in modeling natural language, but not
in precise and unambiguous programming language.
6
Language <- Subject Verb Noun
Subject <- He | Lisa …
Verb <- is | has | sees…
Noun <- student | a toy …
PEG Basic Rule
 PEG in definition are very similar to CFG, composed
of rules.
 Rule will either:
 Match success: consume input.
 Match fail: not consume input.
 As predicate: only return success or fail, not consume input.
7
PEG Basic Rule
 Replace choice ‘|’ with
prioritized choice ‘/’.
 Consider following:
 CFG: A = “a” | “ab”
PEG: A = “a” / “ab”
 PEG: A = a* a
8
Operator
“” String Literal
[] Character Set
. Any Character
(e1 e2 ..) Grouping
e? e+ e* Optional Repetition
&e And predicate
!e Not predicate
e1 e2 Sequence
e1 / e2 Prioritized Choice
Some Example
 NUMBER <- [1-9] [0-9]*
 COMMENT <- “//” (!”n” .)* n
 EXPRESSION <- TERM ([+-] TERM)*
TERM <- FACTOR ([*/] FACTOR)*
 STAT_IF <-
“if” COND “then” STATEMENT “else” STATEMENT /
“if” COND “then” STATEMENT
9
PEG is not CFG
 PEG is equivalent to Top Down Programming Language
(TDPL)
 Language anbncn is not context-free, however PEG can parse
it with And-predicate.
 In CFG, A <- aAa | a match: odd number “a”
In PEG, A <- aAa / a match: 2n-1 “a”
 It is an open problem that any CFG can be parsed by PEG
10
A <- aAb / ε
B <- bBc / ε
S <- &(A !b) a* B
Using PEG
 There are many library that supports PEG:
 Rust: rust-peg, pest, nom-peg …
 C++: PEGTL, Boost …
 Ruby: kpeg, raabro, Treetop …
 Python: pyPEG, parsimonious …
 Haskell: Peggy …
 …
 So why Rust?
11
Introduction to
Programming Language
12
Simple Language
 3 types of statements: assign, if else, while.
 Support integer arithmetic.
 Support pair, list, function with one argument.
Simple, but actually we can do some complex things, like
recursion, map.
13
factorfun = function factor(x) {
if (x > 1) { x * factor ( x-1 ) } else { 1 }
}
result = factorfun(10); // 3628800
function last(l) {
if (isnothing(snd(l))) {
fst(l)
} else {
last(snd(l))
}
}
Abstract Syntax Tree
 Use Rust enum to store a payload inside.
 “Programming” like this:
14
pub enum Node {
Number(i64),
Boolean(bool),
Add(Box<Node>, Box<Node>),
Subtract(Box<Node>, Box<Node>),
LT(Box<Node>, Box<Node>)
…
}
let n = Node::add(Node::number(3), Node::number(4))
Add
3 4
LT
8
Abstract Syntax Tree
 All the statement are Node:
15
pub enum Node {
Variable ( String ),
Assign ( String, Box<Node>),
If ( Box<Node>, Box<Node>, Box<Node> ),
While ( Box<Node>, Box<Node> ),
…
}
Pair, List and Nothing
 Node::pair(Node::number(3), Node::number(4))
 List [3,4,5] = pair(3, pair(4, pair(5, nothing)))
 Nothing special
16
Pair
3 4
Pair
3 Pair
4 Pair
Nothing5
Environment and Machine
 Environment stores a Hashmap<String, Box<Node>>, with
<add> and <get> interface.
 A machine accepts an AST and an environment to evaluate
AST inside the machine.
17
pub struct Environment {
pub vars: HashMap<String, Box<Node>>
}
pub struct Machine {
pub environment: Environment,
expression: Box<Node>
}
Evaluate the AST
 Add evaluate function to all AST node using trait.
 The result will be a new Node.
18
fn evaluate(&self, env: &mut Environment) -> Box<Node>;
match *self {
Node::Add(ref l, ref r) => {
Node::number(l.evaluate(env).value() +
r.evaluate(env).value()) }
…
}
Evaluate the AST
 How to evaluate While Node ( condition, body )?
 Evaluate condition => evaluate body and self if true.
19
x = 3;
while (x < 9) { x = x * 2; }
Evaluate x = 3
Evaluate while (x < 9) x = x * 2
Evaluate x = x * 2
Evaluate while (x < 9) x = x * 2
Evaluate x = x * 2
Evaluate while (x < 9) x = x * 2
Function
 Function is also a type of Node. Upon evaluation, function is
wrapped into Closure with environment at that time.
 Call is evaluated the function with closure’s environment.
20
Node::Func(String, String, Box<Node>)
Node::Closure(Environment, Box<Node>)
fn evaluate(&self, env: &mut Environment) -> Box<Node> {
Node::Fun(ref name, ref arg, ref body) => {
Node::closure(env.clone(), Box::new(self.clone()))
}
}
Call a Function
fn evaluate(&self, env: &mut Environment) -> Box<Node> {
Node::Call(ref closure, ref arg) => {
match *closure {
Node::Closure(ref env, ref fun) => {
if let Node::Fun(funname, argname, body) = *fun.clone() {
let mut newenv = env.clone();
newenv.add(&funname, closure.evaluate(env));
newenv.add(&argname, arg.evaluate(env));
body.evaluate(&mut newenv);
} } } } }
21
Free Variable
 Evaluate the free variables in a function to prevent copy whole
environment
 Node::Variable
 Node::Assign
 Node::Function
22
function addx(x) { function addy(y) { x + y }}
-> no free variables
function addy(x) { x + y }
-> free variable y
Call a Function
if let Node::Fun(funname, argname, body) = *fun.clone() {
let mut newenv = new Environment {};
for var in free_vars(fun) {
newenv.add(var, env.get(var));
}
newenv.add(&funname, closure.evaluate(env));
newenv.add(&argname, arg.evaluate(env));
body.evaluate(&mut newenv);
}
23
What is a Language?
 We make some concepts abstract, like a virtual machine.
Design a language is to design the abstraction.
 Function “evaluate” implement the concept, of course we can
implement it as anything. Like return 42 on every evaluation.
24
Concept Simple, virtual
machine
Real Machine
Number 3 Node::number(3) 0b11 in memory
+ Node::add(l, r) add r1 r2
Choice Node::if branch command
What is a Language?
 Abstraction will bring some precision issue, like floating point.
We have no way to express concept of <infinite>.
 We can create a language on geometry as below, which
representation for line is best?
 Consider every pros and cons the abstraction will bring.
25
Concept In Programming Language
Point (x: u32, y: u32)
Line
(Point, Point)
(Point, Slope)
(Point, Point, type{vertical, horizontal, angled})
Intersection Calculate intersection
Implement a Parser with
PEG
26
The Pest Package
 Rust Pest
 https://github.com/pest-parser/pest
 My simple language parser grammar at:
 https://github.com/yodalee/simplelang
 Parsing Flow
27
Grammar Parser
Source
Code
Pest Pair
Structure
Simple AST
The Pest Package
28
use pest::Parser;
#[derive(Parser)]
#[grammar = "simple.pest"]
struct SimpleParser;
let pairs = SimpleParser::parse(
Rule::simple, “<source code>")
 A pair represents the parse result
from a rule.
 Pair.as_rule() => the rule
 Pair.as_span() => get match span
 Pair.as_str() => matched text
 Pair.into_inner()=> Sub-rules
Grammar <-> Build AST
Number = { [1-9] ~ [0-9]* }
Variable = { [A-Za-z] ~ [A-Za-z0-9]* }
Call = { Variable ~ “(“ ~ Expr ~ “)” }
Factor = { “(“ ~ Expr ~ “)” | Call | Variable | Number }
29
fn build_factor(pair: Pair<Rule>) -> Box<Node> {
match pair.as_rule() {
Rule::number => Node::number(pair.as_str().parse::<i64>().unwrap()),
Rule::variable => Node::variable(pair.as_str()),
Rule::expr => ...,
Rule::call => ...,
}
}
Climb the Expression
 Expression can be written as single Rule:
Expr = { Factor ~ (op_binary ~ Factor)* }
 Pest provides a template, just defines:
 Function build factor => create Factor Node
 Function infix rules => create Operator Node
 Operator precedence =>
vector of operator precedence and left/right association
30
Challenges
 Error message with syntax error.
 How to deal with optional? Like C for loop
 A more systematic way to deal with large language, like C.
31
compound_statement <- block_list
block_list <- block_list block | ε
block <- declaration_list | statement_list
declaration_list <- declaration_list declaration | ε
statement_list <- statment_list statement | ε
// Wrong PEG
compound_statement <- block*
block <- declaration* ~ statement*
// Correct PEG
compound_statement <- block*
block <- (declaration | statement)+
Conclusion
32
Conclusion
 PEG is a new, much powerful grammar than CFG. Fast and
convenient to create a small language parser.
 The most important concept in programming language?
Abstraction
 Is there best abstraction? NO. It is engineering.
33
Reference
 <Parsing Expression Grammars: A Recognition-Based
Syntactic Foundation>, Bryan Ford
 <Understanding Computation: From Simple Machines to
Impossible Programs>
 <Programming Language Part B> on Coursera, University of
Washington
34
Thank You for Listening
35
IB502 1430 – 1510
Build Yourself a Nixie Tube Clock
36

More Related Content

Similar to Use PEG to Write a Programming Language Parser

Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1Robert Stern
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersAlessandro Sanino
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?lichtkind
 
SymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performancesSymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performancesjulien pauli
 
Functional programming ii
Functional programming iiFunctional programming ii
Functional programming iiPrashant Kalkar
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»SpbDotNet Community
 
Geeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)Pavlo Baron
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Parkpointstechgeeks
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Guillaume Laforge
 
name name2 n
name name2 nname name2 n
name name2 ncallroom
 
name name2 n2
name name2 n2name name2 n2
name name2 n2callroom
 
name name2 n
name name2 nname name2 n
name name2 ncallroom
 

Similar to Use PEG to Write a Programming Language Parser (20)

Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
C Tutorials
C TutorialsC Tutorials
C Tutorials
 
Python basic
Python basicPython basic
Python basic
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
SymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performancesSymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performances
 
Functional programming ii
Functional programming iiFunctional programming ii
Functional programming ii
 
Ch2
Ch2Ch2
Ch2
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»
 
Geeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes - Le langage Go
Geeks Anonymes - Le langage Go
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Park
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
ppt7
ppt7ppt7
ppt7
 
ppt2
ppt2ppt2
ppt2
 
name name2 n
name name2 nname name2 n
name name2 n
 
name name2 n2
name name2 n2name name2 n2
name name2 n2
 
test ppt
test ppttest ppt
test ppt
 
name name2 n
name name2 nname name2 n
name name2 n
 
ppt21
ppt21ppt21
ppt21
 

More from Yodalee

COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfYodalee
 
rrxv6 Build a Riscv xv6 Kernel in Rust.pdf
rrxv6 Build a Riscv xv6 Kernel in Rust.pdfrrxv6 Build a Riscv xv6 Kernel in Rust.pdf
rrxv6 Build a Riscv xv6 Kernel in Rust.pdfYodalee
 
Gameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyGameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyYodalee
 
Make A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkMake A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkYodalee
 
Build Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube ClockBuild Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube ClockYodalee
 
Introduction to nand2 tetris
Introduction to nand2 tetrisIntroduction to nand2 tetris
Introduction to nand2 tetrisYodalee
 
Office word skills
Office word skillsOffice word skills
Office word skillsYodalee
 
Git: basic to advanced
Git: basic to advancedGit: basic to advanced
Git: basic to advancedYodalee
 

More from Yodalee (8)

COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
rrxv6 Build a Riscv xv6 Kernel in Rust.pdf
rrxv6 Build a Riscv xv6 Kernel in Rust.pdfrrxv6 Build a Riscv xv6 Kernel in Rust.pdf
rrxv6 Build a Riscv xv6 Kernel in Rust.pdf
 
Gameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyGameboy emulator in rust and web assembly
Gameboy emulator in rust and web assembly
 
Make A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkMake A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst Framework
 
Build Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube ClockBuild Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube Clock
 
Introduction to nand2 tetris
Introduction to nand2 tetrisIntroduction to nand2 tetris
Introduction to nand2 tetris
 
Office word skills
Office word skillsOffice word skills
Office word skills
 
Git: basic to advanced
Git: basic to advancedGit: basic to advanced
Git: basic to advanced
 

Recently uploaded

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Use PEG to Write a Programming Language Parser

  • 1. 1
  • 2. Outline  Who am I? Why I did this?  Introduction to PEG  Introduction to programming language  Write a parser in PEG  No demo QQ 2
  • 3. About Me  葉闆, Yodalee <lc85301@gmail.com>  Study EE in college, Microwave in graduate school, now rookie engineer in Synopsys. 3 Github: yodalee Blogger: http://yodalee.blogspot.tw
  • 4. Why Did I Do This  “Understanding Computation: From Simple Machines to Impossible Programs”  In the book, it implements a programming language parser, regular expression parser with Ruby Treetop, which is a PEG parser.  I re-write all the code in Rust, so I did a little research on PEG. https://github.com/yodalee/computationbook -rust 4
  • 6. Parsing Expression Grammar, PEG  Bryan Ford, <Parsing Expression Grammars: A Recognition- Based Syntactic Foundation>, 2004  A replacement to Chomsky language, by removing the ambiguity in grammar.  The ambiguity is useful in modeling natural language, but not in precise and unambiguous programming language. 6 Language <- Subject Verb Noun Subject <- He | Lisa … Verb <- is | has | sees… Noun <- student | a toy …
  • 7. PEG Basic Rule  PEG in definition are very similar to CFG, composed of rules.  Rule will either:  Match success: consume input.  Match fail: not consume input.  As predicate: only return success or fail, not consume input. 7
  • 8. PEG Basic Rule  Replace choice ‘|’ with prioritized choice ‘/’.  Consider following:  CFG: A = “a” | “ab” PEG: A = “a” / “ab”  PEG: A = a* a 8 Operator “” String Literal [] Character Set . Any Character (e1 e2 ..) Grouping e? e+ e* Optional Repetition &e And predicate !e Not predicate e1 e2 Sequence e1 / e2 Prioritized Choice
  • 9. Some Example  NUMBER <- [1-9] [0-9]*  COMMENT <- “//” (!”n” .)* n  EXPRESSION <- TERM ([+-] TERM)* TERM <- FACTOR ([*/] FACTOR)*  STAT_IF <- “if” COND “then” STATEMENT “else” STATEMENT / “if” COND “then” STATEMENT 9
  • 10. PEG is not CFG  PEG is equivalent to Top Down Programming Language (TDPL)  Language anbncn is not context-free, however PEG can parse it with And-predicate.  In CFG, A <- aAa | a match: odd number “a” In PEG, A <- aAa / a match: 2n-1 “a”  It is an open problem that any CFG can be parsed by PEG 10 A <- aAb / ε B <- bBc / ε S <- &(A !b) a* B
  • 11. Using PEG  There are many library that supports PEG:  Rust: rust-peg, pest, nom-peg …  C++: PEGTL, Boost …  Ruby: kpeg, raabro, Treetop …  Python: pyPEG, parsimonious …  Haskell: Peggy …  …  So why Rust? 11
  • 13. Simple Language  3 types of statements: assign, if else, while.  Support integer arithmetic.  Support pair, list, function with one argument. Simple, but actually we can do some complex things, like recursion, map. 13 factorfun = function factor(x) { if (x > 1) { x * factor ( x-1 ) } else { 1 } } result = factorfun(10); // 3628800 function last(l) { if (isnothing(snd(l))) { fst(l) } else { last(snd(l)) } }
  • 14. Abstract Syntax Tree  Use Rust enum to store a payload inside.  “Programming” like this: 14 pub enum Node { Number(i64), Boolean(bool), Add(Box<Node>, Box<Node>), Subtract(Box<Node>, Box<Node>), LT(Box<Node>, Box<Node>) … } let n = Node::add(Node::number(3), Node::number(4)) Add 3 4 LT 8
  • 15. Abstract Syntax Tree  All the statement are Node: 15 pub enum Node { Variable ( String ), Assign ( String, Box<Node>), If ( Box<Node>, Box<Node>, Box<Node> ), While ( Box<Node>, Box<Node> ), … }
  • 16. Pair, List and Nothing  Node::pair(Node::number(3), Node::number(4))  List [3,4,5] = pair(3, pair(4, pair(5, nothing)))  Nothing special 16 Pair 3 4 Pair 3 Pair 4 Pair Nothing5
  • 17. Environment and Machine  Environment stores a Hashmap<String, Box<Node>>, with <add> and <get> interface.  A machine accepts an AST and an environment to evaluate AST inside the machine. 17 pub struct Environment { pub vars: HashMap<String, Box<Node>> } pub struct Machine { pub environment: Environment, expression: Box<Node> }
  • 18. Evaluate the AST  Add evaluate function to all AST node using trait.  The result will be a new Node. 18 fn evaluate(&self, env: &mut Environment) -> Box<Node>; match *self { Node::Add(ref l, ref r) => { Node::number(l.evaluate(env).value() + r.evaluate(env).value()) } … }
  • 19. Evaluate the AST  How to evaluate While Node ( condition, body )?  Evaluate condition => evaluate body and self if true. 19 x = 3; while (x < 9) { x = x * 2; } Evaluate x = 3 Evaluate while (x < 9) x = x * 2 Evaluate x = x * 2 Evaluate while (x < 9) x = x * 2 Evaluate x = x * 2 Evaluate while (x < 9) x = x * 2
  • 20. Function  Function is also a type of Node. Upon evaluation, function is wrapped into Closure with environment at that time.  Call is evaluated the function with closure’s environment. 20 Node::Func(String, String, Box<Node>) Node::Closure(Environment, Box<Node>) fn evaluate(&self, env: &mut Environment) -> Box<Node> { Node::Fun(ref name, ref arg, ref body) => { Node::closure(env.clone(), Box::new(self.clone())) } }
  • 21. Call a Function fn evaluate(&self, env: &mut Environment) -> Box<Node> { Node::Call(ref closure, ref arg) => { match *closure { Node::Closure(ref env, ref fun) => { if let Node::Fun(funname, argname, body) = *fun.clone() { let mut newenv = env.clone(); newenv.add(&funname, closure.evaluate(env)); newenv.add(&argname, arg.evaluate(env)); body.evaluate(&mut newenv); } } } } } 21
  • 22. Free Variable  Evaluate the free variables in a function to prevent copy whole environment  Node::Variable  Node::Assign  Node::Function 22 function addx(x) { function addy(y) { x + y }} -> no free variables function addy(x) { x + y } -> free variable y
  • 23. Call a Function if let Node::Fun(funname, argname, body) = *fun.clone() { let mut newenv = new Environment {}; for var in free_vars(fun) { newenv.add(var, env.get(var)); } newenv.add(&funname, closure.evaluate(env)); newenv.add(&argname, arg.evaluate(env)); body.evaluate(&mut newenv); } 23
  • 24. What is a Language?  We make some concepts abstract, like a virtual machine. Design a language is to design the abstraction.  Function “evaluate” implement the concept, of course we can implement it as anything. Like return 42 on every evaluation. 24 Concept Simple, virtual machine Real Machine Number 3 Node::number(3) 0b11 in memory + Node::add(l, r) add r1 r2 Choice Node::if branch command
  • 25. What is a Language?  Abstraction will bring some precision issue, like floating point. We have no way to express concept of <infinite>.  We can create a language on geometry as below, which representation for line is best?  Consider every pros and cons the abstraction will bring. 25 Concept In Programming Language Point (x: u32, y: u32) Line (Point, Point) (Point, Slope) (Point, Point, type{vertical, horizontal, angled}) Intersection Calculate intersection
  • 26. Implement a Parser with PEG 26
  • 27. The Pest Package  Rust Pest  https://github.com/pest-parser/pest  My simple language parser grammar at:  https://github.com/yodalee/simplelang  Parsing Flow 27 Grammar Parser Source Code Pest Pair Structure Simple AST
  • 28. The Pest Package 28 use pest::Parser; #[derive(Parser)] #[grammar = "simple.pest"] struct SimpleParser; let pairs = SimpleParser::parse( Rule::simple, “<source code>")  A pair represents the parse result from a rule.  Pair.as_rule() => the rule  Pair.as_span() => get match span  Pair.as_str() => matched text  Pair.into_inner()=> Sub-rules
  • 29. Grammar <-> Build AST Number = { [1-9] ~ [0-9]* } Variable = { [A-Za-z] ~ [A-Za-z0-9]* } Call = { Variable ~ “(“ ~ Expr ~ “)” } Factor = { “(“ ~ Expr ~ “)” | Call | Variable | Number } 29 fn build_factor(pair: Pair<Rule>) -> Box<Node> { match pair.as_rule() { Rule::number => Node::number(pair.as_str().parse::<i64>().unwrap()), Rule::variable => Node::variable(pair.as_str()), Rule::expr => ..., Rule::call => ..., } }
  • 30. Climb the Expression  Expression can be written as single Rule: Expr = { Factor ~ (op_binary ~ Factor)* }  Pest provides a template, just defines:  Function build factor => create Factor Node  Function infix rules => create Operator Node  Operator precedence => vector of operator precedence and left/right association 30
  • 31. Challenges  Error message with syntax error.  How to deal with optional? Like C for loop  A more systematic way to deal with large language, like C. 31 compound_statement <- block_list block_list <- block_list block | ε block <- declaration_list | statement_list declaration_list <- declaration_list declaration | ε statement_list <- statment_list statement | ε // Wrong PEG compound_statement <- block* block <- declaration* ~ statement* // Correct PEG compound_statement <- block* block <- (declaration | statement)+
  • 33. Conclusion  PEG is a new, much powerful grammar than CFG. Fast and convenient to create a small language parser.  The most important concept in programming language? Abstraction  Is there best abstraction? NO. It is engineering. 33
  • 34. Reference  <Parsing Expression Grammars: A Recognition-Based Syntactic Foundation>, Bryan Ford  <Understanding Computation: From Simple Machines to Impossible Programs>  <Programming Language Part B> on Coursera, University of Washington 34
  • 35. Thank You for Listening 35
  • 36. IB502 1430 – 1510 Build Yourself a Nixie Tube Clock 36