SlideShare a Scribd company logo
1 of 29
Download to read offline
Introduction
to ANTLR 4
The latest and last (sic!) revision
Oliver Zeigermann
Code Generation Cambridge 2013
Outline
● Introduction to parsing with ANTLR4
● Use case: Interpreter
● Use case: Code Generator
● The tiniest bit of theory: Adaptive LL(*)
● Comparing ANTLR4
ANTLR 4: Management Summary
● Parser generator
● Language tool
● Design goals
○ ease-of-use over performance
○ simplicity over complexity
● Eats all context free grammars
○ with a small caveat
What does parsing with ANTLR4
mean?
Identify and reveal the implicit structure of
an input in a parse tree
● Structure is described as a grammar
● Recognizer is divided into lexer and parser
● Parse tree can be auto generated
Practical example #1: An expression
interpreter
Build an interpreter from scratch
● complete and working
● parses numerical expressions
● does the calculation
Lexical rule for integers
INT : ('0'..'9')+ ;
● Rule for token INT
● Composed of digits 0 to 9
● At least one digit
Syntactic rule for expressions
expr : expr ('*'|'/') expr
| expr ('+'|'-') expr
| INT
;
● Three options
● Operator precedence by order of options
● Notice left recursion
grammar Expressions;
start : expr ;
expr : left=expr op=('*'|'/') right=expr #opExpr
| left=expr op=('+'|'-') right=expr #opExpr
| atom=INT #atomExpr
;
INT : ('0'..'9')+ ;
WS : [ trn]+ -> skip ;
Doing something based on input
Options for separation of actions from
grammars in ANTLR4
1. Listeners (SAX style)
2. Visitors (DOM style)
Introducing Visitors
Yes, that well known pattern
● Actions are separated from the grammar
● Visitors actively control traversal of parse
tree
○ if and
○ how
● Calls of visitor methods can return values
Visitor: actively visiting tree
int visitOpExpr(OpExprContext ctx) {
int left = visit(ctx.left);
int right = visit(ctx.right);
switch (ctx.op) {
case '*' : return left * right;
case '/' : return left / right;
case '+' : return left + right;
case '-' : return left - right;
}
}
Visitor methods pseudo code
int visitStart(StartContext ctx) {
return visitChildren(ctx);
}
int visitAtomExpr(AtomExprContext ctx) {
return Integer.parseInt(ctx.atom.getText());
}
int visitOpExpr(OpExprContext ctx) {
// what we just saw
}
Live Demo
Practical example #2: Code
generation from Java
Generate interface declarations for existing
class definitions
● Adapted from ANTLR4 book
● Like a large scale refactoring or code
generation
● Keep method signature exactly as it is
● Good news
○ You can use an existing Java grammar
○ You just have to figure out what is/are the rule(s) you
are interested in: methodDeclaration
Doing something based on input
Options for separation of actions from
grammars
1. Listeners (SAX style)
2. Visitors (DOM style)
Introducing listeners
Yes, that well known pattern (aka Observer)
● Automatic depth-first traversal of parse tree
● walker fires events that you can handle
Listener for method declaration,
almost real code
void enterMethodDeclaration(
MethodDeclarationContext ctx) {
// gets me the text of all the tokens that have
// been matched while parsing a
// method declaration
String fullSignature =
parser.getTokenStream().getText(ctx);
print(fullSignature + ";");
}
(plus minor glue code)
That's it!
Live Demo
ANTLR4 Adaptive LL(*) or ALL(*)
● Generates recursive descent parser
○ like what you would build by hand
● does dynamic grammar analysis while
parsing
○ saves results like a JIT does
● allows for direct left recursion
○ indirect left recursion not supported
● parsing time typically near-linear
Comparing to ANTLR3
top-down SLL(*) parser with optional
backtracking and memoization
● ANTLR4
○ can handler a larger set of grammars
○ supports lexer modes
● ANTLR3
○ relies on embedded actions
○ has tree parser and transformer
○ can't handle left recursive grammars
Comparision Summary
● PEG / packrat
○ can't do left recursion
○ similar to ANTLR3 with backtracking
○ error recovery or reporting hard
● GLR
○ bottom-up like yacc
○ can do all grammars
○ debugging on the declarative specification only
● GLL
○ top-down like ANTLR
○ can do all grammars
Comparing to PEG / packrat
top-down parsing with backtracking and
memoization
● Sometimes scanner-less
● also can not do left recursion
● Uses the first option that matches
● Error recovery and proper error reporting
very hard
● ANTLR4 much more efficient
Comparing to GLR
a generalized LR parser
● does breadth-first search when there is a
conflict in the LR table
● can deliver all possible derivation on
ambiguous input
● debugging on the declarative specification
○ no need to debug state machine
● can do indirect left recursion
Comparing to GLL
a generalized LL parser
● mapping to recursive descent parser not
obvious (not possible?)
● can do indirect left recursion
Wrap-Up
● ANTLR 4 is easy to use
● You can use listeners or visitors for actions
● ANTLR 4 helps you solve practical
problems
Why is ANTLR4 called "Honey
Badger"?
● The amimal "Honey Badger" is the most
fearless animal of all
● ANTLR4 named after it
○ http://www.youtube.com/watch?v=4r7wHMg5Yjg
● It takes any grammar you give it. “It just
doesn’t give a damn.”
Resources
● Terence Parr speaking about ANTLR4
○ http://vimeo.com/m/59285751
● ANTLR4-Website
○ http://www.antlr.org
● Book
○ http://pragprog.com/book/tpantlr2/the-definitive-antlr-
4-reference
● ANTLRWorks
○ http://tunnelvisionlabs.
com/products/demo/antlrworks
● Examples of this talk on github
○ https://github.com/DJCordhose/antlr4-sandbox
Questions / Discussion
Oliver Zeigermann
zeigermann.eu
Java example and some illustrations provided as a courtesy of Terence Parr

More Related Content

What's hot

FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...Dierk König
 
FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)Dierk König
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]Tom Lee
 
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16Introduction Functional Programming - Tech Hangout #11 - 2013.01.16
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16Innovecs
 
Introduction to Python Part-1
Introduction to Python Part-1Introduction to Python Part-1
Introduction to Python Part-1Devashish Kumar
 
Academy PRO: React JS. Typescript
Academy PRO: React JS. TypescriptAcademy PRO: React JS. Typescript
Academy PRO: React JS. TypescriptBinary Studio
 
Effective c# part1
Effective c# part1Effective c# part1
Effective c# part1Yuriy Seniuk
 
Python Compiler Internals Presentation Slides
Python Compiler Internals Presentation SlidesPython Compiler Internals Presentation Slides
Python Compiler Internals Presentation SlidesTom Lee
 
Functional programming
Functional programmingFunctional programming
Functional programmingBas Bossink
 
Static typing vs dynamic typing languages
Static typing vs dynamic typing languagesStatic typing vs dynamic typing languages
Static typing vs dynamic typing languagesJawad Khan
 
Static vs dynamic types
Static vs dynamic typesStatic vs dynamic types
Static vs dynamic typesTerence Parr
 

What's hot (20)

SS UI Lecture 5
SS UI Lecture 5SS UI Lecture 5
SS UI Lecture 5
 
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
FregeDay: Roadmap for resolving differences between Haskell and Frege (Ingo W...
 
FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)FregeDay: Design and Implementation of the language (Ingo Wechsung)
FregeDay: Design and Implementation of the language (Ingo Wechsung)
 
Demo learn python
Demo learn pythonDemo learn python
Demo learn python
 
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
 
2nd presantation
2nd presantation2nd presantation
2nd presantation
 
PHPDoc
PHPDocPHPDoc
PHPDoc
 
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16Introduction Functional Programming - Tech Hangout #11 - 2013.01.16
Introduction Functional Programming - Tech Hangout #11 - 2013.01.16
 
Introductory func prog
Introductory func progIntroductory func prog
Introductory func prog
 
Introduction to Python Part-1
Introduction to Python Part-1Introduction to Python Part-1
Introduction to Python Part-1
 
What's New in C# 6
What's New in C# 6What's New in C# 6
What's New in C# 6
 
Introduction to JavaScript
Introduction to JavaScriptIntroduction to JavaScript
Introduction to JavaScript
 
Academy PRO: React JS. Typescript
Academy PRO: React JS. TypescriptAcademy PRO: React JS. Typescript
Academy PRO: React JS. Typescript
 
Sysprog 9
Sysprog 9Sysprog 9
Sysprog 9
 
GooglePropsal
GooglePropsalGooglePropsal
GooglePropsal
 
Effective c# part1
Effective c# part1Effective c# part1
Effective c# part1
 
Python Compiler Internals Presentation Slides
Python Compiler Internals Presentation SlidesPython Compiler Internals Presentation Slides
Python Compiler Internals Presentation Slides
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Static typing vs dynamic typing languages
Static typing vs dynamic typing languagesStatic typing vs dynamic typing languages
Static typing vs dynamic typing languages
 
Static vs dynamic types
Static vs dynamic typesStatic vs dynamic types
Static vs dynamic types
 

Similar to Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4

Java 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen ColebourneJava 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen ColebourneJAXLondon_Conference
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17Daniel Eriksson
 
javase8bestpractices-151015135520-lva1-app6892
javase8bestpractices-151015135520-lva1-app6892javase8bestpractices-151015135520-lva1-app6892
javase8bestpractices-151015135520-lva1-app6892SRINIVAS C
 
Algorithm and Programming (Procedure and Function)
Algorithm and Programming (Procedure and Function)Algorithm and Programming (Procedure and Function)
Algorithm and Programming (Procedure and Function)Adam Mukharil Bachtiar
 
Building parsers in JavaScript
Building parsers in JavaScriptBuilding parsers in JavaScript
Building parsers in JavaScriptKenneth Geisshirt
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithmsAsen Bozhilov
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytechyannick grenzinger
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextEdward Willink
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scaladatamantra
 
Amber and beyond: Java language changes
Amber and beyond: Java language changesAmber and beyond: Java language changes
Amber and beyond: Java language changesStephen Colebourne
 
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Osama Ghandour Geris
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBArangoDB Database
 
Programming Methodologies Functions - C Language
Programming Methodologies Functions - C LanguageProgramming Methodologies Functions - C Language
Programming Methodologies Functions - C LanguageChobodiDamsaraniPadm
 
Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovParsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovVasil Remeniuk
 

Similar to Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4 (20)

Java 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen ColebourneJava 8 best practices - Stephen Colebourne
Java 8 best practices - Stephen Colebourne
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
 
Java SE 8 best practices
Java SE 8 best practicesJava SE 8 best practices
Java SE 8 best practices
 
javase8bestpractices-151015135520-lva1-app6892
javase8bestpractices-151015135520-lva1-app6892javase8bestpractices-151015135520-lva1-app6892
javase8bestpractices-151015135520-lva1-app6892
 
Algorithm and Programming (Procedure and Function)
Algorithm and Programming (Procedure and Function)Algorithm and Programming (Procedure and Function)
Algorithm and Programming (Procedure and Function)
 
Dart workshop
Dart workshopDart workshop
Dart workshop
 
Automata Invasion
Automata InvasionAutomata Invasion
Automata Invasion
 
Building parsers in JavaScript
Building parsers in JavaScriptBuilding parsers in JavaScript
Building parsers in JavaScript
 
Analysis of algorithms
Analysis of algorithmsAnalysis of algorithms
Analysis of algorithms
 
Software Craftmanship - Cours Polytech
Software Craftmanship - Cours PolytechSoftware Craftmanship - Cours Polytech
Software Craftmanship - Cours Polytech
 
Re-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for XtextRe-engineering Eclipse MDT/OCL for Xtext
Re-engineering Eclipse MDT/OCL for Xtext
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Amber and beyond: Java language changes
Amber and beyond: Java language changesAmber and beyond: Java language changes
Amber and beyond: Java language changes
 
Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10 Python week 6 2019 2020 for grade 10
Python week 6 2019 2020 for grade 10
 
Custom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDBCustom Pregel Algorithms in ArangoDB
Custom Pregel Algorithms in ArangoDB
 
Programming Methodologies Functions - C Language
Programming Methodologies Functions - C LanguageProgramming Methodologies Functions - C Language
Programming Methodologies Functions - C Language
 
Austen x talk
Austen x talkAusten x talk
Austen x talk
 
Java SE 8 library design
Java SE 8 library designJava SE 8 library design
Java SE 8 library design
 
Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovParsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
 

Code Generation Cambridge 2013 Introduction to Parsing with ANTLR4

  • 1. Introduction to ANTLR 4 The latest and last (sic!) revision Oliver Zeigermann Code Generation Cambridge 2013
  • 2. Outline ● Introduction to parsing with ANTLR4 ● Use case: Interpreter ● Use case: Code Generator ● The tiniest bit of theory: Adaptive LL(*) ● Comparing ANTLR4
  • 3. ANTLR 4: Management Summary ● Parser generator ● Language tool ● Design goals ○ ease-of-use over performance ○ simplicity over complexity ● Eats all context free grammars ○ with a small caveat
  • 4. What does parsing with ANTLR4 mean? Identify and reveal the implicit structure of an input in a parse tree ● Structure is described as a grammar ● Recognizer is divided into lexer and parser ● Parse tree can be auto generated
  • 5. Practical example #1: An expression interpreter Build an interpreter from scratch ● complete and working ● parses numerical expressions ● does the calculation
  • 6. Lexical rule for integers INT : ('0'..'9')+ ; ● Rule for token INT ● Composed of digits 0 to 9 ● At least one digit
  • 7. Syntactic rule for expressions expr : expr ('*'|'/') expr | expr ('+'|'-') expr | INT ; ● Three options ● Operator precedence by order of options ● Notice left recursion
  • 8. grammar Expressions; start : expr ; expr : left=expr op=('*'|'/') right=expr #opExpr | left=expr op=('+'|'-') right=expr #opExpr | atom=INT #atomExpr ; INT : ('0'..'9')+ ; WS : [ trn]+ -> skip ;
  • 9. Doing something based on input Options for separation of actions from grammars in ANTLR4 1. Listeners (SAX style) 2. Visitors (DOM style)
  • 10. Introducing Visitors Yes, that well known pattern ● Actions are separated from the grammar ● Visitors actively control traversal of parse tree ○ if and ○ how ● Calls of visitor methods can return values
  • 11. Visitor: actively visiting tree int visitOpExpr(OpExprContext ctx) { int left = visit(ctx.left); int right = visit(ctx.right); switch (ctx.op) { case '*' : return left * right; case '/' : return left / right; case '+' : return left + right; case '-' : return left - right; } }
  • 12. Visitor methods pseudo code int visitStart(StartContext ctx) { return visitChildren(ctx); } int visitAtomExpr(AtomExprContext ctx) { return Integer.parseInt(ctx.atom.getText()); } int visitOpExpr(OpExprContext ctx) { // what we just saw }
  • 14. Practical example #2: Code generation from Java Generate interface declarations for existing class definitions ● Adapted from ANTLR4 book ● Like a large scale refactoring or code generation ● Keep method signature exactly as it is ● Good news ○ You can use an existing Java grammar ○ You just have to figure out what is/are the rule(s) you are interested in: methodDeclaration
  • 15. Doing something based on input Options for separation of actions from grammars 1. Listeners (SAX style) 2. Visitors (DOM style)
  • 16. Introducing listeners Yes, that well known pattern (aka Observer) ● Automatic depth-first traversal of parse tree ● walker fires events that you can handle
  • 17. Listener for method declaration, almost real code void enterMethodDeclaration( MethodDeclarationContext ctx) { // gets me the text of all the tokens that have // been matched while parsing a // method declaration String fullSignature = parser.getTokenStream().getText(ctx); print(fullSignature + ";"); }
  • 18. (plus minor glue code) That's it!
  • 20. ANTLR4 Adaptive LL(*) or ALL(*) ● Generates recursive descent parser ○ like what you would build by hand ● does dynamic grammar analysis while parsing ○ saves results like a JIT does ● allows for direct left recursion ○ indirect left recursion not supported ● parsing time typically near-linear
  • 21. Comparing to ANTLR3 top-down SLL(*) parser with optional backtracking and memoization ● ANTLR4 ○ can handler a larger set of grammars ○ supports lexer modes ● ANTLR3 ○ relies on embedded actions ○ has tree parser and transformer ○ can't handle left recursive grammars
  • 22. Comparision Summary ● PEG / packrat ○ can't do left recursion ○ similar to ANTLR3 with backtracking ○ error recovery or reporting hard ● GLR ○ bottom-up like yacc ○ can do all grammars ○ debugging on the declarative specification only ● GLL ○ top-down like ANTLR ○ can do all grammars
  • 23. Comparing to PEG / packrat top-down parsing with backtracking and memoization ● Sometimes scanner-less ● also can not do left recursion ● Uses the first option that matches ● Error recovery and proper error reporting very hard ● ANTLR4 much more efficient
  • 24. Comparing to GLR a generalized LR parser ● does breadth-first search when there is a conflict in the LR table ● can deliver all possible derivation on ambiguous input ● debugging on the declarative specification ○ no need to debug state machine ● can do indirect left recursion
  • 25. Comparing to GLL a generalized LL parser ● mapping to recursive descent parser not obvious (not possible?) ● can do indirect left recursion
  • 26. Wrap-Up ● ANTLR 4 is easy to use ● You can use listeners or visitors for actions ● ANTLR 4 helps you solve practical problems
  • 27. Why is ANTLR4 called "Honey Badger"? ● The amimal "Honey Badger" is the most fearless animal of all ● ANTLR4 named after it ○ http://www.youtube.com/watch?v=4r7wHMg5Yjg ● It takes any grammar you give it. “It just doesn’t give a damn.”
  • 28. Resources ● Terence Parr speaking about ANTLR4 ○ http://vimeo.com/m/59285751 ● ANTLR4-Website ○ http://www.antlr.org ● Book ○ http://pragprog.com/book/tpantlr2/the-definitive-antlr- 4-reference ● ANTLRWorks ○ http://tunnelvisionlabs. com/products/demo/antlrworks ● Examples of this talk on github ○ https://github.com/DJCordhose/antlr4-sandbox
  • 29. Questions / Discussion Oliver Zeigermann zeigermann.eu Java example and some illustrations provided as a courtesy of Terence Parr