SlideShare a Scribd company logo
25 March 2023 Hande Çelikkanat 1
A (Long) Introduction
to AntLR
Slides adapted from:
–AntLR Reference Manual by Terence Pratt
antlr.org/share/1084743321127/ANTLR_Reference_Manual.pdf
–AntLR Tutorial by Ashley J.S Mills
http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/antlr/antlrhome.html
–An Introduction to AntLR by Terence Pratt
http://www.cs.usfca.edu/~parrt/course/652/lectures/antlr.html
–An AntLR Tutorial by Scott Stanchfield
javadude.com/articles/antlrtut/
2
AntLR
ANother Tool for Language Recognition
(or anti-LR??)
a LL(k) parser and translator generator tool
which can create
– lexers
– parsers
– abstract syntax trees (AST’s)
in which you describe the language grammatically
and in return receive a program that can recognize and
translate that language
3
Tasks Divided
• Lexical Analysis (scanning)
• Semantic Analysis (parsing)
• Tree Generation
• Code Generation
4
Lexer
A source file is streamed to a lexer on a character by character basis by
some kind of input interface.
Lexer groups characters into meaningful tokens that are meaningful to
the parser.
A “token” may be
– keywords
– identifiers
– symbols
– operators
Lexer also removes comments and whitespace from the program,
which are meaningless to the parser.
So it creates a stream of tokens, which are received one by one by the
parser.
5
Parser
Parser organizes the tokens into the allowed sequences defined by the
grammar of the language.
If the parser encounters a sequence of tokens that match none of the allowed
sequences of tokens, it will issue an error
A design choice is whether to try to recover from the error by making
assumptions.
Parsers may either do syntax-directed translation on-the-fly,
or convert the sequences of tokens into an Abstract Syntax Tree (AST).
An AST is a structure which
– keeps information in an easily traversable form (such as operator at a node,
operands at children of the node)
– ignores form-dependent superficial details
More on AST’s later...
Parser also generates one or more symbol table(s) which contain information,
about the tokens it encounters.
6
What does a grammar file look like?
It is composed of rules
ANTLR accepts three types of grammar specifications
parsers
lexers
tree-parsers (also called tree-walkers)
Uses LL(k) analysis for all
So the grammar specifications are similar, and the
generated lexers and parsers behave similarly
7
Sample File
taken from AntLR tutorial of Ashley J.S Mills
8
Sample File Divided (1/3)
• An arbitrary number of parsers, lexers, and tree-
parsers in a grammar file
– a separate class file will be generated for each
– i.e, YourLexerClass.class, YourParserClass.class,
YourTreeParserClass.class
• Header:
– put preamble that will be put on top of each of these
classes
– an import, maybe?
9
Sample File Divided (2/3)
• Options
– file-wide
– charVocabulary = '0'..'377'; //defines the alphabet (usage in complement and
wildcard)
– k=2; // means two characters of lookahead
• Class specific:
{ ... header for parser class only ...}
class MyParser extends Parser;
options { ...parser options... }
{
parser class members
}
parser rules
10
• Rules in EBNF notation:
Sample File Divided (3/3)
taken from AntLR tutorial of Ashley J.S Mills
You simply list a set of lexical rules that match tokens. The tool automatically
generates code to map the next input character(s) to a rule likely to match.
A big "switch“ that routes recognition flow to the appropriate rule
11
Symbols in AntLR
taken from AntLR reference manual
12
Lexer
With one restriction:
• Rules defined within a lexer grammar must have a name beginning
with an uppercase letter
taken from AntLR tutorial of Ashley J.S Mills
13
Lexer Rules
You can define operators like:
BECOMES : “:=“;
COLON : ‘:‘;
SEMI : ‘;’ ;
EQUALS : ‘=‘ ;
LBRACKET : ‘[‘;
RBRACKET : ‘]’ ;
LPAREN : ‘(‘ ;
RPAREN : ‘)’ ;
LT : ‘<‘ ;
LTE : “<=“ ;
PLUS : ‘+’ ;
MINUS : ‘-’ ;
TIMES : ‘*’ ;
DIV : ‘/’ ;
And then you can define a token class such as:
OPS : (PLUS | MINUS | MULT | DIV) ;
14
Actions
Blocks of source code (expressed in the target language) enclosed in curly braces
Executed
after the preceding production element has been recognized
before the recognition of the following element
Typically used to generate output, construct trees, or modify a symbol table
Position dictates when it is recognized relative to the surrounding grammar elements.
If the first element of a production, it is executed before any other element in that production, but only if
that production is predicted by the lookahead
rule_name
(
{init-action}:
{action of 1st production} production_1
| {action of 2nd production} production_2
)?
The init-action would be executed regardless of what (if anything) matched in the optional subrule.
The init-actions are placed within the loops generated for subrules (...)+ and (...)*.
15
Tip: Skipping Tokens
A white space has nothing to do in a grammar:
WS :
(‘ ‘ | ‘n’ | ‘t’)
{ $setType(Token.SKIP); } → action
;
→ Do not pass this token to the parser. Recognize
it and then throw it away.
Same for comments ;)
16
Tip: Newline Stuff
Line number of input is used for reporting error
Must be incremented by hand when lexer encounters a
newline:
WS :
( ' ' | 't' | 'f'
// handle newlines
| (
"rn" // DOS/Windows
| 'r' // Macintosh
| 'n' // Unix )
// increment the line count
{ newline(); } → action executed only in this case
)
{ $setType(Token.SKIP); }
;
17
Parser
class ExprParser extends Parser;
expr:
mexpr ((PLUS|MINUS) mexpr)* ;
mexpr :
atom (STAR atom)* ;
atom:
INT
| LPAREN expr RPAREN ;
• Rules defined within a parser grammar must have a name beginning
with a lowercase letter
18
Tip: Keywords and Literals (1/2)
Many languages have a general "identifier" lexical rule, and keywords that are special
cases of the identifier pattern
A typical identifier token may be defined as:
ID : LETTER (LETTER | DIGIT)*;
So how can AntLR understand “if” is not an identifier?
You put fixed keywords into a literals table.
checked after each token is matched
Any double-quoted string used in a parser is automatically entered into the literals
table of the associated lexer.
subprogramBody :
(basicDecl)*
(procedureDecl)*
"begin"
(statement)*
"end" IDENT ;
19
Tip: Keywords and Literals (2/2)
option testLiterals
By default, ANTLR will generate code in all lexer rules to test each
token against the literals table
However, you may suppress this code generation in the lexer by using
a grammar option:
class L extends Lexer;
options { testLiterals=false; }
...
If you turn this option off for a lexer, you may re-enable it for specific
rules
ID options { testLiterals=true; }
: LETTER (LETTER | DIGIT)*;
20
Tip: Token Object Creation
You will sometimes want to access information about the token being
matched
Label lexical rules and obtain a Token object representing the text,
token type, line number, etc... matched for that rule reference
Lexer rule:
INT : ('0'..'9')+ ;
Parser rule:
INDEX :
'[' i:INT ']'
{System.out.println(i.getText());} ;
21
Tip: Syntactic / Semantic Predicates
There are other situations where you have to turn on and
off certain rules
depending on prior context or semantic information
Use “predicates” to decide
22
Syntactic Predicates
ANTLR (tree) parsers usually use only a single symbol of lookahead, which is normally
not a problem as intermediate forms are explicitly designed to be easy to walk
However, there is occasionally the need to distinguish between similar tree structures
Syntactic predicates can be used to overcome the limitations of limited fixed lookahead
For example, distinguishing between the unary and binary minus operator:
expr: ( #(MINUS expr expr) )=> #( MINUS expr expr )
| #( MINUS expr )
...
;
The order of evaluation is very important as the second alternative is a "subset" of the
first alternative
Syntactic predicates are a form of selective backtracking and, therefore, actions are
turned off while evaluating a syntactic predicate so that actions do not have to be
undone
23
Semantic Predicates
Semantic predicates
– at the start of an alternative: decides whether or not to match
– in the middle of productions: throw exceptions when they evaluate to
false
stat:
{isTypeName(LT(1))}? ID ID ";“ // declaration "type varName;"
| ID "=" expr ";" // assignment
;
decl: "var" ID ":" t:ID
{ isTypeName(t.getText()) }? //used to throw an exception
;
24
Eg: Keeping State Information
Context-sensitive recognition example:
If you are matching tokens that separate rows of data such as "----",
you probably only want to match this if the "begin table" sequence
has been found
BEGIN_TABLE :
'[' {this.inTable=true;} // enter table context
;
ROW_SEP :
{this.inTable}? "----“ // sematic predicate
;
END_TABLE :
']' {this.inTable=false;} // exit table context
;
25
The Java Code
The code to invoke the parser:
import java.io.*;
class Main {
public static void main(String[] args) {
try {
// use DataInputStream to grab bytes
MyLexer lexer = new MyLexer(new DataInputStream(System.in));
MyParser parser = new MyParser(lexer);
int x = parser.expr();
System.out.println(x);
} catch(Exception e) {
System.err.println("exception: "+e);
}
}
}
26
Running AntLR
In Linux
runantlr <antlr_file>.g
javac *.java
java Main
In Windows
Eclipse has a very easy-to-use plugin for AntLR
http://antlreclipse.sourceforge.net/ for very very detailed
instructions
The plugin will run AntLR on the grammar file
27
Expression Evaluation 1:
Syntax-Directed Translation
To evaluate the expressions on the fly as the tokens come in, add actions to the parser:
class ExprParser extends Parser;
expr returns [int value=0] {int x;} :
value=mexpr
(
PLUS x=mexpr {value += x;}
| MINUS x=mexpr {value -= x;}
)* ;
mexpr returns [int value=0] {int x;} :
value=atom
( STAR x=atom {value *= x;} )* ;
atom returns [int value=0] :
i:INT {value=Integer.parseInt(i.getText());}
| LPAREN value=expr RPAREN ;
28
Expression Evaluation 2:
via AST Intermediate Form
A more powerful strategy than syntax-directed translation is
to build an AST:
intermediate representation that holds all or most of the
input symbols and has encoded, in the structure of the
data, the relationship between those tokens
For this kind of tree, you will use a tree walker to compute
the same values as before, but using a different strategy
The utility of ASTs becomes clear when you must do
multiple walks over the tree to figure out what to
compute or to do tree rewrites, morphing the tree
towards another language.
29
Abstract Syntax Trees
Abstract Syntax Tree: Like a parse tree, without unnecessary
information
Two-dimensional trees that can encode the structure of the input as
well as the input symbols
Either
homogeneous: all objects of the same type; e.g., CommonAST in
ANTLR
or heterogeneous: multiple types such as PlusNode, MultNode...
An AST for (3+4) might be represented as
No parantheses are included in the tree!
30
AST Construction
To get ANTLR to generate a useful AST :
– turn on the buildAST option
– add a few suffix operators
class ExprParser extends Parser;
options { buildAST=true; }
expr: mexpr ((PLUS^|MINUS^) mexpr)* ;
mexpr : atom (STAR^ atom)* ;
atom: INT | LPAREN! expr RPAREN! ;
No changes in the Lexer.
31
AST Operators
AST root operator
Normally AntLR makes the first token it encounters the root of the tree
We usually want to manipulate this, eg, for operators
A token suffixed with the “^” root operator forces that token as the root of the
current tree:
expr: mexpr ((PLUS^|MINUS^) mexpr)* ;
AST exclude operator.
Tokens / rule references suffixed with the exclude operator are not included
in the AST
eg, for parantheses:
atom: INT | LPAREN! expr RPAREN! ;
32
AST Parsing and Evaluation
Rule format is like #(A B C);
which means "match a node of type A, and then descend into its list of children
and match B and C".
This notation can be nested arbitrarily, using #(...) for child trees
eg, #(A B #(C D) );
class ExprTreeParser extends TreeParser;
expr returns [int r=0] { int a,b; } :
#(PLUS a=expr b=expr) {r = a+b;}
| #(MINUS a=expr b=expr) {r = a-b;}
| #(STAR a=expr b=expr) {r = a*b;}
| i:INT {r = (int)Integer.parseInt(i.getText());} ;
Important: Sufficient matches are not exact matches. As long as the tree satistfies the
pattern, a match is reported, regardless of how much is left unparsed
#( A B ) = #( A #(B C) D).
33
in Java
The code to launch the parser and the tree walker:
import java.io.*;
import antlr.CommonAST;
import antlr.collections.AST;
class Calc {
public static void main(String[] args) {
try {
CalcLexer lexer = new CalcLexer(new DataInputStream(System.in));
CalcParser parser = new CalcParser(lexer);
parser.expr(); // Parse the input expression
CommonAST t = (CommonAST)parser.getAST();
System.out.println(t.toStringList()); // Print the resulting tree out in LISP notation
CalcTreeWalker walker = new CalcTreeWalker(); // Traverse the tree created by the parser
int r = walker.expr(t);
System.out.println("value is "+r);
} catch(Exception e) {
System.err.println("exception: "+e);
}
}
}
34
AST Construction by Hand
In some cases, you may want to transfom a tree yourself. eg, Optimization of addition with zero
class CalcTreeWalker extends TreeParser;
options{ buildAST = true; // "transform" mode }
expr:
! #(PLUS left:expr right:expr) // '!' turns off auto transform
{
if ( #right.getType()==INT && Integer.parseInt(#right.getText())==0 ) // x+0 = x
{
#expr = #left;
}
else if ( #left.getType()==INT && Integer.parseInt(#left.getText())==0 ) // 0+x = x
{
#expr = #right;
}
else // x+y
{
#expr = #(PLUS, left, right);
}
}
| #(STAR expr expr) // use auto transformation
| i:INT
;
35
in Java
The code to launch the parser and tree trasformer is:
import java.io.*;
import antlr.CommonAST;
import antlr.collections.AST;
class Calc {
public static void main(String[] args) {
try {
CalcLexer lexer = new CalcLexer(new DataInputStream(System.in));
CalcParser parser = new CalcParser(lexer);
parser.expr(); // Parse the input expression
CommonAST t = (CommonAST)parser.getAST();
System.out.println(t.toLispString()); // Print the resulting tree out in LISP notation
CalcTreeWalker walker = new CalcTreeWalker();
walker.expr(t); // Traverse the tree created by the parser
t = (CommonAST)walker.getAST(); // Get the result tree from the walker
System.out.println(t.toLispString());
} catch(Exception e) {
System.err.println("exception: "+e);
}
}
}
36
Left Recursion Solved
E → E + T | T written in AntLR as expr: expr PLUS term | term;
The code generated checks for expr infinitely:
expr()
{
expr();
match(PLUS);
expr();
}
Eliminate left recursion by
E → TE’
E’ → +TE’ | ε
results in:
expr: term (PLUS term)* ;
37
Links
• AntLR Reference Manual by Terence Pratt
antlr.org/share/1084743321127/ANTLR_Reference_Manual.pdf
• AntLR Tutorial by Ashley J.S Mills
http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/an
tlr/antlrhome.html
• An Introduction to AntLR by Terence Pratt
http://www.cs.usfca.edu/~parrt/course/652/lectures/antlr.html
• An AntLR Tutorial by Scott Stanchfield
javadude.com/articles/antlrtut/

More Related Content

Similar to introduction_to_antlr 3.ppt

Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
fredharris32
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
Sumathi Gnanasekaran
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
Alexey Diyan
 
Java Programming Introduction Lexer 1 In this project we.pdf
Java Programming  Introduction Lexer 1 In this project we.pdfJava Programming  Introduction Lexer 1 In this project we.pdf
Java Programming Introduction Lexer 1 In this project we.pdf
adinathassociates
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
MashaelQ
 
An Annotation Framework for Statically-Typed Syntax Trees
An Annotation Framework for Statically-Typed Syntax TreesAn Annotation Framework for Statically-Typed Syntax Trees
An Annotation Framework for Statically-Typed Syntax Trees
Ray Toal
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
vikram singh
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
Munni28
 
LEX & YACC
LEX & YACCLEX & YACC
LEX & YACC
Mahbubur Rahman
 
Intro To TSQL - Unit 1
Intro To TSQL - Unit 1Intro To TSQL - Unit 1
Intro To TSQL - Unit 1
iccma
 
ITU - MDD - XText
ITU - MDD - XTextITU - MDD - XText
ITU - MDD - XText
Tonny Madsen
 
Intro to tsql unit 1
Intro to tsql   unit 1Intro to tsql   unit 1
Intro to tsql unit 1
Syed Asrarali
 
EnScript Workshop
EnScript WorkshopEnScript Workshop
EnScript Workshop
Mark Morgan, CCE, EnCE
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
Ted Leung
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic Descriptions
Ray Toal
 
Oracle notes
Oracle notesOracle notes
Oracle notes
Prashant Dadmode
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
J'tong Atong
 
Advanced REXX Programming Techniques
Advanced REXX Programming TechniquesAdvanced REXX Programming Techniques
Advanced REXX Programming Techniques
Dan O'Dea
 
Viva
VivaViva
Viva
VivaViva

Similar to introduction_to_antlr 3.ppt (20)

Article link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docxArticle link httpiveybusinessjournal.compublicationmanaging-.docx
Article link httpiveybusinessjournal.compublicationmanaging-.docx
 
Pcd question bank
Pcd question bank Pcd question bank
Pcd question bank
 
Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...Using ANTLR on real example - convert "string combined" queries into paramete...
Using ANTLR on real example - convert "string combined" queries into paramete...
 
Java Programming Introduction Lexer 1 In this project we.pdf
Java Programming  Introduction Lexer 1 In this project we.pdfJava Programming  Introduction Lexer 1 In this project we.pdf
Java Programming Introduction Lexer 1 In this project we.pdf
 
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex ToolCompiler Engineering Lab#5 : Symbol Table, Flex Tool
Compiler Engineering Lab#5 : Symbol Table, Flex Tool
 
An Annotation Framework for Statically-Typed Syntax Trees
An Annotation Framework for Statically-Typed Syntax TreesAn Annotation Framework for Statically-Typed Syntax Trees
An Annotation Framework for Statically-Typed Syntax Trees
 
Sax Dom Tutorial
Sax Dom TutorialSax Dom Tutorial
Sax Dom Tutorial
 
Lexical Analysis
Lexical AnalysisLexical Analysis
Lexical Analysis
 
LEX & YACC
LEX & YACCLEX & YACC
LEX & YACC
 
Intro To TSQL - Unit 1
Intro To TSQL - Unit 1Intro To TSQL - Unit 1
Intro To TSQL - Unit 1
 
ITU - MDD - XText
ITU - MDD - XTextITU - MDD - XText
ITU - MDD - XText
 
Intro to tsql unit 1
Intro to tsql   unit 1Intro to tsql   unit 1
Intro to tsql unit 1
 
EnScript Workshop
EnScript WorkshopEnScript Workshop
EnScript Workshop
 
A Survey of Concurrency Constructs
A Survey of Concurrency ConstructsA Survey of Concurrency Constructs
A Survey of Concurrency Constructs
 
Convention-Based Syntactic Descriptions
Convention-Based Syntactic DescriptionsConvention-Based Syntactic Descriptions
Convention-Based Syntactic Descriptions
 
Oracle notes
Oracle notesOracle notes
Oracle notes
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Advanced REXX Programming Techniques
Advanced REXX Programming TechniquesAdvanced REXX Programming Techniques
Advanced REXX Programming Techniques
 
Viva
VivaViva
Viva
 
Viva
VivaViva
Viva
 

Recently uploaded

The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
engrasjadshahzad
 
Stiffness Method for structure analysis - Truss
Stiffness Method  for structure analysis - TrussStiffness Method  for structure analysis - Truss
Stiffness Method for structure analysis - Truss
adninhaerul
 
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptxDBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
Tulasi72
 
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
amzhoxvzidbke
 
Time-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 TalkTime-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 Talk
Evan Chan
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
Kamal Acharya
 
Jet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdfJet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdf
KIET Group of Institutions
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
Mani Krishna Sarkar
 
Presentation python programming vtu 6th sem
Presentation python programming vtu 6th semPresentation python programming vtu 6th sem
Presentation python programming vtu 6th sem
ssuser8f6b1d1
 
Conservation of Natural Resources Biodiversity.pptx
Conservation of Natural Resources Biodiversity.pptxConservation of Natural Resources Biodiversity.pptx
Conservation of Natural Resources Biodiversity.pptx
AdarshaMR1
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
IJAEMSJORNAL
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
T.D. Shashikala
 
Evento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recapEvento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recap
Rafael Santos
 
Online airline reservation system project report.pdf
Online airline reservation system project report.pdfOnline airline reservation system project report.pdf
Online airline reservation system project report.pdf
Kamal Acharya
 
Synthetic Test Collections for Retrieval Evaluation (Poster)
Synthetic Test Collections for Retrieval Evaluation (Poster)Synthetic Test Collections for Retrieval Evaluation (Poster)
Synthetic Test Collections for Retrieval Evaluation (Poster)
Hossein A. (Saeed) Rahmani
 
Traffic Engineering-MODULE-1 vtu syllabus.pptx
Traffic Engineering-MODULE-1 vtu syllabus.pptxTraffic Engineering-MODULE-1 vtu syllabus.pptx
Traffic Engineering-MODULE-1 vtu syllabus.pptx
mailmad391
 
Thermodynamics Digital Material basics subject
Thermodynamics Digital Material basics subjectThermodynamics Digital Material basics subject
Thermodynamics Digital Material basics subject
JigneshChhatbar1
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
KishorMahale5
 
PPT_grt.pptx engineering criteria grt for accrediation
PPT_grt.pptx engineering criteria  grt for accrediationPPT_grt.pptx engineering criteria  grt for accrediation
PPT_grt.pptx engineering criteria grt for accrediation
SHALINIRAJAN20
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
Mani Krishna Sarkar
 

Recently uploaded (20)

The world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptxThe world of Technology Management MEM 814.pptx
The world of Technology Management MEM 814.pptx
 
Stiffness Method for structure analysis - Truss
Stiffness Method  for structure analysis - TrussStiffness Method  for structure analysis - Truss
Stiffness Method for structure analysis - Truss
 
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptxDBMS Commands  DDL DML DCL ENTITY RELATIONSHIP.pptx
DBMS Commands DDL DML DCL ENTITY RELATIONSHIP.pptx
 
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
杨洋李一桐做爱视频流出【网芷:ht28.co】国产国产午夜精华>>>[网趾:ht28.co】]<<<
 
Time-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 TalkTime-State Analytics: MinneAnalytics 2024 Talk
Time-State Analytics: MinneAnalytics 2024 Talk
 
Ludo system project report management .pdf
Ludo  system project report management .pdfLudo  system project report management .pdf
Ludo system project report management .pdf
 
Jet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdfJet Propulsion and its working principle.pdf
Jet Propulsion and its working principle.pdf
 
IS Code SP 23: Handbook on concrete mixes
IS Code SP 23: Handbook  on concrete mixesIS Code SP 23: Handbook  on concrete mixes
IS Code SP 23: Handbook on concrete mixes
 
Presentation python programming vtu 6th sem
Presentation python programming vtu 6th semPresentation python programming vtu 6th sem
Presentation python programming vtu 6th sem
 
Conservation of Natural Resources Biodiversity.pptx
Conservation of Natural Resources Biodiversity.pptxConservation of Natural Resources Biodiversity.pptx
Conservation of Natural Resources Biodiversity.pptx
 
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
Best Practices of Clothing Businesses in Talavera, Nueva Ecija, A Foundation ...
 
Adv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdfAdv. Digital Signal Processing LAB MANUAL.pdf
Adv. Digital Signal Processing LAB MANUAL.pdf
 
Evento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recapEvento anual Splunk .conf24 Highlights recap
Evento anual Splunk .conf24 Highlights recap
 
Online airline reservation system project report.pdf
Online airline reservation system project report.pdfOnline airline reservation system project report.pdf
Online airline reservation system project report.pdf
 
Synthetic Test Collections for Retrieval Evaluation (Poster)
Synthetic Test Collections for Retrieval Evaluation (Poster)Synthetic Test Collections for Retrieval Evaluation (Poster)
Synthetic Test Collections for Retrieval Evaluation (Poster)
 
Traffic Engineering-MODULE-1 vtu syllabus.pptx
Traffic Engineering-MODULE-1 vtu syllabus.pptxTraffic Engineering-MODULE-1 vtu syllabus.pptx
Traffic Engineering-MODULE-1 vtu syllabus.pptx
 
Thermodynamics Digital Material basics subject
Thermodynamics Digital Material basics subjectThermodynamics Digital Material basics subject
Thermodynamics Digital Material basics subject
 
Unit 1 Information Storage and Retrieval
Unit 1 Information Storage and RetrievalUnit 1 Information Storage and Retrieval
Unit 1 Information Storage and Retrieval
 
PPT_grt.pptx engineering criteria grt for accrediation
PPT_grt.pptx engineering criteria  grt for accrediationPPT_grt.pptx engineering criteria  grt for accrediation
PPT_grt.pptx engineering criteria grt for accrediation
 
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
1239_2.pdf IS CODE FOR GI PIPE FOR PROCUREMENT
 

introduction_to_antlr 3.ppt

  • 1. 25 March 2023 Hande Çelikkanat 1 A (Long) Introduction to AntLR Slides adapted from: –AntLR Reference Manual by Terence Pratt antlr.org/share/1084743321127/ANTLR_Reference_Manual.pdf –AntLR Tutorial by Ashley J.S Mills http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/antlr/antlrhome.html –An Introduction to AntLR by Terence Pratt http://www.cs.usfca.edu/~parrt/course/652/lectures/antlr.html –An AntLR Tutorial by Scott Stanchfield javadude.com/articles/antlrtut/
  • 2. 2 AntLR ANother Tool for Language Recognition (or anti-LR??) a LL(k) parser and translator generator tool which can create – lexers – parsers – abstract syntax trees (AST’s) in which you describe the language grammatically and in return receive a program that can recognize and translate that language
  • 3. 3 Tasks Divided • Lexical Analysis (scanning) • Semantic Analysis (parsing) • Tree Generation • Code Generation
  • 4. 4 Lexer A source file is streamed to a lexer on a character by character basis by some kind of input interface. Lexer groups characters into meaningful tokens that are meaningful to the parser. A “token” may be – keywords – identifiers – symbols – operators Lexer also removes comments and whitespace from the program, which are meaningless to the parser. So it creates a stream of tokens, which are received one by one by the parser.
  • 5. 5 Parser Parser organizes the tokens into the allowed sequences defined by the grammar of the language. If the parser encounters a sequence of tokens that match none of the allowed sequences of tokens, it will issue an error A design choice is whether to try to recover from the error by making assumptions. Parsers may either do syntax-directed translation on-the-fly, or convert the sequences of tokens into an Abstract Syntax Tree (AST). An AST is a structure which – keeps information in an easily traversable form (such as operator at a node, operands at children of the node) – ignores form-dependent superficial details More on AST’s later... Parser also generates one or more symbol table(s) which contain information, about the tokens it encounters.
  • 6. 6 What does a grammar file look like? It is composed of rules ANTLR accepts three types of grammar specifications parsers lexers tree-parsers (also called tree-walkers) Uses LL(k) analysis for all So the grammar specifications are similar, and the generated lexers and parsers behave similarly
  • 7. 7 Sample File taken from AntLR tutorial of Ashley J.S Mills
  • 8. 8 Sample File Divided (1/3) • An arbitrary number of parsers, lexers, and tree- parsers in a grammar file – a separate class file will be generated for each – i.e, YourLexerClass.class, YourParserClass.class, YourTreeParserClass.class • Header: – put preamble that will be put on top of each of these classes – an import, maybe?
  • 9. 9 Sample File Divided (2/3) • Options – file-wide – charVocabulary = '0'..'377'; //defines the alphabet (usage in complement and wildcard) – k=2; // means two characters of lookahead • Class specific: { ... header for parser class only ...} class MyParser extends Parser; options { ...parser options... } { parser class members } parser rules
  • 10. 10 • Rules in EBNF notation: Sample File Divided (3/3) taken from AntLR tutorial of Ashley J.S Mills You simply list a set of lexical rules that match tokens. The tool automatically generates code to map the next input character(s) to a rule likely to match. A big "switch“ that routes recognition flow to the appropriate rule
  • 11. 11 Symbols in AntLR taken from AntLR reference manual
  • 12. 12 Lexer With one restriction: • Rules defined within a lexer grammar must have a name beginning with an uppercase letter taken from AntLR tutorial of Ashley J.S Mills
  • 13. 13 Lexer Rules You can define operators like: BECOMES : “:=“; COLON : ‘:‘; SEMI : ‘;’ ; EQUALS : ‘=‘ ; LBRACKET : ‘[‘; RBRACKET : ‘]’ ; LPAREN : ‘(‘ ; RPAREN : ‘)’ ; LT : ‘<‘ ; LTE : “<=“ ; PLUS : ‘+’ ; MINUS : ‘-’ ; TIMES : ‘*’ ; DIV : ‘/’ ; And then you can define a token class such as: OPS : (PLUS | MINUS | MULT | DIV) ;
  • 14. 14 Actions Blocks of source code (expressed in the target language) enclosed in curly braces Executed after the preceding production element has been recognized before the recognition of the following element Typically used to generate output, construct trees, or modify a symbol table Position dictates when it is recognized relative to the surrounding grammar elements. If the first element of a production, it is executed before any other element in that production, but only if that production is predicted by the lookahead rule_name ( {init-action}: {action of 1st production} production_1 | {action of 2nd production} production_2 )? The init-action would be executed regardless of what (if anything) matched in the optional subrule. The init-actions are placed within the loops generated for subrules (...)+ and (...)*.
  • 15. 15 Tip: Skipping Tokens A white space has nothing to do in a grammar: WS : (‘ ‘ | ‘n’ | ‘t’) { $setType(Token.SKIP); } → action ; → Do not pass this token to the parser. Recognize it and then throw it away. Same for comments ;)
  • 16. 16 Tip: Newline Stuff Line number of input is used for reporting error Must be incremented by hand when lexer encounters a newline: WS : ( ' ' | 't' | 'f' // handle newlines | ( "rn" // DOS/Windows | 'r' // Macintosh | 'n' // Unix ) // increment the line count { newline(); } → action executed only in this case ) { $setType(Token.SKIP); } ;
  • 17. 17 Parser class ExprParser extends Parser; expr: mexpr ((PLUS|MINUS) mexpr)* ; mexpr : atom (STAR atom)* ; atom: INT | LPAREN expr RPAREN ; • Rules defined within a parser grammar must have a name beginning with a lowercase letter
  • 18. 18 Tip: Keywords and Literals (1/2) Many languages have a general "identifier" lexical rule, and keywords that are special cases of the identifier pattern A typical identifier token may be defined as: ID : LETTER (LETTER | DIGIT)*; So how can AntLR understand “if” is not an identifier? You put fixed keywords into a literals table. checked after each token is matched Any double-quoted string used in a parser is automatically entered into the literals table of the associated lexer. subprogramBody : (basicDecl)* (procedureDecl)* "begin" (statement)* "end" IDENT ;
  • 19. 19 Tip: Keywords and Literals (2/2) option testLiterals By default, ANTLR will generate code in all lexer rules to test each token against the literals table However, you may suppress this code generation in the lexer by using a grammar option: class L extends Lexer; options { testLiterals=false; } ... If you turn this option off for a lexer, you may re-enable it for specific rules ID options { testLiterals=true; } : LETTER (LETTER | DIGIT)*;
  • 20. 20 Tip: Token Object Creation You will sometimes want to access information about the token being matched Label lexical rules and obtain a Token object representing the text, token type, line number, etc... matched for that rule reference Lexer rule: INT : ('0'..'9')+ ; Parser rule: INDEX : '[' i:INT ']' {System.out.println(i.getText());} ;
  • 21. 21 Tip: Syntactic / Semantic Predicates There are other situations where you have to turn on and off certain rules depending on prior context or semantic information Use “predicates” to decide
  • 22. 22 Syntactic Predicates ANTLR (tree) parsers usually use only a single symbol of lookahead, which is normally not a problem as intermediate forms are explicitly designed to be easy to walk However, there is occasionally the need to distinguish between similar tree structures Syntactic predicates can be used to overcome the limitations of limited fixed lookahead For example, distinguishing between the unary and binary minus operator: expr: ( #(MINUS expr expr) )=> #( MINUS expr expr ) | #( MINUS expr ) ... ; The order of evaluation is very important as the second alternative is a "subset" of the first alternative Syntactic predicates are a form of selective backtracking and, therefore, actions are turned off while evaluating a syntactic predicate so that actions do not have to be undone
  • 23. 23 Semantic Predicates Semantic predicates – at the start of an alternative: decides whether or not to match – in the middle of productions: throw exceptions when they evaluate to false stat: {isTypeName(LT(1))}? ID ID ";“ // declaration "type varName;" | ID "=" expr ";" // assignment ; decl: "var" ID ":" t:ID { isTypeName(t.getText()) }? //used to throw an exception ;
  • 24. 24 Eg: Keeping State Information Context-sensitive recognition example: If you are matching tokens that separate rows of data such as "----", you probably only want to match this if the "begin table" sequence has been found BEGIN_TABLE : '[' {this.inTable=true;} // enter table context ; ROW_SEP : {this.inTable}? "----“ // sematic predicate ; END_TABLE : ']' {this.inTable=false;} // exit table context ;
  • 25. 25 The Java Code The code to invoke the parser: import java.io.*; class Main { public static void main(String[] args) { try { // use DataInputStream to grab bytes MyLexer lexer = new MyLexer(new DataInputStream(System.in)); MyParser parser = new MyParser(lexer); int x = parser.expr(); System.out.println(x); } catch(Exception e) { System.err.println("exception: "+e); } } }
  • 26. 26 Running AntLR In Linux runantlr <antlr_file>.g javac *.java java Main In Windows Eclipse has a very easy-to-use plugin for AntLR http://antlreclipse.sourceforge.net/ for very very detailed instructions The plugin will run AntLR on the grammar file
  • 27. 27 Expression Evaluation 1: Syntax-Directed Translation To evaluate the expressions on the fly as the tokens come in, add actions to the parser: class ExprParser extends Parser; expr returns [int value=0] {int x;} : value=mexpr ( PLUS x=mexpr {value += x;} | MINUS x=mexpr {value -= x;} )* ; mexpr returns [int value=0] {int x;} : value=atom ( STAR x=atom {value *= x;} )* ; atom returns [int value=0] : i:INT {value=Integer.parseInt(i.getText());} | LPAREN value=expr RPAREN ;
  • 28. 28 Expression Evaluation 2: via AST Intermediate Form A more powerful strategy than syntax-directed translation is to build an AST: intermediate representation that holds all or most of the input symbols and has encoded, in the structure of the data, the relationship between those tokens For this kind of tree, you will use a tree walker to compute the same values as before, but using a different strategy The utility of ASTs becomes clear when you must do multiple walks over the tree to figure out what to compute or to do tree rewrites, morphing the tree towards another language.
  • 29. 29 Abstract Syntax Trees Abstract Syntax Tree: Like a parse tree, without unnecessary information Two-dimensional trees that can encode the structure of the input as well as the input symbols Either homogeneous: all objects of the same type; e.g., CommonAST in ANTLR or heterogeneous: multiple types such as PlusNode, MultNode... An AST for (3+4) might be represented as No parantheses are included in the tree!
  • 30. 30 AST Construction To get ANTLR to generate a useful AST : – turn on the buildAST option – add a few suffix operators class ExprParser extends Parser; options { buildAST=true; } expr: mexpr ((PLUS^|MINUS^) mexpr)* ; mexpr : atom (STAR^ atom)* ; atom: INT | LPAREN! expr RPAREN! ; No changes in the Lexer.
  • 31. 31 AST Operators AST root operator Normally AntLR makes the first token it encounters the root of the tree We usually want to manipulate this, eg, for operators A token suffixed with the “^” root operator forces that token as the root of the current tree: expr: mexpr ((PLUS^|MINUS^) mexpr)* ; AST exclude operator. Tokens / rule references suffixed with the exclude operator are not included in the AST eg, for parantheses: atom: INT | LPAREN! expr RPAREN! ;
  • 32. 32 AST Parsing and Evaluation Rule format is like #(A B C); which means "match a node of type A, and then descend into its list of children and match B and C". This notation can be nested arbitrarily, using #(...) for child trees eg, #(A B #(C D) ); class ExprTreeParser extends TreeParser; expr returns [int r=0] { int a,b; } : #(PLUS a=expr b=expr) {r = a+b;} | #(MINUS a=expr b=expr) {r = a-b;} | #(STAR a=expr b=expr) {r = a*b;} | i:INT {r = (int)Integer.parseInt(i.getText());} ; Important: Sufficient matches are not exact matches. As long as the tree satistfies the pattern, a match is reported, regardless of how much is left unparsed #( A B ) = #( A #(B C) D).
  • 33. 33 in Java The code to launch the parser and the tree walker: import java.io.*; import antlr.CommonAST; import antlr.collections.AST; class Calc { public static void main(String[] args) { try { CalcLexer lexer = new CalcLexer(new DataInputStream(System.in)); CalcParser parser = new CalcParser(lexer); parser.expr(); // Parse the input expression CommonAST t = (CommonAST)parser.getAST(); System.out.println(t.toStringList()); // Print the resulting tree out in LISP notation CalcTreeWalker walker = new CalcTreeWalker(); // Traverse the tree created by the parser int r = walker.expr(t); System.out.println("value is "+r); } catch(Exception e) { System.err.println("exception: "+e); } } }
  • 34. 34 AST Construction by Hand In some cases, you may want to transfom a tree yourself. eg, Optimization of addition with zero class CalcTreeWalker extends TreeParser; options{ buildAST = true; // "transform" mode } expr: ! #(PLUS left:expr right:expr) // '!' turns off auto transform { if ( #right.getType()==INT && Integer.parseInt(#right.getText())==0 ) // x+0 = x { #expr = #left; } else if ( #left.getType()==INT && Integer.parseInt(#left.getText())==0 ) // 0+x = x { #expr = #right; } else // x+y { #expr = #(PLUS, left, right); } } | #(STAR expr expr) // use auto transformation | i:INT ;
  • 35. 35 in Java The code to launch the parser and tree trasformer is: import java.io.*; import antlr.CommonAST; import antlr.collections.AST; class Calc { public static void main(String[] args) { try { CalcLexer lexer = new CalcLexer(new DataInputStream(System.in)); CalcParser parser = new CalcParser(lexer); parser.expr(); // Parse the input expression CommonAST t = (CommonAST)parser.getAST(); System.out.println(t.toLispString()); // Print the resulting tree out in LISP notation CalcTreeWalker walker = new CalcTreeWalker(); walker.expr(t); // Traverse the tree created by the parser t = (CommonAST)walker.getAST(); // Get the result tree from the walker System.out.println(t.toLispString()); } catch(Exception e) { System.err.println("exception: "+e); } } }
  • 36. 36 Left Recursion Solved E → E + T | T written in AntLR as expr: expr PLUS term | term; The code generated checks for expr infinitely: expr() { expr(); match(PLUS); expr(); } Eliminate left recursion by E → TE’ E’ → +TE’ | ε results in: expr: term (PLUS term)* ;
  • 37. 37 Links • AntLR Reference Manual by Terence Pratt antlr.org/share/1084743321127/ANTLR_Reference_Manual.pdf • AntLR Tutorial by Ashley J.S Mills http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/an tlr/antlrhome.html • An Introduction to AntLR by Terence Pratt http://www.cs.usfca.edu/~parrt/course/652/lectures/antlr.html • An AntLR Tutorial by Scott Stanchfield javadude.com/articles/antlrtut/