Java Programming Introduction Lexer 1 In this project we.pdfadinathassociates
Ā
Java Programming:
Introduction: Lexer 1
In this project, we will begin our lexer. Our lexer will start by reading the strings of the .shank file
that the user wants to run. It will break the Shank code up into "words" or tokens and build a
collection of these tokens. We can consider the lexer complete when it can take any Shank file
and output a list of the tokens generated.
We will not be using the Scanner class that you may be familiar with for reading from a file; we
are, instead, using Files.readAllLines. This is a much simpler way of dealing with files.
Example of readAllLines:
Path myPath = Paths.get("someFile.shank");
List <String> lines = Files.readAllLines(myPath, StandardCharsets.UTF_8);
A second concept you may not be familiar with is "enum". Enum, short for enumeration. This is a
Java language construct that lets us create a variable that may be any one of a list of things. We
will use this to define the types of tokens - a tokenType.
enum colorType { RED, GREEN, BLUE }
colorType myFavorite = colorType.BLUE;
System.out.println(myFavorite); // prints BLUE
Details
To start with, we will build a lexer that accepts words and numbers and generates a collection of
tokens. There are three types of tokens in this assignment - WORD, NUMBER and ENDOFLINE.
A word is defined as a letter (upper or lower case) and then any number of letters or numbers. In
regular expression terms: [A-Za-z][A-Za-z0-9]* - anything not a letter or number ends the word.
A number is defined as integer or floating point; in regular expressions: [0-9]*[.]?[0-9]+ - anything
else ends the number.
Any character that is not word or number is not a token (for now) but does show the end of a
token.
For example:
word1 123.456 2
would lex to 4 tokens:
WORD (word1)
NUMBER (123.456)
NUMBER (2)
ENDOFLINE
The lexer class will hold a collection of tokens; it will start out empty and calls to lex() will add to it.
You must use a state machine to keep track of what type of token you are in the middle of. Any
character that is not a letter or number will reset the state machine and output the current token.
The end of the line will also cause the current token to be output. Output an ENDOFLINE token at
the end of each line.
This assignment must have three different source code files.
One file must be called Shank.java.
Shank.java must contain main. Your main must ensure that there is one and only one argument
(args). If there are none or more than 1, it must print an appropriate error message and exit. That
one argument will be considered as a filename. Your main must then use Files.ReadAllLines to
read all of the lines from the file denoted by filename. Your main must instantiate one instance of
your Lexer class (to be defined below). You must parse all lines using the lex method of the Lexer
class (calling it repeatedly). If lex throws an exception, you must catch the exception, print that
there was an exception. You must then print each token out (this is a temporary step to show .
Using Static Analysis in Program DevelopmentPVS-Studio
Ā
Static analysis allows checking program code before the tested program is executed. The static analysis process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants, identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is performed over the AST. This article describes three static analysis techniques: AST walker analysis, data flow analysis and path-sensitive data flow analysis.
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
* The Token set as an Enum type. This is just "binds" each token to its list of
* valid lexemes.
*
*
*/
public enum TOKEN {
ARTICLE("a", "the"), // a list of articles
NOUN("dog", "cat", "rat"), // a list of nouns
VERB("loves", "hates", "eats"), // a list of verbs
UNKNOWN(); // keep our lexemes "type-safe"!
//
// The lexemes under this token
private List<String> lexemeList;
// Construct the token with the list of lexems
private TOKEN(String... tokenStrings) {
lexemeList = new ArrayList<>(tokenStrings.length);
lexemeList.addAll(Arrays.asList(tokenStrings));
}
// Gets a token from a lexeme
public static TOKEN fromLexeme(String str) {
// Search through the lexemes looking for a match.
for (TOKEN t : TOKEN.values()) {
if (t.lexemeList.contains(str)) {
return t;
}
}
// If nothing matches then return UNKNOWN.
return UNKNOWN;
}
}
/**
* Programming Languages: Implementation and Design.
*
* A Simple Compiler Adapted from Sebesta (2010) by Josh Dehlinger further modified by Adam Conover
* (2012-2015)
*
* A simple compiler used for the simple English grammar in Section 2.2 of Adam Brooks Weber's
* "Modern Programming Languages" book. Parts of this code was adapted from Robert Sebesta's
* "Concepts of Programming Languages".
*
* This compiler assumes that the source file containing the sentences to parse is provided as the
* first runtime argument. Within the source file, the compiler assumes that each sentence to parse
* is provided on its own line.
*
* NOTE: A "real" compiler would more likely treat an entire file as a single stream of input,
* rather than each line being an independent input stream.
*/
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Compiler {
/**
* It is assumed that the first argument provided is the name of the source file that is to be
* "compiled".
*/
public static void main(String[] args) throws IOException {
//args = new String[]{"<some hard coded path for testing>"};
//args = new String[]{"D:\\Version_Controlled\\_SVN_\\Newton\\COS420\\Java\\ParserSample\\input.txt"};
if (args.length < 1) {
System.out.println("Need a filename!");
} else {
// Java 7 "try-with-resource" to create the file input buffer.
try (BufferedReader br = new BufferedReader(new FileReader(args[0]))) {
// Create the new lexer.
LexicalAnalyzer lexer = new LexicalAnalyzer();
// Start lexing and parsing.
processFile(lexer, br);
}
}
}
/**
* Reads each line of the input file and invokes th.
Java Programming Introduction Lexer 1 In this project we.pdfadinathassociates
Ā
Java Programming:
Introduction: Lexer 1
In this project, we will begin our lexer. Our lexer will start by reading the strings of the .shank file
that the user wants to run. It will break the Shank code up into "words" or tokens and build a
collection of these tokens. We can consider the lexer complete when it can take any Shank file
and output a list of the tokens generated.
We will not be using the Scanner class that you may be familiar with for reading from a file; we
are, instead, using Files.readAllLines. This is a much simpler way of dealing with files.
Example of readAllLines:
Path myPath = Paths.get("someFile.shank");
List <String> lines = Files.readAllLines(myPath, StandardCharsets.UTF_8);
A second concept you may not be familiar with is "enum". Enum, short for enumeration. This is a
Java language construct that lets us create a variable that may be any one of a list of things. We
will use this to define the types of tokens - a tokenType.
enum colorType { RED, GREEN, BLUE }
colorType myFavorite = colorType.BLUE;
System.out.println(myFavorite); // prints BLUE
Details
To start with, we will build a lexer that accepts words and numbers and generates a collection of
tokens. There are three types of tokens in this assignment - WORD, NUMBER and ENDOFLINE.
A word is defined as a letter (upper or lower case) and then any number of letters or numbers. In
regular expression terms: [A-Za-z][A-Za-z0-9]* - anything not a letter or number ends the word.
A number is defined as integer or floating point; in regular expressions: [0-9]*[.]?[0-9]+ - anything
else ends the number.
Any character that is not word or number is not a token (for now) but does show the end of a
token.
For example:
word1 123.456 2
would lex to 4 tokens:
WORD (word1)
NUMBER (123.456)
NUMBER (2)
ENDOFLINE
The lexer class will hold a collection of tokens; it will start out empty and calls to lex() will add to it.
You must use a state machine to keep track of what type of token you are in the middle of. Any
character that is not a letter or number will reset the state machine and output the current token.
The end of the line will also cause the current token to be output. Output an ENDOFLINE token at
the end of each line.
This assignment must have three different source code files.
One file must be called Shank.java.
Shank.java must contain main. Your main must ensure that there is one and only one argument
(args). If there are none or more than 1, it must print an appropriate error message and exit. That
one argument will be considered as a filename. Your main must then use Files.ReadAllLines to
read all of the lines from the file denoted by filename. Your main must instantiate one instance of
your Lexer class (to be defined below). You must parse all lines using the lex method of the Lexer
class (calling it repeatedly). If lex throws an exception, you must catch the exception, print that
there was an exception. You must then print each token out (this is a temporary step to show .
Using Static Analysis in Program DevelopmentPVS-Studio
Ā
Static analysis allows checking program code before the tested program is executed. The static analysis process consists of three steps. First, the analyzed program code is split into tokens, i.e. constants, identifiers, reserved symbols, etc. This operation is performed by lexer. Second, the tokens are passed to parser, which builds an abstract syntax tree (AST) based on the tokens. Finally, the static analysis is performed over the AST. This article describes three static analysis techniques: AST walker analysis, data flow analysis and path-sensitive data flow analysis.
Lexical Analysis, Tokens, Patterns, Lexemes, Example pattern, Stages of a Lexical Analyzer, Regular expressions to the lexical analysis, Implementation of Lexical Analyzer, Lexical analyzer: use as generator.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
/**
* The Token set as an Enum type. This is just "binds" each token to its list of
* valid lexemes.
*
*
*/
public enum TOKEN {
ARTICLE("a", "the"), // a list of articles
NOUN("dog", "cat", "rat"), // a list of nouns
VERB("loves", "hates", "eats"), // a list of verbs
UNKNOWN(); // keep our lexemes "type-safe"!
//
// The lexemes under this token
private List<String> lexemeList;
// Construct the token with the list of lexems
private TOKEN(String... tokenStrings) {
lexemeList = new ArrayList<>(tokenStrings.length);
lexemeList.addAll(Arrays.asList(tokenStrings));
}
// Gets a token from a lexeme
public static TOKEN fromLexeme(String str) {
// Search through the lexemes looking for a match.
for (TOKEN t : TOKEN.values()) {
if (t.lexemeList.contains(str)) {
return t;
}
}
// If nothing matches then return UNKNOWN.
return UNKNOWN;
}
}
/**
* Programming Languages: Implementation and Design.
*
* A Simple Compiler Adapted from Sebesta (2010) by Josh Dehlinger further modified by Adam Conover
* (2012-2015)
*
* A simple compiler used for the simple English grammar in Section 2.2 of Adam Brooks Weber's
* "Modern Programming Languages" book. Parts of this code was adapted from Robert Sebesta's
* "Concepts of Programming Languages".
*
* This compiler assumes that the source file containing the sentences to parse is provided as the
* first runtime argument. Within the source file, the compiler assumes that each sentence to parse
* is provided on its own line.
*
* NOTE: A "real" compiler would more likely treat an entire file as a single stream of input,
* rather than each line being an independent input stream.
*/
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class Compiler {
/**
* It is assumed that the first argument provided is the name of the source file that is to be
* "compiled".
*/
public static void main(String[] args) throws IOException {
//args = new String[]{"<some hard coded path for testing>"};
//args = new String[]{"D:\\Version_Controlled\\_SVN_\\Newton\\COS420\\Java\\ParserSample\\input.txt"};
if (args.length < 1) {
System.out.println("Need a filename!");
} else {
// Java 7 "try-with-resource" to create the file input buffer.
try (BufferedReader br = new BufferedReader(new FileReader(args[0]))) {
// Create the new lexer.
LexicalAnalyzer lexer = new LexicalAnalyzer();
// Start lexing and parsing.
processFile(lexer, br);
}
}
}
/**
* Reads each line of the input file and invokes th.
I have made this presentation for my personal work puposes.
Just want to have some comments, suggestions, advices from others to make it better.
Hope that you guys will help me out
Have you ever wondered how you would implement a parser for simple arithmetic expressions in C++? Faced with this problem, many C++ programmers would generate a parser using Flex and Bison, or perhaps Boost Spirit. In this talk, I introduced some alternative tools, Ragel and Lemon, showing how they can be used to build a calculator that handles simple arithmetic expressions.
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
Ā
1. Hello ANTLR: ANother Tool for Language Recognition
2. Where we can use ANTLR?
3. Why just not use regular expression language?
4. Tools under ANTLR umbrella
5. ANTLR basic syntax
6. ANTLR on real example
I have made this presentation for my personal work puposes.
Just want to have some comments, suggestions, advices from others to make it better.
Hope that you guys will help me out
Have you ever wondered how you would implement a parser for simple arithmetic expressions in C++? Faced with this problem, many C++ programmers would generate a parser using Flex and Bison, or perhaps Boost Spirit. In this talk, I introduced some alternative tools, Ragel and Lemon, showing how they can be used to build a calculator that handles simple arithmetic expressions.
Using ANTLR on real example - convert "string combined" queries into paramete...Alexey Diyan
Ā
1. Hello ANTLR: ANother Tool for Language Recognition
2. Where we can use ANTLR?
3. Why just not use regular expression language?
4. Tools under ANTLR umbrella
5. ANTLR basic syntax
6. ANTLR on real example
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
Ā
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Ā
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Ā
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Ā
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Hanās Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insiderās LMA Course, this piece examines the courseās effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Ā
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
2. MEET ANTLR
ANTLR is written in Java - so installing it is a
matter of downloading the latest jar, such as
antlr-4.0-complete.jar
For syntax diagrams of grammar rules, syntax
highlighting of ANTLR 4 grammars, etc, use the
ANTLRWorks 2 or ANTRLWorks 2 Netbeans plugin.
http:/
/tunnelvisionlabs.com/products/demo/
antlrworks
3. ANTLR (ANother Tool for Language Recognition) is a
powerful parser generator for translating structured
text.
It's widely used to build language related tools.
From a grammar, ANTLR generates a parser that can
build and walk parse trees.
What is ANTLR?
4. Terence Parr is the maniac behind ANTLR and has
been working on language tools since 1989. He is a
professor of computer science at the University of
San Francisco.
Twitter search uses for example ANTLR for query
parsing.
5. Create aliases for the ANTLR Tool, and TestRig.
$ alias antlr4='java -jar /usr/local/lib/antlr-4.0-complete.jar'
$ alias grun='java org.antlr.v4.runtime.misc.TestRig'
Getting Started
6. Hello.g4
/
/ Deļ¬ne a grammar called Hello
grammar Hello;
r : 'hello' ID ; /
/ match keyword hello followed by an identiļ¬er
ID : [a-z]+ ; /
/ match lower-case identiļ¬ers
WS : [ trn]+ -> skip ; /
/ skip spaces, tabs, newlines
Then run ANTLR the tool on it:
$ antlr4 Hello.g4
$ javac Hello*.java
Now test it:
$ grun Hello r -tree
hello brink
^D
(r hello brink)
A First Example
7. $ grun Hello r -gui
hello brink
^D
This pops up a dialog box showing that rule r matched keyword hello followed by
the identiļ¬er brink.
8. THE BIG PICTURE
Tokenizing:
The process of grouping characters into words or
symbols (tokens) is called lexical analysis or simply
tokenizing.
We call a program that tokenizes the input a lexer.
The lexer group related tokens into token classes, or
token types, such as INT, ID, FLOAT, etc.
Tokens consist of at least two pieces of information: the
token type and the text matched for that token by the
lexer.
9. The parser feeds off of the tokens to recognize the
sentence structure.
By default, ANTLR-generated parsers build a parse
tree that records how the parse recognized the input
sentence.
10. By producing a parse tree, a parser delivers a handy
data structure to be used by the rest of the language
translation application.
Trees are easy to process in subsequent steps and are
well understood by programmers.
Better yet, the parser can generate parse trees
automatically.
11. The ANTLR tool generates recursive-descent parsers
from grammar rules.
Recursive-descent parsers are really just a collection
of recursive methods, one per rule.
The descent term refers to the fact that parsing
begins at the root of a parse tree and proceeds
toward the leaves (tokens).
12. To get an idea of what recursive-descent parsers look
like, next some (slightly cleaned up) methods that ANTLR
generates from grammar rules:
assign : ID '=' expr ';' ; /
/ match an assignment statement like "sp = 100;"
/
/ assign : ID '=' expr ';' ;
void assign() { /
/ method generated from rule assign
match(ID); /
/ compare ID to current input symbol then consume
match('=');
expr(); /
/ match an expression by calling expr()
match(';');
}
13. /** Match any kind of statement starting at the current input position */
stat: assign /
/ First alternative ('|' is alternative separator)
| ifstat /
/ Second alternative
| whilestat
...
;
The parser will generate:
void stat() {
switch ( Ā«current input tokenĀ» ) {
CASE ID : assign(); break;
CASE IF : ifstat(); break; /
/ IF is token type for keyword 'if'
CASE WHILE : whilestat(); break;
...
default : Ā«raise no viable alternative exceptionĀ»
}
}
14. In the example on the previous slide:
Method stat() has to make a parsing decision or
prediction by examining the next input token.
Parsing decisions predict which alternative will be
successful.
In this case, seeing a WHILE keyword predicts the
third alternative of rule().
A lookahead token is any token that the parser sniffs
before matching and consuming it.
Sometimes, the parser needs lots of lookahead tokens
to predict which alternative will succeed. ANTLR
silently handles all of this for you.
15. Ambiguous Grammars
stat: ID '=' expr ';' /
/ match an assignment; can match "f();"
| ID '=' expr ';' /
/ oops! an exact duplicate of previous alternative
;
expr: INT ;
stat: expr ';' /
/ expression statement
| ID '(' ')' ';' /
/ function call statement
;
expr: ID '(' ')'
| INT
;
ANTLR resolves the ambiguity by choosing the ļ¬rst alternative involved in the decision.
16. Ambiguities can occur in the lexer as well as the parser.
ANTLR resolves lexical ambiguities by matching the input string to the rule
speciļ¬ed ļ¬rst in the grammar.
BEGIN : 'begin' ; /
/ match b-e-g-i-n sequence; ambiguity resolves to BEGIN
ID : [a-z]+ ; /
/ match one or more of any lowercase letter
17. Building Language Applications Using Parse Trees
Lexers process characters and pass tokens to the parser, which in turn checks syntax
and creates a parse tree.
The corresponding ANTLR classes are CharStream, Lexer, Token, Parser, and ParseTree.
The āpipeā connecting the lexer and parser is called a TokenStream.
18. ANTLR uses context objects which knows the start and stop tokens for a
recognized phrase and provides access to all of the elements of that phrase.
For example, AssignContext provides methods ID() and expr() to access the
identiļ¬er node and expression subtree.
19. Parse-Tree Listeners and Visitors
By default, ANTLR generates a parse-tree listener interface that responds to
events triggered by the built-in tree walker.
To walk a tree and trigger calls into a listener, ANTLRās runtime provides the class
ParseTreeWalker.
To make a language application, we write a ParseTreeListener.
The beauty of the listener mechanism is that itās all automatic. We donāt have to
write a parse-tree walker, and our listener methods donāt have to explicitly visit
their children.
21. Parse-Tree Visitors
There are situations, however, where we want to control the walk itself, explicitly
calling methods to visit children.
Option -visitor asks ANTLR to generate a visitor interface from a grammar with a visit
method per rule.
ParseTree tree = ... ; /
/ tree is result of parsing
MyVisitor v = new MyVisitor();
v.visit(tree);
22. A STATER ANTLR PROJECT
Letās build a grammar to recognize integers in, possibly nested, curly braces like
{1, 2, 3} and {1, {2, 3}, 4}.
grammar ArrayInit;
/** A rule called init that matches comma-separated values between {...}. */
init : '{' value (',' value)* '}' ; /
/ must match at least one value
/** A value can be either a nested array/struct or a simple integer (INT) */
value : init | INT;
/
/ parser rules start with lowercase letters, lexer rules with uppercase
INT : [0-9]+ ; /
/ Deļ¬ne token INT as one or more digits
WS : [ trn]+ -> skip ; /
/ Deļ¬ne whitespace rule, toss it out
23. From the grammar ArrayInit.g4, ANTLR generates the following ļ¬les:
24. ArrayInitParser.java This ļ¬le contains the parser class deļ¬nition speciļ¬c to
grammar ArrayInit that recognizes our array language syntax.
ArrayInitLexer.java ANTLR automatically extracts a separate parser and lexer
speciļ¬cation from our grammar.
ArrayInit.tokens ANTLR assigns a token type number to each token we deļ¬ne and
stores these values in this ļ¬le.
ArrayInitListener.java, ArrayInitBaseListener.java By default, ANTLR parsers build
a tree from the input. By walking that tree, a tree walker can ļ¬re
āeventsā (callbacks) to a listener object that we provide.
ArrayInitListener is the interface that describes the callbacks we can implement.
ArrayInitBaseListener is a set of empty default implementations.
26. Integrating a Generated Parser into a Java Program
/
/ import ANTLR's runtime libraries
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.*;
public class Test {
public static void main(String[] args) throws Exception {
/
/ create a CharStream that reads from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
/
/ create a lexer that feeds off of input CharStream
ArrayInitLexer lexer = new ArrayInitLexer(input);
/
/ create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
/
/ create a parser that feeds off the tokens buffer
ArrayInitParser parser = new ArrayInitParser(tokens);
ParseTree tree = parser.init(); /
/ begin parsing at init rule
System.out.println(tree.toStringTree(parser)); /
/ print LISP-style tree
}
}
27. Building a Language Application using ArrayInitBaseListener
/** Convert short array inits like {1,2,3} to "u0001u0002u0003" */
public class ShortToUnicodeString extends ArrayInitBaseListener {
/** Translate { to " */
@Override
public void enterInit(ArrayInitParser.InitContext ctx) {System.out.print(' " '); }
/** Translate } to " */
@Override
public void exitInit(ArrayInitParser.InitContext ctx) {System.out.print(' " '); }
/** Convert short array inits like {1,2,3} to "u0001u0002u0003" */
@Override
public void enterValue(ArrayInitParser.ValueContext ctx) {
/
/ Assumes no nested array initializers
int value = Integer.valueOf(ctx.INT().getText());
System.out.printf("u%04x", value); }
}
}
28. Driver program for ArrayInitBaseListener
public class Translate {
public static void main(String[] args) throws Exception {
/
/ create a CharStream that reads from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
/
/ create a lexer that feeds off of input CharStream
ArrayInitLexer lexer = new ArrayInitLexer(input);
/
/ create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
/
/ create a parser that feeds off the tokens buffer
ArrayInitParser parser = new ArrayInitParser(tokens);
ParseTree tree = parser.init(); /
/ begin parsing at init rule
/
/ Create a generic parse tree walker that can trigger callbacks
ParseTreeWalker walker = new ParseTreeWalker();
/
/ Walk the tree created during the parse, trigger callbacks
walker.walk(new ShortToUnicodeString(), tree);
System.out.println(); /
/ print a n after translation
}
}
29. Building a Calculator Using a Visitor
grammar LabeledExpr;
prog: stat+ ;
stat: expr NEWLINE # printExpr
| ID '=' expr NEWLINE # assign
| NEWLINE # blank
;
expr: expr op=('*'|'/') expr # MulDiv
| expr op=('+'|'-') expr # AddSub
| INT # int
| ID # id
| '(' expr ')' # parens
;
MUL : '*' ; /
/ assigns token name to '*' used above in grammar
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
ID : [a-zA-Z]+ ; /
/ match identiļ¬ers
INT : [0-9]+ ; /
/ match integers
30. Driver class: Calc.java
LabeledExprLexer lexer = new LabeledExprLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
LabeledExprParser parser = new LabeledExprParser(tokens);
ParseTree tree = parser.prog(); /
/ parse
EvalVisitor eval = new EvalVisitor();
eval.visit(tree);
31. Use ANTLR to generate a visitor interface with a method for each labeled
$ antlr4 -no-listener -visitor LabeledExpr.g4
public interface LabeledExprVisitor<T> {
T visitId(LabeledExprParser.IdContext ctx);
T visitAssign(LabeledExprParser.AssignContext ctx);
T visitMulDiv(LabeledExprParser.MulDivContext ctx);
...
}
To implement the calculator, we override the methods associated with statement and expression alternatives.
32. import java.util.HashMap;
import java.util.Map;
public class EvalVisitor extends LabeledExprBaseVisitor<Integer> {
/** "memory" for our calculator; variable/value pairs go here */
Map<String, Integer> memory = new HashMap<String, Integer>();
/** ID '=' expr NEWLINE */
@Override
public Integer visitAssign(LabeledExprParser.AssignContext ctx) {
String id = ctx.ID().getText(); /
/ id is left-hand side of '='
int value = visit(ctx.expr());
memory.put(id, value);
return value;
}
etc.
33. Building a Translator with a Listener
Imagine you want to build a tool that generates a Java interface ļ¬le from the methods in a Java class deļ¬nition.
Sample Input:
import java.util.List;
import java.util.Map;
public class Demo {
void f(int x, String y) {...}
int[ ] g( ) { return null; }
List<Map<String, Integer>>[ ] h( ) { return null; }
}
Output:
interface IDemo {
void f(int x, String y);
int[ ] g( );
List<Map<String, Integer>>[ ] h( );
}
34. The key āinterfaceā between the grammar and our listener object is called
JavaListener, and ANTLR automatically generates it for us.
It deļ¬nes all of the methods that the class ParseTreeWalker from ANTLRās runtime
can trigger as it traverses the parse tree.
Here are the relevant methods from the generated listener interface:
public interface JavaListener extends ParseTreeListener {
void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx);
void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx);
void enterMethodDeclaration(JavaParser.MethodDeclarationContext ctx);
...
}
35. Listener methods are called by the ANTLR-provided walker object, whereas
visitor methods must walk their children with explicit visit calls.
Forgetting to invoke visit() on a nodeās children means those subtrees donāt get
visited.
ANTLR generates a default implementation called JavaBaseListener. Our
interface extractor can then subclass JavaBaseListener and override the
methods of interest.
36. public class ExtractInterfaceListener extends JavaBaseListener {
JavaParser parser;
public ExtractInterfaceListener(JavaParser parser) {this.parser = parser;}
/** Listen to matches of classDeclaration */
@Override
public void enterClassDeclaration(JavaParser.ClassDeclarationContext ctx){
System.out.println("interface I"+ctx.Identiļ¬er()+" {");
}
@Override
public void exitClassDeclaration(JavaParser.ClassDeclarationContext ctx) {
System.out.println("}");
}
/** Listen to matches of methodDeclaration */
@Override
public void enterMethodDeclaration(JavaParser.MethodDeclarationContext ctx)
{
/
/ need parser to get tokens
TokenStream tokens = parser.getTokenStream();
String type = "void";
if ( ctx.type()!=null ) {
type = tokens.getText(ctx.type());
}
String args = tokens.getText(ctx.formalParameters());
System.out.println("t"+type+" "+ctx.Identiļ¬er()+args+";");
}
}