ANTLR4 and its testing
Sahil Sawhney
Software Consultant
Knoldus Software, LLP
Agenda
● Understanding Grammar
● Parse tree
● Process Of Parsing
● Knowing ANTLR
● Who use ANTLR
● Testing in ANTLR
● Demonstration
What is grammar?
“The set of rules that explains how words are used in
a language”
-Merriam Webster
For example use of ‘an’ in English grammar
Why grammar?
“To bring order out of chaos”
And
we adore chaos because we love to produce order.
What type of grammar?
Context-free grammar (CFG)
It consists of a finite set of grammar rules in form
of a quadruple (N, T, S, P) where
➔ N is a set of non-terminal symbols. (Placeholders)
➔ T is a set of terminals where N ∩ T = NULL
➔ S is the start symbol. (must be a non-terminal)
➔ P is a set of production rules, P: N → (N T)*∪
An Example
Consider the production rule for palindrome with
alphabet {a,b}.
S → aSa | bSb | a | b | ε
here,
➔ S is the start as well as non-terminal symbol
➔ {a,b} is the set of terminal nodes
Example Cont...
Consider the string “ababa”
Corresponding Parse Tree →
(read from left terminal to right terminal node)
S
a S
a
b bS
a
What is a parse tree?
A parse tree for a grammar G is a tree where
➔ The root is the start symbol for G
➔ The interior nodes are the nonterminals of G
➔ The leaf nodes are the terminal symbols of G
A terminal string is considered valid with respect to
a grammar only if there exists a valid parse tree for
the input string among all possible parse trees.
Finally, what is ANTLR?
➔ ANTLR (ANother Tool for Language
Recognition) is a powerful parser generator for
reading, processing, executing, or translating
structured text or binary files.
➔ From a grammar, ANTLR generates a parser that
can build and walk parse trees (data structure
representing how a grammar matches the input).
Now what is this parser?
➔ A parser is a program that takes input in the form
of a sequence of tokens or program instructions
and usually builds a data structure in the form of a
parse tree or an abstract syntax tree
ababa
I am the
parser
Grammar rules
(S → aSa | bSb | a | b | ε)
S
a S a
b bS
a
3 stages of parsing
➔ Lexical Analysis – It produces tokens from a
stream of input string.
➔ Syntactic Analysis – Checks weather generated
tokens form a grammatically correct expression.
➔ Semantic Parsing – If expressions are valid a
meaning is associated with the expression and
necessary actions are taken.
ANTLR Cont...
In a nutshell,
➔ the ANTLR tool converts grammars into programs
(Java programs for now) that recognize sentences
in the language described by the grammar.
➔ For example, given a grammar for JSON, the
ANTLR tool generates a program that recognizes
JSON input using some support classes from the
ANTLR runtime library.
MyGrammar.g4
I am
ANTLR
And the version
is 4
MyGrammar.tokens
MyGrammarBaseListner
MyGrammarBaseVisitor
MyGrammarLexer
MyGrammarLexer.tokens
MyGrammarListner
MyGrammarParser
MyGrammarVisito
r
Here, ANTLR acts on the grammar and
generate corresponding Java files
But why ANTLR?
➔ ANTLR generates recursive decent parsers
(type of a top down parser) and has good error
reporting.
➔ The parser generated by ANTLR is more or
less readable. This helps in debugging.
➔ ANTLR is available as "open source" and there
are a number of ANTLR users world wide, so
there is a reasonable chance that bugs will be
identified and corrected.
When to use ANTLR4?
DSL (Domain Specific Language)
Anyone care about ANTLR?
The following say “YES WE DO” :
➔ Twitter search uses ANTLR for query parsing,
with more than 2 billion queries a day
➔ The NetBeans IDE parses C++ with ANTLR
➔ Oracle uses ANTLR within the SQL Developer
IDE and its migration tools
➔ Knoldus uses ANTLR in there projects to
achieve DSL (domain specific language)
requirements
Any Alternatives?
The list is long. Some examples are :
➔ CL-Yacc (Common Lisp)
➔ Gold (C#, Java, Python, Visual Basic etc.)
➔ Hime Parser Generator (C#, Java)
➔ Coco/R (Ada, Pascal, Oberon, Ruby)
➔ Yecc (Erlang)
Etc……..
Testing ANTLR4
➔ ANTLR provides a flexible testing tool in the
runtime library called TestRig .
➔ It can display lots of information about how a
recognizer(auto generated Java classes)
matches input from a file or standard input.
➔ TestRig uses Java reflection to invoke compiled
recognizers
References
● https://github.com/antlr/antlr4/blob/master/doc/gettin
● https://www.javacodegeeks.com/2012/06/antlr-getti
● https://en.wikipedia.org/wiki/Context-free_grammar
● https://blog.knoldus.com/2016/04/29/testing-gramm
Any Question?
Thank You !!!

ANTLR4 and its testing

  • 1.
    ANTLR4 and itstesting Sahil Sawhney Software Consultant Knoldus Software, LLP
  • 2.
    Agenda ● Understanding Grammar ●Parse tree ● Process Of Parsing ● Knowing ANTLR ● Who use ANTLR ● Testing in ANTLR ● Demonstration
  • 3.
    What is grammar? “Theset of rules that explains how words are used in a language” -Merriam Webster For example use of ‘an’ in English grammar
  • 4.
    Why grammar? “To bringorder out of chaos” And we adore chaos because we love to produce order.
  • 5.
    What type ofgrammar? Context-free grammar (CFG) It consists of a finite set of grammar rules in form of a quadruple (N, T, S, P) where ➔ N is a set of non-terminal symbols. (Placeholders) ➔ T is a set of terminals where N ∩ T = NULL ➔ S is the start symbol. (must be a non-terminal) ➔ P is a set of production rules, P: N → (N T)*∪
  • 6.
    An Example Consider theproduction rule for palindrome with alphabet {a,b}. S → aSa | bSb | a | b | ε here, ➔ S is the start as well as non-terminal symbol ➔ {a,b} is the set of terminal nodes
  • 7.
    Example Cont... Consider thestring “ababa” Corresponding Parse Tree → (read from left terminal to right terminal node) S a S a b bS a
  • 8.
    What is aparse tree? A parse tree for a grammar G is a tree where ➔ The root is the start symbol for G ➔ The interior nodes are the nonterminals of G ➔ The leaf nodes are the terminal symbols of G A terminal string is considered valid with respect to a grammar only if there exists a valid parse tree for the input string among all possible parse trees.
  • 9.
    Finally, what isANTLR? ➔ ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. ➔ From a grammar, ANTLR generates a parser that can build and walk parse trees (data structure representing how a grammar matches the input).
  • 10.
    Now what isthis parser? ➔ A parser is a program that takes input in the form of a sequence of tokens or program instructions and usually builds a data structure in the form of a parse tree or an abstract syntax tree ababa I am the parser Grammar rules (S → aSa | bSb | a | b | ε) S a S a b bS a
  • 11.
    3 stages ofparsing ➔ Lexical Analysis – It produces tokens from a stream of input string. ➔ Syntactic Analysis – Checks weather generated tokens form a grammatically correct expression. ➔ Semantic Parsing – If expressions are valid a meaning is associated with the expression and necessary actions are taken.
  • 12.
    ANTLR Cont... In anutshell, ➔ the ANTLR tool converts grammars into programs (Java programs for now) that recognize sentences in the language described by the grammar. ➔ For example, given a grammar for JSON, the ANTLR tool generates a program that recognizes JSON input using some support classes from the ANTLR runtime library.
  • 13.
    MyGrammar.g4 I am ANTLR And theversion is 4 MyGrammar.tokens MyGrammarBaseListner MyGrammarBaseVisitor MyGrammarLexer MyGrammarLexer.tokens MyGrammarListner MyGrammarParser MyGrammarVisito r Here, ANTLR acts on the grammar and generate corresponding Java files
  • 14.
    But why ANTLR? ➔ANTLR generates recursive decent parsers (type of a top down parser) and has good error reporting. ➔ The parser generated by ANTLR is more or less readable. This helps in debugging. ➔ ANTLR is available as "open source" and there are a number of ANTLR users world wide, so there is a reasonable chance that bugs will be identified and corrected.
  • 15.
    When to useANTLR4? DSL (Domain Specific Language)
  • 16.
    Anyone care aboutANTLR? The following say “YES WE DO” : ➔ Twitter search uses ANTLR for query parsing, with more than 2 billion queries a day ➔ The NetBeans IDE parses C++ with ANTLR ➔ Oracle uses ANTLR within the SQL Developer IDE and its migration tools ➔ Knoldus uses ANTLR in there projects to achieve DSL (domain specific language) requirements
  • 17.
    Any Alternatives? The listis long. Some examples are : ➔ CL-Yacc (Common Lisp) ➔ Gold (C#, Java, Python, Visual Basic etc.) ➔ Hime Parser Generator (C#, Java) ➔ Coco/R (Ada, Pascal, Oberon, Ruby) ➔ Yecc (Erlang) Etc……..
  • 18.
    Testing ANTLR4 ➔ ANTLRprovides a flexible testing tool in the runtime library called TestRig . ➔ It can display lots of information about how a recognizer(auto generated Java classes) matches input from a file or standard input. ➔ TestRig uses Java reflection to invoke compiled recognizers
  • 19.
    References ● https://github.com/antlr/antlr4/blob/master/doc/gettin ● https://www.javacodegeeks.com/2012/06/antlr-getti ●https://en.wikipedia.org/wiki/Context-free_grammar ● https://blog.knoldus.com/2016/04/29/testing-gramm
  • 20.
  • 21.