ANTLR 4 
Grammars 
by Alexander Vasiltsov
EBNF 
● lexeme “::=” its description (or “=”) 
● ‘...’ - text element - character or group of 
characters 
● А В - element А followed by element B 
(concatenation) 
● A | B - element А or В (choice) 
● [A] - element А exists or not (optional 
existence) 
● {A} - zero or more А elements (repeat) 
● (А В) - elements grouping
ANTLR Notation
Grammar patterns 
Sequence of elements 
Choice between multiple alternatives 
Token dependence - precence of some token 
requires presence of its counterpart 
somewhere in a phrase 
Nested phrase - a self-similar language 
construct
Sequence 
This is a finite or arbitrarily long sequence of 
tokens or subphrases 
Sequence with terminator 
Sequence with separator
Choiсe (Alternatives) 
This is a set of alternative phrases
Token Dependency 
The presence of one token requires the 
presence of one or more subsequent tokens
Nested Phrase 
This is a self-similar language structure
Common lexical structures
Lexical Starter Kit (1)
Lexical Starter Kit (2)
Lexical Starter Kit (3)
Line between lexer and parser 
● Match and discard anything in the lexer that the parser 
does not need to see at all 
● Match common tokens such as identifiers, keywords, 
strings, and numbers in the lexer 
● Lump together into a single token type those lexical 
structures that the parser does not need to distinguish 
● Lump together anything that the parser can treat as a 
single entity 
● On the other hand, if the parser needs to pull apart a 
lump of text to process it, the lexer should pass the 
individual components as tokens to the parser
JSON Reference 
http://json.org
JSON grammar (1) 
grammar JSON; 
json: object 
| array 
; 
object 
: '{' pair (',' pair)* '}' 
| '{' '}' // empty object 
; 
pair: STRING ':' value ; 
array 
: '[' value (',' value)* ']' 
| '[' ']' // empty array 
; 
value 
: STRING 
| NUMBER 
| object // recursion 
| array // recursion 
| 'true' // keywords 
| 'false' 
| 'null' 
;
JSON grammar (2) 
STRING : '"' (ESC | ~["])* '"' ; 
fragment ESC : '' (["/bfnrt] | UNICODE) ; 
fragment UNICODE : 'u' HEX HEX HEX HEX ; 
fragment HEX : [0-9a-fA-F] ; 
NUMBER 
: '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5 
| '-'? INT EXP // 1e10 -3e4 
| '-'? INT // -3, 45 
; 
fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros 
fragment EXP : [Ee] [+-]? INT ; // - since - means "range" inside [...] 
WS : [ tnr]+ -> skip ;
Typical JSON
Parse tree

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 2)

  • 1.
    ANTLR 4 Grammars by Alexander Vasiltsov
  • 2.
    EBNF ● lexeme“::=” its description (or “=”) ● ‘...’ - text element - character or group of characters ● А В - element А followed by element B (concatenation) ● A | B - element А or В (choice) ● [A] - element А exists or not (optional existence) ● {A} - zero or more А elements (repeat) ● (А В) - elements grouping
  • 3.
  • 4.
    Grammar patterns Sequenceof elements Choice between multiple alternatives Token dependence - precence of some token requires presence of its counterpart somewhere in a phrase Nested phrase - a self-similar language construct
  • 5.
    Sequence This isa finite or arbitrarily long sequence of tokens or subphrases Sequence with terminator Sequence with separator
  • 6.
    Choiсe (Alternatives) Thisis a set of alternative phrases
  • 7.
    Token Dependency Thepresence of one token requires the presence of one or more subsequent tokens
  • 8.
    Nested Phrase Thisis a self-similar language structure
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Line between lexerand parser ● Match and discard anything in the lexer that the parser does not need to see at all ● Match common tokens such as identifiers, keywords, strings, and numbers in the lexer ● Lump together into a single token type those lexical structures that the parser does not need to distinguish ● Lump together anything that the parser can treat as a single entity ● On the other hand, if the parser needs to pull apart a lump of text to process it, the lexer should pass the individual components as tokens to the parser
  • 14.
  • 15.
    JSON grammar (1) grammar JSON; json: object | array ; object : '{' pair (',' pair)* '}' | '{' '}' // empty object ; pair: STRING ':' value ; array : '[' value (',' value)* ']' | '[' ']' // empty array ; value : STRING | NUMBER | object // recursion | array // recursion | 'true' // keywords | 'false' | 'null' ;
  • 16.
    JSON grammar (2) STRING : '"' (ESC | ~["])* '"' ; fragment ESC : '' (["/bfnrt] | UNICODE) ; fragment UNICODE : 'u' HEX HEX HEX HEX ; fragment HEX : [0-9a-fA-F] ; NUMBER : '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5 | '-'? INT EXP // 1e10 -3e4 | '-'? INT // -3, 45 ; fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros fragment EXP : [Ee] [+-]? INT ; // - since - means "range" inside [...] WS : [ tnr]+ -> skip ;
  • 17.
  • 18.