Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 2)

ANTLR 4
Grammars
by Alexander Vasiltsov

EBNF
● lexeme “::=” its description (or “=”)
● ‘...’ - text element - character or group of
characters
● А В - element А followed by element B
(concatenation)
● A | B - element А or В (choice)
● [A] - element А exists or not (optional
existence)
● {A} - zero or more А elements (repeat)
● (А В) - elements grouping

Grammar patterns
Sequence of elements
Choice between multiple alternatives
Token dependence - precence of some token
requires presence of its counterpart
somewhere in a phrase
Nested phrase - a self-similar language
construct

Sequence
This is a finite or arbitrarily long sequence of
tokens or subphrases
Sequence with terminator
Sequence with separator

Choiсe (Alternatives)
This is a set of alternative phrases

Token Dependency
The presence of one token requires the
presence of one or more subsequent tokens

Nested Phrase
This is a self-similar language structure

Line between lexer and parser
● Match and discard anything in the lexer that the parser
does not need to see at all
● Match common tokens such as identifiers, keywords,
strings, and numbers in the lexer
● Lump together into a single token type those lexical
structures that the parser does not need to distinguish
● Lump together anything that the parser can treat as a
single entity
● On the other hand, if the parser needs to pull apart a
lump of text to process it, the lexer should pass the
individual components as tokens to the parser

JSON Reference
http://json.org

JSON grammar (1)
grammar JSON;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair: STRING ':' value ;
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // recursion
| array // recursion
| 'true' // keywords
| 'false'
| 'null'
;

JSON grammar (2)
STRING : '"' (ESC | ~["])* '"' ;
fragment ESC : '' (["/bfnrt] | UNICODE) ;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
NUMBER
: '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP : [Ee] [+-]? INT ; // - since - means "range" inside [...]
WS : [ tnr]+ -> skip ;

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 2)

More Related Content

What's hot

Similar to Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 2)

More from Binary Studio

Recently uploaded

Binary Studio Academy PRO: ANTLR course by Alexander Vasiltsov (lesson 2)