0
ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
Topics <ul><li>Information flow </li></ul><ul><li>v3 grammars </li></ul><ul><li>Error recovery </li></ul><ul><li>Attribute...
Block Info Flow Diagram
Grammar Syntax header {…} /** doc comment */ kind  grammar  name ; options {…} tokens {…} scopes… action rules … /** doc c...
Grammar improvements <ul><li>Single element EBNF like ID* </li></ul><ul><li>Combined parser/lexer </li></ul><ul><li>Allows...
Example Grammar grammar SimpleParser; program : variable* method+ ; variable: &quot;int&quot; ID (‘=‘ expr)? ';’ ; method ...
Using the parser CharStream in = new ANTLRFileStream(“inputfile”); SimpleParserLexer lexer = new SimpleParserLexer(in); Co...
Improved grammar warnings <ul><li>they happen less often ;) </li></ul><ul><li>internationalized (templates again!) </li></...
Recursion Warnings a : a A | B ; t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a t.g:2:5: Alternative...
Nondeterminisms <ul><li>t.g:2:5: Decision can match input such as &quot;A B&quot; using multiple alternatives: 1, 2 </li><...
Runtime Objects of Interest <ul><li>Lexer passes all tokens to the parser, but parser listens to only a single “channel”; ...
Error Recovery <ul><li>ANTLR v3 does what Josef Grosch does in Cocktail </li></ul><ul><li>Does single token  insertion  or...
Example Error Recovery int i = 0; method foo( { int j = i; i = 4 } [program, method]: line 2:12 mismatched token: [@14,23:...
Attributes <ul><li>New label syntax and multiple return values </li></ul><ul><li>Unified token, rule, parameter, return va...
Label properties <ul><li>Token label reference properties </li></ul><ul><ul><li>text, type, line, pos, channel, index, tre...
Rule Scope Attributes <ul><li>A rule may define a scope of attributes visible to any invoked rule; operates like a  stacke...
Global Scope Attributes <ul><li>Named scopes; rules must explicitly request access </li></ul>scope Symbols { List names; }...
Tree Support <ul><li>TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are...
Tree Construction <ul><li>Automatic mechanism is same as v2 except ^ is now ^^ expr : atom ( '+'^^ atom )* ; </li></ul><ul...
Tree Rewrite Rules <ul><li>Maps an input grammar fragment to an output tree grammar fragment </li></ul>variable :  type de...
Mixed Rewrite/Auto Trees <ul><li>Alternatives w/o -> rewrite use automatic mechanism </li></ul>b : ID INT -> INT ID | INT ...
Rewrites and labels <ul><li>Disambiguates element references or used to construct imaginary nodes </li></ul><ul><li>Concat...
Loops in Rewrites <ul><li>Repeated element ID ID -> ^(VARS ID+) yields ^(VARS a b) </li></ul><ul><li>Repeated tree ID ID -...
Preventing cyclic structures <ul><li>Repeated elements get duplicated a : INT -> INT INT ; // dups INT! a : INT INT -> INT...
Predicated rewrites <ul><li>Use semantic predicate to indicate which rewrite to choose from </li></ul>a : ID INT -> {p1}? ...
Misc Rewrite Elements <ul><li>Arbitrary actions a : atom -> ^({adaptor.createToken(INT,&quot;9&quot;)} atom) ; </li></ul><...
Tree Grammars <ul><li>Syntax same as parser grammars, add ^(root children…) tree element </li></ul><ul><li>Uses LL(*) also...
Code Generation <ul><li>Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful! </...
Sample code gen templates /** Dump the elements one per line and stick in debugging *  location() trigger in front. */ ele...
Internationalization <ul><li>ANTLR v3 uses StringTemplate to display all errors </li></ul><ul><li>Senses locale to load me...
Runtime Support <ul><li>Better organized, separated: org.antlr.runtime org.antlr.runtime.tree org.antlr.runtime.debug </li...
Summary <ul><li>v3 kicks ass </li></ul><ul><li>it sort of works! </li></ul><ul><li>http://www.antlr.org/download/… </li></...
Upcoming SlideShare
Loading in...5
×

Antlr V3

3,992

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,992
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
67
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Antlr V3"

  1. 1. ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
  2. 2. Topics <ul><li>Information flow </li></ul><ul><li>v3 grammars </li></ul><ul><li>Error recovery </li></ul><ul><li>Attributes </li></ul><ul><li>Tree construction </li></ul><ul><li>Tree grammars </li></ul><ul><li>Code generation </li></ul><ul><li>Internationalization </li></ul><ul><li>Runtime support </li></ul>
  3. 3. Block Info Flow Diagram
  4. 4. Grammar Syntax header {…} /** doc comment */ kind grammar name ; options {…} tokens {…} scopes… action rules … /** doc comment */ rule[String s, int z] returns [int x, int y] throws E options {…} scopes init {…} :  |  ; exceptions ^(root child1 … childN) Trees Note: No inheritance
  5. 5. Grammar improvements <ul><li>Single element EBNF like ID* </li></ul><ul><li>Combined parser/lexer </li></ul><ul><li>Allows ‘c’ and “literal” literals </li></ul><ul><li>Multiple parameters, return values </li></ul><ul><li>Labels do not have to be unique (x=ID|x=INT) {…$x…} </li></ul><ul><li>For combined grammars, warns when tokens are not defined </li></ul>
  6. 6. Example Grammar grammar SimpleParser; program : variable* method+ ; variable: &quot;int&quot; ID (‘=‘ expr)? ';’ ; method : &quot;method&quot; ID '(' ')' '{' variable* statement+ '}' ; statement : ID ‘=‘ expr ';' | &quot;return&quot; expr ';' ; expr : ID | INT ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; WS : (' '|' '|' ')+ {channel=99;} ;
  7. 7. Using the parser CharStream in = new ANTLRFileStream(“inputfile”); SimpleParserLexer lexer = new SimpleParserLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); SimpleParser p = new SimpleParser(tokens); p.program(); // invoke start rule
  8. 8. Improved grammar warnings <ul><li>they happen less often ;) </li></ul><ul><li>internationalized (templates again!) </li></ul><ul><li>gives (smallest) sample input sequence </li></ul><ul><li>better recursion warnings </li></ul>
  9. 9. Recursion Warnings a : a A | B ; t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a t.g:2:5: Alternative 1: after matching input such as B decision cannot predict what comes next due to recursion overflow to c from b // with -Im 0 (secret internal parameter) a : b | B ; b : c ; c : B b ;
  10. 10. Nondeterminisms <ul><li>t.g:2:5: Decision can match input such as &quot;A B&quot; using multiple alternatives: 1, 2 </li></ul><ul><li>As a result, alternative(s) 2 were disabled for that input </li></ul><ul><li>t.g:2:5: The following alternatives are unreachable: 2 </li></ul>a : (A B|A B) C ; a : (A+ B|A+ B) C ; t.g:2:5: Decision can match input such as &quot;A B&quot; using multiple alternatives: 1, 2
  11. 11. Runtime Objects of Interest <ul><li>Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored </li></ul><ul><li>Tokens have start/stop index into single text input buffer </li></ul><ul><li>Token is an abstract class </li></ul><ul><li>TokenSource anything answering nextToken() </li></ul><ul><li>TokenStream stream pulling from TokenSource; LT(i), … </li></ul><ul><li>CharStream source of characters for a lexer; LT(i), … </li></ul>
  12. 12. Error Recovery <ul><li>ANTLR v3 does what Josef Grosch does in Cocktail </li></ul><ul><li>Does single token insertion or deletion if necessary to keep going </li></ul><ul><li>Computes context-sensitive FOLLOW to do insert/delete </li></ul><ul><ul><li>proper context is passed to each rule invocation </li></ul></ul><ul><ul><li>knows precisely what can follow reference to r rather than what could follow any reference to r (per Wirth circa 1970) </li></ul></ul>
  13. 13. Example Error Recovery int i = 0; method foo( { int j = i; i = 4 } [program, method]: line 2:12 mismatched token: [@14,23:23='{',<14>,2:12]; expecting type ')' [program, method, statement]: line 5:0 mismatched token: [@31,46:46='}',<15>,5:0]; expecting type ';' int i = 0; method foo() ) { int j = i; i = = 4; } [program, method]: line 2:13 mismatched token: [@15,24:24=')',<13>,2:13]; expecting type '{' [program, method, statement, expr]: line 4:6 mismatched token: [@32,47:47='=',<6>,4:6]; expecting set null Note: I put in two errors each so you’ll see it continues properly One token insertion One token deletion
  14. 14. Attributes <ul><li>New label syntax and multiple return values </li></ul><ul><li>Unified token, rule, parameter, return value, tree reference syntax in actions </li></ul><ul><li>Dynamically scope attributes! </li></ul>a[String s] returns [float y] : id=ID f=field (ids+=ID)+ {$s, $y, $id, $id.text, $f.z; $ids.size();} ; field returns [int x, int z] : … ;
  15. 15. Label properties <ul><li>Token label reference properties </li></ul><ul><ul><li>text, type, line, pos, channel, index, tree </li></ul></ul><ul><li>Rule label reference properties </li></ul><ul><ul><li>start, stop; indices of token boundaries </li></ul></ul><ul><ul><li>tree </li></ul></ul><ul><ul><li>text; text matched for whole rule </li></ul></ul>
  16. 16. Rule Scope Attributes <ul><li>A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable </li></ul><ul><li>Avoids having to pass a value down </li></ul>method scope { String name; } : &quot;method&quot; ID '(' ')' {$name=$ID.text;} body ; body: '{' stat* '}’ ; … atom init {… $ method .name …} : ID | INT ;
  17. 17. Global Scope Attributes <ul><li>Named scopes; rules must explicitly request access </li></ul>scope Symbols { List names; } {int level=0;} globals scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : decl* {level--;} ; block scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : '{' decl* stat* '}’ {level--;} ; decl : &quot;int&quot; ID ';' {$Symbols.names.add($ID);} ; *What if we want to keep the symbol tables around after parsing?
  18. 18. Tree Support <ul><li>TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type </li></ul><ul><li>Tree; used by support code </li></ul><ul><li>BaseTree; List of children, w/o payload (no more child-sibling trees) </li></ul><ul><li>CommonTree; node wrapping Token as payload </li></ul><ul><li>ParseTree; used by interpreter to build trees </li></ul>
  19. 19. Tree Construction <ul><li>Automatic mechanism is same as v2 except ^ is now ^^ expr : atom ( '+'^^ atom )* ; </li></ul><ul><li>^ implies root of tree for enclosing subrule a : ( ID^ INT )* ; builds (a 1) (b 2) … </li></ul><ul><li>Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree </li></ul><ul><li>Turn on options {output=AST;} (one can imagine output=text for templates) </li></ul><ul><li>Option: ASTLabelType=CommonTree; </li></ul>
  20. 20. Tree Rewrite Rules <ul><li>Maps an input grammar fragment to an output tree grammar fragment </li></ul>variable : type declarator ';' -> ^(VAR_DEF type declarator) ; functionHeader : type ID '(' ( formalParameter ( ',' formalParameter )* )? ')' -> ^(FUNC_HDR type ID formalParameter+) ; atom : … | '(' expr ')' -> expr ;
  21. 21. Mixed Rewrite/Auto Trees <ul><li>Alternatives w/o -> rewrite use automatic mechanism </li></ul>b : ID INT -> INT ID | INT // implies -> INT ;
  22. 22. Rewrites and labels <ul><li>Disambiguates element references or used to construct imaginary nodes </li></ul><ul><li>Concatenation += labels useful too: </li></ul>forStat : &quot;for&quot; '(' start=assignStat ';' expr ';' next=assignStat ')' block -> ^(&quot;for&quot; $start expr $next block) ; block : lc='{' variable* stat* '}’ -> ^(BLOCK[$lc] variable* stat*) ; /** match string representation of tree and build tree in memory */ tree : ‘^’ ‘(‘ root=atom (children+=tree)+ ‘)’ -> ^($root $children) | atom ;
  23. 23. Loops in Rewrites <ul><li>Repeated element ID ID -> ^(VARS ID+) yields ^(VARS a b) </li></ul><ul><li>Repeated tree ID ID -> ^(VARS ID)+ yields ^(VARS a) ^(VARS b) </li></ul><ul><li>Multiple elements in loop need same size ID INT ID INT -> ^( R ID ^( S INT) )+ yields (R a (S 1)) (R b (S 2)) </li></ul><ul><li>Checks cardinality + and * loops </li></ul>
  24. 24. Preventing cyclic structures <ul><li>Repeated elements get duplicated a : INT -> INT INT ; // dups INT! a : INT INT -> INT+ INT+ ; // 4 INTs! </li></ul><ul><li>Repeated rule references get duplicated a : atom -> ^(atom atom) ; // no cycle! </li></ul><ul><li>Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree </li></ul><ul><li>*Useful example “int x,y” -> “^(int x) ^(int y)” decl : type ID (‘,’ ID)* -> ^(type ID)+ ; </li></ul>*Just noticed a bug in this one ;)
  25. 25. Predicated rewrites <ul><li>Use semantic predicate to indicate which rewrite to choose from </li></ul>a : ID INT -> {p1}? ID -> {p2}? INT -> ;
  26. 26. Misc Rewrite Elements <ul><li>Arbitrary actions a : atom -> ^({adaptor.createToken(INT,&quot;9&quot;)} atom) ; </li></ul><ul><li>rewrite always sets the rule’s AST not subrule’s </li></ul><ul><li>Reference to previous value (useful?) </li></ul>b : &quot;int&quot; ( ID -> ^(TYPE &quot;int&quot; ID) | ID '=' INT -> ^(TYPE &quot;int&quot; ID INT) ) ; a : (atom -> atom) (op='+' r=atom -> ^($op $a $r) )* ;
  27. 27. Tree Grammars <ul><li>Syntax same as parser grammars, add ^(root children…) tree element </li></ul><ul><li>Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser </li></ul>variable : ^(VAR_DEF type ID) | ^(VAR_DEF type ID ^(INIT expr)) ;
  28. 28. Code Generation <ul><li>Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful! </li></ul><ul><li>Separates code gen logic from output; not a single character of output in the Java code </li></ul><ul><li>Java.stg: 140 templates, 1300 lines </li></ul>
  29. 29. Sample code gen templates /** Dump the elements one per line and stick in debugging * location() trigger in front. */ element() ::= << <if(debug)> dbg.location(<it.line>,<it.pos>);< > <endif> <it.el>< > >> /** match a token optionally with a label in front */ tokenRef(token,label,elementIndex) ::= << <if(label)> <label>=input.LT(1);< > <endif> match(input,<token>,FOLLOW_<token>_in_<ruleName><elementIndex>); >>
  30. 30. Internationalization <ul><li>ANTLR v3 uses StringTemplate to display all errors </li></ul><ul><li>Senses locale to load messages; en.stg: 76 templates </li></ul><ul><li>ErrorManager error number constants map to a template name; e.g., </li></ul>RULE_REDEFINITION(file,line,col,arg) ::= &quot;<loc()>rule <arg> redefinition” /* This factors out file location formatting; file,line,col inherited from * enclosing template; don't manually pass stuff in. */ loc() ::= &quot;<file>:<line>:<col>: &quot;
  31. 31. Runtime Support <ul><li>Better organized, separated: org.antlr.runtime org.antlr.runtime.tree org.antlr.runtime.debug </li></ul><ul><li>Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr </li></ul><ul><li>4500 lines of Java code minus BSD header </li></ul>
  32. 32. Summary <ul><li>v3 kicks ass </li></ul><ul><li>it sort of works! </li></ul><ul><li>http://www.antlr.org/download/… </li></ul><ul><li>ANTLRWorks progressing in parallel </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×