Your SlideShare is downloading. ×
0
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Antlr V3
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Antlr V3

3,983

Published on

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,983
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
67
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
  • 2. Topics <ul><li>Information flow </li></ul><ul><li>v3 grammars </li></ul><ul><li>Error recovery </li></ul><ul><li>Attributes </li></ul><ul><li>Tree construction </li></ul><ul><li>Tree grammars </li></ul><ul><li>Code generation </li></ul><ul><li>Internationalization </li></ul><ul><li>Runtime support </li></ul>
  • 3. Block Info Flow Diagram
  • 4. Grammar Syntax header {…} /** doc comment */ kind grammar name ; options {…} tokens {…} scopes… action rules … /** doc comment */ rule[String s, int z] returns [int x, int y] throws E options {…} scopes init {…} :  |  ; exceptions ^(root child1 … childN) Trees Note: No inheritance
  • 5. Grammar improvements <ul><li>Single element EBNF like ID* </li></ul><ul><li>Combined parser/lexer </li></ul><ul><li>Allows ‘c’ and “literal” literals </li></ul><ul><li>Multiple parameters, return values </li></ul><ul><li>Labels do not have to be unique (x=ID|x=INT) {…$x…} </li></ul><ul><li>For combined grammars, warns when tokens are not defined </li></ul>
  • 6. Example Grammar grammar SimpleParser; program : variable* method+ ; variable: &amp;quot;int&amp;quot; ID (‘=‘ expr)? &apos;;’ ; method : &amp;quot;method&amp;quot; ID &apos;(&apos; &apos;)&apos; &apos;{&apos; variable* statement+ &apos;}&apos; ; statement : ID ‘=‘ expr &apos;;&apos; | &amp;quot;return&amp;quot; expr &apos;;&apos; ; expr : ID | INT ; ID : (&apos;a&apos;..&apos;z&apos;|&apos;A&apos;..&apos;Z&apos;)+ ; INT : &apos;0&apos;..&apos;9&apos;+ ; WS : (&apos; &apos;|&apos; &apos;|&apos; &apos;)+ {channel=99;} ;
  • 7. Using the parser CharStream in = new ANTLRFileStream(“inputfile”); SimpleParserLexer lexer = new SimpleParserLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); SimpleParser p = new SimpleParser(tokens); p.program(); // invoke start rule
  • 8. Improved grammar warnings <ul><li>they happen less often ;) </li></ul><ul><li>internationalized (templates again!) </li></ul><ul><li>gives (smallest) sample input sequence </li></ul><ul><li>better recursion warnings </li></ul>
  • 9. Recursion Warnings a : a A | B ; t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a t.g:2:5: Alternative 1: after matching input such as B decision cannot predict what comes next due to recursion overflow to c from b // with -Im 0 (secret internal parameter) a : b | B ; b : c ; c : B b ;
  • 10. Nondeterminisms <ul><li>t.g:2:5: Decision can match input such as &amp;quot;A B&amp;quot; using multiple alternatives: 1, 2 </li></ul><ul><li>As a result, alternative(s) 2 were disabled for that input </li></ul><ul><li>t.g:2:5: The following alternatives are unreachable: 2 </li></ul>a : (A B|A B) C ; a : (A+ B|A+ B) C ; t.g:2:5: Decision can match input such as &amp;quot;A B&amp;quot; using multiple alternatives: 1, 2
  • 11. Runtime Objects of Interest <ul><li>Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored </li></ul><ul><li>Tokens have start/stop index into single text input buffer </li></ul><ul><li>Token is an abstract class </li></ul><ul><li>TokenSource anything answering nextToken() </li></ul><ul><li>TokenStream stream pulling from TokenSource; LT(i), … </li></ul><ul><li>CharStream source of characters for a lexer; LT(i), … </li></ul>
  • 12. Error Recovery <ul><li>ANTLR v3 does what Josef Grosch does in Cocktail </li></ul><ul><li>Does single token insertion or deletion if necessary to keep going </li></ul><ul><li>Computes context-sensitive FOLLOW to do insert/delete </li></ul><ul><ul><li>proper context is passed to each rule invocation </li></ul></ul><ul><ul><li>knows precisely what can follow reference to r rather than what could follow any reference to r (per Wirth circa 1970) </li></ul></ul>
  • 13. Example Error Recovery int i = 0; method foo( { int j = i; i = 4 } [program, method]: line 2:12 mismatched token: [@14,23:23=&apos;{&apos;,&lt;14&gt;,2:12]; expecting type &apos;)&apos; [program, method, statement]: line 5:0 mismatched token: [@31,46:46=&apos;}&apos;,&lt;15&gt;,5:0]; expecting type &apos;;&apos; int i = 0; method foo() ) { int j = i; i = = 4; } [program, method]: line 2:13 mismatched token: [@15,24:24=&apos;)&apos;,&lt;13&gt;,2:13]; expecting type &apos;{&apos; [program, method, statement, expr]: line 4:6 mismatched token: [@32,47:47=&apos;=&apos;,&lt;6&gt;,4:6]; expecting set null Note: I put in two errors each so you’ll see it continues properly One token insertion One token deletion
  • 14. Attributes <ul><li>New label syntax and multiple return values </li></ul><ul><li>Unified token, rule, parameter, return value, tree reference syntax in actions </li></ul><ul><li>Dynamically scope attributes! </li></ul>a[String s] returns [float y] : id=ID f=field (ids+=ID)+ {$s, $y, $id, $id.text, $f.z; $ids.size();} ; field returns [int x, int z] : … ;
  • 15. Label properties <ul><li>Token label reference properties </li></ul><ul><ul><li>text, type, line, pos, channel, index, tree </li></ul></ul><ul><li>Rule label reference properties </li></ul><ul><ul><li>start, stop; indices of token boundaries </li></ul></ul><ul><ul><li>tree </li></ul></ul><ul><ul><li>text; text matched for whole rule </li></ul></ul>
  • 16. Rule Scope Attributes <ul><li>A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable </li></ul><ul><li>Avoids having to pass a value down </li></ul>method scope { String name; } : &amp;quot;method&amp;quot; ID &apos;(&apos; &apos;)&apos; {$name=$ID.text;} body ; body: &apos;{&apos; stat* &apos;}’ ; … atom init {… $ method .name …} : ID | INT ;
  • 17. Global Scope Attributes <ul><li>Named scopes; rules must explicitly request access </li></ul>scope Symbols { List names; } {int level=0;} globals scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : decl* {level--;} ; block scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : &apos;{&apos; decl* stat* &apos;}’ {level--;} ; decl : &amp;quot;int&amp;quot; ID &apos;;&apos; {$Symbols.names.add($ID);} ; *What if we want to keep the symbol tables around after parsing?
  • 18. Tree Support <ul><li>TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type </li></ul><ul><li>Tree; used by support code </li></ul><ul><li>BaseTree; List of children, w/o payload (no more child-sibling trees) </li></ul><ul><li>CommonTree; node wrapping Token as payload </li></ul><ul><li>ParseTree; used by interpreter to build trees </li></ul>
  • 19. Tree Construction <ul><li>Automatic mechanism is same as v2 except ^ is now ^^ expr : atom ( &apos;+&apos;^^ atom )* ; </li></ul><ul><li>^ implies root of tree for enclosing subrule a : ( ID^ INT )* ; builds (a 1) (b 2) … </li></ul><ul><li>Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree </li></ul><ul><li>Turn on options {output=AST;} (one can imagine output=text for templates) </li></ul><ul><li>Option: ASTLabelType=CommonTree; </li></ul>
  • 20. Tree Rewrite Rules <ul><li>Maps an input grammar fragment to an output tree grammar fragment </li></ul>variable : type declarator &apos;;&apos; -&gt; ^(VAR_DEF type declarator) ; functionHeader : type ID &apos;(&apos; ( formalParameter ( &apos;,&apos; formalParameter )* )? &apos;)&apos; -&gt; ^(FUNC_HDR type ID formalParameter+) ; atom : … | &apos;(&apos; expr &apos;)&apos; -&gt; expr ;
  • 21. Mixed Rewrite/Auto Trees <ul><li>Alternatives w/o -&gt; rewrite use automatic mechanism </li></ul>b : ID INT -&gt; INT ID | INT // implies -&gt; INT ;
  • 22. Rewrites and labels <ul><li>Disambiguates element references or used to construct imaginary nodes </li></ul><ul><li>Concatenation += labels useful too: </li></ul>forStat : &amp;quot;for&amp;quot; &apos;(&apos; start=assignStat &apos;;&apos; expr &apos;;&apos; next=assignStat &apos;)&apos; block -&gt; ^(&amp;quot;for&amp;quot; $start expr $next block) ; block : lc=&apos;{&apos; variable* stat* &apos;}’ -&gt; ^(BLOCK[$lc] variable* stat*) ; /** match string representation of tree and build tree in memory */ tree : ‘^’ ‘(‘ root=atom (children+=tree)+ ‘)’ -&gt; ^($root $children) | atom ;
  • 23. Loops in Rewrites <ul><li>Repeated element ID ID -&gt; ^(VARS ID+) yields ^(VARS a b) </li></ul><ul><li>Repeated tree ID ID -&gt; ^(VARS ID)+ yields ^(VARS a) ^(VARS b) </li></ul><ul><li>Multiple elements in loop need same size ID INT ID INT -&gt; ^( R ID ^( S INT) )+ yields (R a (S 1)) (R b (S 2)) </li></ul><ul><li>Checks cardinality + and * loops </li></ul>
  • 24. Preventing cyclic structures <ul><li>Repeated elements get duplicated a : INT -&gt; INT INT ; // dups INT! a : INT INT -&gt; INT+ INT+ ; // 4 INTs! </li></ul><ul><li>Repeated rule references get duplicated a : atom -&gt; ^(atom atom) ; // no cycle! </li></ul><ul><li>Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree </li></ul><ul><li>*Useful example “int x,y” -&gt; “^(int x) ^(int y)” decl : type ID (‘,’ ID)* -&gt; ^(type ID)+ ; </li></ul>*Just noticed a bug in this one ;)
  • 25. Predicated rewrites <ul><li>Use semantic predicate to indicate which rewrite to choose from </li></ul>a : ID INT -&gt; {p1}? ID -&gt; {p2}? INT -&gt; ;
  • 26. Misc Rewrite Elements <ul><li>Arbitrary actions a : atom -&gt; ^({adaptor.createToken(INT,&amp;quot;9&amp;quot;)} atom) ; </li></ul><ul><li>rewrite always sets the rule’s AST not subrule’s </li></ul><ul><li>Reference to previous value (useful?) </li></ul>b : &amp;quot;int&amp;quot; ( ID -&gt; ^(TYPE &amp;quot;int&amp;quot; ID) | ID &apos;=&apos; INT -&gt; ^(TYPE &amp;quot;int&amp;quot; ID INT) ) ; a : (atom -&gt; atom) (op=&apos;+&apos; r=atom -&gt; ^($op $a $r) )* ;
  • 27. Tree Grammars <ul><li>Syntax same as parser grammars, add ^(root children…) tree element </li></ul><ul><li>Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser </li></ul>variable : ^(VAR_DEF type ID) | ^(VAR_DEF type ID ^(INIT expr)) ;
  • 28. Code Generation <ul><li>Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful! </li></ul><ul><li>Separates code gen logic from output; not a single character of output in the Java code </li></ul><ul><li>Java.stg: 140 templates, 1300 lines </li></ul>
  • 29. Sample code gen templates /** Dump the elements one per line and stick in debugging * location() trigger in front. */ element() ::= &lt;&lt; &lt;if(debug)&gt; dbg.location(&lt;it.line&gt;,&lt;it.pos&gt;);&lt; &gt; &lt;endif&gt; &lt;it.el&gt;&lt; &gt; &gt;&gt; /** match a token optionally with a label in front */ tokenRef(token,label,elementIndex) ::= &lt;&lt; &lt;if(label)&gt; &lt;label&gt;=input.LT(1);&lt; &gt; &lt;endif&gt; match(input,&lt;token&gt;,FOLLOW_&lt;token&gt;_in_&lt;ruleName&gt;&lt;elementIndex&gt;); &gt;&gt;
  • 30. Internationalization <ul><li>ANTLR v3 uses StringTemplate to display all errors </li></ul><ul><li>Senses locale to load messages; en.stg: 76 templates </li></ul><ul><li>ErrorManager error number constants map to a template name; e.g., </li></ul>RULE_REDEFINITION(file,line,col,arg) ::= &amp;quot;&lt;loc()&gt;rule &lt;arg&gt; redefinition” /* This factors out file location formatting; file,line,col inherited from * enclosing template; don&apos;t manually pass stuff in. */ loc() ::= &amp;quot;&lt;file&gt;:&lt;line&gt;:&lt;col&gt;: &amp;quot;
  • 31. Runtime Support <ul><li>Better organized, separated: org.antlr.runtime org.antlr.runtime.tree org.antlr.runtime.debug </li></ul><ul><li>Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr </li></ul><ul><li>4500 lines of Java code minus BSD header </li></ul>
  • 32. Summary <ul><li>v3 kicks ass </li></ul><ul><li>it sort of works! </li></ul><ul><li>http://www.antlr.org/download/… </li></ul><ul><li>ANTLRWorks progressing in parallel </li></ul>

×