Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,879
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
60
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
  • 2. Topics
    • Information flow
    • v3 grammars
    • Error recovery
    • Attributes
    • Tree construction
    • Tree grammars
    • Code generation
    • Internationalization
    • Runtime support
  • 3. Block Info Flow Diagram
  • 4. Grammar Syntax header {…} /** doc comment */ kind grammar name ; options {…} tokens {…} scopes… action rules … /** doc comment */ rule[String s, int z] returns [int x, int y] throws E options {…} scopes init {…} :  |  ; exceptions ^(root child1 … childN) Trees Note: No inheritance
  • 5. Grammar improvements
    • Single element EBNF like ID*
    • Combined parser/lexer
    • Allows ‘c’ and “literal” literals
    • Multiple parameters, return values
    • Labels do not have to be unique (x=ID|x=INT) {…$x…}
    • For combined grammars, warns when tokens are not defined
  • 6. Example Grammar grammar SimpleParser; program : variable* method+ ; variable: "int" ID (‘=‘ expr)? ';’ ; method : "method" ID '(' ')' '{' variable* statement+ '}' ; statement : ID ‘=‘ expr ';' | "return" expr ';' ; expr : ID | INT ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; WS : (' '|' '|' ')+ {channel=99;} ;
  • 7. Using the parser CharStream in = new ANTLRFileStream(“inputfile”); SimpleParserLexer lexer = new SimpleParserLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); SimpleParser p = new SimpleParser(tokens); p.program(); // invoke start rule
  • 8. Improved grammar warnings
    • they happen less often ;)
    • internationalized (templates again!)
    • gives (smallest) sample input sequence
    • better recursion warnings
  • 9. Recursion Warnings a : a A | B ; t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a t.g:2:5: Alternative 1: after matching input such as B decision cannot predict what comes next due to recursion overflow to c from b // with -Im 0 (secret internal parameter) a : b | B ; b : c ; c : B b ;
  • 10. Nondeterminisms
    • t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2
    • As a result, alternative(s) 2 were disabled for that input
    • t.g:2:5: The following alternatives are unreachable: 2
    a : (A B|A B) C ; a : (A+ B|A+ B) C ; t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2
  • 11. Runtime Objects of Interest
    • Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored
    • Tokens have start/stop index into single text input buffer
    • Token is an abstract class
    • TokenSource anything answering nextToken()
    • TokenStream stream pulling from TokenSource; LT(i), …
    • CharStream source of characters for a lexer; LT(i), …
  • 12. Error Recovery
    • ANTLR v3 does what Josef Grosch does in Cocktail
    • Does single token insertion or deletion if necessary to keep going
    • Computes context-sensitive FOLLOW to do insert/delete
      • proper context is passed to each rule invocation
      • knows precisely what can follow reference to r rather than what could follow any reference to r (per Wirth circa 1970)
  • 13. Example Error Recovery int i = 0; method foo( { int j = i; i = 4 } [program, method]: line 2:12 mismatched token: [@14,23:23='{',<14>,2:12]; expecting type ')' [program, method, statement]: line 5:0 mismatched token: [@31,46:46='}',<15>,5:0]; expecting type ';' int i = 0; method foo() ) { int j = i; i = = 4; } [program, method]: line 2:13 mismatched token: [@15,24:24=')',<13>,2:13]; expecting type '{' [program, method, statement, expr]: line 4:6 mismatched token: [@32,47:47='=',<6>,4:6]; expecting set null Note: I put in two errors each so you’ll see it continues properly One token insertion One token deletion
  • 14. Attributes
    • New label syntax and multiple return values
    • Unified token, rule, parameter, return value, tree reference syntax in actions
    • Dynamically scope attributes!
    a[String s] returns [float y] : id=ID f=field (ids+=ID)+ {$s, $y, $id, $id.text, $f.z; $ids.size();} ; field returns [int x, int z] : … ;
  • 15. Label properties
    • Token label reference properties
      • text, type, line, pos, channel, index, tree
    • Rule label reference properties
      • start, stop; indices of token boundaries
      • tree
      • text; text matched for whole rule
  • 16. Rule Scope Attributes
    • A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable
    • Avoids having to pass a value down
    method scope { String name; } : &quot;method&quot; ID '(' ')' {$name=$ID.text;} body ; body: '{' stat* '}’ ; … atom init {… $ method .name …} : ID | INT ;
  • 17. Global Scope Attributes
    • Named scopes; rules must explicitly request access
    scope Symbols { List names; } {int level=0;} globals scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : decl* {level--;} ; block scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : '{' decl* stat* '}’ {level--;} ; decl : &quot;int&quot; ID ';' {$Symbols.names.add($ID);} ; *What if we want to keep the symbol tables around after parsing?
  • 18. Tree Support
    • TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type
    • Tree; used by support code
    • BaseTree; List of children, w/o payload (no more child-sibling trees)
    • CommonTree; node wrapping Token as payload
    • ParseTree; used by interpreter to build trees
  • 19. Tree Construction
    • Automatic mechanism is same as v2 except ^ is now ^^ expr : atom ( '+'^^ atom )* ;
    • ^ implies root of tree for enclosing subrule a : ( ID^ INT )* ; builds (a 1) (b 2) …
    • Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree
    • Turn on options {output=AST;} (one can imagine output=text for templates)
    • Option: ASTLabelType=CommonTree;
  • 20. Tree Rewrite Rules
    • Maps an input grammar fragment to an output tree grammar fragment
    variable : type declarator ';' -> ^(VAR_DEF type declarator) ; functionHeader : type ID '(' ( formalParameter ( ',' formalParameter )* )? ')' -> ^(FUNC_HDR type ID formalParameter+) ; atom : … | '(' expr ')' -> expr ;
  • 21. Mixed Rewrite/Auto Trees
    • Alternatives w/o -> rewrite use automatic mechanism
    b : ID INT -> INT ID | INT // implies -> INT ;
  • 22. Rewrites and labels
    • Disambiguates element references or used to construct imaginary nodes
    • Concatenation += labels useful too:
    forStat : &quot;for&quot; '(' start=assignStat ';' expr ';' next=assignStat ')' block -> ^(&quot;for&quot; $start expr $next block) ; block : lc='{' variable* stat* '}’ -> ^(BLOCK[$lc] variable* stat*) ; /** match string representation of tree and build tree in memory */ tree : ‘^’ ‘(‘ root=atom (children+=tree)+ ‘)’ -> ^($root $children) | atom ;
  • 23. Loops in Rewrites
    • Repeated element ID ID -> ^(VARS ID+) yields ^(VARS a b)
    • Repeated tree ID ID -> ^(VARS ID)+ yields ^(VARS a) ^(VARS b)
    • Multiple elements in loop need same size ID INT ID INT -> ^( R ID ^( S INT) )+ yields (R a (S 1)) (R b (S 2))
    • Checks cardinality + and * loops
  • 24. Preventing cyclic structures
    • Repeated elements get duplicated a : INT -> INT INT ; // dups INT! a : INT INT -> INT+ INT+ ; // 4 INTs!
    • Repeated rule references get duplicated a : atom -> ^(atom atom) ; // no cycle!
    • Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree
    • *Useful example “int x,y” -> “^(int x) ^(int y)” decl : type ID (‘,’ ID)* -> ^(type ID)+ ;
    *Just noticed a bug in this one ;)
  • 25. Predicated rewrites
    • Use semantic predicate to indicate which rewrite to choose from
    a : ID INT -> {p1}? ID -> {p2}? INT -> ;
  • 26. Misc Rewrite Elements
    • Arbitrary actions a : atom -> ^({adaptor.createToken(INT,&quot;9&quot;)} atom) ;
    • rewrite always sets the rule’s AST not subrule’s
    • Reference to previous value (useful?)
    b : &quot;int&quot; ( ID -> ^(TYPE &quot;int&quot; ID) | ID '=' INT -> ^(TYPE &quot;int&quot; ID INT) ) ; a : (atom -> atom) (op='+' r=atom -> ^($op $a $r) )* ;
  • 27. Tree Grammars
    • Syntax same as parser grammars, add ^(root children…) tree element
    • Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser
    variable : ^(VAR_DEF type ID) | ^(VAR_DEF type ID ^(INIT expr)) ;
  • 28. Code Generation
    • Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful!
    • Separates code gen logic from output; not a single character of output in the Java code
    • Java.stg: 140 templates, 1300 lines
  • 29. Sample code gen templates /** Dump the elements one per line and stick in debugging * location() trigger in front. */ element() ::= << <if(debug)> dbg.location(<it.line>,<it.pos>);< > <endif> <it.el>< > >> /** match a token optionally with a label in front */ tokenRef(token,label,elementIndex) ::= << <if(label)> <label>=input.LT(1);< > <endif> match(input,<token>,FOLLOW_<token>_in_<ruleName><elementIndex>); >>
  • 30. Internationalization
    • ANTLR v3 uses StringTemplate to display all errors
    • Senses locale to load messages; en.stg: 76 templates
    • ErrorManager error number constants map to a template name; e.g.,
    RULE_REDEFINITION(file,line,col,arg) ::= &quot;<loc()>rule <arg> redefinition” /* This factors out file location formatting; file,line,col inherited from * enclosing template; don't manually pass stuff in. */ loc() ::= &quot;<file>:<line>:<col>: &quot;
  • 31. Runtime Support
    • Better organized, separated: org.antlr.runtime org.antlr.runtime.tree org.antlr.runtime.debug
    • Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr
    • 4500 lines of Java code minus BSD header
  • 32. Summary
    • v3 kicks ass
    • it sort of works!
    • http://www.antlr.org/download/…
    • ANTLRWorks progressing in parallel