• Like
  • Save
Antlr V3
Upcoming SlideShare
Loading in...5
×
 

Antlr V3

on

  • 6,837 views

 

Statistics

Views

Total Views
6,837
Views on SlideShare
6,801
Embed Views
36

Actions

Likes
3
Downloads
58
Comments
0

1 Embed 36

http://www.slideshare.net 36

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Antlr V3 Antlr V3 Presentation Transcript

    • ANTLR v3 Overview (for ANTLR v2 users) Terence Parr University of San Francisco
    • Topics
      • Information flow
      • v3 grammars
      • Error recovery
      • Attributes
      • Tree construction
      • Tree grammars
      • Code generation
      • Internationalization
      • Runtime support
    • Block Info Flow Diagram
    • Grammar Syntax header {…} /** doc comment */ kind grammar name ; options {…} tokens {…} scopes… action rules … /** doc comment */ rule[String s, int z] returns [int x, int y] throws E options {…} scopes init {…} :  |  ; exceptions ^(root child1 … childN) Trees Note: No inheritance
    • Grammar improvements
      • Single element EBNF like ID*
      • Combined parser/lexer
      • Allows ‘c’ and “literal” literals
      • Multiple parameters, return values
      • Labels do not have to be unique (x=ID|x=INT) {…$x…}
      • For combined grammars, warns when tokens are not defined
    • Example Grammar grammar SimpleParser; program : variable* method+ ; variable: "int" ID (‘=‘ expr)? ';’ ; method : "method" ID '(' ')' '{' variable* statement+ '}' ; statement : ID ‘=‘ expr ';' | "return" expr ';' ; expr : ID | INT ; ID : ('a'..'z'|'A'..'Z')+ ; INT : '0'..'9'+ ; WS : (' '|' '|' ')+ {channel=99;} ;
    • Using the parser CharStream in = new ANTLRFileStream(“inputfile”); SimpleParserLexer lexer = new SimpleParserLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); SimpleParser p = new SimpleParser(tokens); p.program(); // invoke start rule
    • Improved grammar warnings
      • they happen less often ;)
      • internationalized (templates again!)
      • gives (smallest) sample input sequence
      • better recursion warnings
    • Recursion Warnings a : a A | B ; t.g:2:5: Alternative 1 discovers infinite left-recursion to a from a t.g:2:5: Alternative 1: after matching input such as B decision cannot predict what comes next due to recursion overflow to c from b // with -Im 0 (secret internal parameter) a : b | B ; b : c ; c : B b ;
    • Nondeterminisms
      • t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2
      • As a result, alternative(s) 2 were disabled for that input
      • t.g:2:5: The following alternatives are unreachable: 2
      a : (A B|A B) C ; a : (A+ B|A+ B) C ; t.g:2:5: Decision can match input such as "A B" using multiple alternatives: 1, 2
    • Runtime Objects of Interest
      • Lexer passes all tokens to the parser, but parser listens to only a single “channel”; channel 99, for example, where I place WS tokens, is ignored
      • Tokens have start/stop index into single text input buffer
      • Token is an abstract class
      • TokenSource anything answering nextToken()
      • TokenStream stream pulling from TokenSource; LT(i), …
      • CharStream source of characters for a lexer; LT(i), …
    • Error Recovery
      • ANTLR v3 does what Josef Grosch does in Cocktail
      • Does single token insertion or deletion if necessary to keep going
      • Computes context-sensitive FOLLOW to do insert/delete
        • proper context is passed to each rule invocation
        • knows precisely what can follow reference to r rather than what could follow any reference to r (per Wirth circa 1970)
    • Example Error Recovery int i = 0; method foo( { int j = i; i = 4 } [program, method]: line 2:12 mismatched token: [@14,23:23='{',<14>,2:12]; expecting type ')' [program, method, statement]: line 5:0 mismatched token: [@31,46:46='}',<15>,5:0]; expecting type ';' int i = 0; method foo() ) { int j = i; i = = 4; } [program, method]: line 2:13 mismatched token: [@15,24:24=')',<13>,2:13]; expecting type '{' [program, method, statement, expr]: line 4:6 mismatched token: [@32,47:47='=',<6>,4:6]; expecting set null Note: I put in two errors each so you’ll see it continues properly One token insertion One token deletion
    • Attributes
      • New label syntax and multiple return values
      • Unified token, rule, parameter, return value, tree reference syntax in actions
      • Dynamically scope attributes!
      a[String s] returns [float y] : id=ID f=field (ids+=ID)+ {$s, $y, $id, $id.text, $f.z; $ids.size();} ; field returns [int x, int z] : … ;
    • Label properties
      • Token label reference properties
        • text, type, line, pos, channel, index, tree
      • Rule label reference properties
        • start, stop; indices of token boundaries
        • tree
        • text; text matched for whole rule
    • Rule Scope Attributes
      • A rule may define a scope of attributes visible to any invoked rule; operates like a stacked global variable
      • Avoids having to pass a value down
      method scope { String name; } : &quot;method&quot; ID '(' ')' {$name=$ID.text;} body ; body: '{' stat* '}’ ; … atom init {… $ method .name …} : ID | INT ;
    • Global Scope Attributes
      • Named scopes; rules must explicitly request access
      scope Symbols { List names; } {int level=0;} globals scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : decl* {level--;} ; block scope Symbols; init { level++; $Symbols.names = new ArrayList(); } : '{' decl* stat* '}’ {level--;} ; decl : &quot;int&quot; ID ';' {$Symbols.names.add($ID);} ; *What if we want to keep the symbol tables around after parsing?
    • Tree Support
      • TreeAdaptor; How to create and navigate trees (like ASTFactory from v2); ANTLR assumes tree nodes are Object type
      • Tree; used by support code
      • BaseTree; List of children, w/o payload (no more child-sibling trees)
      • CommonTree; node wrapping Token as payload
      • ParseTree; used by interpreter to build trees
    • Tree Construction
      • Automatic mechanism is same as v2 except ^ is now ^^ expr : atom ( '+'^^ atom )* ;
      • ^ implies root of tree for enclosing subrule a : ( ID^ INT )* ; builds (a 1) (b 2) …
      • Token labels are $label not #label and rule invocation tree results are $ruleLabel.tree
      • Turn on options {output=AST;} (one can imagine output=text for templates)
      • Option: ASTLabelType=CommonTree;
    • Tree Rewrite Rules
      • Maps an input grammar fragment to an output tree grammar fragment
      variable : type declarator ';' -> ^(VAR_DEF type declarator) ; functionHeader : type ID '(' ( formalParameter ( ',' formalParameter )* )? ')' -> ^(FUNC_HDR type ID formalParameter+) ; atom : … | '(' expr ')' -> expr ;
    • Mixed Rewrite/Auto Trees
      • Alternatives w/o -> rewrite use automatic mechanism
      b : ID INT -> INT ID | INT // implies -> INT ;
    • Rewrites and labels
      • Disambiguates element references or used to construct imaginary nodes
      • Concatenation += labels useful too:
      forStat : &quot;for&quot; '(' start=assignStat ';' expr ';' next=assignStat ')' block -> ^(&quot;for&quot; $start expr $next block) ; block : lc='{' variable* stat* '}’ -> ^(BLOCK[$lc] variable* stat*) ; /** match string representation of tree and build tree in memory */ tree : ‘^’ ‘(‘ root=atom (children+=tree)+ ‘)’ -> ^($root $children) | atom ;
    • Loops in Rewrites
      • Repeated element ID ID -> ^(VARS ID+) yields ^(VARS a b)
      • Repeated tree ID ID -> ^(VARS ID)+ yields ^(VARS a) ^(VARS b)
      • Multiple elements in loop need same size ID INT ID INT -> ^( R ID ^( S INT) )+ yields (R a (S 1)) (R b (S 2))
      • Checks cardinality + and * loops
    • Preventing cyclic structures
      • Repeated elements get duplicated a : INT -> INT INT ; // dups INT! a : INT INT -> INT+ INT+ ; // 4 INTs!
      • Repeated rule references get duplicated a : atom -> ^(atom atom) ; // no cycle!
      • Duplicates whole tree for all but first ref to an element; here 2nd ref to atom results in a duplicated atom tree
      • *Useful example “int x,y” -> “^(int x) ^(int y)” decl : type ID (‘,’ ID)* -> ^(type ID)+ ;
      *Just noticed a bug in this one ;)
    • Predicated rewrites
      • Use semantic predicate to indicate which rewrite to choose from
      a : ID INT -> {p1}? ID -> {p2}? INT -> ;
    • Misc Rewrite Elements
      • Arbitrary actions a : atom -> ^({adaptor.createToken(INT,&quot;9&quot;)} atom) ;
      • rewrite always sets the rule’s AST not subrule’s
      • Reference to previous value (useful?)
      b : &quot;int&quot; ( ID -> ^(TYPE &quot;int&quot; ID) | ID '=' INT -> ^(TYPE &quot;int&quot; ID INT) ) ; a : (atom -> atom) (op='+' r=atom -> ^($op $a $r) )* ;
    • Tree Grammars
      • Syntax same as parser grammars, add ^(root children…) tree element
      • Uses LL(*) also; even derives from same superclass! Tree is serialized to include DOWN, UP imaginary tokens to encode 2D structure for serial parser
      variable : ^(VAR_DEF type ID) | ^(VAR_DEF type ID ^(INIT expr)) ;
    • Code Generation
      • Uses StringTemplate to specify how each abstract ANTLR concept maps to code; wildly successful!
      • Separates code gen logic from output; not a single character of output in the Java code
      • Java.stg: 140 templates, 1300 lines
    • Sample code gen templates /** Dump the elements one per line and stick in debugging * location() trigger in front. */ element() ::= << <if(debug)> dbg.location(<it.line>,<it.pos>);< > <endif> <it.el>< > >> /** match a token optionally with a label in front */ tokenRef(token,label,elementIndex) ::= << <if(label)> <label>=input.LT(1);< > <endif> match(input,<token>,FOLLOW_<token>_in_<ruleName><elementIndex>); >>
    • Internationalization
      • ANTLR v3 uses StringTemplate to display all errors
      • Senses locale to load messages; en.stg: 76 templates
      • ErrorManager error number constants map to a template name; e.g.,
      RULE_REDEFINITION(file,line,col,arg) ::= &quot;<loc()>rule <arg> redefinition” /* This factors out file location formatting; file,line,col inherited from * enclosing template; don't manually pass stuff in. */ loc() ::= &quot;<file>:<line>:<col>: &quot;
    • Runtime Support
      • Better organized, separated: org.antlr.runtime org.antlr.runtime.tree org.antlr.runtime.debug
      • Clean; Parser has input ptr only (except error recovery FOLLOW stack); Lexer also only has input ptr
      • 4500 lines of Java code minus BSD header
    • Summary
      • v3 kicks ass
      • it sort of works!
      • http://www.antlr.org/download/…
      • ANTLRWorks progressing in parallel