An Annotation Framework for Statically-Typed Syntax Trees Loren Abrams  Ray Toal Loyola Marymount University Los Angeles CA USA IASTED SEA 2009 2009-11-03
Outline Overview Previous Work Motivating Example Some Theoretical Contributions An Annotation Framework Implementation of a Parser Generator Conclusions
Goals To contribute to parser generation theory and practice with a grammar annotation framework that is Terse (more convention, less markup) Fully declarative Grammar-independent Supportive of statically typed host languages To demonstrate feasibility with a prototype parser generator
Contributions A grammar-independent annotation  framework  (not “just another generator”) Distillation of embedded abstract syntax tree specification (useful for understanding) Definition of  statically-typed  AST specification Prototype parser generator  very lightweight self-contained, easy to integrate
Previous Research Grammars : CFG, (E)BNF, XBNF, SDF, PEG Parser Generators : Lex/Yacc, Flex/Bison, JavaCC, AntLR, SableCC,  Rats! Tree Builders : JTB, JJTree Parser Generation Design Axes Concrete vs. Abstract Tree Production Static vs. Dynamic Typing Inline vs. External Specification
Motivating Example (1 of 2) Given a grammar such as this one... ID  => @[A-Za-z][A-Za-z0-9_]+ NUMLIT  => @\d+(\.\d+([Ee][+-]?\d+)?)? STRLIT  => @"[^"\p{Cc}]*" SKIP  => @\s+ Program => Block Block  => (Dec ";")* (Stmt ";")+ Dec  => "var" ID ("=" Exp)? | "fun" ID "(" IdList? ")" "=" Exp Stmt  => ID "=" Exp |  "read" IdList |  "write" ExpList |  "while" Exp "do" Block "end" IdList  => ID ("," ID)*  ExpList => Exp ("," Exp)* Exp  => Term (("+" | "-") Term)* Term  => Factor (("*" | "/") Factor)* Factor  => NUMLIT | STRLIT | ID | Call | "(" Exp ")" Call  => ID "(" ExpList? ")"
Motivating Example (2 of 2) ...we want to annotate the grammar to produce  statically-typed  ASTs var y; fun half(x) = x / 2; while x - (5 * x) do write half(10.4), x+2; read x; end;
Describing the ASTs Each node in the generated AST is an object of some AST node class The fields of each object are name-value pairs, with values we define recursively as being The value  null Strings (which come from token literals) References to nodes Lists of values
Annotating the Grammar Our contribution is to exhibit a high-level approach to annotation The approach must be fully declarative and support statically typed ASTs The key idea is to ensure each type of value (on the previous slide) is producible Our current work is only for  embedded  annotations, but it should extend to AST descriptions  external  to the grammar
Annotation Highlights Each rule execution produces a value (null, string, node-ref, list) We  tag  syntax elements and AST node expressions: tags become field names Expressions not tagged get the name of the construct (convention!) Different tag binding symbols for scalar and list values Simple notation for node class hierarchies
Annotation Example (1 of 3) Value of rule is last value produced params  is a scalar variable;  decs  and  stmts  are list variables;  block ,  exp  are implicit variables Var  and  Fun  are subclasses of  Dec Note how some values can be null Program => Block {Program block} Block  => ( decs *: Dec  ";")* ( stmts *: Stmt  ";")+  {Block  decs   stmts } Dec   => " var " ID ("="  Exp )? {Var:Dec id  exp } |  "fun" ID "(" params:IdList? ")" "="  Exp {Fun:Dec id  params   exp }
Annotation Example (2 of 3) Value of  IdList  is just the value of  id left  is a scalar variable, note how it gets “reassigned” Stmt  => ID "=" Exp {Assign:Stmt id exp} |  "read" IdList {Read:Stmt idList} |  "write" ExpList {Write:Stmt expList} |  "while" Exp "do" Block "end" {While:Stmt exp block} IdList  => id*:ID ("," id*:ID)* ExpList => exp*:Exp ("," exp*:Exp)* Exp  => left:Term (op:("+" | "-") right:Term left:{Bin:Exp op left right} )* Term  => left:Factor (op:("*" | "/") right:Factor left:{Bin:Exp op left right} )*
Annotation Example (3 of 3) Used  value  since  numlit  and  strlit  would not be nice field names Type of  Factor  rule is  Exp  (most general superclass) ^exp  required to avoid  “)”  as the value Factor  => value:NUMLIT {NumLit:Exp value} |  value:STRLIT {StrLit:Exp value} |  ID {Ref:Exp id} |  Call |  "(" Exp ")" ^exp Call  => ID "(" args:ExpList? ")" {Call:Exp id args}
Parser Generator Implementation A parser generator reads a description (like the one on the last three slides) and outputs A scanner A parser, producing an AST (only) A set of AST node classes, each with setters, getters, and (possibly) visit methods A visitor framework for using the generated AST (without touching the tree classes, of course) Interesting implementation : token set, types, etc. are  computed  (inference algorithm)
Prototype Implementation (1 of 2) Initial implementation is a proof of concept Description elements fixed, not yet pluggable : for scalar binding *: for list binding { } for node expressions : for subclassing Produces incomplete parsers, though scanner, tree classes, and navigation are fully implemented.
Prototype Implementation (2 of 2) Java only Microsyntax specification uses Java regexes (nice) Packaged as altgen-m-n.jar (m and n are version numbers)  — under 5 0KB Further info at http://xlg.cs.lmu.edu/altgen Planned open source distribution at Google Code
Summary Presentation of a terse, declarative, grammar-independent annotation framework for the generation of statically type abstract syntax trees Presentation of a prototype parser generator using the framework Java implementation of the prototype is only 50KB
Questions?

An Annotation Framework for Statically-Typed Syntax Trees

  • 1.
    An Annotation Frameworkfor Statically-Typed Syntax Trees Loren Abrams Ray Toal Loyola Marymount University Los Angeles CA USA IASTED SEA 2009 2009-11-03
  • 2.
    Outline Overview PreviousWork Motivating Example Some Theoretical Contributions An Annotation Framework Implementation of a Parser Generator Conclusions
  • 3.
    Goals To contributeto parser generation theory and practice with a grammar annotation framework that is Terse (more convention, less markup) Fully declarative Grammar-independent Supportive of statically typed host languages To demonstrate feasibility with a prototype parser generator
  • 4.
    Contributions A grammar-independentannotation framework (not “just another generator”) Distillation of embedded abstract syntax tree specification (useful for understanding) Definition of statically-typed AST specification Prototype parser generator very lightweight self-contained, easy to integrate
  • 5.
    Previous Research Grammars: CFG, (E)BNF, XBNF, SDF, PEG Parser Generators : Lex/Yacc, Flex/Bison, JavaCC, AntLR, SableCC, Rats! Tree Builders : JTB, JJTree Parser Generation Design Axes Concrete vs. Abstract Tree Production Static vs. Dynamic Typing Inline vs. External Specification
  • 6.
    Motivating Example (1of 2) Given a grammar such as this one... ID => @[A-Za-z][A-Za-z0-9_]+ NUMLIT => @\d+(\.\d+([Ee][+-]?\d+)?)? STRLIT => @"[^"\p{Cc}]*" SKIP => @\s+ Program => Block Block => (Dec ";")* (Stmt ";")+ Dec => "var" ID ("=" Exp)? | "fun" ID "(" IdList? ")" "=" Exp Stmt => ID "=" Exp | "read" IdList | "write" ExpList | "while" Exp "do" Block "end" IdList => ID ("," ID)* ExpList => Exp ("," Exp)* Exp => Term (("+" | "-") Term)* Term => Factor (("*" | "/") Factor)* Factor => NUMLIT | STRLIT | ID | Call | "(" Exp ")" Call => ID "(" ExpList? ")"
  • 7.
    Motivating Example (2of 2) ...we want to annotate the grammar to produce statically-typed ASTs var y; fun half(x) = x / 2; while x - (5 * x) do write half(10.4), x+2; read x; end;
  • 8.
    Describing the ASTsEach node in the generated AST is an object of some AST node class The fields of each object are name-value pairs, with values we define recursively as being The value null Strings (which come from token literals) References to nodes Lists of values
  • 9.
    Annotating the GrammarOur contribution is to exhibit a high-level approach to annotation The approach must be fully declarative and support statically typed ASTs The key idea is to ensure each type of value (on the previous slide) is producible Our current work is only for embedded annotations, but it should extend to AST descriptions external to the grammar
  • 10.
    Annotation Highlights Eachrule execution produces a value (null, string, node-ref, list) We tag syntax elements and AST node expressions: tags become field names Expressions not tagged get the name of the construct (convention!) Different tag binding symbols for scalar and list values Simple notation for node class hierarchies
  • 11.
    Annotation Example (1of 3) Value of rule is last value produced params is a scalar variable; decs and stmts are list variables; block , exp are implicit variables Var and Fun are subclasses of Dec Note how some values can be null Program => Block {Program block} Block => ( decs *: Dec ";")* ( stmts *: Stmt ";")+ {Block decs stmts } Dec => " var " ID ("=" Exp )? {Var:Dec id exp } | "fun" ID "(" params:IdList? ")" "=" Exp {Fun:Dec id params exp }
  • 12.
    Annotation Example (2of 3) Value of IdList is just the value of id left is a scalar variable, note how it gets “reassigned” Stmt => ID "=" Exp {Assign:Stmt id exp} | "read" IdList {Read:Stmt idList} | "write" ExpList {Write:Stmt expList} | "while" Exp "do" Block "end" {While:Stmt exp block} IdList => id*:ID ("," id*:ID)* ExpList => exp*:Exp ("," exp*:Exp)* Exp => left:Term (op:("+" | "-") right:Term left:{Bin:Exp op left right} )* Term => left:Factor (op:("*" | "/") right:Factor left:{Bin:Exp op left right} )*
  • 13.
    Annotation Example (3of 3) Used value since numlit and strlit would not be nice field names Type of Factor rule is Exp (most general superclass) ^exp required to avoid “)” as the value Factor => value:NUMLIT {NumLit:Exp value} | value:STRLIT {StrLit:Exp value} | ID {Ref:Exp id} | Call | "(" Exp ")" ^exp Call => ID "(" args:ExpList? ")" {Call:Exp id args}
  • 14.
    Parser Generator ImplementationA parser generator reads a description (like the one on the last three slides) and outputs A scanner A parser, producing an AST (only) A set of AST node classes, each with setters, getters, and (possibly) visit methods A visitor framework for using the generated AST (without touching the tree classes, of course) Interesting implementation : token set, types, etc. are computed (inference algorithm)
  • 15.
    Prototype Implementation (1of 2) Initial implementation is a proof of concept Description elements fixed, not yet pluggable : for scalar binding *: for list binding { } for node expressions : for subclassing Produces incomplete parsers, though scanner, tree classes, and navigation are fully implemented.
  • 16.
    Prototype Implementation (2of 2) Java only Microsyntax specification uses Java regexes (nice) Packaged as altgen-m-n.jar (m and n are version numbers) — under 5 0KB Further info at http://xlg.cs.lmu.edu/altgen Planned open source distribution at Google Code
  • 17.
    Summary Presentation ofa terse, declarative, grammar-independent annotation framework for the generation of statically type abstract syntax trees Presentation of a prototype parser generator using the framework Java implementation of the prototype is only 50KB
  • 18.