• Save
Convention-Based Syntactic Descriptions
Upcoming SlideShare
Loading in...5
×
 

Convention-Based Syntactic Descriptions

on

  • 400 views

 

Statistics

Views

Total Views
400
Views on SlideShare
400
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Convention-Based Syntactic Descriptions Convention-Based Syntactic Descriptions Presentation Transcript

  • Convention-Based Syntactic Descriptions Ray Toal Derek Smith Loyola Marymount University CSIE 2009, Los Angeles 2009-04-02
  • Outline
    • Introduction
    • Goals and Objectives
    • Motivation and Challenges
    • Approach
    • Conventions
    • Summary
    • Future Work
  • Objectives
    • Design a improved syntax formalism for programming languages, that can BOTH
      • be used as a concise, formal description, and
      • be used as input to a parser generator
    • Formalism must also be understandable to users of existing notations
      • so we're basically using EBNF
      • ... with some regex notation and custom extensions
  • Motivation
    • Few programming language specifications care to even separate microsyntax and macrosyntax
      • ID -> LETTER (LETTER | DIGIT | '_')*
      • STMT -> 'while' EXP 'do' BLOCK
    • Existing parser generator input languages have too much markup
    • Idea: Try to adapt convention over configuration to reduce markup requirements!
  • Challenges
    • Existing formalisms are necessarily parser generator independent: e.g., don't want to commit to LL or LR
      • Solution: Allow rich EBNF extensions to make LL a viable option
    • Existing generators allow code to be run during parse
      • Solution: Restrict “generation” to AST nodes only.
  • Microsyntax Example For a little C-like language: LETTER -> <L> DIGIT -> <Nd> CHAR -> [ˆ<Cc>&quot;] | '' [ˆ<Cc>] ID -> LETTER (LETTER | DIGIT | '_')* KEYWORD -> 'var' | 'fun' | 'read' | 'write' | 'while' | 'do' | 'end' NUMLIT -> DIGIT+ ('.' DIGIT+)? ([Ee] [+-]? DIGIT+)? STRLIT -> '&quot;' CHAR* '&quot;' SKIP -> <Zs> | #09 | #0A | #0D | '//' [ˆ#0A#0D]* [#0A#0D]
  • Microsyntax
    • Rules use ->, and no delimiters needed
    • Rules must be non-recursive, and RHS can only use symbols from previous rules
    • Later rules take precedence
      • In our example 'while' is a KEYWORD, not an ID
      • Means we don't need '-' meta operator
    • SKIP predefined and must be last
    • Token set inferred
  • Quoting
    • Object language forms are quoted
    • Five quoting mechanisms
      • Codepoint: #0A, ##2029, ###0001D1CF
      • Category: <L>, <Nd>, <C>, <Zl>
      • String: 'while', ';'
      • One-of: [aeiou], [0-9A-Fa-f], [<L><Nd>_]
      • One-not-of: [^<Zs><C>], [^#0A#0D]
    • No need for escaping: can always use #, and even reposition ']', '^', and '-' in [...]
  • Operators
    • Whitespace between expressions
    • e 1 | e 2 or
    • e? optional
    • e* zero or more
    • e+ one or more
    • e^n exactly n e's
    • (…) grouping
  • Macrosyntax Example PROGRAM => BLOCK BLOCK => (DEC ';')* (STMT ';')+ DEC => 'var' ID ('=' EXP)? | 'fun' ID '(' IDLIST? ')' '=' EXP STMT => ID '=' EXP | 'read' IDLIST | 'write' EXPLIST | 'while' EXP 'do' BLOCK 'end' IDLIST => ID (',' ID)* EXPLIST => EXP (',' EXP)* EXP => TERM ([+-] TERM)* TERM => FACTOR ([*/] FACTOR)* FACTOR => NUMLIT | STRLIT | ID | CALL | '(' EXP ')' CALL => ID '(' EXPLIST? ')'
  • Macrosyntax
    • Rules use =>
    • Means that SKIP* can appear before and after any token
    • Maximal munch assumed for tokenization
    • No delimiters needed between rules
    • First rule is the start symbol
    • Recursion is fine
    • Tokens can be introduced here, too!
  • Abstract Syntax var y; fun half(x) = x / 2; while x - (5 * x) do write half(10.4), x + 2; read x; end; (Program (Block (Var y) (Fun half x (/ (Ref x) (Numlit 2))) (While (- (Ref x) (* (Numlit 5) (Ref x))) (Block (Write (Call half (Numlit 10.4)) (+ (Ref x)(Numlit 2))) (Read x)))))
  • Abstract Syntax
    • Goal: We want to define the AST for a given macrosyntax phrase with minimal markup (no code)
    • Ideas
      • AST markup are declarative node expressions
      • Last expression encountered is the “value”
      • Some rules don't even need AST expressions
      • Can have variables which are implicitly list valued, with special syntax for reassignment
  • Abstract Syntax PROGRAM => b:BLOCK {Program b} BLOCK => (d:DEC ';')* (s:STMT ';')+ {Block d s} DEC => 'var' i:ID ('=' e:EXP)? {Var i e} | 'fun' i:ID '(' p:IDLIST? ')' '=' e:EXP {Fun i p e} IDLIST => i:ID (',' i:ID)* The value of IDLIST is not an AST node; it's just a list since the last thing evaluated was stored in i Value of d is a list of values from each DEC Value of PROGRAM is an AST node with root 'Program; We're purposely using the same variable twice
  • Abstract Syntax EXP => t1:TERM (o=[+-] t2=TERM t1={o t1 t2})* TERM => f1:FACTOR (o=[*/] f2=FACTOR f1={o f1 f2})* FACTOR => n:NUMLIT {Numlit n} | s:STRLIT {Strlit s} | i:ID {Ref i} | c:CALL | '(' e:EXP ')' Each time we iterate through the ([*/] FACTOR)* syntax element, the vaues of the variables o and f1 are reassigned. Here o refers to the Variable because it Is lowercase It's okay that some of the alternatives produce AST nodes and some do not
  • Summary of Conventions
    • Rules found automatically, no delimiters
    • Maximal munch assumed
    • SKIP is just another rule
    • Token set inferred
    • => implies SKIP* separators
    • AST variables lowercase; nodes capitalized
    • Value of abstract syntax object is value of last object encountered in L-R parse
  • More
    • The specification written in itself is in the paper
    • C, JSON, other specifications on web (http://xlg.cs.lmu.edu/ssd/)
    • Tool development ongoing
  • Summary
    • Introduced syntax notation suitable for both humans and parser generators
    • Added custom features to EBNF
    • Defined conventions to simplify the notation
    • Provided examples of use
  • Related Work
    • Krahn, Rumpe V ö kel (2007)
      • Integrated definition of concrete and abstract syntax
    • Van Wyk and Schwerdfeger (2007)
      • Context-aware scanning
    • LR generators (e.g. AnaGram, SableCC)
      • LALR parser, AST nodes introduced with '='
  • Future Work
    • Lookahead (for macrosyntax only)
      • 'if' EXP (@2 'else' 'if' ...)* ('else' ...)?
      • Full syntactic lookahead?
    • Microsyntax lookahead (or greedy qualifiers)
    '/*' ([^*] | *(?!/))* '*/' '/*' [^]*? '*/'
    • Alternatives to maximal munch? (for Java >>)
    • Ultimate convention: AST nodes automatically generated according to syntax category