Convention-Based Syntactic Descriptions Ray Toal Derek Smith Loyola Marymount University CSIE 2009, Los Angeles 2009-04-02
Outline Introduction
Goals and Objectives
Motivation and Challenges
Approach
Conventions
Summary
Future Work
Objectives Design a improved syntax formalism for programming languages, that can BOTH be used as a concise, formal description, and
be used as input to a parser generator Formalism must also be understandable to users of existing notations so we're basically using EBNF
... with some regex notation and custom extensions
Motivation Few programming language specifications care to even separate microsyntax and macrosyntax ID -> LETTER (LETTER | DIGIT | '_')*
STMT -> 'while' EXP 'do' BLOCK Existing parser generator input languages have too much markup
Idea: Try to adapt convention over configuration to reduce markup requirements!
Challenges Existing formalisms are  necessarily  parser generator independent: e.g., don't want to commit to LL or LR Solution: Allow rich EBNF extensions to make LL a viable option Existing generators allow code to be run during parse Solution: Restrict “generation” to AST nodes only.
Microsyntax Example For a little C-like language: LETTER -> <L> DIGIT -> <Nd> CHAR -> [ˆ<Cc>&quot;\] | '\' [ˆ<Cc>] ID -> LETTER (LETTER | DIGIT | '_')* KEYWORD -> 'var' | 'fun' | 'read' | 'write' | 'while' | 'do' | 'end' NUMLIT -> DIGIT+ ('.' DIGIT+)? ([Ee] [+-]? DIGIT+)? STRLIT -> '&quot;' CHAR* '&quot;' SKIP -> <Zs> | #09 | #0A | #0D | '//' [ˆ#0A#0D]* [#0A#0D]
Microsyntax Rules use ->, and no delimiters needed
Rules must be non-recursive, and RHS can only use symbols from previous rules
Later rules take precedence In our example 'while' is a KEYWORD, not an ID
Means we don't need '-' meta operator SKIP predefined and must be last

Convention-Based Syntactic Descriptions