Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Convention-Based Syntactic Descriptions Ray Toal Derek Smith Loyola Marymount University CSIE 2009, Los Angeles 2009-04-02
Outline <ul><li>Introduction
Goals and Objectives
Motivation and Challenges
Approach
Conventions
Summary
Future Work </li></ul>
Objectives <ul><li>Design a improved syntax formalism for programming languages, that can BOTH </li><ul><li>be used as a c...
be used as input to a parser generator </li></ul><li>Formalism must also be understandable to users of existing notations ...
... with some regex notation and custom extensions </li></ul></ul>
Motivation <ul><li>Few programming language specifications care to even separate microsyntax and macrosyntax </li><ul><li>...
STMT -> 'while' EXP 'do' BLOCK </li></ul><li>Existing parser generator input languages have too much markup
Idea: Try to adapt convention over configuration to reduce markup requirements! </li></ul>
Challenges <ul><li>Existing formalisms are  necessarily  parser generator independent: e.g., don't want to commit to LL or...
Microsyntax Example For a little C-like language: LETTER -> <L> DIGIT -> <Nd> CHAR -> [ˆ<Cc>&quot;] | '' [ˆ<Cc>] ID -> LET...
Microsyntax <ul><li>Rules use ->, and no delimiters needed
Rules must be non-recursive, and RHS can only use symbols from previous rules
Later rules take precedence </li><ul><li>In our example 'while' is a KEYWORD, not an ID
Means we don't need '-' meta operator </li></ul><li>SKIP predefined and must be last
Upcoming SlideShare
Loading in …5
×

Convention-Based Syntactic Descriptions

490 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Convention-Based Syntactic Descriptions

  1. 1. Convention-Based Syntactic Descriptions Ray Toal Derek Smith Loyola Marymount University CSIE 2009, Los Angeles 2009-04-02
  2. 2. Outline <ul><li>Introduction
  3. 3. Goals and Objectives
  4. 4. Motivation and Challenges
  5. 5. Approach
  6. 6. Conventions
  7. 7. Summary
  8. 8. Future Work </li></ul>
  9. 9. Objectives <ul><li>Design a improved syntax formalism for programming languages, that can BOTH </li><ul><li>be used as a concise, formal description, and
  10. 10. be used as input to a parser generator </li></ul><li>Formalism must also be understandable to users of existing notations </li><ul><li>so we're basically using EBNF
  11. 11. ... with some regex notation and custom extensions </li></ul></ul>
  12. 12. Motivation <ul><li>Few programming language specifications care to even separate microsyntax and macrosyntax </li><ul><li>ID -> LETTER (LETTER | DIGIT | '_')*
  13. 13. STMT -> 'while' EXP 'do' BLOCK </li></ul><li>Existing parser generator input languages have too much markup
  14. 14. Idea: Try to adapt convention over configuration to reduce markup requirements! </li></ul>
  15. 15. Challenges <ul><li>Existing formalisms are necessarily parser generator independent: e.g., don't want to commit to LL or LR </li><ul><li>Solution: Allow rich EBNF extensions to make LL a viable option </li></ul><li>Existing generators allow code to be run during parse </li><ul><li>Solution: Restrict “generation” to AST nodes only. </li></ul></ul>
  16. 16. Microsyntax Example For a little C-like language: LETTER -> <L> DIGIT -> <Nd> CHAR -> [ˆ<Cc>&quot;] | '' [ˆ<Cc>] ID -> LETTER (LETTER | DIGIT | '_')* KEYWORD -> 'var' | 'fun' | 'read' | 'write' | 'while' | 'do' | 'end' NUMLIT -> DIGIT+ ('.' DIGIT+)? ([Ee] [+-]? DIGIT+)? STRLIT -> '&quot;' CHAR* '&quot;' SKIP -> <Zs> | #09 | #0A | #0D | '//' [ˆ#0A#0D]* [#0A#0D]
  17. 17. Microsyntax <ul><li>Rules use ->, and no delimiters needed
  18. 18. Rules must be non-recursive, and RHS can only use symbols from previous rules
  19. 19. Later rules take precedence </li><ul><li>In our example 'while' is a KEYWORD, not an ID
  20. 20. Means we don't need '-' meta operator </li></ul><li>SKIP predefined and must be last
  21. 21. Token set inferred </li></ul>
  22. 22. Quoting <ul><li>Object language forms are quoted
  23. 23. Five quoting mechanisms </li><ul><li>Codepoint: #0A, ##2029, ###0001D1CF
  24. 24. Category: <L>, <Nd>, <C>, <Zl>
  25. 25. String: 'while', ';'
  26. 26. One-of: [aeiou], [0-9A-Fa-f], [<L><Nd>_]
  27. 27. One-not-of: [^<Zs><C>], [^#0A#0D] </li></ul><li>No need for escaping: can always use #, and even reposition ']', '^', and '-' in [...] </li></ul>
  28. 28. Operators <ul><li>Whitespace between expressions
  29. 29. e 1 | e 2 or
  30. 30. e? optional
  31. 31. e* zero or more
  32. 32. e+ one or more
  33. 33. e^n exactly n e's
  34. 34. (…) grouping </li></ul>
  35. 35. Macrosyntax Example PROGRAM => BLOCK BLOCK => (DEC ';')* (STMT ';')+ DEC => 'var' ID ('=' EXP)? | 'fun' ID '(' IDLIST? ')' '=' EXP STMT => ID '=' EXP | 'read' IDLIST | 'write' EXPLIST | 'while' EXP 'do' BLOCK 'end' IDLIST => ID (',' ID)* EXPLIST => EXP (',' EXP)* EXP => TERM ([+-] TERM)* TERM => FACTOR ([*/] FACTOR)* FACTOR => NUMLIT | STRLIT | ID | CALL | '(' EXP ')' CALL => ID '(' EXPLIST? ')'
  36. 36. Macrosyntax <ul><li>Rules use =>
  37. 37. Means that SKIP* can appear before and after any token
  38. 38. Maximal munch assumed for tokenization
  39. 39. No delimiters needed between rules
  40. 40. First rule is the start symbol
  41. 41. Recursion is fine
  42. 42. Tokens can be introduced here, too! </li></ul>
  43. 43. Abstract Syntax var y; fun half(x) = x / 2; while x - (5 * x) do write half(10.4), x + 2; read x; end; (Program (Block (Var y) (Fun half x (/ (Ref x) (Numlit 2))) (While (- (Ref x) (* (Numlit 5) (Ref x))) (Block (Write (Call half (Numlit 10.4)) (+ (Ref x)(Numlit 2))) (Read x)))))
  44. 44. Abstract Syntax <ul><li>Goal: We want to define the AST for a given macrosyntax phrase with minimal markup (no code)
  45. 45. Ideas </li><ul><li>AST markup are declarative node expressions
  46. 46. Last expression encountered is the “value”
  47. 47. Some rules don't even need AST expressions
  48. 48. Can have variables which are implicitly list valued, with special syntax for reassignment </li></ul></ul>
  49. 49. Abstract Syntax PROGRAM => b:BLOCK {Program b} BLOCK => (d:DEC ';')* (s:STMT ';')+ {Block d s} DEC => 'var' i:ID ('=' e:EXP)? {Var i e} | 'fun' i:ID '(' p:IDLIST? ')' '=' e:EXP {Fun i p e} IDLIST => i:ID (',' i:ID)* The value of IDLIST is not an AST node; it's just a list since the last thing evaluated was stored in i Value of d is a list of values from each DEC Value of PROGRAM is an AST node with root 'Program; We're purposely using the same variable twice
  50. 50. Abstract Syntax EXP => t1:TERM (o=[+-] t2=TERM t1={o t1 t2})* TERM => f1:FACTOR (o=[*/] f2=FACTOR f1={o f1 f2})* FACTOR => n:NUMLIT {Numlit n} | s:STRLIT {Strlit s} | i:ID {Ref i} | c:CALL | '(' e:EXP ')' Each time we iterate through the ([*/] FACTOR)* syntax element, the vaues of the variables o and f1 are reassigned. Here o refers to the Variable because it Is lowercase It's okay that some of the alternatives produce AST nodes and some do not
  51. 51. Summary of Conventions <ul><li>Rules found automatically, no delimiters
  52. 52. Maximal munch assumed
  53. 53. SKIP is just another rule
  54. 54. Token set inferred
  55. 55. => implies SKIP* separators
  56. 56. AST variables lowercase; nodes capitalized
  57. 57. Value of abstract syntax object is value of last object encountered in L-R parse </li></ul>
  58. 58. More <ul><li>The specification written in itself is in the paper
  59. 59. C, JSON, other specifications on web (http://xlg.cs.lmu.edu/ssd/)
  60. 60. Tool development ongoing </li></ul>
  61. 61. Summary <ul><li>Introduced syntax notation suitable for both humans and parser generators
  62. 62. Added custom features to EBNF
  63. 63. Defined conventions to simplify the notation
  64. 64. Provided examples of use </li></ul>
  65. 65. Related Work <ul><li>Krahn, Rumpe V ö kel (2007) </li><ul><li>Integrated definition of concrete and abstract syntax </li></ul><li>Van Wyk and Schwerdfeger (2007) </li><ul><li>Context-aware scanning </li></ul><li>LR generators (e.g. AnaGram, SableCC) </li><ul><li>LALR parser, AST nodes introduced with '=' </li></ul></ul>
  66. 66. Future Work <ul><li>Lookahead (for macrosyntax only) </li><ul><li>'if' EXP (@2 'else' 'if' ...)* ('else' ...)?
  67. 67. Full syntactic lookahead? </li></ul><li>Microsyntax lookahead (or greedy qualifiers) </li></ul>'/*' ([^*] | *(?!/))* '*/' '/*' [^]*? '*/' <ul><li>Alternatives to maximal munch? (for Java >>)
  68. 68. Ultimate convention: AST nodes automatically generated according to syntax category </li></ul>

×