Of Rats And
                                Dragons
                               Achieving Parsing Sanity



           ...
Quick Review
Saturday, September 12, 2009
context-free
                                grammars


Saturday, September 12, 2009
G = (V, Σ, R, S)



Saturday, September 12, 2009
V→w



Saturday, September 12, 2009
non-deterministic
                        pushdown
                         automata


Saturday, September 12, 2009
stack machine
                           w/ backtracking


Saturday, September 12, 2009
massage it



Saturday, September 12, 2009
ε productions



Saturday, September 12, 2009
produce nothing
                         never produced


Saturday, September 12, 2009
cycles



Saturday, September 12, 2009
ambiguities
                               “dangling else”


Saturday, September 12, 2009
if A then if B then C else D

                if A then if B then C else D

                if A then if B then C else D

...
parsing expression
                     grammars


Saturday, September 12, 2009
top-down parsing
                       language (70’s)


Saturday, September 12, 2009
direct
                      representation of
                      parsing functions


Saturday, September 12, 2009
Brian Ford 2002



Saturday, September 12, 2009
focused on
                               recognizing


Saturday, September 12, 2009
computer
                               languages


Saturday, September 12, 2009
V←e



Saturday, September 12, 2009
e1 e2



Saturday, September 12, 2009
e1 / e2



Saturday, September 12, 2009
e+



Saturday, September 12, 2009
e*



Saturday, September 12, 2009
&e



Saturday, September 12, 2009
!e



Saturday, September 12, 2009
e?



Saturday, September 12, 2009
“string”
                                   .


Saturday, September 12, 2009
PEG > regexps



Saturday, September 12, 2009
combined
                               lex+parse


Saturday, September 12, 2009
no ambiguity



Saturday, September 12, 2009
choice is ordered



Saturday, September 12, 2009
dangling else
                                 obviated


Saturday, September 12, 2009
greedy repetition



Saturday, September 12, 2009
unlimited
                                 lookahead
                               with predicates


Saturday, September ...
no left-recursion!
                               (use *,+)




Saturday, September 12, 2009
Parsing
                               Techniques


Saturday, September 12, 2009
Tabular
                               test every rule


Saturday, September 12, 2009
Recursive-descent
                    call & consume


Saturday, September 12, 2009
Predictive
                               yacc/yecc


Saturday, September 12, 2009
Packrat
                               RD with memo


Saturday, September 12, 2009
sacrifice memory
                           for speed


Saturday, September 12, 2009
supports PEGs and
                   some CFGs


Saturday, September 12, 2009
Treetop
                                Pappy
                               neotoma


Saturday, September 12, 2009
neotoma
                 Behind the CodeTM




Saturday, September 12, 2009
can:has(cukes) ->
                          false.

Saturday, September 12, 2009
Cucumber uses
                                  Treetop


Saturday, September 12, 2009
PEG → leex/yecc
                              FAIL


Saturday, September 12, 2009
parsec → eParSec



Saturday, September 12, 2009
HOF protocol



Saturday, September 12, 2009
% Implements "?" PEG operator
                      optional(P) ->
                        fun(Input, Index) ->
          ...
% PEG
                optional_space <- space?;


                % Erlang
                optional_space(Input,Index) ->
...
Yay! RD!
                               make it memo


Saturday, September 12, 2009
ets
                               Erlang Term
                                 Storage


Saturday, September 12, 2009
{key, value}




Saturday, September 12, 2009
key = Index



Saturday, September 12, 2009
value = dict



Saturday, September 12, 2009
% Memoization wrapper
     p(Inp, StartIndex, Name, ParseFun, TransformFun) ->
       % Grab the memo table from ets
     ...
parse_transform



Saturday, September 12, 2009
alternative(Input, Index) ->
                        peg:p(Input, Index, alternative, fun(I,P) ->
                        ...
rules <- space? declaration_sequence space?;
                declaration_sequence <- head:declaration tail:(space declarat...
self-hosting



Saturday, September 12, 2009
Future directions



Saturday, September 12, 2009
self-contained
                                   parsers


Saturday, September 12, 2009
inline code in PEG



Saturday, September 12, 2009
Reia
                               retem
                               sedate


Saturday, September 12, 2009
questions?



Saturday, September 12, 2009
Upcoming SlideShare
Loading in...5
×

Of Rats And Dragons

1,544

Published on

Reviews grammar and parsers and discusses my personal path toward writing my own packrat parser-generator for Erlang called neotoma.

Given to "Evil Robot Conference" on September 12, 2009.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,544
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Of Rats And Dragons

  1. 1. Of Rats And Dragons Achieving Parsing Sanity Sean Cribbs Web Consultant Ruby and Erlang Hacker Saturday, September 12, 2009
  2. 2. Quick Review Saturday, September 12, 2009
  3. 3. context-free grammars Saturday, September 12, 2009
  4. 4. G = (V, Σ, R, S) Saturday, September 12, 2009
  5. 5. V→w Saturday, September 12, 2009
  6. 6. non-deterministic pushdown automata Saturday, September 12, 2009
  7. 7. stack machine w/ backtracking Saturday, September 12, 2009
  8. 8. massage it Saturday, September 12, 2009
  9. 9. ε productions Saturday, September 12, 2009
  10. 10. produce nothing never produced Saturday, September 12, 2009
  11. 11. cycles Saturday, September 12, 2009
  12. 12. ambiguities “dangling else” Saturday, September 12, 2009
  13. 13. if A then if B then C else D if A then if B then C else D if A then if B then C else D Saturday, September 12, 2009
  14. 14. parsing expression grammars Saturday, September 12, 2009
  15. 15. top-down parsing language (70’s) Saturday, September 12, 2009
  16. 16. direct representation of parsing functions Saturday, September 12, 2009
  17. 17. Brian Ford 2002 Saturday, September 12, 2009
  18. 18. focused on recognizing Saturday, September 12, 2009
  19. 19. computer languages Saturday, September 12, 2009
  20. 20. V←e Saturday, September 12, 2009
  21. 21. e1 e2 Saturday, September 12, 2009
  22. 22. e1 / e2 Saturday, September 12, 2009
  23. 23. e+ Saturday, September 12, 2009
  24. 24. e* Saturday, September 12, 2009
  25. 25. &e Saturday, September 12, 2009
  26. 26. !e Saturday, September 12, 2009
  27. 27. e? Saturday, September 12, 2009
  28. 28. “string” . Saturday, September 12, 2009
  29. 29. PEG > regexps Saturday, September 12, 2009
  30. 30. combined lex+parse Saturday, September 12, 2009
  31. 31. no ambiguity Saturday, September 12, 2009
  32. 32. choice is ordered Saturday, September 12, 2009
  33. 33. dangling else obviated Saturday, September 12, 2009
  34. 34. greedy repetition Saturday, September 12, 2009
  35. 35. unlimited lookahead with predicates Saturday, September 12, 2009
  36. 36. no left-recursion! (use *,+) Saturday, September 12, 2009
  37. 37. Parsing Techniques Saturday, September 12, 2009
  38. 38. Tabular test every rule Saturday, September 12, 2009
  39. 39. Recursive-descent call & consume Saturday, September 12, 2009
  40. 40. Predictive yacc/yecc Saturday, September 12, 2009
  41. 41. Packrat RD with memo Saturday, September 12, 2009
  42. 42. sacrifice memory for speed Saturday, September 12, 2009
  43. 43. supports PEGs and some CFGs Saturday, September 12, 2009
  44. 44. Treetop Pappy neotoma Saturday, September 12, 2009
  45. 45. neotoma Behind the CodeTM Saturday, September 12, 2009
  46. 46. can:has(cukes) -> false. Saturday, September 12, 2009
  47. 47. Cucumber uses Treetop Saturday, September 12, 2009
  48. 48. PEG → leex/yecc FAIL Saturday, September 12, 2009
  49. 49. parsec → eParSec Saturday, September 12, 2009
  50. 50. HOF protocol Saturday, September 12, 2009
  51. 51. % Implements "?" PEG operator optional(P) -> fun(Input, Index) -> case P(Input, Index) of {fail, _} -> {[], Input, Index}; {_,_,_} = Success -> Success % {Parsed, RemainingInput, NewIndex} end end. Saturday, September 12, 2009
  52. 52. % PEG optional_space <- space?; % Erlang optional_space(Input,Index) -> optional(fun space/2)(Input, Index). Saturday, September 12, 2009
  53. 53. Yay! RD! make it memo Saturday, September 12, 2009
  54. 54. ets Erlang Term Storage Saturday, September 12, 2009
  55. 55. {key, value} Saturday, September 12, 2009
  56. 56. key = Index Saturday, September 12, 2009
  57. 57. value = dict Saturday, September 12, 2009
  58. 58. % Memoization wrapper p(Inp, StartIndex, Name, ParseFun, TransformFun) -> % Grab the memo table from ets Memo = get_memo(StartIndex), % See if the current reduction is memoized case dict:find(Name, Memo) of % If it is, return the result {ok, Result} -> Result; % If not, attempt to parse _ -> case ParseFun(Inp, StartIndex) of % If it fails, memoize the failure {fail,_} = Failure -> memoize(StartIndex, dict:store(Name, Failure, Memo)), Failure; % If it passes, transform and memoize the result. {Result, InpRem, NewIndex} -> Transformed = TransformFun(Result, StartIndex), memoize(StartIndex, dict:store(Name, {Transformed, InpRem, NewIndex}, Memo)), {Transformed, InpRem, NewIndex} end end. Saturday, September 12, 2009
  59. 59. parse_transform Saturday, September 12, 2009
  60. 60. alternative(Input, Index) -> peg:p(Input, Index, alternative, fun(I,P) -> peg:choose([fun sequence/2, fun primary/2])(I,P) end). rule(alternative) -> peg:choose([fun sequence/2, fun primary/2]); Saturday, September 12, 2009
  61. 61. rules <- space? declaration_sequence space?; declaration_sequence <- head:declaration tail:(space declaration)*; declaration <- nonterminal space '<-' space parsing_expression space? ';'; parsing_expression <- choice / sequence / primary; choice <- head:alternative tail:(space '/' space alternative)+; alternative <- sequence / primary; primary <- prefix atomic / atomic suffix / atomic; sequence <- head:labeled_sequence_primary tail:(space labeled_sequence_primary)+; labeled_sequence_primary <- label? primary; label <- alpha_char alphanumeric_char* ':'; suffix <- repetition_suffix / optional_suffix; optional_suffix <- '?'; repetition_suffix <- '+' / '*'; prefix <- '&' / '!'; atomic <- terminal / nonterminal / parenthesized_expression; parenthesized_expression <- '(' space? parsing_expression space? ')'; nonterminal <- alpha_char alphanumeric_char*; terminal <- quoted_string / character_class / anything_symbol; quoted_string <- single_quoted_string / double_quoted_string; double_quoted_string <- '"' string:(!'"' ("" / '"' / .))* '"'; single_quoted_string <- "'" string:(!"'" ("" / "'" / .))* "'"; character_class <- '[' characters:(!']' ('' . / !'' .))+ '] anything_symbol <- '.'; alpha_char <- [a-z_]; alphanumeric_char <- alpha_char / [0-9]; space <- (white / comment_to_eol)+; comment_to_eol <- '%' (!"n" .)*; white <- [ tnr]; Saturday, September 12, 2009
  62. 62. self-hosting Saturday, September 12, 2009
  63. 63. Future directions Saturday, September 12, 2009
  64. 64. self-contained parsers Saturday, September 12, 2009
  65. 65. inline code in PEG Saturday, September 12, 2009
  66. 66. Reia retem sedate Saturday, September 12, 2009
  67. 67. questions? Saturday, September 12, 2009

×