SlideShare a Scribd company logo
1 of 66
Round PEG,
Round Hole
parsing(functionally())



    Sean Cribbs
   Basho Technologies
Feature: Accepting invitations
 In order to open my account
 As a teacher or staff member
 I should be able to accept an email invitation

 Scenario Outline: Accept invitation
  Given I have received an email invitation at "joe@school.edu"
  When I follow the invitation link in the email
  And I complete my details as a <type>
  Then I should be logged in
  And the invitation should be accepted

  Examples:
   | type |
   | staff |
   | teacher |

 Scenario: Bad invitation code
  Given I have received an email invitation at "joe@school.edu"
  When I use a bad invitation code
  Then I should be notied that the invitation code is invalid
First, one must
      parse
Gherkin uses Treetop
OK, I’ll convert to
 leex and yecc
Denitions.

D = [0-9]
IDENT = [a-z|A-Z|0-9|_|-]

Rules.

_         : {token, {underscore, TokenLine, TokenChars}}.
-         : {token, {dash, TokenLine, TokenChars}}.
%         : {token, {tag_start, TokenLine, TokenChars}}.
.        : {token, {class_start, TokenLine, TokenChars}}.
#         : {token, {id_start, TokenLine, TokenChars}}.
{D}+         : {token, {number, TokenLine, list_to_integer(TokenChars)}}.
'(^.|.|[^'])*' :
  S = lists:sublist(TokenChars, 2, TokenLen - 2),
  {token, {string, TokenLine, S}}.
{IDENT}+ : {token, {chr, TokenLine, TokenChars}}.
{        : {token, {lcurly, TokenLine, TokenChars}}.
}        : {token, {rcurly, TokenLine, TokenChars}}.
[        : {token, {lbrace, TokenLine, TokenChars}}.
]        : {token, {rbrace, TokenLine, TokenChars}}.
@          : {token, {at, TokenLine, TokenChars}}.
,        : {token, {comma, TokenLine, TokenChars}}.
'        : {token, {quote, TokenLine, TokenChars}}.
:        : {token, {colon, TokenLine, TokenChars}}.
/         : {token, {slash, TokenLine, TokenChars}}.
!        : {token, {bang, TokenLine, TokenChars}}.
(        : {token, {lparen, TokenLine, TokenChars}}.
)        : {token, {rparen, TokenLine, TokenChars}}.
|        : {token, {pipe, TokenLine, TokenChars}}.
<          : {token, {lt, TokenLine, TokenChars}}.
>          : {token, {gt, TokenLine, TokenChars}}.
s+         : {token, {space, TokenLine, TokenChars}}.

Erlang code.
Rootsymbol template_stmt.

template_stmt   ->   doctype : '$1'.
template_stmt   ->   var_ref : '$1'.
template_stmt   ->   iter : '$1'.
template_stmt   ->   fun_call : '$1'.
template_stmt   ->   tag_decl : '$1'.

%% doctype selector
doctype -> bang bang      bang   : {doctype, "Transitional", []}.
doctype -> bang bang      bang   space : {doctype, "Transitional", []}.
doctype -> bang bang      bang   space doctype_name : {doctype, '$5', []}.
doctype -> bang bang      bang   space doctype_name space doctype_name : {doctype, '$5', '$7'}.

doctype_name -> doctype_name_elem doctype_name : '$1' ++ '$2'.
doctype_name -> doctype_name_elem : '$1'.

doctype_name_elem      ->   chr : unwrap('$1').
doctype_name_elem      ->   dash : "-".
doctype_name_elem      ->   class_start : ".".
doctype_name_elem      ->   number : number_to_list('$1').
There Be Dragons
Grammar EXPLODES
doctype   ->   bang   bang   bang.
doctype   ->   bang   bang   bang space.
doctype   ->   bang   bang   bang space doctype_name.
doctype   ->   bang   bang   bang space doctype_name space doctype_name.
tokens, blah


doctype   ->   bang   bang   bang.
doctype   ->   bang   bang   bang space.
doctype   ->   bang   bang   bang space doctype_name.
doctype   ->   bang   bang   bang space doctype_name space doctype_name.
tokens, blah


doctype   ->   bang   bang   bang.
doctype   ->   bang   bang   bang space.
doctype   ->   bang   bang   bang space doctype_name.
doctype   ->   bang   bang   bang space doctype_name space doctype_name.




                                    excessively explicit
Context-Free =
Out of Context
New Project!
PEG parser-
 generator
Parsing Expression
       Grammars
• Brian Ford 2002 Thesis and related papers
• Direct representation of parsing functions,
  like TDPL

• Superset of Regular Expressions
• Ordered Choice removes ambiguities
Dangling Else Problem
   if A then if B then C else D


   if A then if B then C else D


   if A then if B then C else D
Parsing Expressions
   are Functions
PE → Function


      ef
 sequence(e, f)
PE → Function


     e/f
  choice(e, f)
PE → Function


      e+
 one_or_more(e)
PE → Function


     (e)
      e
PE → Function


       e*
 zero_or_more(e)
PE → Function


      !e
    not(e)
PE → Function


      &e
   assert(e)
PE → Function


       e?
   optional(e)
PE → Function


      tag:e
  label(“tag”, e)
PE → Function


   “some text”
string(“some text”)
PE → Function


         [A-Z]
character_class(“[A-Z]”)
PE → Function


       .
   anything()
PE → Function

       “start” e f*
sequence(string(“start”),
  e, zero_or_more(f))
PE → Function


        “start” e f*
p_seq([p_string(“start”),
    fun e/2,
    p_zero_or_more(fun f/2)])
yeccpars0(Tokens, Tzr, State, States, Vstack) ->
  try yeccpars1(Tokens, Tzr, State, States, Vstack)
  catch
     error: Error ->
        Stacktrace = erlang:get_stacktrace(),
        try yecc_error_type(Error, Stacktrace) of
           {syntax_error, Token} ->
              yeccerror(Token);
           {missing_in_goto_table=Tag, Symbol, State} ->
              Desc = {Symbol, State, Tag},
              erlang:raise(error, {yecc_bug, ?CODE_VERSION, Desc},
                       Stacktrace)
        catch _:_ -> erlang:raise(error, Error, Stacktrace)
        end;
     %% Probably thrown from return_error/2:
     throw: {error, {_Line, ?MODULE, _M}} = Error ->
        Error
  end.

yecc_error_type(function_clause, [{?MODULE,F,[State,_,_,_,Token,_,_]} | _]) ->
  case atom_to_list(F) of
     "yeccpars2" ++ _ ->
        {syntax_error, Token};
     "yeccgoto_" ++ SymbolL ->
        {ok,[{atom,_,Symbol}],_} = erl_scan:string(SymbolL),
        {missing_in_goto_table, Symbol, State}
  end.

yeccpars1([Token | Tokens], Tzr, State, States, Vstack) ->
  yeccpars2(State, element(1, Token), States, Vstack, Token, Tokens, Tzr);


                                                                                 HUH?
yeccpars1([], {{F, A},_Line}, State, States, Vstack) ->
  case apply(F, A) of
     {ok, Tokens, Endline} ->
   yeccpars1(Tokens, {{F, A}, Endline}, State, States, Vstack);
     {eof, Endline} ->
        yeccpars1([], {no_func, Endline}, State, States, Vstack);
     {error, Descriptor, _Endline} ->
        {error, Descriptor}
  end;
yeccpars1([], {no_func, no_line}, State, States, Vstack) ->
  Line = 999999,
  yeccpars2(State, '$end', States, Vstack, yecc_end(Line), [],
          {no_func, Line});
yeccpars1([], {no_func, Endline}, State, States, Vstack) ->
  yeccpars2(State, '$end', States, Vstack, yecc_end(Endline), [],
          {no_func, Endline}).

%% yeccpars1/7 is called from generated code.
%%
%% When using the {includele, Includele} option, make sure that
%% yeccpars1/7 can be found by parsing the le without following
%% include directives. yecc will otherwise assume that an old
%% yeccpre.hrl is included (one which denes yeccpars1/5).
yeccpars1(State1, State, States, Vstack, Token0, [Token | Tokens], Tzr) ->
  yeccpars2(State, element(1, Token), [State1 | States],
         [Token0 | Vstack], Token, Tokens, Tzr);
yeccpars1(State1, State, States, Vstack, Token0, [], {{_F,_A}, _Line}=Tzr) ->
  yeccpars1([], Tzr, State, [State1 | States], [Token0 | Vstack]);
yeccpars1(State1, State, States, Vstack, Token0, [], {no_func, no_line}) ->
  Line = yecctoken_end_location(Token0),
  yeccpars2(State, '$end', [State1 | States], [Token0 | Vstack],
         yecc_end(Line), [], {no_func, Line});
yeccpars1(State1, State, States, Vstack, Token0, [], {no_func, Line}) ->
Functions are Data
PEG Functions are
  Higher-Order
Higher-Order
        Functions

• Currying - Functions that return
  functions

• Composition - Functions that take
  functions as arguments
Higher-Order in
             Parsing

•   p_optional(fun e/2) -> fun(Input, Index)

• Receives current input and index/offset
• Returns {fail,NewIndex}or
  {Result, Rest,
                 Reason}
Combinators vs.
Parsing Functions
 p_optional(fun e/2) -> fun(Input, Index)
How to Design a
    Parser
Recursive Descent


• Functions call other functions to recognize
  and consume input

• Backtrack on failure and try next option
Predictive Recursive
       Descent
• Functions call other functions to
  recognize and consume input

• Stream lookahead to determine which
  branch to take (rsts, follows)

• Fail early, retry very little
Recursive Descent
     O(2^N)
Predictive Parsers are
 Expensive to Build
“Packrat” Parsers
      O(N)
“Packrat” Parsers -
         O(N)
• Works and looks like a Recursive Descent
  Parser

• Memoizes (remembers) intermediate
  results - success and fail

• Trades speed for memory consumption
Memoization is a
Parsing Function
 memoize(Rule, Input, Index, Parser)->
 {fail, Reason} | {Result, Rest, NewIndex}
memoize(Rule, Input, Index, Parser) ->
 case get_memo(Rule, Index) of
  undened ->
   store_memo(Rule, Index, Parser(Input, Index));
  Memoized -> Memoized
 end.
How Memoization
         Helps

if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt
if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt




       if A then if B then C else D
if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt




       if A then if B then C else D
if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt




       if A then if B then C else D
if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt




       if A then if B then C else D
if_stmt <- “if” expr “then” stmt “else” stmt /
           “if” expr “then” stmt




                     memoized!


       if A then if B then C else D
Enter Neotoma
http://github.com/seancribbs/neotoma
Neotoma

• Defines a metagrammar
• Generates packrat parsers
• Transformation code inline
• Memoization uses ETS
• Parses and generates itself
Let’s build a CSV
      parser
rows <- row (crlf row)* / ‘’;
rows <- row (crlf row)* / ‘’;
row <- field (field_sep field)* / ‘’;
rows <- row (crlf row)* / ‘’;
row <- field (field_sep field)* / ‘’;
eld <- quoted_eld / (!eld_sep !crlf .)*;
rows <- row (crlf row)* / ‘’;
row <- field (field_sep field)* / ‘’;
eld <- quoted_eld / (!eld_sep !crlf .)*;
quoted_field <- ‘“‘ (‘“”’ / !’”’ .)* ‘“‘;
rows <- row (crlf row)* / ‘’;
row <- field (field_sep field)* / ‘’;
eld <- quoted_eld / (!eld_sep !crlf .)*;
quoted_field <- ‘“‘ (‘“”’ / !’”’ .)* ‘“‘;
field_sep <- ‘,’;
crlf <- ‘rn’ / ‘n’;
rows <- head:row tail:(crlf row)* / ‘’;
row <- head:field tail:(field_sep field)* / ‘’;
eld <- quoted_eld / (!eld_sep !crlf .)*;
quoted_field <- ‘“‘ string:(‘“”’ / !’”’ .)* ‘“‘;
field_sep <- ‘,’;
crlf <- ‘rn’ / ‘n’;
rows <- head:row tail:(crlf row)* / ‘’
`
case Node of
  [] -> [];
  [“”] -> [];
  _ ->
    Head = proplists:get_value(head, Node),
    Tail = [Row || [_CRLF,Row] <-
            proplists:get_value(tail, Node)],
    [Head|Tail]
end
`;
1> neotoma:file(“csv.peg”).
ok
2> c(csv).
{ok, csv}
DO IT LIVE!
FP + PEG + Packrat
         = WIN
• Clear relationship between grammar and
  generated code

• Powerful abstractions, declarative code
• Simpler to understand and implement
• Terminals in context, no ambiguity
sean <- question*;

More Related Content

What's hot

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
Giovanni924
 

What's hot (20)

Functional Programming with Groovy
Functional Programming with GroovyFunctional Programming with Groovy
Functional Programming with Groovy
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1
 
Achieving Parsing Sanity In Erlang
Achieving Parsing Sanity In ErlangAchieving Parsing Sanity In Erlang
Achieving Parsing Sanity In Erlang
 
Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)
 
PHP Powerpoint -- Teach PHP with this
PHP Powerpoint -- Teach PHP with thisPHP Powerpoint -- Teach PHP with this
PHP Powerpoint -- Teach PHP with this
 
Matlab and Python: Basic Operations
Matlab and Python: Basic OperationsMatlab and Python: Basic Operations
Matlab and Python: Basic Operations
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Coding Guidelines - Crafting Clean Code
Coding Guidelines - Crafting Clean CodeCoding Guidelines - Crafting Clean Code
Coding Guidelines - Crafting Clean Code
 
Python and sysadmin I
Python and sysadmin IPython and sysadmin I
Python and sysadmin I
 
Learn python in 20 minutes
Learn python in 20 minutesLearn python in 20 minutes
Learn python in 20 minutes
 
Introduction to Python and TensorFlow
Introduction to Python and TensorFlowIntroduction to Python and TensorFlow
Introduction to Python and TensorFlow
 
php string part 4
php string part 4php string part 4
php string part 4
 
Slides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammersSlides chapter3part1 ruby-forjavaprogrammers
Slides chapter3part1 ruby-forjavaprogrammers
 
AmI 2016 - Python basics
AmI 2016 - Python basicsAmI 2016 - Python basics
AmI 2016 - Python basics
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Introduction to Perl and BioPerl
Introduction to Perl and BioPerlIntroduction to Perl and BioPerl
Introduction to Perl and BioPerl
 

Similar to Round PEG, Round Hole - Parsing Functionally

Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)
ujihisa
 
So I am writing a CS code for a project and I keep getting cannot .pdf
So I am writing a CS code for a project and I keep getting cannot .pdfSo I am writing a CS code for a project and I keep getting cannot .pdf
So I am writing a CS code for a project and I keep getting cannot .pdf
ezonesolutions
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
LoĂŻc Descotte
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisa
ujihisa
 
Good Evils In Perl
Good Evils In PerlGood Evils In Perl
Good Evils In Perl
Kang-min Liu
 
Scala 2 + 2 > 4
Scala 2 + 2 > 4Scala 2 + 2 > 4
Scala 2 + 2 > 4
Emil Vladev
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9
Aaron Patterson
 

Similar to Round PEG, Round Hole - Parsing Functionally (20)

Elixir formatter Internals
Elixir formatter InternalsElixir formatter Internals
Elixir formatter Internals
 
SWP - A Generic Language Parser
SWP - A Generic Language ParserSWP - A Generic Language Parser
SWP - A Generic Language Parser
 
Php & my sql
Php & my sqlPhp & my sql
Php & my sql
 
GE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python ProgrammingGE8151 Problem Solving and Python Programming
GE8151 Problem Solving and Python Programming
 
Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.Perl6 Regexen: Reduce the line noise in your code.
Perl6 Regexen: Reduce the line noise in your code.
 
Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)Hacking parse.y (RubyKansai38)
Hacking parse.y (RubyKansai38)
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
Php functions
Php functionsPhp functions
Php functions
 
So I am writing a CS code for a project and I keep getting cannot .pdf
So I am writing a CS code for a project and I keep getting cannot .pdfSo I am writing a CS code for a project and I keep getting cannot .pdf
So I am writing a CS code for a project and I keep getting cannot .pdf
 
Scala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar ProkopecScala presentation by Aleksandar Prokopec
Scala presentation by Aleksandar Prokopec
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisa
 
ES6 is Nigh
ES6 is NighES6 is Nigh
ES6 is Nigh
 
Hidden treasures of Ruby
Hidden treasures of RubyHidden treasures of Ruby
Hidden treasures of Ruby
 
Good Evils In Perl
Good Evils In PerlGood Evils In Perl
Good Evils In Perl
 
Scala 2 + 2 > 4
Scala 2 + 2 > 4Scala 2 + 2 > 4
Scala 2 + 2 > 4
 
How to write code you won't hate tomorrow
How to write code you won't hate tomorrowHow to write code you won't hate tomorrow
How to write code you won't hate tomorrow
 
SPL, not a bridge too far
SPL, not a bridge too farSPL, not a bridge too far
SPL, not a bridge too far
 
Lisp Macros in 20 Minutes (Featuring Clojure)
Lisp Macros in 20 Minutes (Featuring Clojure)Lisp Macros in 20 Minutes (Featuring Clojure)
Lisp Macros in 20 Minutes (Featuring Clojure)
 
Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9Hidden Gems of Ruby 1.9
Hidden Gems of Ruby 1.9
 
SDC - EinfĂźhrung in Scala
SDC - EinfĂźhrung in ScalaSDC - EinfĂźhrung in Scala
SDC - EinfĂźhrung in Scala
 

More from Sean Cribbs

Riak (Øredev nosql day)
Riak (Øredev nosql day)Riak (Øredev nosql day)
Riak (Øredev nosql day)
Sean Cribbs
 
Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)
Sean Cribbs
 
Content Management That Won't Rot Your Brain
Content Management That Won't Rot Your BrainContent Management That Won't Rot Your Brain
Content Management That Won't Rot Your Brain
Sean Cribbs
 

More from Sean Cribbs (17)

Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)
 
Eventually-Consistent Data Structures
Eventually-Consistent Data StructuresEventually-Consistent Data Structures
Eventually-Consistent Data Structures
 
A Case of Accidental Concurrency
A Case of Accidental ConcurrencyA Case of Accidental Concurrency
A Case of Accidental Concurrency
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
 
Riak with node.js
Riak with node.jsRiak with node.js
Riak with node.js
 
Schema Design for Riak (Take 2)
Schema Design for Riak (Take 2)Schema Design for Riak (Take 2)
Schema Design for Riak (Take 2)
 
Riak (Øredev nosql day)
Riak (Øredev nosql day)Riak (Øredev nosql day)
Riak (Øredev nosql day)
 
Riak Tutorial (Øredev)
Riak Tutorial (Øredev)Riak Tutorial (Øredev)
Riak Tutorial (Øredev)
 
The Radiant Ethic
The Radiant EthicThe Radiant Ethic
The Radiant Ethic
 
Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)Introduction to Riak and Ripple (KC.rb)
Introduction to Riak and Ripple (KC.rb)
 
Riak with Rails
Riak with RailsRiak with Rails
Riak with Rails
 
Schema Design for Riak
Schema Design for RiakSchema Design for Riak
Schema Design for Riak
 
Introduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf TrainingIntroduction to Riak - Red Dirt Ruby Conf Training
Introduction to Riak - Red Dirt Ruby Conf Training
 
Introducing Riak and Ripple
Introducing Riak and RippleIntroducing Riak and Ripple
Introducing Riak and Ripple
 
Story Driven Development With Cucumber
Story Driven Development With CucumberStory Driven Development With Cucumber
Story Driven Development With Cucumber
 
Of Rats And Dragons
Of Rats And DragonsOf Rats And Dragons
Of Rats And Dragons
 
Content Management That Won't Rot Your Brain
Content Management That Won't Rot Your BrainContent Management That Won't Rot Your Brain
Content Management That Won't Rot Your Brain
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 

Round PEG, Round Hole - Parsing Functionally

  • 1. Round PEG, Round Hole parsing(functionally()) Sean Cribbs Basho Technologies
  • 2.
  • 3. Feature: Accepting invitations In order to open my account As a teacher or staff member I should be able to accept an email invitation Scenario Outline: Accept invitation Given I have received an email invitation at "joe@school.edu" When I follow the invitation link in the email And I complete my details as a <type> Then I should be logged in And the invitation should be accepted Examples: | type | | staff | | teacher | Scenario: Bad invitation code Given I have received an email invitation at "joe@school.edu" When I use a bad invitation code Then I should be notied that the invitation code is invalid
  • 6. OK, I’ll convert to leex and yecc
  • 7. Denitions. D = [0-9] IDENT = [a-z|A-Z|0-9|_|-] Rules. _ : {token, {underscore, TokenLine, TokenChars}}. - : {token, {dash, TokenLine, TokenChars}}. % : {token, {tag_start, TokenLine, TokenChars}}. . : {token, {class_start, TokenLine, TokenChars}}. # : {token, {id_start, TokenLine, TokenChars}}. {D}+ : {token, {number, TokenLine, list_to_integer(TokenChars)}}. '(^.|.|[^'])*' : S = lists:sublist(TokenChars, 2, TokenLen - 2), {token, {string, TokenLine, S}}. {IDENT}+ : {token, {chr, TokenLine, TokenChars}}. { : {token, {lcurly, TokenLine, TokenChars}}. } : {token, {rcurly, TokenLine, TokenChars}}. [ : {token, {lbrace, TokenLine, TokenChars}}. ] : {token, {rbrace, TokenLine, TokenChars}}. @ : {token, {at, TokenLine, TokenChars}}. , : {token, {comma, TokenLine, TokenChars}}. ' : {token, {quote, TokenLine, TokenChars}}. : : {token, {colon, TokenLine, TokenChars}}. / : {token, {slash, TokenLine, TokenChars}}. ! : {token, {bang, TokenLine, TokenChars}}. ( : {token, {lparen, TokenLine, TokenChars}}. ) : {token, {rparen, TokenLine, TokenChars}}. | : {token, {pipe, TokenLine, TokenChars}}. < : {token, {lt, TokenLine, TokenChars}}. > : {token, {gt, TokenLine, TokenChars}}. s+ : {token, {space, TokenLine, TokenChars}}. Erlang code.
  • 8. Rootsymbol template_stmt. template_stmt -> doctype : '$1'. template_stmt -> var_ref : '$1'. template_stmt -> iter : '$1'. template_stmt -> fun_call : '$1'. template_stmt -> tag_decl : '$1'. %% doctype selector doctype -> bang bang bang : {doctype, "Transitional", []}. doctype -> bang bang bang space : {doctype, "Transitional", []}. doctype -> bang bang bang space doctype_name : {doctype, '$5', []}. doctype -> bang bang bang space doctype_name space doctype_name : {doctype, '$5', '$7'}. doctype_name -> doctype_name_elem doctype_name : '$1' ++ '$2'. doctype_name -> doctype_name_elem : '$1'. doctype_name_elem -> chr : unwrap('$1'). doctype_name_elem -> dash : "-". doctype_name_elem -> class_start : ".". doctype_name_elem -> number : number_to_list('$1').
  • 10. doctype -> bang bang bang. doctype -> bang bang bang space. doctype -> bang bang bang space doctype_name. doctype -> bang bang bang space doctype_name space doctype_name.
  • 11. tokens, blah doctype -> bang bang bang. doctype -> bang bang bang space. doctype -> bang bang bang space doctype_name. doctype -> bang bang bang space doctype_name space doctype_name.
  • 12. tokens, blah doctype -> bang bang bang. doctype -> bang bang bang space. doctype -> bang bang bang space doctype_name. doctype -> bang bang bang space doctype_name space doctype_name. excessively explicit
  • 15. Parsing Expression Grammars • Brian Ford 2002 Thesis and related papers • Direct representation of parsing functions, like TDPL • Superset of Regular Expressions • Ordered Choice removes ambiguities
  • 16. Dangling Else Problem if A then if B then C else D if A then if B then C else D if A then if B then C else D
  • 17. Parsing Expressions are Functions
  • 18. PE → Function ef sequence(e, f)
  • 19. PE → Function e/f choice(e, f)
  • 20. PE → Function e+ one_or_more(e)
  • 22. PE → Function e* zero_or_more(e)
  • 23. PE → Function !e not(e)
  • 24. PE → Function &e assert(e)
  • 25. PE → Function e? optional(e)
  • 26. PE → Function tag:e label(“tag”, e)
  • 27. PE → Function “some text” string(“some text”)
  • 28. PE → Function [A-Z] character_class(“[A-Z]”)
  • 29. PE → Function . anything()
  • 30. PE → Function “start” e f* sequence(string(“start”), e, zero_or_more(f))
  • 31. PE → Function “start” e f* p_seq([p_string(“start”), fun e/2, p_zero_or_more(fun f/2)])
  • 32. yeccpars0(Tokens, Tzr, State, States, Vstack) -> try yeccpars1(Tokens, Tzr, State, States, Vstack) catch error: Error -> Stacktrace = erlang:get_stacktrace(), try yecc_error_type(Error, Stacktrace) of {syntax_error, Token} -> yeccerror(Token); {missing_in_goto_table=Tag, Symbol, State} -> Desc = {Symbol, State, Tag}, erlang:raise(error, {yecc_bug, ?CODE_VERSION, Desc}, Stacktrace) catch _:_ -> erlang:raise(error, Error, Stacktrace) end; %% Probably thrown from return_error/2: throw: {error, {_Line, ?MODULE, _M}} = Error -> Error end. yecc_error_type(function_clause, [{?MODULE,F,[State,_,_,_,Token,_,_]} | _]) -> case atom_to_list(F) of "yeccpars2" ++ _ -> {syntax_error, Token}; "yeccgoto_" ++ SymbolL -> {ok,[{atom,_,Symbol}],_} = erl_scan:string(SymbolL), {missing_in_goto_table, Symbol, State} end. yeccpars1([Token | Tokens], Tzr, State, States, Vstack) -> yeccpars2(State, element(1, Token), States, Vstack, Token, Tokens, Tzr); HUH? yeccpars1([], {{F, A},_Line}, State, States, Vstack) -> case apply(F, A) of {ok, Tokens, Endline} -> yeccpars1(Tokens, {{F, A}, Endline}, State, States, Vstack); {eof, Endline} -> yeccpars1([], {no_func, Endline}, State, States, Vstack); {error, Descriptor, _Endline} -> {error, Descriptor} end; yeccpars1([], {no_func, no_line}, State, States, Vstack) -> Line = 999999, yeccpars2(State, '$end', States, Vstack, yecc_end(Line), [], {no_func, Line}); yeccpars1([], {no_func, Endline}, State, States, Vstack) -> yeccpars2(State, '$end', States, Vstack, yecc_end(Endline), [], {no_func, Endline}). %% yeccpars1/7 is called from generated code. %% %% When using the {includele, Includele} option, make sure that %% yeccpars1/7 can be found by parsing the le without following %% include directives. yecc will otherwise assume that an old %% yeccpre.hrl is included (one which denes yeccpars1/5). yeccpars1(State1, State, States, Vstack, Token0, [Token | Tokens], Tzr) -> yeccpars2(State, element(1, Token), [State1 | States], [Token0 | Vstack], Token, Tokens, Tzr); yeccpars1(State1, State, States, Vstack, Token0, [], {{_F,_A}, _Line}=Tzr) -> yeccpars1([], Tzr, State, [State1 | States], [Token0 | Vstack]); yeccpars1(State1, State, States, Vstack, Token0, [], {no_func, no_line}) -> Line = yecctoken_end_location(Token0), yeccpars2(State, '$end', [State1 | States], [Token0 | Vstack], yecc_end(Line), [], {no_func, Line}); yeccpars1(State1, State, States, Vstack, Token0, [], {no_func, Line}) ->
  • 34. PEG Functions are Higher-Order
  • 35. Higher-Order Functions • Currying - Functions that return functions • Composition - Functions that take functions as arguments
  • 36. Higher-Order in Parsing • p_optional(fun e/2) -> fun(Input, Index) • Receives current input and index/offset • Returns {fail,NewIndex}or {Result, Rest, Reason}
  • 37. Combinators vs. Parsing Functions p_optional(fun e/2) -> fun(Input, Index)
  • 38. How to Design a Parser
  • 39. Recursive Descent • Functions call other functions to recognize and consume input • Backtrack on failure and try next option
  • 40. Predictive Recursive Descent • Functions call other functions to recognize and consume input • Stream lookahead to determine which branch to take (rsts, follows) • Fail early, retry very little
  • 42. Predictive Parsers are Expensive to Build
  • 44. “Packrat” Parsers - O(N) • Works and looks like a Recursive Descent Parser • Memoizes (remembers) intermediate results - success and fail • Trades speed for memory consumption
  • 45. Memoization is a Parsing Function memoize(Rule, Input, Index, Parser)-> {fail, Reason} | {Result, Rest, NewIndex}
  • 46. memoize(Rule, Input, Index, Parser) -> case get_memo(Rule, Index) of undened -> store_memo(Rule, Index, Parser(Input, Index)); Memoized -> Memoized end.
  • 47. How Memoization Helps if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt
  • 48. if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt if A then if B then C else D
  • 49. if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt if A then if B then C else D
  • 50. if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt if A then if B then C else D
  • 51. if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt if A then if B then C else D
  • 52. if_stmt <- “if” expr “then” stmt “else” stmt / “if” expr “then” stmt memoized! if A then if B then C else D
  • 54. Neotoma • Denes a metagrammar • Generates packrat parsers • Transformation code inline • Memoization uses ETS • Parses and generates itself
  • 55. Let’s build a CSV parser
  • 56. rows <- row (crlf row)* / ‘’;
  • 57. rows <- row (crlf row)* / ‘’; row <- eld (eld_sep eld)* / ‘’;
  • 58. rows <- row (crlf row)* / ‘’; row <- eld (eld_sep eld)* / ‘’; eld <- quoted_eld / (!eld_sep !crlf .)*;
  • 59. rows <- row (crlf row)* / ‘’; row <- eld (eld_sep eld)* / ‘’; eld <- quoted_eld / (!eld_sep !crlf .)*; quoted_eld <- ‘“‘ (‘“”’ / !’”’ .)* ‘“‘;
  • 60. rows <- row (crlf row)* / ‘’; row <- eld (eld_sep eld)* / ‘’; eld <- quoted_eld / (!eld_sep !crlf .)*; quoted_eld <- ‘“‘ (‘“”’ / !’”’ .)* ‘“‘; eld_sep <- ‘,’; crlf <- ‘rn’ / ‘n’;
  • 61. rows <- head:row tail:(crlf row)* / ‘’; row <- head:eld tail:(eld_sep eld)* / ‘’; eld <- quoted_eld / (!eld_sep !crlf .)*; quoted_eld <- ‘“‘ string:(‘“”’ / !’”’ .)* ‘“‘; eld_sep <- ‘,’; crlf <- ‘rn’ / ‘n’;
  • 62. rows <- head:row tail:(crlf row)* / ‘’ ` case Node of [] -> []; [“”] -> []; _ -> Head = proplists:get_value(head, Node), Tail = [Row || [_CRLF,Row] <- proplists:get_value(tail, Node)], [Head|Tail] end `;
  • 65. FP + PEG + Packrat = WIN • Clear relationship between grammar and generated code • Powerful abstractions, declarative code • Simpler to understand and implement • Terminals in context, no ambiguity

Editor's Notes