• Save
Parsing
Upcoming SlideShare
Loading in...5
×
 

Parsing

on

  • 4,061 views

 

Statistics

Views

Total Views
4,061
Views on SlideShare
4,059
Embed Views
2

Actions

Likes
3
Downloads
0
Comments
1

1 Embed 2

http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Parsing Parsing Presentation Transcript

    • 1 Higher-Order Parsing Next Advanced Parsing Techniques (for Perl) Mark Dominus Plover Systems Co. mjd-perl-hop@plover.com Version 2.0 July, 2007 Note The slides you have are an abridged version Complete ones are available from: http://pic.plover.com/TPC/Parsing/Parsing.pdf http://pic.plover.com/TPC/Parsing/Parsing-2up.pdf http://pic.plover.com/TPC/Parsing/Parsing-4up.pdf Please enjoy, but do not distribute electronically. Next Copyright © 2007 M. J. Dominus
    • 2 Higher-Order Parsing Next Parsing Parsing is the process of taking an unstructured input Such as a sequence of characters and turning it into a data structure Such as a record or an object or a value For example, read the configuration file and build an object that represents the configuration Next Copyright © 2007 M. J. Dominus
    • 3 Higher-Order Parsing Next Open vs. closed systems Some people like closed systems The system should just do all the stuff you need it to It should have a feature for every use-case You should be able to use it without understanding what it is doing Example: Microsoft Windows Next Copyright © 2007 M. J. Dominus
    • 4 Higher-Order Parsing Next Open vs. closed systems I prefer open systems The system should provide modules for doing simple common things The modules should be composable into specialized assemblages It should be possible to assemble a solution for every use-case It should be easy to build new modules Example: Unix Next Copyright © 2007 M. J. Dominus
    • 5 Higher-Order Parsing Next Open vs. closed systems Benefit of open systems: Flexible, powerful, unlimited Drawback: Requires more understanding Parse::RecDescent is a really excellent closed system We’re going to see an open one, HOP::Parser Next Copyright © 2007 M. J. Dominus
    • 6 Higher-Order Parsing Next Example: graphing program Suppose we want to read a web user’s input It will be a mathematical function, like (x^2 + 3*x)* sin(x * 2) + 14 We will emit a web page with a graph of their function Easy solution: Use eval to turn user input into compiled Perl code Next Copyright © 2007 M. J. Dominus
    • 7 Higher-Order Parsing Next Example: graphing program my $function = eval $code; my $y = $function->($x); I don’t need to explain all the things that can go wrong here, do I? Even if it could be made safe, it has some problems: (x^2 + 3*x)* sin(x * 2) + 14 In Perl, ^ means bitwise exclusive or Not exponentiation Alternative: implement an evaluator for expressions Input: string Output: Compiled code Or abstract syntax tree Or specialized data structure Or Expression object ... Next Copyright © 2007 M. J. Dominus
    • 8 Higher-Order Parsing Next Lexing First, our program must identify things like numbers Idea: preprocess the input Turn it from a character string into a list of tokens Each token is an atomic piece of input Examples: sin, x, +, 12345 Humans do this when they read First, turn the sequence of characters into a sequence of words Then, try to understand the structure of the sentence based on meanings of words This is called lexing Next Copyright © 2007 M. J. Dominus
    • 9 Higher-Order Parsing Next Lexing Lexing is mostly a matter of simple pattern matching Perl actually has special regex features just for this purpose Next Copyright © 2007 M. J. Dominus
    • 10 Higher-Order Parsing Next Lexing Consider this: $s =~ /pattern/; This normally locates the first appearance of pattern In lexing, we want to scan the entire string left to right quot;sin(x * 2)quot; Having recognized sin we then want to look at the next position Try various things at that position Eventually figure out that the next token is ( Perl has supported that since about 1989 It’s /.../g Next Copyright © 2007 M. J. Dominus
    • 11 Higher-Order Parsing Next /.../g Each scalar in Perl (can) have a quot;current match positionquot; Usually, this is ignored But matching the scalar with /.../g uses it Instead of starting at the beginning, matching starts at the current position Afterward, the current position is updated to the end of the match Or reset if the match fails while (quot;parliamentarianquot; =~ /(.*?a)/g) { print quot;<$1>nquot;; } <pa> <rlia> <menta> <ria> Without /g, this is an infinite loop It just prints <pa> over and over Point of interest: the pos() function can get and set the matching position Next Copyright © 2007 M. J. Dominus
    • 12 Higher-Order Parsing Next G /.../g isn’t quite what we want Consider: if (quot;sin(37)quot; =~ /(d+)/g) { print quot;<$1>nquot;; } This does find the 37 on the first try But we don’t want that We want to find sin on the first try Then ( on the second try Then 37 on the third try and only on the third try Like all regex matches, /.../g will search rightwards Next Copyright © 2007 M. J. Dominus
    • 13 Higher-Order Parsing Next G if (quot;sin(37)quot; =~ /(d+)/g) { print quot;<$1>nquot;; } We need something like /^.../ Except instead of anchoring the match to the beginning of the string... it should be anchored to the current matching position This is what G does: if (quot;sin(37)quot; =~ G(d+)/g) { / # Match fails unless current position was at the 37 print quot;<$1>nquot;; } Next Copyright © 2007 M. J. Dominus
    • 14 Higher-Order Parsing Next /.../c There’s one more problem we need to solve Our lexer will look something like this: return quot;NUMBERquot; if $s =~ m/G (d+) /gx; return quot;VARquot; if $s =~ m/G (w+) /gx; ... If the first match fails, it destroys the current matching position But the second one needs that position Possible solution: use pos() to save the position before each match Then use pos() again to restore it after Better solution: /.../c With /.../c, the position is retained if the match fails We have no seen all the weird regex features we need (Everyone already knows about /.../x, right?) Next Copyright © 2007 M. J. Dominus
    • 15 Higher-Order Parsing Next Lexing my %builtin = (sin => 1, cos => 1, sqrt => 1); sub lex { my $s = shift; my @t; TOKEN: while (1) { last if $s =~ m/Gz/gxc; push @t, [quot;NUMBERquot;, $1] if $s =~ m/G (d+) /gxc; push @t, $builtin{$1} ? [quot;FUNCTIONquot;, $1] : [quot;VARquot;, $1] if $s =~ m/G ([A-Za-z]w*) /gxc; push @t, [quot;^quot;] if $s =~ m/G ( ^ | ** ) /gxc; push @t, [quot;+quot;] if $s =~ m/G + /gxc; push @t, [quot;*quot;] if $s =~ m/G * /gxc; push @t, [quot;(quot;] if $s =~ m/G ( /gxc; push @t, [quot;)quot;] if $s =~ m/G ) /gxc; next TOKEN if $s =~ m/G s+ /gxc; die quot;Unknown character ’$1’ at ...quot; if $s =~ m/G (.) /gxc; } return @t; } This looks hairy, but most of the code is quite straightforward Note that it scans the entire input and returns a list of tokens A minor variation avoids this defect Next Copyright © 2007 M. J. Dominus
    • 16 Higher-Order Parsing Next Lexing my %builtin = (sin => 1, cos => 1, sqrt => 1); sub make_lexer { my $s = shift; my $lexer = sub { TOP: { return undef if $s =~ m/Gz/gxc; return [quot;NUMBERquot;, $1] if $s =~ m/G (d+) /gxc; return $builtin{$1} ? [quot;FUNCTIONquot;, $1] : [quot;VARquot;, $1] if $s =~ m/G ([A-Za-z]w*) /gxc; return [quot;^quot;] if $s =~ m/G ( ^ | ** ) /gxc; return [quot;+quot;] if $s =~ m/G + /gxc; return [quot;*quot;] if $s =~ m/G * /gxc; return [quot;(quot;] if $s =~ m/G ( /gxc; return [quot;)quot;] if $s =~ m/G ) /gxc; redo TOP if $s =~ m/G s+ /gxc; die quot;Unknown character ’$1’ at ...quot; if $s =~ m/G (.) /gxc; }}; return $lexer; } make_lexer() captures the input in $s It then builds an anonymous function (quot;thunkquot;) When called, the thunk analyzes the next part of the string It locates and returns the next token Next Copyright © 2007 M. J. Dominus
    • 17 Higher-Order Parsing Next Lexing sub make_lexer { my $s = shift; my $lexer = sub { ... }}; return $lexer; } To use this: my $lexer = make_lexer($input); while (my $token = $lexer->()) { my ($type, $value) = @$token; # do something with $type and $value } It may be convenient to have: sub type { my $token = shift; $token->[0] } sub value { my $token = shift; $token->[1] } Then: while (my $token = $lexer->()) { if (type($token) eq quot;NUMquot;) { # do something with value($token) } } Next Copyright © 2007 M. J. Dominus
    • 18 Higher-Order Parsing Next Lexing sub make_lexer { my $s = shift; my $lexer = sub { ... }}; return $lexer; } We’ll assume from now on that the tokens are in an array That’s mostly for convenience of exposition A better approach is to generate them on demand, as above But the implementation above makes it hard to go backward I didn’t want to get into the details of how to make it work So instead: sub first { my ($first, @rest) = @{shift()}; return $first; } sub rest { my ($first, @rest) = @{shift()}; return @rest; } Next Copyright © 2007 M. J. Dominus
    • 19 Higher-Order Parsing Next Lexing return [quot;^quot;] if $s =~ m/G ( ^ | ** ) /gxc; Notice how the lexer can recognize both ^ and ** and eliminate the distinction This is a very common sort of behavior For example, Perl’s lexer effaces the distinction between for and foreach By the time the parser gets the token, it no longer knows which was written This tends to simplify the parser Similarly, the Perl lexer returns identical results for: quot;O’Reillyquot; qq{O’Reilly} ’O’Reilly’ q.O’Reilly. quot;x4fx27x52x65x69x6cx6cx79quot; Also notice how ** is lexed as a power operator, not as two multiplication signs return [quot;^quot;] if $s =~ m/G ( ^ | ** ) /gxc; ... return [quot;*quot;] if $s =~ m/G * /gxc; Okay, we are finally ready to parse Next Copyright © 2007 M. J. Dominus
    • 20 Higher-Order Parsing Next Grammars expression -> quot;(quot; expression quot;)quot; | term (quot;+quot; expression | nothing) term -> factor (quot;*quot; term | nothing) factor -> atom (quot;^quot; NUMBER | nothing) atom -> NUMBER | VAR | FUNCTION quot;(quot; expression quot;)quot; Typical expression: ( 4 + 3 ) * x * f(z) Next Copyright © 2007 M. J. Dominus
    • 21 Higher-Order Parsing Next Recursive-descent parsing This is the method used by Parse::RecDescent too Idea: each grammar rule becomes a function The job of the function expression is to parse an expression If it succeeds, it returns a data structure representing the expression If not, it returns undef Next Copyright © 2007 M. J. Dominus
    • 22 Higher-Order Parsing Next Recursive-descent parsing If you have a rule like this: expression -> quot;(quot; expression quot;)quot; | term (quot;+quot; expression | nothing) You have functions called expression and term expression looks to see if the next token is ( If so, it calls itself recursively, and then looks for the ) Otherwise it calls term to look for a term If term fails, expression does too Otherwise it looks to see if there’s a + sign and another expression Next Copyright © 2007 M. J. Dominus
    • 23 Higher-Order Parsing Next Recursive-descent parsing The description on the previous slide sounds complicated But there are only a few fundamental operations: Look for a certain token Look for either of x or y Look for x followed by y Look for nothing A HOP::Parser parser will be a function that takes a token array It examines some tokens If it likes what it sees, it constructs a value Then it returns the value and the remaining tokens Otherwise, it returns undef Next Copyright © 2007 M. J. Dominus
    • 24 Higher-Order Parsing Next Basic parsers expression -> quot;(quot; expression quot;)quot; | term (quot;+quot; expression | nothing) The simplest parser is the one that corresponds to nothing It consumes no tokens and always succeeds: sub nothing { my $tokens = shift; return (undef, $tokens); } The undef here is a dummy value Next Copyright © 2007 M. J. Dominus
    • 25 Higher-Order Parsing Next Token parsers The next simplest is a parser that looks for a particular token: sub lookfor_PLUS { my $tokens = shift; my $tok = first($tokens); if (type($tok) eq quot;+quot;) { return (quot;+quot;, rest($tokens)); } else { return; # failure } } sub lookfor_NUMBER { my $tokens = shift; my $tok = first($tokens); if (type($tok) eq quot;NUMBERquot;) { return (value($tok), rest($tokens)); } else { return; # failure } } Note that the quot;valuequot; returned by lookfor_NUMBER is the value of the number token it finds Next Copyright © 2007 M. J. Dominus
    • 26 Higher-Order Parsing Next Token parsers Rather than write 9 similar lookfor functions, we’ll have another function build them as required: sub lookfor { my $target = shift; my $parser = sub { my $tokens = shift; my $tok = first($tokens); if (type($tok) eq $target) { return (value($tok), rest($tokens)); } else { return; # failure } }; return $parser; } Now instead of lookfor_PLUS we just use lookfor(quot;+quot;) Instead of lookfor_NUMBER we just use lookfor(quot;NUMBERquot;) Next Copyright © 2007 M. J. Dominus
    • 27 Higher-Order Parsing Next Concatenation Let’s pretend for a moment that atom has only this one rule: atom -> quot;FUNCquot; quot;(quot; expression quot;)quot; We could write atom like this: sub atom { my $t1 = shift; my ($expr, $t2, $t3, $t4, $t5); if ( ($funcname, $t2) = lookfor(quot;FUNCquot;)->($t1) && (undef, $t3) = lookfor(quot;(quot;)->($t2) && ($expr, $t4) = expression($t3) && (undef, $t5) = lookfor(quot;)quot;)->($t4)) { my $val = ... $funcname ... $expr ...; return ($val, $t5); } else { return; # failure } } But many other functions would also follow this pattern So instead we’ll write a function that assembles small parsers into big ones Given parser functions A, B, etc.: conc(A, B, ...) Will return a parser function that looks for A, then B, etc. Next Copyright © 2007 M. J. Dominus
    • 28 Higher-Order Parsing Next Concatenation sub conc { my @p = @_; my $parser = sub { my $tokens = shift; my @results; for my $p (@p) { my ($result, $t_new) = $p->($tokens) or return; # failure push @results, $result; $tokens = $t_new; } # all parsers succeeded return (@results, $tokens); }; return $parser; } With this definition, atom becomes simply: $atom = conc(lookfor(quot;FUNCquot;), lookfor(quot;(quot;), $expression, lookfor(quot;)quot;), ); Next Copyright © 2007 M. J. Dominus
    • 29 Higher-Order Parsing Next Concatenation Similarly, the rule expression -> quot;(quot; expression quot;)quot; Translates to: $expression = conc(lookfor(quot;(quot;), $expression, lookfor(quot;)quot;), ); Oops, no, not quite Next Copyright © 2007 M. J. Dominus
    • 30 Higher-Order Parsing Next Concatenation $expression = conc(lookfor(quot;(quot;), $expression, lookfor(quot;)quot;), ); We can’t use $expression before we’ve defined it But we can pull a very sly trick We’ll define a proxy parser, to postpone use of $expression until call time my $expression; my $EXPRESSION = sub { $expression->(@_) }; $expression = conc(lookfor(quot;(quot;), $EXPRESSION, lookfor(quot;)quot;), ); By the time $expression is actually called, it will be completely defined Next Copyright © 2007 M. J. Dominus
    • 31 Higher-Order Parsing Next Alternation Atoms come in three varieties, not just one: atom -> NUMBER | VAR | function quot;(quot; expression quot;)quot; So we need the atom parser to try these three different things It fails only if the upcoming tokens match none of them Something like this: sub atom { my $in = shift; my ($result, $out); my $alt3 = conc(lookfor(quot;FUNCquot;), lookfor(quot;(quot;), $EXPRESSION, lookfor(quot;)quot;), ); if ( ($result, $out) = lookfor(quot;NUMBERquot;)->($in)) { return ($result, $out); } elsif (($result, $out) = lookfor(quot;VARquot;)->($in)) { return ($result, $out); } elsif (($result, $out) = $alt3->($in)) { return ($result, $out); } else { return; } } But again, many functions will follow this same pattern So instead we’ll write a function that assembles small parsers into big ones Given parser functions A, B, etc.: alt(A, B, ...) Will return a parser function that looks for A or for B, etc. Next Copyright © 2007 M. J. Dominus
    • 32 Higher-Order Parsing Next Alternation sub alt { my @p = @_; my $parser = sub { my $in = shift; for my $p (@p) { if (my ($result, $out) = $p->($in)) { return ($result, $out); } } return; # failure }; return $parser; } Next Copyright © 2007 M. J. Dominus
    • 33 Higher-Order Parsing Next Parsers With this definition, a complete definition of atom is: $atom = alt(lookfor(quot;NUMBERquot;), lookfor(quot;VARquot;), conc(lookfor(quot;FUNCquot;), lookfor(quot;(quot;), $EXPRESSION, lookfor(quot;)quot;), )); Similarly, here’s $factor: # factor -> atom (quot;^quot; NUMBER | nothing) $factor = conc($ATOM, alt(conc(lookfor(quot;^quot;), lookfor(quot;NUMBERquot;)), &nothing)); Here’s $term: # term -> factor (quot;*quot; term | nothing) $term = conc($FACTOR, alt(conc(lookfor(quot;*quot;), $TERM), &nothing)); Next Copyright © 2007 M. J. Dominus
    • 34 Higher-Order Parsing Next Parsers Here’s $expression: # expression -> quot;(quot; expression quot;)quot; # | term (quot;+quot; expression | nothing) $expression = alt(conc(lookfor(quot;(quot;)), $EXPRESSION, lookfor(quot;)quot;), conc($TERM, alt(conc(lookfor(quot;+quot;), $EXPRESSION), &nothing)); This doesn’t look great, but: 1. When you consider how much it’s doing, it’s amazingly brief, and 2. We can use operator overloading and rewrite it as: $expression = L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) | $TERM - (L(quot;+quot;) - $EXPRESSION | $nothing); Next Copyright © 2007 M. J. Dominus
    • 35 Higher-Order Parsing Next Overloading $expression = L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) | $TERM - (L(quot;+quot;) - $EXPRESSION | $nothing); This looks almost exactly like the grammar rule we’re implementing expression -> quot;(quot; expression quot;)quot; | term (quot;+quot; expression | nothing) It also looks a lot like a Parse::RecDescent specification But it’s actually Perl code, not a limited sub-language I’ll use this notation from now on Next Copyright © 2007 M. J. Dominus
    • 36 Higher-Order Parsing Next Complete expression parser use HOP::Parser qw(:overloading $nothing); my ($expression, $term, $factor, $atom); my $EXPRESSION = sub { $expression->(@_) }; my $TERM = sub { $term->(@_) }; $atom = L(quot;NUMBERquot;) | L(quot;VARquot;) | L(quot;FUNCTIONquot;) - L(quot;(quot;) - $EXPRESSION - L(quot;)quot;); $factor = $atom - (L(quot;^quot;) - L(quot;NUMBERquot;) | $nothing); $term = $factor - (L(quot;*quot;) - $TERM | $nothing); $expression = L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) | $term - (L(quot;+quot;) - $EXPRESSION | $nothing); We need to use the capitalized versions when referring to parsers we haven’t defined yet Otherwise, either version will do Next Copyright © 2007 M. J. Dominus
    • 37 Higher-Order Parsing Next Parsers So far we’ve done a bunch of work to build a parser system It has some modular, interchangeable parts We can use these to manufacture all kinds of parsers But it’s only gotten us to the same place that use Parse::RecDescent does Is there any other benefit? Yes. Parse::RecDescent is what it is, and goes only as far as it goes Our system is only getting started Next Copyright © 2007 M. J. Dominus
    • 38 Higher-Order Parsing Next Optional items Regexes have a ? notation that means an item is optional We might want to say something like: term -> factor optional(quot;*quot; term) We can define optional quite easily: sub optional { my $p = shift; return alt($p, $nothing); } Now this: $term = $factor - (L(quot;*quot;) - $TERM | $nothing); Becomes this: $term = $factor - optional(L(quot;*quot;) - $TERM); Parse::RecDescent provides this with its (?) notation Next Copyright © 2007 M. J. Dominus
    • 39 Higher-Order Parsing Next Repeated items Similarly, we can implement repeat Suppose $p looks for something like P Then repeat($p) will return a parser that looks for PPPPP With zero or more P’s That is: repeat(P) -> P repeat(P) | nothing Since the definition of repeat uses itself, we’ll use the proxy parser technique: sub repeat { my $p = shift; my $repeat; my $REPEAT = sub { $repeat->(@_) }; # Proxy parser $repeat = alt(conc($p, $REPEAT), $nothing); return $repeat; } Parse::RecDescent provides this with its (s?) notation Next Copyright © 2007 M. J. Dominus
    • 40 Higher-Order Parsing Next Lists Here’s something else that’s common in programming languages Comma-separated expression lists Or semicolon-separated statement blocks Or ... sub list_of { my ($item, $separator) = @_; $separator = lookfor(quot;COMMAquot;) unless defined $separator; conc($item, repeat($separator, $item), optional($separator)); } Now comma-separated lists: $list = conc(lookfor(quot;(quot;), list_of($EXPRESSION), lookfor(quot;)quot;)); Semicolon-separated statement blocks: $block = conc(lookfor(quot;{quot;), list_of($STATEMENT, lookfor(quot;;quot;)), lookfor(quot;}quot;)); We have now gone beyond Parse::RecDescent Next Copyright © 2007 M. J. Dominus
    • 41 Higher-Order Parsing Next Labeled blocks In chapter 9 of Higher-Order Perl, I built a drawing system The input language to the drawing system contained constructions like: constraints { ... } And: define square extends rectangle { ... } So I used an even higher-level parser constructor: sub labeled_block { my ($header, $item, $separator) = @_; $separator = lookfor(quot;;quot;) unless defined $separator; conc($header, lookfor(quot;{quot;), list_of($item, $separator), lookfor(quot;}quot;)); } And defined more complex parsers with it: $constraint_block = labeled_block(L(quot;CONSTRAINTSquot;), $constraint); $definition = labeled_block($Definition_header, $declaration); Next Copyright © 2007 M. J. Dominus
    • 42 Higher-Order Parsing Next New tools We’ve built all this up just by gluing together a very few basic tools: lookfor() conc() alt() But the tools themselves were simple If we needed some new tool, we could build it For example, quot;look for A, but only if it doesn’t also look like Bquot;: sub this_but_not_that { my ($A, $B) = @_; my $parser = sub { my $in = shift; my ($res, $out) = $A->($in) or return; if ($B->($in)) { return; } return ($res, $out); }; return $parser; } Next Copyright © 2007 M. J. Dominus
    • 43 Higher-Order Parsing Next New tools Or quot;do what A does, but only if the result satisfies some conditionquot;: sub side_condition { my ($A, $condition) = @_; my $parser = sub { my $in = shift; my ($res, $out) = $A->($in) or return; $condition->($res) or return; return ($res, $out); }; } Next Copyright © 2007 M. J. Dominus
    • 44 Higher-Order Parsing Next New tools Or quot;do what $A does, but only some number of timesquot; sub repeat_n { my ($A, $min, $max) = @_; die ... unless defined $min; $max = $min unless defined $max; return sub { my $input = shift; my @results; while (@results < $max && ($input, $result) = $A->($input)) { push @results, $result; } if (@results < $min) { return } else { return new_nest(@results) } }; } We’ll see more examples of tool-building as we go on Next Copyright © 2007 M. J. Dominus
    • 45 Higher-Order Parsing Next Open systems again Sorry to keep harping on this, but I think it’s important 1. By providing a few interchangeable parts, we enable not only powerful parsers But ways to build tools to build even more powerful parsers 2. Since the tools themselves are simple, it’s easy to make new ones A small amount of effort put into new tools pays off big Functional programming style enables use of functions as modular components Functions like lookfor and conc manufacture other functions as needed Next Copyright © 2007 M. J. Dominus
    • 46 Higher-Order Parsing Next Semantic values One essential matter I’ve left out until now We want our expression parser to evaluate expressions I talked about the parsing, but where’s the evaluation? Suppose I have a token list for an expression like 2 + 3 * 4 I hand this token list to the $expression function I said it would return quot;a valuequot; What does it actually return? Probably something like [ 2, ’+’, [ 3, ’*’, 4 ] ] We wanted it to return 14 Next Copyright © 2007 M. J. Dominus
    • 47 Higher-Order Parsing Next Semantic values Let’s consider this token list: [ [’NUM’, 2], [’*’], [’NUM’, 3] ] This is a term What if we pass this to the factor parser? $term = conc($FACTOR, alt(conc(lookfor(quot;*quot;), $TERM), &nothing)); The values for the various sub-parts are 2, quot;*quot;, and 3 The inner conc gets the latter two and returns [quot;*quot;, 3]: sub conc { ... for my $p (@p) { my ($result, $t_new) = $p->($tokens) or return; # failure push @results, $result; $tokens = $t_new; } # all parsers succeeded return (@results, $tokens); }; return $parser; } alt returns [quot;*quot;, 3] unchanged The outer conc gets the 2 and the [quot;*quot;, 3] and returns [2, [quot;*quot;, 3]]: That’s the return value from $term But we wanted it to return 6. Next Copyright © 2007 M. J. Dominus
    • 48 Higher-Order Parsing Next Semantic values The first thing to do is to un-nest the nested arrays from conc This is a little trickier than it sounds This doesn’t work: sub un_nest { my $pair = shift; my ($a, $b) = @$pair; return ($a, un_nest($b)); } Problem 1: no base case for the recursion So it never finishes sub un_nest { my $pair = shift; my ($a, $b) = @$pair; return ($a, ref($b) ? un_nest($b) : ($b)); } Problem 2: Now it terminates, but the test is all wrong Next Copyright © 2007 M. J. Dominus
    • 49 Higher-Order Parsing Next Semantic values Solution: have conc tag its nests unambiguously: sub conc { ... return (new_nest(@results), $tokens); ... } sub new_nest { my $items = shift; bless $items => quot;Hop::Parser::NESTquot;; } sub is_nest { ref($_[0]) eq quot;Hop::Parser::NESTquot;; } sub un_nest { my $pair = shift; return $pair unless is_nest($pair); my ($a, $b) = @$pair; return ($a, un_nest($b)); } Next Copyright © 2007 M. J. Dominus
    • 50 Higher-Order Parsing Next Semantic values Why did we bother to tag the output of conc? So we can do this: $term = conc($FACTOR, alt(conc(lookfor(quot;*quot;), $TERM), &nothing)) >> sub { my ($a, $b, $c) = @_; if (defined $c) { return $a * $c } else { return $a } }; The >> operator takes a parser and a function It returns a new parser that parses the same as the original one But the value produced by the new parser is filtered through the function The value of $FACTOR becomes $a inside the function The value of lookfor(quot;*quot;), if present, becomes $b The value of $TERM, if present, becomes $c The resulting parser takes (2, quot;+quot;, 3) and returns 6 Next Copyright © 2007 M. J. Dominus
    • 51 Higher-Order Parsing Next Semantic values $term = $FACTOR - (L(quot;*quot;) - $TERM | $nothing) >> sub { my ($a, $b, $c) = @_; if (defined $c) { return $a * $c } else { return $a } }; How does >> work? It’s overloaded to call this transform function: sub transform { my ($p, $f) = @_; my $parser = sub { my $tokens = shift; my ($result, $t_new) = $p->($tokens); my @result = un_nest($result); return ($f->(@result), $t_new); }; return $parser; } transform undoes the effects of the concatenation Then uses $f to filter the resulting list of values Next Copyright © 2007 M. J. Dominus
    • 52 Higher-Order Parsing Next Semantic values We can now tag parsers with arbitrary actions: my %VAR; my %builtin = (sin => sub { return sin($_[0]) }, cos => ... ); $atom = L(quot;NUMBERquot;) | L(quot;VARquot;) >> sub { my $name = shift; $VAR{$name} } | L(quot;FUNCTIONquot;) - L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) >> sub { my ($f_name, undef, $arg, undef) = @_; my $f = $builtin{$f_name} or die quot;Unknown function name ’$f_name’...nquot;; return $f->($arg); }; $factor = $atom - (L(quot;^quot;) - L(quot;NUMBERquot;) | $nothing) >> sub { my ($base, undef, $power) = @_; $power = 1 unless defined $power; return $base ** $power; }; ... Next Copyright © 2007 M. J. Dominus
    • 53 Higher-Order Parsing Next Semantic values my %VAR; Hmm, we have no assignment expressions $expression = L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) >> sub { $_[1] } | $term | $term - L(quot;+quot;) - $EXPRESSION >> sub { $_[0] + $_[2] } | L(quot;VARquot;) - L(quot;=quot;) - $EXPRESSION >> sub { my ($name, $val) = @_[0, 2]; $VAR{$name} = $val; return $val; } Our grammar here supports a = b = c = 3, just like Perl Next Copyright © 2007 M. J. Dominus
    • 54 Higher-Order Parsing Next Left recursion My examples so far have involved + and * This is because they work OK as written so far Unfortunately, - and / introduce a technical issue Adding them looks straightforward at first Something like this: $expression = L(quot;VARquot;) - L(quot;=quot;) - $EXPRESSION >> sub { return $VAR{$_[0]} = $_[2] } | $TERM - L(’+’) - $EXPRESSION >> sub { $_[0] + $_[2] } | $TERM - L(’-’) - $EXPRESSION >> sub { $_[0] - $_[2] } | L(quot;(quot;) - $EXPRESSION - L(quot;)quot;) >> sub { $_[1] } | $TERM ; But this doesn’t quite work Next Copyright © 2007 M. J. Dominus
    • 55 Higher-Order Parsing Next Left recursion $expression = ... | $TERM - L(’+’) - $EXPRESSION >> sub { $_[0] + $_[2] } | $TERM - L(’-’) - $EXPRESSION >> sub { $_[0] - $_[2] } ... ; Consider 8 + 4 + 1 This is ambiguous Is it a term (quot;8quot;) plus an expression (quot;4 + 1quot;)? Or is it an expression (quot;8 + 4quot;) plus a term (quot;1quot;)? Or grammar says it’s the former The value of the latter expression (quot;4 + 1quot;) is 5 The value of the entire expression is 8 + 5 = 13 No problem so far Next Copyright © 2007 M. J. Dominus
    • 56 Higher-Order Parsing Next Left recursion $expression = ... | $TERM - L(’+’) - $EXPRESSION >> sub { $_[0] + $_[2] } | $TERM - L(’-’) - $EXPRESSION >> sub { $_[0] - $_[2] } ... ; Now consider 8 - 4 - 1 This is a term (quot;8quot;) minus an expression (quot;4 - 1quot;) The value of the latter expression (quot;4 - 1quot;) is 3 The value of the entire expression is 8 - 3 = 5 Unfortunately, it was supposed to be 3, not 5 Whoops Next Copyright © 2007 M. J. Dominus
    • 57 Higher-Order Parsing Next Left recursion $expression = ... | $TERM - L(’+’) - $EXPRESSION >> sub { $_[0] + $_[2] } | $TERM - L(’-’) - $EXPRESSION >> sub { $_[0] - $_[2] } ... ; Okay, no problem 8 - 4 - 3 means ((8 - 4) - 3) and not (8 - (4 - 3)) So we should have said that an expression is term - expression: $expression = ... | $EXPRESSION - L(’+’) - $TERM >> sub { $_[0] + $_[2] } | $EXPRESSION - L(’-’) - $TERM >> sub { $_[0] - $_[2] } ... ; Unfortunately, this does not work at all Next Copyright © 2007 M. J. Dominus
    • 58 Higher-Order Parsing Next Left recursion $expression = ... | $EXPRESSION - L(’+’) - $TERM >> sub { $_[0] + $_[2] } | $EXPRESSION - L(’-’) - $TERM >> sub { $_[0] - $_[2] } ... ; What is the expression parser doing? If it gets to the new quot;+quot; rule, what will it do first? Why, it will call itself recursively On the entire input stream The recursive call will also invoke the expression parser recursively On the entire input stream The recursive call will also invoke the expression parser recursively On the entire input stream The recursive call will also invoke the expression parser recursively On the entire input stream The recursive call will also invoke the expression parser recursively On the entire input stream ... This problem is called left recursion Next Copyright © 2007 M. J. Dominus
    • 59 Higher-Order Parsing Next Left recursion I wish I had a good way to fix this Before I show you how to fix it, I want to answer a question I am afraid you might be thinking Geez, why not just use Parse::RecDescent and avoid all these issues completely? Well, I already said why I like this thing better in spite of the technical issues But in this case, there’s a more compelling reason: #!/usr/bin/perl use Parse::RecDescent; my $g = Parse::RecDescent->new(quot;a: a b | cquot;); ERROR: Rule quot;aquot; is left-recursive. This problem affects all recursive-descent parsing So let’s forge ahead Next Copyright © 2007 M. J. Dominus
    • 60 Higher-Order Parsing Next Left recursion elimination Left recursion can be eliminated from a grammar Let’s consider this trivial left-recursive grammar: A -> Ab|c If this ever terminates, it’s because A is actually a c So the resulting sentence must begin with c It could be any of: c cb cbb cbbb ... Clearly, this grammar is equivalent: A -> c repeat(b) Next Copyright © 2007 M. J. Dominus
    • 61 Higher-Order Parsing Next Left recursion elimination This simple rule: A -> Ab|c Becomes this: A -> c repeat(b) More generally, this: P -> Ps|Pt|x|y; Becomes this: P -> (x | y) repeat(s | t) We’ll rewrite our left-recursive expression rule and see what happens Next Copyright © 2007 M. J. Dominus
    • 62 Higher-Order Parsing Next Left recursion elimination The expression parser turns into something like this: $expression = $simple_expression repeat($TERM_SUM); $simple_expression = L(quot;VARquot;) - L(quot;=quot;) - $expression | $TERM | ... ; $term_sum = L(’+’) - $TERM | L(’-’) - $TERM ; That wasn’t too bad 8 - 4 - 3 parses as: the $simple_expression quot;8quot;, followed by the $TERM_SUM quot;- 4quot;, followed by the $TERM_SUM quot;- 3quot; But what do we do about the semantic values? What is the value of a $term_sum item, for example? Next Copyright © 2007 M. J. Dominus
    • 63 Higher-Order Parsing Next Left recursion elimination $term_sum = L(’+’) - $TERM | L(’-’) - $TERM ; We’ll say that the quot;valuequot; of - 4 is a function One that takes the rest of the expression, to be determined later and then subtracts the 4 from it $term_sum = L(’+’) - $TERM >> sub { my $a = $_[1]; return sub { $_[0] + $a }; } | L(’-’) - $TERM >> sub { my $a = $_[1]; return sub { $_[0] - $a }; } ; Next Copyright © 2007 M. J. Dominus
    • 64 Higher-Order Parsing Next Left recursion elimination $expression = $simple_expression repeat($TERM_SUM); The value of $TERM_SUM, say - 13 is a function like this: sub { $_[0] - 13 } The value of repeat($TERM_SUM) is a list of such functions (Actually a nest, but >> will take care of that) The value of $simple_expression is just a number, like before For example, when parsing 8 - 4 - 3, the $simple_expression is 8 The $TERM_SUM items are sub { $_[0] - 4 } sub { $_[0] - 3 } We need to apply these functions to the initial 8 value in order: $expression = $simple_expression repeat($TERM_SUM) >> sub { my ($left, @term_sum_functions) = @_; my $result = $left; for my $f (@term_sum_functions) { $result = $f->($result); } }; Then we need to similarly eliminate left-recursion from $term And handle multiplication and division along the way The result has a lot of code, but it’s formulaic Next Copyright © 2007 M. J. Dominus
    • 65 Higher-Order Parsing Next Operators Parsing arithmetic-type expressions is not too uncommon $term = $simple_term - repeat($atom_product); $expression = $simple_expression - repeat($term_sum); As I said, there’s a lot of formulaic code in the left-recursion elimination Solution to quot;a lot of formulaic codequot;: Encapsulate it into a function The design of our module makes it easy to write an operator function that does this: $term = $simple_term | operator($ATOM, [lookfor([’*’]), sub { $_[0] * $_[1] }], [lookfor([’/’]), sub { $_[0] / $_[1] }]) ; $expression = $simple_expression | operator($TERM, [lookfor([’+’]), sub { $_[0] + $_[1] }], [lookfor([’-’]), sub { $_[0] - $_[1] }]) ; Next Copyright © 2007 M. J. Dominus
    • 66 Higher-Order Parsing Next Operators From this, one can easily build an expression-parser-generator-function Give it an entire operator precedence table Let it generate the parser for expressions Something like this: $expression = expression_parser( ATOM => $ATOM, OPS => [[lookfor([’^’]), sub { $a ** $b }, quot;right-assocquot;], [lookfor([’*’]), sub { $a * $b }, lookfor([’/’]), sub { $a / $b }], [lookfor([’+’]), sub { $a + $b }, lookfor([’-’]), sub { $a - $b }], ], ); This little bit of code writes a function that parses an input like 2 + 3 * 4 and calculates the result (14) It handles precedence, parentheses associativity... All the complications are encapsulated inside of the function Next Copyright © 2007 M. J. Dominus
    • 67 Higher-Order Parsing Next Parsing outlines Case study: parsing outlines An outline is a piece of text that looks like this: * Desserts + Pie o Custard o Fruit - Berry - Peach + Cookies o Chocolate chip o Oatmeal + Cake o Pineapple upside-down o Angel food There is an implied tree structure here: [ quot;Dessertsquot;, [ quot;Piequot;, quot;Custardquot;, [ quot;Fruitquot;, quot;Berryquot;, quot;Peachquot; ]], [ quot;Cookiesquot;, quot;Chocolate chipquot;, quot;Oatmealquot; ], [ quot;Cakequot;, quot;Pineapple upside-downquot;, quot;Angel foodquot; ]] Each bullet item parses as either a plain string (If it has no sub-items) Or as an array with a string and then some more bullet items Next Copyright © 2007 M. J. Dominus
    • 68 Higher-Order Parsing Next Outline lexer Our previous lexers all ignored whitespace This is very typical But clearly inappropriate here! A line like this: o Chocolate chip Will turn into a token like this: [4, quot;*quot;, quot;Chocolate chipquot;] Next Copyright © 2007 M. J. Dominus
    • 69 Higher-Order Parsing Next Outline lexer sub make_outline_lexer { my $s = shift; my $lexer = sub { my ($next_line) = $s =~ /(.*)n/gc; my ($ws, $bullet, $label) = $next_line =~ /^(s*)([#*.+ox-])s*(.*)/ or die ...; return [length($ws), $bullet, $label]; }; return $lexer; } Next Copyright © 2007 M. J. Dominus
    • 70 Higher-Order Parsing Next Outline grammar The basic grammar here is quite simple: tree -> item repeat(subtree) subtree -> tree The interesting issue is handling the indentation: tree -> item <<record indentation of item>> repeat(subtree) subtree -> <<check indentation of next item>> tree After we see the item at the root of the tree, record how far it is indented When parsing a subtree, check to make sure it it indented at least as far as the last time If not, it isn’t a subtree, so the parsing should fail The next item will be parsed as a subtree of some other item Next Copyright © 2007 M. J. Dominus
    • 71 Higher-Order Parsing Next Auxiliary action This handy utility peeks at the next token and performs an action If the action returns true, it continues; otherwise it fails Like nothing(), it never consume any input sub action { my $action = shift; return sub { my $input = shift; my $next = first($input); my $result = $action->($first, $input); return $result || undef; }; } So, for example: $parser = $P | action(sub { print quot;It wasn’t P, trying Q...nquot; }) - $Q ; Next Copyright © 2007 M. J. Dominus
    • 72 Higher-Order Parsing Next Outline parser my @INDENT; sub record_indent { my $token = shift; push @INDENT, $token->[0]; return $token; } sub erase_indent { pop @INDENT; } $tree = action(&record_indent) - repeat($subtree) - action(&erase_indent); # Succeed if current item indented sub check_indent { my $token = shift; return $token->[0] > $INDENT[-1]; } $subtree = action(&check_indent) - $tree; Next Copyright © 2007 M. J. Dominus
    • 73 Higher-Order Parsing Next Outline parser semantic value $tree = action(&record_indent) - (repeat($subtree) >> sub { my @subtrees = @_; @subtrees; } ) - action(&erase_indent) >> sub { my ($root_item, $subtrees) = @_; my $root_text = $root_item->[2]; if (@subtrees) { return [$root_text, @subtrees]; } else { return $root_text; } }; We’re done Next Copyright © 2007 M. J. Dominus
    • 74 Higher-Order Parsing Next Backtracking There’s one last major issue that I swept under the rug Here’s a simple demonstration: A -> PQ P -> a|ab Q -> c Suppose the input tokens are a b c Parsing should succeed, but... The A parser will first call P P will try the first alternative, a This will succeed, so it will return the remaining tokens, b c The A parser will then call Q This will fail, because the next token is b Next Copyright © 2007 M. J. Dominus
    • 75 Higher-Order Parsing Next Backtracking A -> PQ P -> a|ab Q -> c After Q fails, we need the parser to backtrack It should return to the last alternation and try other alternatives But our implementation of alt doesn’t do that: sub alt { ... for my $p (@p) { if (my ($result, $out) = $p->($in)) { return ($result, $out); } } ... } Once it selects an alternative, it is committed You can get a lot of stuff done without backtracking But sooner or later you’ll write a grammar that requires it to work properly Next Copyright © 2007 M. J. Dominus
    • 76 Higher-Order Parsing Next Continuations There are a number of ways to add the required backtracking behavior One such is to change the structure of the parsers Instead of this: tokens -> result, tokens We will structure them like this: tokens, continuation -> result Each parser will get a continuation argument The continuation argument says what to do next When a parser succeeds, it does not return immediately Instead, it invokes the continuation parser on the remaining input Next Copyright © 2007 M. J. Dominus
    • 77 Higher-Order Parsing Next Continuations The modified version of alt looks like this: sub alt { my @p = @_; my $parser = sub { my ($in, $continuation) = @_; for my $p (@p) { my $result = $p->($in, $continuation); return $result if defined $result; } return; # failure }; return $parser; } The alt parser returns only when it’s sure the continuation succeeds Next Copyright © 2007 M. J. Dominus
    • 78 Higher-Order Parsing Next Continuations Of course, all the other parsers must handle the continuation argument: sub lookfor { my $target = shift; my $parser = sub { my ($tokens, $continuation) = @_; my $tok = first($tokens); if (type($tok) eq $target) { return $continuation->(rest($tokens)); } else { return; # failure } }; return $parser; } A lookfor parser now succeeds if the next token has the right type and if the continuation succeeds on the remainder of the input Next Copyright © 2007 M. J. Dominus
    • 79 Higher-Order Parsing Next Continuations The nothing() parser is still simple: sub nothing { my ($tokens, $continuation) = @_; return $continuation->($tokens); } Next Copyright © 2007 M. J. Dominus
    • 80 Higher-Order Parsing Next Continuations Concatenation gets more complicated Consider P -> ABC What is the continuation of A here? It’s a parser that will invoke B and pass it C as its continuation sub conc2 { my ($A, $B) = @_; my $parser = sub { # Concatenation of A and B my ($Aval, $Bval); my ($tokens, $C) = @_; my $A_cont = sub { my $B_input = shift; $Bval = $B->($B_input, $C); return $Bval; }; $Aval = $A->($tokens, $A_cont) if (defined $Aval) { return new_nest([$Aval, $Bval]); } else { return; } }; return $parser; } Next Copyright © 2007 M. J. Dominus
    • 81 Higher-Order Parsing Next Concatenation sub conc2 { my ($A, $B) = @_; ... return $parser; } This suffices to implement the overloaded - operator Also if we still want full conc, we can do something like this: # quot;foldedquot; version of conc2 sub conc { my @p = @_; return &nothing if @p == 0; return $p[0] if @p == 1; return conc2(@p) if @p == 2; my ($A, $B, @rest) = @p; return conc(conc2($A, $B), @rest); } Next Copyright © 2007 M. J. Dominus
    • 82 Higher-Order Parsing Next Continuations We had to change the definitions of the four basic parser constructors But once this is done, everything else Just Works For example, repeat survives completely unchanged: sub repeat { my $p = shift; my $repeat; my $REPEAT = sub { $repeat->(@_) }; # Proxy parser $repeat = alt(conc($p, $REPEAT), $nothing); return $repeat; } Next Copyright © 2007 M. J. Dominus
    • 83 Higher-Order Parsing Next Error handling and recovery Our parsers so far have bad error-handling behavior Consider a parser for a programming language A syntax error 90% of the way through the file will cause backtracking The parser will try to find an alternative parse that makes the syntax error OK This could be very time-consuming And it is probably a waste of effort And the resulting failure will be reported at the beginning of the file We can do better Next Copyright © 2007 M. J. Dominus
    • 84 Higher-Order Parsing Next Error handling and recovery $statement = $assignment_statement | $compound_statement | $quit_statement | [C[error(quot;Can’t parse statementquot;, L(quot;SEMICOLONquot;), $statement) ; error(...) makes an error_handling parser When invoked, it issues the indicated error message Then it gobbles and discards input up to a SEMICOLON token Then it invokes $statement to try to continue parsing sub error { my ($msg, $resync, $continue) = @_; return sub { my ($input, $old_continuation) = @_; warn $msg; # Discard tokens until $resync says upcoming input is OK until ($resync->($input, $nothing)) { $input = next($input); } $continue->(next($input), $old_continuation); }; } Have I ranted enough yet about open systems that make it easy to build new utilities and tools? Next Copyright © 2007 M. J. Dominus
    • 85 Higher-Order Parsing Next Error handling and recovery Another useful technique is to instrument the lexer It can annotate tokens with line number information sub make_annotating_lexer { my $s = shift; my $line = 0; my $lexer = sub { TOP: { if ($s =~ /Gn/gc) { $line++; redo TOP; } ... } return sub { my $tok = $lexer->(); return [@$tok, $line]; }; } Next Copyright © 2007 M. J. Dominus
    • 86 Higher-Order Parsing Next Error handling and recovery If tokens are annotated with line number information, error can use it: sub error { my ($msg, $resync, $continue) = @_; return sub { my ($input, $old_continuation) = @_; warn quot;$msg at line quot;, $input->[-1], quot; of inputnquot;; ... }; } Next Copyright © 2007 M. J. Dominus
    • 87 Higher-Order Parsing Next Thank You! Talk slides: http://pic.plover.com/TPC/Parsing/Parsing.pdf http://pic.plover.com/TPC/Parsing/Parsing-2up.pdf http://pic.plover.com/TPC/Parsing/Parsing-4up.pdf Example code: http://hop.perl.plover.com/Examples/Chap8/ Any questions? Next Copyright © 2007 M. J. Dominus
    • 88 Higher-Order Parsing Next Bonus slides If this talk were a DVD, this would be the quot;bonus featuresquot; section Or maybe quot;deleted scenesquot; Or perhaps quot;blooper reelquot; Next Copyright © 2007 M. J. Dominus
    • 89 Higher-Order Parsing Next Every program parses Perl has a lot of features that let you ignore the fact that you’re parsing This is a rudimentary parser: while (<$fh>) { # do something with $_ } This would be more obvious if the code were in C As the input you’re parsing becomes more complicated, the code becomes more elaborate At some point it may exceed your ability to keep up with ad-hoc mechanisms So we have parsing systems like Parse::RecDescent Next Copyright © 2007 M. J. Dominus
    • 90 Higher-Order Parsing Next Continuations Sometimes we want a version of alt that has the old behavior Once it sees an alternative that seems to work, it commits to it Often this is useful for efficiency reasons Consider: number -> ’+’ repeat(digit) | ’-’ repeat(digit) | ’0x’ repeat(digit) ; Suppose the next token is in fact a + But the subsequent parsing fails There’s no point in having number try the other two alternatives This operation is called biased choice Next Copyright © 2007 M. J. Dominus
    • 91 Higher-Order Parsing Next Continuations Sometimes we want a version of alt that has the old behavior Once it sees an alternative that seems to work, it commits to it Often this is useful for efficiency reasons Consider: number &rarr; ’+’ repeat(digit) | ’-’ repeat(digit) | ’0x’ repeat(digit) ; Suppose the next token is in fact a + But the subsequent parsing fails There’s no point in having number try the other two alternatives This operation is called biased choice Copyright © 2007 M. J. Next Dominus
    • 92 Higher-Order Parsing Next Continuations Sometimes we want a version of alt that has the old behavior Once it sees an alternative that seems to work, it commits to it Often this is useful for efficiency reasons Consider: number &rarr; ’+’ repeat(digit) | ’-’ repeat(digit) | ’0x’ repeat(digit) ; Suppose the next token is in fact a + But the subsequent parsing fails There’s no point in having number try the other two alternatives This operation is called biased choice Copyright © 2007 M. J. Next Dominus