Recursive Descent Parsing
In practice with PHP
Plan for the next 40 mins
1. Walk through creating a Parsing Expression Grammar
and scannerless predictive recursive descent parser
for a subset of print_r output.
2. Talk about why anyone would want to do such a
thing.
Source Code: https://bit.ly/dpc14rdp
Disclaimer: I am not …
I am Boy Baukema
Senior Software Engineer @ ibuildings.nl
print_r
(PHP 4, PHP 5)
print_r — Prints human-readable information about a
variable
An example
Array	
(	
[Talk] => Array	
(	
[Title] => Ansible: Orchestrate	
[Type] => 3
Just one problem…
!
!
It’s unparsable.
No escaping
> print_r(array("a"=>"n [b] => evil"));	
Array	
(	
[a] => 	
[b] => evil	
)
print_r
* for anything non-trivial
–Martin Fowler
“…it’s a technique that isn't as widely known as it
should be. Many people are under the impression
that using it is quite hard. I think that this fear often
comes from the fact that Syntax- Directed
Translation is usually described in the context of
parsing a general-purpose language—which
introduces a lot of complexities that you don't face
with a DSL.”
V1 - An empty array
Source Code: https://bit.ly/dpc14rdp
> print_r(array());
Array
(
)
ARRAY <- ARRAY_START
LF
PAREN_OPEN
LF
PAREN_CLOSE
LF
ARRAY_START <- ‘Array’
LF <- “n”
PAREN_OPEN <- ‘(’
PAREN_CLOSE <- ‘)’
PrintRLang  V1 
RecursiveDescentParser
- $content : string
+ __construct ( string $content )	
+ consume ( string $terminal )	
+ lookAhead ( string $terminal )
Source Code: https://bit.ly/dpc14rdp
PrintRLang  V1  

ArrayParser
- $parser : RecursiveDescentParser
+ __construct(RecursiveDescentParser $parser)	
+ parse(): array	
+ arrayStart()	
+ lf()	
+ braceOpen()	
+ braceClose()
Source Code: https://bit.ly/dpc14rdp
!
$parser = new PrintRLang  ArrayParser(	
new PrintRLang  RecursiveDescentParser(	
"Arrayn(n)n"	
)	
);	
$parser->parse();
public function parse() {	
$this->arrayStart();	
$this->lf();	
$this->braceOpen();	
$this->lf()	
$this->braceClose();	
$this->lf();	
return array();	
}
A r r a y n ( n ) n
public function arrayStart() {	
$this->parser->consume('Array');	
}
n ( n ) n
n ( n ) n
public function lf() {	
$this->parser->consume("n");	
}
( n ) n
( n ) n
public function braceOpen() {	
$this->parser->consume('(');	
}
n ) n
n ) n
public function lf() {	
$this->parser->consume("n");	
}
) n
) n
public function braceClose() {	
$this->parser->consume(')');	
}
n
n
public function lf() {	
$this->parser->consume("n");	
}
V2 - Array of strings
Source Code: https://bit.ly/dpc14rdp
Array	
(	
[Room] => E104	
[Difficulty] => 2	
[Type] => 1	
)
ARRAY <- ARRAY_START	
LF	
PAREN_OPEN	
LF	
ARRAY_ASSIGN*	
PAREN_CLOSE	
LF
Kleene star
translates to:
ARRAY_ASSIGN*
while (lookAhead(' '))	
$result = arrayAssign($result)
ARRAY_ASSIGN <- SPACE+	
ARRAY_KEY	
SPACE	
FAT_ARROW	
SPACE	
ARRAY_VALUE	
LF
Kleene plus
SPACE+ === SPACE SPACE*
Kleene plus implemented
space()	
while (lookAhead(' '))	
space()
ARRAY_KEY <- BRACKET_OPEN	
KEY_VALUE	
BRACKET_CLOSE	
KEY_VALUE <-!BRACKET_CLOSE
ARRAY_VALUE <- !LF
PrintRLang  V2 
RecursiveDescentParser
- $content : string
+ __construct ( string $content )	
+ consume ( string $terminal )	
+ consumeRegex( string $regex )
+ lookAhead ( string $terminal )	
+ lookAheadRegex( string $regex
Source Code: https://bit.ly/dpc14rdp
PrintRLang  V2  

ArrayParser
- $parser : RecursiveDescentParser
...
+ arrayAssign( array $result )
+ arrayKey() : string
+ arrayValue() : string
+ space()
+ fatArrow
...
V3 - Array of Arrays
Array	
(	
[Talk] => Array	
(	
[Title] => Ansible: Orchestrate	
[Type] => 3	
)	
)
ARRAY_VALUE <- ARRAY / 	
	 	 	 	 	 	 	 	 	 	 	 STRING	
STRING		 	 	 	 <- !LF
ARRAY <- ARRAY_START	
LF	
SPACE*	
PAREN_OPEN	
LF	
ARRAY_ASSIGN*	
SPACE*	
PAREN_CLOSE
PrintRLang  V3  

ArrayParser
- $parser : RecursiveDescentParser
...
+ string()	
...
Why?
– Steve Yegge, Rich Programmer Food
“If you don't know how parsing works, you'll do it
badly with regular expressions, or if you don't know
those, then with hand-rolled state machines that are
thousands of lines of incomprehensible code that
doesn't actually work.”
Mail::RFC822::Address
(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:
rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(
?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[
t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0
31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*
](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+
(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:
(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)
?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)
?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t]
)*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[
t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*
)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]
)+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)
*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+
|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r
n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:
rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t
]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031
]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](
?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?
:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?
:rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(?
:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?
[ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>
@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"
(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?
:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-
031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(
?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;
:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([
^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:"
.[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[
]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".
[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]
r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]
|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 0
00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|
.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,
;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?
:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*
(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[
^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]
]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*(
?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(
?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[
["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t
])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t
])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?
:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|
Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:
[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[
]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)
?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["
()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)
?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>
@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[
t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,
;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t]
)*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:
".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?
(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".
[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:
rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[
"()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])
*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])
+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:
.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z
|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(
?:rn)?[ t])*))*)?;s*)
Useful applications I’ve seen
REST API with CQL querying (MediaMosa.org)
Migrating wiki content
Parsing log files
Parsing obscure specifications (ARF)
Concise configuration files
Domain Specific Languages
–Martin Fowler, Domain Specific Languages
“a DSL is a front-end to a library providing a different
style of manipulation to the command-query API.	

”
Rules for building a parser
Consider using an existing parser.
Consider porting one from another language.
Consider XML or the new XMLs: JSON / YAML
Consider working around it.
Then and only then consider building your own
parser
Whereto from here?
Let’s build a parser!
http://protalk.me/dpcradio-lets-build-a-parser
Thank you for your time and attention!
Questions?
Tweet to @relaxnow
Rate @ https://joind.in/10859
Slides @ https://joind.in/10859
Code @ https://bit.ly/dpc14rdp

Recursive descent parsing

  • 1.
  • 2.
    Plan for thenext 40 mins 1. Walk through creating a Parsing Expression Grammar and scannerless predictive recursive descent parser for a subset of print_r output. 2. Talk about why anyone would want to do such a thing. Source Code: https://bit.ly/dpc14rdp
  • 3.
  • 4.
    I am BoyBaukema Senior Software Engineer @ ibuildings.nl
  • 5.
    print_r (PHP 4, PHP5) print_r — Prints human-readable information about a variable
  • 6.
    An example Array ( [Talk] =>Array ( [Title] => Ansible: Orchestrate [Type] => 3
  • 7.
  • 8.
    No escaping > print_r(array("a"=>"n[b] => evil")); Array ( [a] => [b] => evil )
  • 9.
  • 10.
    –Martin Fowler “…it’s atechnique that isn't as widely known as it should be. Many people are under the impression that using it is quite hard. I think that this fear often comes from the fact that Syntax- Directed Translation is usually described in the context of parsing a general-purpose language—which introduces a lot of complexities that you don't face with a DSL.”
  • 11.
    V1 - Anempty array Source Code: https://bit.ly/dpc14rdp
  • 12.
  • 13.
  • 14.
    ARRAY_START <- ‘Array’ LF<- “n” PAREN_OPEN <- ‘(’ PAREN_CLOSE <- ‘)’
  • 15.
    PrintRLang V1 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + lookAhead ( string $terminal ) Source Code: https://bit.ly/dpc14rdp
  • 16.
    PrintRLang V1 
 ArrayParser - $parser : RecursiveDescentParser + __construct(RecursiveDescentParser $parser) + parse(): array + arrayStart() + lf() + braceOpen() + braceClose() Source Code: https://bit.ly/dpc14rdp
  • 17.
    ! $parser = newPrintRLang ArrayParser( new PrintRLang RecursiveDescentParser( "Arrayn(n)n" ) ); $parser->parse();
  • 18.
    public function parse(){ $this->arrayStart(); $this->lf(); $this->braceOpen(); $this->lf() $this->braceClose(); $this->lf(); return array(); }
  • 19.
    A r ra y n ( n ) n public function arrayStart() { $this->parser->consume('Array'); } n ( n ) n
  • 20.
    n ( n) n public function lf() { $this->parser->consume("n"); } ( n ) n
  • 21.
    ( n )n public function braceOpen() { $this->parser->consume('('); } n ) n
  • 22.
    n ) n publicfunction lf() { $this->parser->consume("n"); } ) n
  • 23.
    ) n public functionbraceClose() { $this->parser->consume(')'); } n
  • 24.
    n public function lf(){ $this->parser->consume("n"); }
  • 25.
    V2 - Arrayof strings Source Code: https://bit.ly/dpc14rdp
  • 26.
  • 27.
  • 28.
    Kleene star translates to: ARRAY_ASSIGN* while(lookAhead(' ')) $result = arrayAssign($result)
  • 29.
  • 30.
  • 31.
    Kleene plus implemented space() while(lookAhead(' ')) space()
  • 32.
  • 33.
  • 34.
    PrintRLang V2 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + consumeRegex( string $regex ) + lookAhead ( string $terminal ) + lookAheadRegex( string $regex Source Code: https://bit.ly/dpc14rdp
  • 35.
    PrintRLang V2 
 ArrayParser - $parser : RecursiveDescentParser ... + arrayAssign( array $result ) + arrayKey() : string + arrayValue() : string + space() + fatArrow ...
  • 36.
    V3 - Arrayof Arrays
  • 37.
    Array ( [Talk] => Array ( [Title]=> Ansible: Orchestrate [Type] => 3 ) )
  • 38.
    ARRAY_VALUE <- ARRAY/ STRING STRING <- !LF
  • 39.
  • 40.
    PrintRLang V3 
 ArrayParser - $parser : RecursiveDescentParser ... + string() ...
  • 41.
  • 42.
    – Steve Yegge,Rich Programmer Food “If you don't know how parsing works, you'll do it badly with regular expressions, or if you don't know those, then with hand-rolled state machines that are thousands of lines of incomprehensible code that doesn't actually work.”
  • 43.
    Mail::RFC822::Address (?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[t] )+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?: rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:( ?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)* ](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: (?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn) ?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
  • 44.
    t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn) ?[t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t] )*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])* )(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*) *:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+ |Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t ]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031 ]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]( ?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(? :(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(? :rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(? :(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)? [ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
  • 45.
    000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<> @,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|" (?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(? :[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000- 031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|( ?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,; :".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([ ^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:" .[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[ ]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:". [] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[] r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
  • 46.
    000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r] |.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[t])*)?(?:[^()<>@,;:".[] 0 00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@, ;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(? :[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])* (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[ ^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[] ]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*( ?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:( ?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t ])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t ])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(? :.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|
  • 47.
    Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[t])+|Z|(?=[["()<>@,;:".[ ]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn) ?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[" ()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn) ?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<> @,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@, ;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)? (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?: rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[ "()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t]) *))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]) +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?: .(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:( ?:rn)?[ t])*))*)?;s*)
  • 48.
    Useful applications I’veseen REST API with CQL querying (MediaMosa.org) Migrating wiki content Parsing log files Parsing obscure specifications (ARF) Concise configuration files Domain Specific Languages
  • 49.
    –Martin Fowler, DomainSpecific Languages “a DSL is a front-end to a library providing a different style of manipulation to the command-query API. ”
  • 50.
    Rules for buildinga parser Consider using an existing parser. Consider porting one from another language. Consider XML or the new XMLs: JSON / YAML Consider working around it. Then and only then consider building your own parser
  • 51.
  • 52.
    Let’s build aparser! http://protalk.me/dpcradio-lets-build-a-parser
  • 53.
    Thank you foryour time and attention! Questions? Tweet to @relaxnow Rate @ https://joind.in/10859 Slides @ https://joind.in/10859 Code @ https://bit.ly/dpc14rdp