Recursive descent parsing

1,130 views
973 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,130
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
31
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Recursive descent parsing

  1. 1. Recursive Descent Parsing In practice with PHP
  2. 2. Plan for the next 40 mins 1. Walk through creating a Parsing Expression Grammar and scannerless predictive recursive descent parser for a subset of print_r output. 2. Talk about why anyone would want to do such a thing. Source Code: https://bit.ly/dpc14rdp
  3. 3. Disclaimer: I am not …
  4. 4. I am Boy Baukema Senior Software Engineer @ ibuildings.nl
  5. 5. print_r (PHP 4, PHP 5) print_r — Prints human-readable information about a variable
  6. 6. An example Array ( [Talk] => Array ( [Title] => Ansible: Orchestrate [Type] => 3
  7. 7. Just one problem… ! ! It’s unparsable.
  8. 8. No escaping > print_r(array("a"=>"n [b] => evil")); Array ( [a] => [b] => evil )
  9. 9. print_r * for anything non-trivial
  10. 10. –Martin Fowler “…it’s a technique that isn't as widely known as it should be. Many people are under the impression that using it is quite hard. I think that this fear often comes from the fact that Syntax- Directed Translation is usually described in the context of parsing a general-purpose language—which introduces a lot of complexities that you don't face with a DSL.”
  11. 11. V1 - An empty array Source Code: https://bit.ly/dpc14rdp
  12. 12. > print_r(array()); Array ( )
  13. 13. ARRAY <- ARRAY_START LF PAREN_OPEN LF PAREN_CLOSE LF
  14. 14. ARRAY_START <- ‘Array’ LF <- “n” PAREN_OPEN <- ‘(’ PAREN_CLOSE <- ‘)’
  15. 15. PrintRLang V1 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + lookAhead ( string $terminal ) Source Code: https://bit.ly/dpc14rdp
  16. 16. PrintRLang V1 
 ArrayParser - $parser : RecursiveDescentParser + __construct(RecursiveDescentParser $parser) + parse(): array + arrayStart() + lf() + braceOpen() + braceClose() Source Code: https://bit.ly/dpc14rdp
  17. 17. ! $parser = new PrintRLang ArrayParser( new PrintRLang RecursiveDescentParser( "Arrayn(n)n" ) ); $parser->parse();
  18. 18. public function parse() { $this->arrayStart(); $this->lf(); $this->braceOpen(); $this->lf() $this->braceClose(); $this->lf(); return array(); }
  19. 19. A r r a y n ( n ) n public function arrayStart() { $this->parser->consume('Array'); } n ( n ) n
  20. 20. n ( n ) n public function lf() { $this->parser->consume("n"); } ( n ) n
  21. 21. ( n ) n public function braceOpen() { $this->parser->consume('('); } n ) n
  22. 22. n ) n public function lf() { $this->parser->consume("n"); } ) n
  23. 23. ) n public function braceClose() { $this->parser->consume(')'); } n
  24. 24. n public function lf() { $this->parser->consume("n"); }
  25. 25. V2 - Array of strings Source Code: https://bit.ly/dpc14rdp
  26. 26. Array ( [Room] => E104 [Difficulty] => 2 [Type] => 1 )
  27. 27. ARRAY <- ARRAY_START LF PAREN_OPEN LF ARRAY_ASSIGN* PAREN_CLOSE LF
  28. 28. Kleene star translates to: ARRAY_ASSIGN* while (lookAhead(' ')) $result = arrayAssign($result)
  29. 29. ARRAY_ASSIGN <- SPACE+ ARRAY_KEY SPACE FAT_ARROW SPACE ARRAY_VALUE LF
  30. 30. Kleene plus SPACE+ === SPACE SPACE*
  31. 31. Kleene plus implemented space() while (lookAhead(' ')) space()
  32. 32. ARRAY_KEY <- BRACKET_OPEN KEY_VALUE BRACKET_CLOSE KEY_VALUE <-!BRACKET_CLOSE
  33. 33. ARRAY_VALUE <- !LF
  34. 34. PrintRLang V2 RecursiveDescentParser - $content : string + __construct ( string $content ) + consume ( string $terminal ) + consumeRegex( string $regex ) + lookAhead ( string $terminal ) + lookAheadRegex( string $regex Source Code: https://bit.ly/dpc14rdp
  35. 35. PrintRLang V2 
 ArrayParser - $parser : RecursiveDescentParser ... + arrayAssign( array $result ) + arrayKey() : string + arrayValue() : string + space() + fatArrow ...
  36. 36. V3 - Array of Arrays
  37. 37. Array ( [Talk] => Array ( [Title] => Ansible: Orchestrate [Type] => 3 ) )
  38. 38. ARRAY_VALUE <- ARRAY / STRING STRING <- !LF
  39. 39. ARRAY <- ARRAY_START LF SPACE* PAREN_OPEN LF ARRAY_ASSIGN* SPACE* PAREN_CLOSE
  40. 40. PrintRLang V3 
 ArrayParser - $parser : RecursiveDescentParser ... + string() ...
  41. 41. Why?
  42. 42. – Steve Yegge, Rich Programmer Food “If you don't know how parsing works, you'll do it badly with regular expressions, or if you don't know those, then with hand-rolled state machines that are thousands of lines of incomprehensible code that doesn't actually work.”
  43. 43. Mail::RFC822::Address (?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?: rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:( ?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-0 31]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)* ](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+ (?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?: (?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn) ?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[
  44. 44. t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn) ?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t] )*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])* )(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t] )+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*) *:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+ |Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:r n)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?: rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t ]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031 ]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*]( ?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(? :(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(? :rn)?[ t])*))*>(?:(?:rn)?[ t])*)|(?:[^()<>@,;:".[] 000-031]+(?:(? :(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)? [ t]))*"(?:(?:rn)?[ t])*)*:(?:(?:rn)?[ t])*(?:(?:(?:[^()<>@,;:".[]
  45. 45. 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<> @,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|" (?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(? :[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?:[^()<>@,;:".[] 000- 031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]|.|( ?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn)?[ t])*(?:@(?:[^()<>@,; :".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([ ^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:" .[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[ ]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:". [] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[] r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[]
  46. 46. 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r] |.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)?(?:[^()<>@,;:".[] 0 00-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(?:[^"r]| .|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[^()<>@, ;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[]]))|"(? :[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*))*@(?:(?:rn)?[ t])* (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t])*(?:[ ^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[] ]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:(?:rn)?[ t])*)(?:,s*( ?:(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:( ?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[ ["()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t ])*))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t ])+|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(? :.(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|
  47. 47. Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*|(?: [^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:".[ ]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)*<(?:(?:rn) ?[ t])*(?:@(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[" ()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn) ?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<> @,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*(?:,@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@, ;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?:.(?:(?:rn)?[ t] )*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;: ".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*)*:(?:(?:rn)?[ t])*)? (?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[["()<>@,;:". []]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t])*)(?:.(?:(?: rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z|(?=[[ "()<>@,;:".[]]))|"(?:[^"r]|.|(?:(?:rn)?[ t]))*"(?:(?:rn)?[ t]) *))*@(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t]) +|Z|(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*)(?: .(?:(?:rn)?[ t])*(?:[^()<>@,;:".[] 000-031]+(?:(?:(?:rn)?[ t])+|Z |(?=[["()<>@,;:".[]]))|[([^[]r]|.)*](?:(?:rn)?[ t])*))*>(?:( ?:rn)?[ t])*))*)?;s*)
  48. 48. Useful applications I’ve seen REST API with CQL querying (MediaMosa.org) Migrating wiki content Parsing log files Parsing obscure specifications (ARF) Concise configuration files Domain Specific Languages
  49. 49. –Martin Fowler, Domain Specific Languages “a DSL is a front-end to a library providing a different style of manipulation to the command-query API. ”
  50. 50. Rules for building a parser Consider using an existing parser. Consider porting one from another language. Consider XML or the new XMLs: JSON / YAML Consider working around it. Then and only then consider building your own parser
  51. 51. Whereto from here?
  52. 52. Let’s build a parser! http://protalk.me/dpcradio-lets-build-a-parser
  53. 53. Thank you for your time and attention! Questions? Tweet to @relaxnow Rate @ https://joind.in/10859 Slides @ https://joind.in/10859 Code @ https://bit.ly/dpc14rdp

×