Parsers Error Recovery for Practical Use            Parsers Error Recovery for Practical Use                              ...
Parsers Error Recovery for Practical Use  ContextOsinka              Forum of “BB” kind                      (2.5M+ pages/...
Parsers Error Recovery for Practical Use  ContextPlan              Why parser combinators?              Why error recovery...
Parsers Error Recovery for Practical Use  ContextBBCode      Few BBCode tags               [b]bold[/b]                 <b>...
Parsers Error Recovery for Practical Use  ContextBBCode example              example of BBCode      Example      [quote="N...
Parsers Error Recovery for Practical Use  TaskWhy parser combinators, 1              Regexp maintenance is a headache     ...
Parsers Error Recovery for Practical Use  TaskWhy parser combinators, 2      One post source, many views              HTML...
Parsers Error Recovery for Practical Use  TaskWhy parser combinators, 3      Post analysis algorithms              links (...
Parsers Error Recovery for Practical Use  TaskUniversal AST      One AST              different printers              vario...
Parsers Error Recovery for Practical Use  ProblemSounds great. But.                              This all looks like a per...
Parsers Error Recovery for Practical Use  ProblemSounds great. But. Humans. They do mistakes.
Parsers Error Recovery for Practical Use  ProblemSounds great. But. Humans.                                   Example They...
Parsers Error Recovery for Practical Use  ProblemUser-Generated Content: Problem      Erroneous markup              People...
Parsers Error Recovery for Practical Use  ProblemBlack or White World              Scala parser combinators assume valid i...
Parsers Error Recovery for Practical Use  SolutionError recovery: our approach              Our Parser never breaks       ...
Parsers Error Recovery for Practical Use  SolutionApproach: Error nodes              Part of AST, FailNode contains the po...
Parsers Error Recovery for Practical Use  SolutionApproach: input & unpaired tags              Assume all input except tag...
Parsers Error Recovery for Practical Use  ExampleExample                                           Example
Parsers Error Recovery for Practical Use  ExampleTrivial BBCode markup      Example (Trivial "one tag" BBCode)      Simple...
Parsers Error Recovery for Practical Use  ExampleCorresponding AST      AST      trait Node      case class Text(text: Str...
Parsers Error Recovery for Practical Use  ExampleParser      BBCode parser          lazy val nodes = rep(font | text)     ...
Parsers Error Recovery for Practical Use  ExampleValid markup      Scalatest          describe("parser") {            it("...
Parsers Error Recovery for Practical Use  ExampleInvalid markup      Scalatest          describe("error markup") {        ...
Parsers Error Recovery for Practical Use  ExampleRecovery: Extra AST node      FailNode      case class FailNode(reason: S...
Parsers Error Recovery for Practical Use  ExampleRecovery: helper methods      Explicitly return FailNode          protect...
Parsers Error Recovery for Practical Use  ExampleRecovery: Parser rules              never break (provide “alone tag” pars...
Parsers Error Recovery for Practical Use  Example“Missing open tag” parser      Catching alone [/font]          def missin...
Parsers Error Recovery for Practical Use  ExampleArgument check      font may have limits on argument          lazy val fo...
Parsers Error Recovery for Practical Use  ExamplePasses markup error tests      Scalatest          describe("recovery") { ...
Parsers Error Recovery for Practical Use  ExamplePasses longer tests      Scalatest             it("recovers extra startin...
Parsers Error Recovery for Practical Use  ExampleExamples source code              Source code, specs:              https:...
Parsers Error Recovery for Practical Use  ResultsProduction use outlines              It works reliably              Lower...
Parsers Error Recovery for Practical Use  ResultsPerformance              The biggest problem is performance.      Benchma...
Parsers Error Recovery for Practical Use  ResultsSurprise!                                           Never give up        ...
Parsers Error Recovery for Practical Use  ResultsPerformance: success story              Want performance? Do not use Lexi...
Parsers Error Recovery for Practical Use  ResultsThank you              Email: azarov@osinka.ru              Twitter: http...
Upcoming SlideShare
Loading in …5
×

"Error Recovery" by @alaz at scalaby#8

1,208 views
1,139 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,208
On SlideShare
0
From Embeds
0
Number of Embeds
639
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

"Error Recovery" by @alaz at scalaby#8

  1. 1. Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov azarov@osinka.ru / Osinka.ru February 18, 2012
  2. 2. Parsers Error Recovery for Practical Use ContextOsinka Forum of “BB” kind (2.5M+ pages/day, 8M+ posts total) User generated content: BBCode markup Migrating to Scala Backend
  3. 3. Parsers Error Recovery for Practical Use ContextPlan Why parser combinators? Why error recovery? Example of error recovery Results
  4. 4. Parsers Error Recovery for Practical Use ContextBBCode Few BBCode tags [b]bold[/b] <b>bold</b> [i]italic[/i] <i>italic</i> [url=href]text[/url] <a href="href">text</a> [img]href[/img] <img src="href"/>
  5. 5. Parsers Error Recovery for Practical Use ContextBBCode example example of BBCode Example [quote="Nick"] original [b]text[/b] [/quote] Here it is the reply with [url=http://www.google.com]link[/url]
  6. 6. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 1 Regexp maintenance is a headache Bugs extremely hard to find No markup errors
  7. 7. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 2 One post source, many views HTML render for Web textual view for emails text-only short summary text-only for full-text search indexer
  8. 8. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 3 Post analysis algorithms links (e.g. spam automated analysis) images whatever structure analysis we’d want
  9. 9. Parsers Error Recovery for Practical Use TaskUniversal AST One AST different printers various traversal algorithms
  10. 10. Parsers Error Recovery for Practical Use ProblemSounds great. But. This all looks like a perfect world. But what’s the catch??
  11. 11. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. They do mistakes.
  12. 12. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. Example They do mistakes. [quote] [url=http://www.google.com] [img]http://www.image.com [/url[/img] [/b]
  13. 13. Parsers Error Recovery for Practical Use ProblemUser-Generated Content: Problem Erroneous markup People do mistakes, But no one wants to see empty post, We have to show something meaningful in any case
  14. 14. Parsers Error Recovery for Practical Use ProblemBlack or White World Scala parser combinators assume valid input Parser result: Success | NoSuccess no error recovery out of the box
  15. 15. Parsers Error Recovery for Practical Use SolutionError recovery: our approach Our Parser never breaks It generates “error nodes” instead
  16. 16. Parsers Error Recovery for Practical Use SolutionApproach: Error nodes Part of AST, FailNode contains the possible causes of the failure They are meaningful for highlighting in editor to mark posts having failures in markup (for moderators/other users to see this)
  17. 17. Parsers Error Recovery for Practical Use SolutionApproach: input & unpaired tags Assume all input except tags as text E.g. [tag]text[/tag] is a text node Unpaired tags as the last choice: markup errors
  18. 18. Parsers Error Recovery for Practical Use ExampleExample Example
  19. 19. Parsers Error Recovery for Practical Use ExampleTrivial BBCode markup Example (Trivial "one tag" BBCode) Simplest [font=bold]BBCode [font=red]example[/font][/font] has only one tag, font though it may have an argument
  20. 20. Parsers Error Recovery for Practical Use ExampleCorresponding AST AST trait Node case class Text(text: String) extends Node case class Font(arg: Option[String], subnodes: List[Node]) extends Node
  21. 21. Parsers Error Recovery for Practical Use ExampleParser BBCode parser lazy val nodes = rep(font | text) lazy val text = rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ { texts => Text(texts.mkString) } lazy val font: Parser[Node] = { fontOpen ~ nodes <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => Font(Option(arg), subnodes) } }
  22. 22. Parsers Error Recovery for Practical Use ExampleValid markup Scalatest describe("parser") { it("keeps spaces") { parse(" ") must equal(Right(Text(" ") :: Nil)) parse(" n ") must equal(Right(Text(" n ") :: Nil)) } it("parses text") { parse("plain text") must equal(Right(Text("plain text") :: Nil)) } it("parses bbcode-like text") { parse("plain [tag] [fonttext") must equal(Right(Text(" plain [tag] [fonttext") :: Nil)) }
  23. 23. Parsers Error Recovery for Practical Use ExampleInvalid markup Scalatest describe("error markup") { it("results in error") { parse("t[/font]") must be(’left) parse("[font]t") must be(’left) } }
  24. 24. Parsers Error Recovery for Practical Use ExampleRecovery: Extra AST node FailNode case class FailNode(reason: String, markup: String) extends Node
  25. 25. Parsers Error Recovery for Practical Use ExampleRecovery: helper methods Explicitly return FailNode protected def failed(reason: String) = FailNode(reason, "") Enrich FailNode with markup protected def recover(p: => Parser[Node]): Parser[Node] = Parser { in => val r = p(in) lazy val markup = in.source.subSequence(in.offset, r.next.offset ).toString r match { case Success(node: FailNode, next) => Success(node.copy(markup = markup), next) case other => other
  26. 26. Parsers Error Recovery for Practical Use ExampleRecovery: Parser rules never break (provide “alone tag” parsers) return FailNode explicitly if needed nodes lazy val nodes = rep(node | missingOpen) lazy val node = font | text | missingClose
  27. 27. Parsers Error Recovery for Practical Use Example“Missing open tag” parser Catching alone [/font] def missingOpen = recover { fontClose ^^^ { failed("missing open") } }
  28. 28. Parsers Error Recovery for Practical Use ExampleArgument check font may have limits on argument lazy val font: Parser[Node] = recover { fontOpen ~ rep(node) <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => if (arg == null || allowedFontArgs.contains(arg)) Font( Option(arg), subnodes) else failed("arg incorrect") } }
  29. 29. Parsers Error Recovery for Practical Use ExamplePasses markup error tests Scalatest describe("recovery") { it("reports incorrect arg") { parse("[font=b]t[/font]") must equal(Right( FailNode("arg incorrect", "[font=b]t[/font]") :: Nil )) } it("recovers extra ending tag") { parse("t[/font]") must equal(Right( Text("t") :: FailNode("missing open", "[/font]") :: Nil )) }
  30. 30. Parsers Error Recovery for Practical Use ExamplePasses longer tests Scalatest it("recovers extra starting tag in a longer sequence") { parse("[font][font]t[/font]") must equal(Right( FailNode("missing close", "[font]") :: Font(None, Text("t ") :: Nil) :: Nil )) } it("recovers extra ending tag in a longer sequence") { parse("[font]t[/font][/font]") must equal(Right( Font(None, Text("t") :: Nil) :: FailNode("missing open", " [/font]") :: Nil ))
  31. 31. Parsers Error Recovery for Practical Use ExampleExamples source code Source code, specs: https://github.com/alaz/slides-err-recovery
  32. 32. Parsers Error Recovery for Practical Use ResultsProduction use outlines It works reliably Lower maintenance costs Performance (see next slides) Beware: Scala parser combinators are not thread-safe.
  33. 33. Parsers Error Recovery for Practical Use ResultsPerformance The biggest problem is performance. Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms Big w/err 76k 136ms 1245ms Workaround: caching
  34. 34. Parsers Error Recovery for Practical Use ResultsSurprise! Never give up find a good motivator instead (e.g. presentation for Scala.by)
  35. 35. Parsers Error Recovery for Practical Use ResultsPerformance: success story Want performance? Do not use Lexical Forget those scary numbers! Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms 16ms Big w/err 76k 136ms 1245ms 31ms Thank you, Scala.by!
  36. 36. Parsers Error Recovery for Practical Use ResultsThank you Email: azarov@osinka.ru Twitter: http://twitter.com/aazarov Source code, specs: https://github.com/alaz/slides-err-recovery

×