Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scala parsers Error Recovery in Production

2,682 views

Published on

Using Scala parser combinators for BBCode markup parsing in production system at club.osinka.ru:

- Treating invalid markup in parsers: recovery of errors
- Scala parser combinators performance.

Published in: Technology
  • Be the first to comment

Scala parsers Error Recovery in Production

  1. 1. Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov azarov@osinka.ru / Osinka.ru February 18, 2012
  2. 2. Parsers Error Recovery for Practical Use ContextOsinka Forum of “BB” kind (2.5M+ pages/day, 8M+ posts total) User generated content: BBCode markup Migrating to Scala Backend
  3. 3. Parsers Error Recovery for Practical Use ContextPlan Why parser combinators? Why error recovery? Example of error recovery Results
  4. 4. Parsers Error Recovery for Practical Use ContextBBCode Few BBCode tags [b]bold[/b] <b>bold</b> [i]italic[/i] <i>italic</i> [url=href]text[/url] <a href="href">text</a> [img]href[/img] <img src="href"/>
  5. 5. Parsers Error Recovery for Practical Use ContextBBCode example example of BBCode Example [quote="Nick"] original [b]text[/b] [/quote] Here it is the reply with [url=http://www.google.com]link[/url]
  6. 6. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 1 Regexp maintenance is a headache Bugs extremely hard to find No markup errors
  7. 7. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 2 One post source, many views HTML render for Web textual view for emails text-only short summary text-only for full-text search indexer
  8. 8. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 3 Post analysis algorithms links (e.g. spam automated analysis) images whatever structure analysis we’d want
  9. 9. Parsers Error Recovery for Practical Use TaskUniversal AST One AST different printers various traversal algorithms
  10. 10. Parsers Error Recovery for Practical Use ProblemSounds great. But. This all looks like a perfect world. But what’s the catch??
  11. 11. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. They do mistakes.
  12. 12. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. Example They do mistakes. [quote] [url=http://www.google.com] [img]http://www.image.com [/url[/img] [/b]
  13. 13. Parsers Error Recovery for Practical Use ProblemUser-Generated Content: Problem Erroneous markup People do mistakes, But no one wants to see empty post, We have to show something meaningful in any case
  14. 14. Parsers Error Recovery for Practical Use ProblemBlack or White World Scala parser combinators assume valid input Parser result: Success | NoSuccess no error recovery out of the box
  15. 15. Parsers Error Recovery for Practical Use SolutionError recovery: our approach Our Parser never breaks It generates “error nodes” instead
  16. 16. Parsers Error Recovery for Practical Use SolutionApproach: Error nodes Part of AST, FailNode contains the possible causes of the failure They are meaningful for highlighting in editor to mark posts having failures in markup (for moderators/other users to see this)
  17. 17. Parsers Error Recovery for Practical Use SolutionApproach: input & unpaired tags Assume all input except tags as text E.g. [tag]text[/tag] is a text node Unpaired tags as the last choice: markup errors
  18. 18. Parsers Error Recovery for Practical Use ExampleExample Example
  19. 19. Parsers Error Recovery for Practical Use ExampleTrivial BBCode markup Example (Trivial "one tag" BBCode) Simplest [font=bold]BBCode [font=red]example[/font][/font] has only one tag, font though it may have an argument
  20. 20. Parsers Error Recovery for Practical Use ExampleCorresponding AST AST trait Node case class Text(text: String) extends Node case class Font(arg: Option[String], subnodes: List[Node]) extends Node
  21. 21. Parsers Error Recovery for Practical Use ExampleParser BBCode parser lazy val nodes = rep(font | text) lazy val text = rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ { texts => Text(texts.mkString) } lazy val font: Parser[Node] = { fontOpen ~ nodes <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => Font(Option(arg), subnodes) } }
  22. 22. Parsers Error Recovery for Practical Use ExampleValid markup Scalatest describe("parser") { it("keeps spaces") { parse(" ") must equal(Right(Text(" ") :: Nil)) parse(" n ") must equal(Right(Text(" n ") :: Nil)) } it("parses text") { parse("plain text") must equal(Right(Text("plain text") :: Nil)) } it("parses bbcode-like text") { parse("plain [tag] [fonttext") must equal(Right(Text(" plain [tag] [fonttext") :: Nil)) }
  23. 23. Parsers Error Recovery for Practical Use ExampleInvalid markup Scalatest describe("error markup") { it("results in error") { parse("t[/font]") must be(’left) parse("[font]t") must be(’left) } }
  24. 24. Parsers Error Recovery for Practical Use ExampleRecovery: Extra AST node FailNode case class FailNode(reason: String, markup: String) extends Node
  25. 25. Parsers Error Recovery for Practical Use ExampleRecovery: helper methods Explicitly return FailNode protected def failed(reason: String) = FailNode(reason, "") Enrich FailNode with markup protected def recover(p: => Parser[Node]): Parser[Node] = Parser { in => val r = p(in) lazy val markup = in.source.subSequence(in.offset, r.next.offset ).toString r match { case Success(node: FailNode, next) => Success(node.copy(markup = markup), next) case other => other
  26. 26. Parsers Error Recovery for Practical Use ExampleRecovery: Parser rules never break (provide “alone tag” parsers) return FailNode explicitly if needed nodes lazy val nodes = rep(node | missingOpen) lazy val node = font | text | missingClose
  27. 27. Parsers Error Recovery for Practical Use Example“Missing open tag” parser Catching alone [/font] def missingOpen = recover { fontClose ^^^ { failed("missing open") } }
  28. 28. Parsers Error Recovery for Practical Use ExampleArgument check font may have limits on argument lazy val font: Parser[Node] = recover { fontOpen ~ rep(node) <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => if (arg == null || allowedFontArgs.contains(arg)) Font( Option(arg), subnodes) else failed("arg incorrect") } }
  29. 29. Parsers Error Recovery for Practical Use ExamplePasses markup error tests Scalatest describe("recovery") { it("reports incorrect arg") { parse("[font=b]t[/font]") must equal(Right( FailNode("arg incorrect", "[font=b]t[/font]") :: Nil )) } it("recovers extra ending tag") { parse("t[/font]") must equal(Right( Text("t") :: FailNode("missing open", "[/font]") :: Nil )) }
  30. 30. Parsers Error Recovery for Practical Use ExamplePasses longer tests Scalatest it("recovers extra starting tag in a longer sequence") { parse("[font][font]t[/font]") must equal(Right( FailNode("missing close", "[font]") :: Font(None, Text("t ") :: Nil) :: Nil )) } it("recovers extra ending tag in a longer sequence") { parse("[font]t[/font][/font]") must equal(Right( Font(None, Text("t") :: Nil) :: FailNode("missing open", " [/font]") :: Nil ))
  31. 31. Parsers Error Recovery for Practical Use ExampleExamples source code Source code, specs: https://github.com/alaz/slides-err-recovery
  32. 32. Parsers Error Recovery for Practical Use ResultsProduction use outlines It works reliably Lower maintenance costs Performance (see next slides) Beware: Scala parser combinators are not thread-safe.
  33. 33. Parsers Error Recovery for Practical Use ResultsPerformance The biggest problem is performance. Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms Big w/err 76k 136ms 1245ms Workaround: caching
  34. 34. Parsers Error Recovery for Practical Use ResultsSurprise! Never give up find a good motivator instead (e.g. presentation for Scala.by)
  35. 35. Parsers Error Recovery for Practical Use ResultsPerformance: success story Want performance? Do not use Lexical Forget those scary numbers! Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms 16ms Big w/err 76k 136ms 1245ms 31ms Thank you, Scala.by!
  36. 36. Parsers Error Recovery for Practical Use ResultsThank you Email: azarov@osinka.ru Twitter: http://twitter.com/aazarov Source code, specs: https://github.com/alaz/slides-err-recovery

×