"Error Recovery" by @alaz at scalaby#8
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,163
On Slideshare
557
From Embeds
606
Number of Embeds
2

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 606

http://scala.by 545
http://localhost 61

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov azarov@osinka.ru / Osinka.ru February 18, 2012
  • 2. Parsers Error Recovery for Practical Use ContextOsinka Forum of “BB” kind (2.5M+ pages/day, 8M+ posts total) User generated content: BBCode markup Migrating to Scala Backend
  • 3. Parsers Error Recovery for Practical Use ContextPlan Why parser combinators? Why error recovery? Example of error recovery Results
  • 4. Parsers Error Recovery for Practical Use ContextBBCode Few BBCode tags [b]bold[/b] <b>bold</b> [i]italic[/i] <i>italic</i> [url=href]text[/url] <a href="href">text</a> [img]href[/img] <img src="href"/>
  • 5. Parsers Error Recovery for Practical Use ContextBBCode example example of BBCode Example [quote="Nick"] original [b]text[/b] [/quote] Here it is the reply with [url=http://www.google.com]link[/url]
  • 6. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 1 Regexp maintenance is a headache Bugs extremely hard to find No markup errors
  • 7. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 2 One post source, many views HTML render for Web textual view for emails text-only short summary text-only for full-text search indexer
  • 8. Parsers Error Recovery for Practical Use TaskWhy parser combinators, 3 Post analysis algorithms links (e.g. spam automated analysis) images whatever structure analysis we’d want
  • 9. Parsers Error Recovery for Practical Use TaskUniversal AST One AST different printers various traversal algorithms
  • 10. Parsers Error Recovery for Practical Use ProblemSounds great. But. This all looks like a perfect world. But what’s the catch??
  • 11. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. They do mistakes.
  • 12. Parsers Error Recovery for Practical Use ProblemSounds great. But. Humans. Example They do mistakes. [quote] [url=http://www.google.com] [img]http://www.image.com [/url[/img] [/b]
  • 13. Parsers Error Recovery for Practical Use ProblemUser-Generated Content: Problem Erroneous markup People do mistakes, But no one wants to see empty post, We have to show something meaningful in any case
  • 14. Parsers Error Recovery for Practical Use ProblemBlack or White World Scala parser combinators assume valid input Parser result: Success | NoSuccess no error recovery out of the box
  • 15. Parsers Error Recovery for Practical Use SolutionError recovery: our approach Our Parser never breaks It generates “error nodes” instead
  • 16. Parsers Error Recovery for Practical Use SolutionApproach: Error nodes Part of AST, FailNode contains the possible causes of the failure They are meaningful for highlighting in editor to mark posts having failures in markup (for moderators/other users to see this)
  • 17. Parsers Error Recovery for Practical Use SolutionApproach: input & unpaired tags Assume all input except tags as text E.g. [tag]text[/tag] is a text node Unpaired tags as the last choice: markup errors
  • 18. Parsers Error Recovery for Practical Use ExampleExample Example
  • 19. Parsers Error Recovery for Practical Use ExampleTrivial BBCode markup Example (Trivial "one tag" BBCode) Simplest [font=bold]BBCode [font=red]example[/font][/font] has only one tag, font though it may have an argument
  • 20. Parsers Error Recovery for Practical Use ExampleCorresponding AST AST trait Node case class Text(text: String) extends Node case class Font(arg: Option[String], subnodes: List[Node]) extends Node
  • 21. Parsers Error Recovery for Practical Use ExampleParser BBCode parser lazy val nodes = rep(font | text) lazy val text = rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ { texts => Text(texts.mkString) } lazy val font: Parser[Node] = { fontOpen ~ nodes <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => Font(Option(arg), subnodes) } }
  • 22. Parsers Error Recovery for Practical Use ExampleValid markup Scalatest describe("parser") { it("keeps spaces") { parse(" ") must equal(Right(Text(" ") :: Nil)) parse(" n ") must equal(Right(Text(" n ") :: Nil)) } it("parses text") { parse("plain text") must equal(Right(Text("plain text") :: Nil)) } it("parses bbcode-like text") { parse("plain [tag] [fonttext") must equal(Right(Text(" plain [tag] [fonttext") :: Nil)) }
  • 23. Parsers Error Recovery for Practical Use ExampleInvalid markup Scalatest describe("error markup") { it("results in error") { parse("t[/font]") must be(’left) parse("[font]t") must be(’left) } }
  • 24. Parsers Error Recovery for Practical Use ExampleRecovery: Extra AST node FailNode case class FailNode(reason: String, markup: String) extends Node
  • 25. Parsers Error Recovery for Practical Use ExampleRecovery: helper methods Explicitly return FailNode protected def failed(reason: String) = FailNode(reason, "") Enrich FailNode with markup protected def recover(p: => Parser[Node]): Parser[Node] = Parser { in => val r = p(in) lazy val markup = in.source.subSequence(in.offset, r.next.offset ).toString r match { case Success(node: FailNode, next) => Success(node.copy(markup = markup), next) case other => other
  • 26. Parsers Error Recovery for Practical Use ExampleRecovery: Parser rules never break (provide “alone tag” parsers) return FailNode explicitly if needed nodes lazy val nodes = rep(node | missingOpen) lazy val node = font | text | missingClose
  • 27. Parsers Error Recovery for Practical Use Example“Missing open tag” parser Catching alone [/font] def missingOpen = recover { fontClose ^^^ { failed("missing open") } }
  • 28. Parsers Error Recovery for Practical Use ExampleArgument check font may have limits on argument lazy val font: Parser[Node] = recover { fontOpen ~ rep(node) <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => if (arg == null || allowedFontArgs.contains(arg)) Font( Option(arg), subnodes) else failed("arg incorrect") } }
  • 29. Parsers Error Recovery for Practical Use ExamplePasses markup error tests Scalatest describe("recovery") { it("reports incorrect arg") { parse("[font=b]t[/font]") must equal(Right( FailNode("arg incorrect", "[font=b]t[/font]") :: Nil )) } it("recovers extra ending tag") { parse("t[/font]") must equal(Right( Text("t") :: FailNode("missing open", "[/font]") :: Nil )) }
  • 30. Parsers Error Recovery for Practical Use ExamplePasses longer tests Scalatest it("recovers extra starting tag in a longer sequence") { parse("[font][font]t[/font]") must equal(Right( FailNode("missing close", "[font]") :: Font(None, Text("t ") :: Nil) :: Nil )) } it("recovers extra ending tag in a longer sequence") { parse("[font]t[/font][/font]") must equal(Right( Font(None, Text("t") :: Nil) :: FailNode("missing open", " [/font]") :: Nil ))
  • 31. Parsers Error Recovery for Practical Use ExampleExamples source code Source code, specs: https://github.com/alaz/slides-err-recovery
  • 32. Parsers Error Recovery for Practical Use ResultsProduction use outlines It works reliably Lower maintenance costs Performance (see next slides) Beware: Scala parser combinators are not thread-safe.
  • 33. Parsers Error Recovery for Practical Use ResultsPerformance The biggest problem is performance. Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms Big w/err 76k 136ms 1245ms Workaround: caching
  • 34. Parsers Error Recovery for Practical Use ResultsSurprise! Never give up find a good motivator instead (e.g. presentation for Scala.by)
  • 35. Parsers Error Recovery for Practical Use ResultsPerformance: success story Want performance? Do not use Lexical Forget those scary numbers! Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms 16ms Big w/err 76k 136ms 1245ms 31ms Thank you, Scala.by!
  • 36. Parsers Error Recovery for Practical Use ResultsThank you Email: azarov@osinka.ru Twitter: http://twitter.com/aazarov Source code, specs: https://github.com/alaz/slides-err-recovery