SlideShare a Scribd company logo
1 of 36
Download to read offline
Parsers Error Recovery for Practical Use




            Parsers Error Recovery for Practical Use

                                           Alexander Azarov

                                      azarov@osinka.ru / Osinka.ru

                                           February 18, 2012
Parsers Error Recovery for Practical Use
  Context




Osinka



              Forum of “BB” kind
                      (2.5M+ pages/day, 8M+ posts total)
                             User generated content: BBCode markup

              Migrating to Scala
                      Backend
Parsers Error Recovery for Practical Use
  Context




Plan




              Why parser combinators?
              Why error recovery?
              Example of error recovery
              Results
Parsers Error Recovery for Practical Use
  Context




BBCode



      Few BBCode tags

               [b]bold[/b]                 <b>bold</b>
               [i]italic[/i]               <i>italic</i>
               [url=href]text[/url]        <a href="href">text</a>
               [img]href[/img]             <img src="href"/>
Parsers Error Recovery for Practical Use
  Context




BBCode example


              example of BBCode

      Example

      [quote="Nick"]
      original [b]text[/b]
      [/quote]
      Here it is the reply with
      [url=http://www.google.com]link[/url]
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 1




              Regexp maintenance is a headache
              Bugs extremely hard to find
              No markup errors
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 2



      One post source, many views

              HTML render for Web
              textual view for emails
              text-only short summary
              text-only for full-text search indexer
Parsers Error Recovery for Practical Use
  Task




Why parser combinators, 3



      Post analysis algorithms

              links (e.g. spam automated analysis)
              images
              whatever structure analysis we’d want
Parsers Error Recovery for Practical Use
  Task




Universal AST




      One AST

              different printers
              various traversal algorithms
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




                              This all looks like a perfect world.
                                   But what’s the catch??
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




 Humans.
 They do mistakes.
Parsers Error Recovery for Practical Use
  Problem




Sounds great. But.




 Humans.                                   Example
 They do mistakes.
                                           [quote]
                                           [url=http://www.google.com]
                                           [img]http://www.image.com
                                           [/url[/img]
                                           [/b]
Parsers Error Recovery for Practical Use
  Problem




User-Generated Content: Problem



      Erroneous markup

              People do mistakes,
              But no one wants to see empty post,
              We have to show something meaningful in any case
Parsers Error Recovery for Practical Use
  Problem




Black or White World




              Scala parser combinators assume valid input
                      Parser result: Success | NoSuccess
                             no error recovery out of the box
Parsers Error Recovery for Practical Use
  Solution




Error recovery: our approach




              Our Parser never breaks
              It generates “error nodes” instead
Parsers Error Recovery for Practical Use
  Solution




Approach: Error nodes



              Part of AST, FailNode contains the possible causes of the
              failure
              They are meaningful
                      for highlighting in editor
                      to mark posts having failures in markup (for moderators/other
                      users to see this)
Parsers Error Recovery for Practical Use
  Solution




Approach: input & unpaired tags




              Assume all input except tags as text
                      E.g. [tag]text[/tag] is a text node

              Unpaired tags as the last choice: markup errors
Parsers Error Recovery for Practical Use
  Example




Example




                                           Example
Parsers Error Recovery for Practical Use
  Example




Trivial BBCode markup



      Example (Trivial "one tag" BBCode)

      Simplest [font=bold]BBCode [font=red]example[/font][/font]

              has only one tag, font
              though it may have an argument
Parsers Error Recovery for Practical Use
  Example




Corresponding AST



      AST
      trait Node

      case class Text(text: String) extends Node
      case class Font(arg: Option[String], subnodes: List[Node]) extends
          Node
Parsers Error Recovery for Practical Use
  Example




Parser

      BBCode parser

          lazy val nodes = rep(font | text)
          lazy val text =
            rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ {
              texts => Text(texts.mkString)
            }
          lazy val font: Parser[Node] = {
            fontOpen ~ nodes <~ fontClose ^^ {
              case fontOpen(_, arg) ~ subnodes => Font(Option(arg),
                   subnodes)
            }
          }
Parsers Error Recovery for Practical Use
  Example




Valid markup
      Scalatest
          describe("parser") {
            it("keeps spaces") {
               parse(" ") must equal(Right(Text(" ") :: Nil))
               parse(" n ") must equal(Right(Text(" n ") :: Nil))
            }
            it("parses text") {
               parse("plain text") must equal(Right(Text("plain text") ::
                    Nil))
            }
            it("parses bbcode-like text") {
               parse("plain [tag] [fonttext") must equal(Right(Text("
                    plain [tag] [fonttext") :: Nil))
            }
Parsers Error Recovery for Practical Use
  Example




Invalid markup



      Scalatest
          describe("error markup") {
            it("results in error") {
               parse("t[/font]") must be(’left)
               parse("[font]t") must be(’left)
            }
          }
Parsers Error Recovery for Practical Use
  Example




Recovery: Extra AST node




      FailNode
      case class FailNode(reason: String, markup: String) extends Node
Parsers Error Recovery for Practical Use
  Example




Recovery: helper methods
      Explicitly return FailNode

          protected def failed(reason:     String) = FailNode(reason, "")


      Enrich FailNode with markup

          protected def recover(p: => Parser[Node]): Parser[Node] =
            Parser { in =>
              val r = p(in)
              lazy val markup = in.source.subSequence(in.offset, r.next.offset
                   ).toString
              r match {
                case Success(node: FailNode, next) =>
                  Success(node.copy(markup = markup), next)
                case other =>
                  other
Parsers Error Recovery for Practical Use
  Example




Recovery: Parser rules



              never break (provide “alone tag” parsers)
              return FailNode explicitly if needed

      nodes
          lazy val nodes = rep(node | missingOpen)
          lazy val node = font | text | missingClose
Parsers Error Recovery for Practical Use
  Example




“Missing open tag” parser




      Catching alone [/font]

          def missingOpen = recover {
            fontClose ^^^ { failed("missing open") }
          }
Parsers Error Recovery for Practical Use
  Example




Argument check


      font may have limits on argument

          lazy val font: Parser[Node] = recover {
            fontOpen ~ rep(node) <~ fontClose ^^ {
              case fontOpen(_, arg) ~ subnodes =>
                if (arg == null || allowedFontArgs.contains(arg)) Font(
                     Option(arg), subnodes)
                else failed("arg incorrect")
            }
          }
Parsers Error Recovery for Practical Use
  Example




Passes markup error tests

      Scalatest
          describe("recovery") {
            it("reports incorrect arg") {
               parse("[font=b]t[/font]") must equal(Right(
                  FailNode("arg incorrect", "[font=b]t[/font]") :: Nil
               ))
            }
            it("recovers extra ending tag") {
               parse("t[/font]") must equal(Right(
                  Text("t") :: FailNode("missing open", "[/font]") :: Nil
               ))
            }
Parsers Error Recovery for Practical Use
  Example




Passes longer tests

      Scalatest
             it("recovers extra starting tag in a longer sequence") {
                parse("[font][font]t[/font]") must equal(Right(
                   FailNode("missing close", "[font]") :: Font(None, Text("t
                        ") :: Nil) :: Nil
                ))
             }
             it("recovers extra ending tag in a longer sequence") {
                parse("[font]t[/font][/font]") must equal(Right(
                   Font(None, Text("t") :: Nil) :: FailNode("missing open", "
                        [/font]") :: Nil
                ))
Parsers Error Recovery for Practical Use
  Example




Examples source code




              Source code, specs:
              https://github.com/alaz/slides-err-recovery
Parsers Error Recovery for Practical Use
  Results




Production use outlines




              It works reliably
              Lower maintenance costs
              Performance (see next slides)
              Beware: Scala parser combinators are not thread-safe.
Parsers Error Recovery for Practical Use
  Results




Performance


              The biggest problem is performance.

      Benchmarks (real codebase)

                                                 PHP     Scala
                                 Typical 8k      5.3ms   51ms
                                 Big w/err 76k   136ms   1245ms


              Workaround: caching
Parsers Error Recovery for Practical Use
  Results




Surprise!




                                           Never give up

              find a good motivator instead (e.g. presentation for Scala.by)
Parsers Error Recovery for Practical Use
  Results




Performance: success story

              Want performance? Do not use Lexical
              Forget those scary numbers!

      Benchmarks (real codebase)

                                             PHP     Scala
                             Typical 8k      5.3ms   51ms 16ms
                             Big w/err 76k   136ms   1245ms 31ms


              Thank you, Scala.by!
Parsers Error Recovery for Practical Use
  Results




Thank you




              Email: azarov@osinka.ru
              Twitter: http://twitter.com/aazarov
              Source code, specs:
              https://github.com/alaz/slides-err-recovery

More Related Content

What's hot

Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Perl 5.10 in 2010
Perl 5.10 in 2010Perl 5.10 in 2010
Perl 5.10 in 2010guest7899f0
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard LibrarySantosh Rajan
 
Yapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlYapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlBruce Gray
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full storyJosé Paumard
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Puppet
 
C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)Patricia Aas
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentationVan Huong
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Prof. Wim Van Criekinge
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Parkpointstechgeeks
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl ProgrammingUtkarsh Sengar
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsRoy Zimmer
 

What's hot (19)

Intro to Perl and Bioperl
Intro to Perl and BioperlIntro to Perl and Bioperl
Intro to Perl and Bioperl
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Java8
Java8Java8
Java8
 
Perl 5.10 in 2010
Perl 5.10 in 2010Perl 5.10 in 2010
Perl 5.10 in 2010
 
The Swift Compiler and Standard Library
The Swift Compiler and Standard LibraryThe Swift Compiler and Standard Library
The Swift Compiler and Standard Library
 
Perl Basics with Examples
Perl Basics with ExamplesPerl Basics with Examples
Perl Basics with Examples
 
Java 8 Lambda and Streams
Java 8 Lambda and StreamsJava 8 Lambda and Streams
Java 8 Lambda and Streams
 
Python
PythonPython
Python
 
Yapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line PerlYapc::NA::2009 - Command Line Perl
Yapc::NA::2009 - Command Line Perl
 
Linked to ArrayList: the full story
Linked to ArrayList: the full storyLinked to ArrayList: the full story
Linked to ArrayList: the full story
 
Unit v
Unit vUnit v
Unit v
 
Power of Puppet 4
Power of Puppet 4Power of Puppet 4
Power of Puppet 4
 
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
Can you upgrade to Puppet 4.x? (Beginner) Can you upgrade to Puppet 4.x? (Beg...
 
C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)C++ for Java Developers (JavaZone 2017)
C++ for Java Developers (JavaZone 2017)
 
Java 8 presentation
Java 8 presentationJava 8 presentation
Java 8 presentation
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Park
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl Programming
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager Needs
 

Viewers also liked

Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovParsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovVasil Remeniuk
 
Desarrollo eco turistico-trabajo final nacional
Desarrollo eco turistico-trabajo  final nacionalDesarrollo eco turistico-trabajo  final nacional
Desarrollo eco turistico-trabajo final nacionalflacamia
 
Cristobal suarez presentación_2_en_1
Cristobal suarez presentación_2_en_1Cristobal suarez presentación_2_en_1
Cristobal suarez presentación_2_en_1Educared2011
 
SMM Guide: How to Promote a Bank on Social Media. A Reference Manual
SMM Guide: How to Promote a Bank on Social Media. A Reference ManualSMM Guide: How to Promote a Bank on Social Media. A Reference Manual
SMM Guide: How to Promote a Bank on Social Media. A Reference ManualPSBSMM
 
Considering Email Marketing integration with Omniture?
Considering Email Marketing integration with Omniture?Considering Email Marketing integration with Omniture?
Considering Email Marketing integration with Omniture?bricedubosq
 
ADC PRS Study-Paper OGJ- June 2014
ADC PRS Study-Paper OGJ- June 2014ADC PRS Study-Paper OGJ- June 2014
ADC PRS Study-Paper OGJ- June 2014Xavier MICHEL
 
Caso Práctico: Planificación Anual Social Media
Caso Práctico: Planificación Anual Social MediaCaso Práctico: Planificación Anual Social Media
Caso Práctico: Planificación Anual Social MediaJacqueline Juárez
 
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...FUSADES
 
Prevención y el iafa
Prevención y el iafaPrevención y el iafa
Prevención y el iafaMarta Siles
 
How to use social media for marketing your music business (Midemnet 2010)
How to use social media for marketing your music business (Midemnet 2010)How to use social media for marketing your music business (Midemnet 2010)
How to use social media for marketing your music business (Midemnet 2010)Gerd Leonhard
 
Cómo Atraer Dinero A Tu Vida
Cómo Atraer Dinero A Tu VidaCómo Atraer Dinero A Tu Vida
Cómo Atraer Dinero A Tu VidaGustavo Vallejo
 
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civil
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civilLo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civil
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civilblogmodabodas93
 

Viewers also liked (20)

Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix KliuchnikovParsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
Parsers Combinators in Scala, Ilya @lambdamix Kliuchnikov
 
Natural Awakenings, We Are Coaches
Natural Awakenings, We Are CoachesNatural Awakenings, We Are Coaches
Natural Awakenings, We Are Coaches
 
Church of scientology l. ron hubbard part 05
Church of scientology l. ron hubbard part 05Church of scientology l. ron hubbard part 05
Church of scientology l. ron hubbard part 05
 
Desarrollo eco turistico-trabajo final nacional
Desarrollo eco turistico-trabajo  final nacionalDesarrollo eco turistico-trabajo  final nacional
Desarrollo eco turistico-trabajo final nacional
 
Navegadores
NavegadoresNavegadores
Navegadores
 
Cristobal suarez presentación_2_en_1
Cristobal suarez presentación_2_en_1Cristobal suarez presentación_2_en_1
Cristobal suarez presentación_2_en_1
 
PLASTICHOME
PLASTICHOMEPLASTICHOME
PLASTICHOME
 
SMM Guide: How to Promote a Bank on Social Media. A Reference Manual
SMM Guide: How to Promote a Bank on Social Media. A Reference ManualSMM Guide: How to Promote a Bank on Social Media. A Reference Manual
SMM Guide: How to Promote a Bank on Social Media. A Reference Manual
 
Puntos negros bfs 2013
Puntos negros bfs 2013Puntos negros bfs 2013
Puntos negros bfs 2013
 
Curso seo basico
Curso seo basicoCurso seo basico
Curso seo basico
 
Considering Email Marketing integration with Omniture?
Considering Email Marketing integration with Omniture?Considering Email Marketing integration with Omniture?
Considering Email Marketing integration with Omniture?
 
ADC PRS Study-Paper OGJ- June 2014
ADC PRS Study-Paper OGJ- June 2014ADC PRS Study-Paper OGJ- June 2014
ADC PRS Study-Paper OGJ- June 2014
 
Caso Práctico: Planificación Anual Social Media
Caso Práctico: Planificación Anual Social MediaCaso Práctico: Planificación Anual Social Media
Caso Práctico: Planificación Anual Social Media
 
Cual Es Tu Objetivo
Cual Es Tu ObjetivoCual Es Tu Objetivo
Cual Es Tu Objetivo
 
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...
Una Ley Procesal Constitucional para el Siglo XXI: reflexiones para una refor...
 
Prevención y el iafa
Prevención y el iafaPrevención y el iafa
Prevención y el iafa
 
How to use social media for marketing your music business (Midemnet 2010)
How to use social media for marketing your music business (Midemnet 2010)How to use social media for marketing your music business (Midemnet 2010)
How to use social media for marketing your music business (Midemnet 2010)
 
Cómo Atraer Dinero A Tu Vida
Cómo Atraer Dinero A Tu VidaCómo Atraer Dinero A Tu Vida
Cómo Atraer Dinero A Tu Vida
 
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civil
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civilLo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civil
Lo aconsejable que tendrias que tener en cuenta acerca anillos de boda al civil
 
Estas Preparado Para Encontrarte Con Dios
Estas Preparado Para Encontrarte Con DiosEstas Preparado Para Encontrarte Con Dios
Estas Preparado Para Encontrarte Con Dios
 

Similar to Scala parsers Error Recovery in Production

New features in abap
New features in abapNew features in abap
New features in abapSrihari J
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersAlessandro Sanino
 
Java 8 New features
Java 8 New featuresJava 8 New features
Java 8 New featuresSon Nguyen
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side JavascriptJulie Iskander
 
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Amazon Web Services
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonManageIQ
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17Daniel Eriksson
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?lichtkind
 
Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfnamarta88
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()daewon jeong
 
Visual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 OverviewVisual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 Overviewbwullems
 
Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Kai Chan
 
Functional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhFunctional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhHarmeet Singh(Taara)
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.ioNilaNila16
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginnersAbishek Purushothaman
 

Similar to Scala parsers Error Recovery in Production (20)

New features in abap
New features in abapNew features in abap
New features in abap
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
Java 8
Java 8Java 8
Java 8
 
Java 8 New features
Java 8 New featuresJava 8 New features
Java 8 New features
 
Introduction to Client-Side Javascript
Introduction to Client-Side JavascriptIntroduction to Client-Side Javascript
Introduction to Client-Side Javascript
 
Colloquium Report
Colloquium ReportColloquium Report
Colloquium Report
 
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
Distributed, Incremental Dataflow Processing on AWS with GRAIL's Reflow (CMP3...
 
Design Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron PattersonDesign Summit - Rails 4 Migration - Aaron Patterson
Design Summit - Rails 4 Migration - Aaron Patterson
 
Meetup C++ A brief overview of c++17
Meetup C++  A brief overview of c++17Meetup C++  A brief overview of c++17
Meetup C++ A brief overview of c++17
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
Hello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdfHello, I need help with the following assignmentThis assignment w.pdf
Hello, I need help with the following assignmentThis assignment w.pdf
 
Scala is java8.next()
Scala is java8.next()Scala is java8.next()
Scala is java8.next()
 
Visual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 OverviewVisual Studio 2010 and .NET 4.0 Overview
Visual Studio 2010 and .NET 4.0 Overview
 
Python and You Series
Python and You SeriesPython and You Series
Python and You Series
 
Rails on Oracle 2011
Rails on Oracle 2011Rails on Oracle 2011
Rails on Oracle 2011
 
Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)Java Performance Tips (So Code Camp San Diego 2014)
Java Performance Tips (So Code Camp San Diego 2014)
 
Functional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singhFunctional programming in java 8 by harmeet singh
Functional programming in java 8 by harmeet singh
 
Annotation processing
Annotation processingAnnotation processing
Annotation processing
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.io
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 

Recently uploaded

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Scala parsers Error Recovery in Production

  • 1. Parsers Error Recovery for Practical Use Parsers Error Recovery for Practical Use Alexander Azarov azarov@osinka.ru / Osinka.ru February 18, 2012
  • 2. Parsers Error Recovery for Practical Use Context Osinka Forum of “BB” kind (2.5M+ pages/day, 8M+ posts total) User generated content: BBCode markup Migrating to Scala Backend
  • 3. Parsers Error Recovery for Practical Use Context Plan Why parser combinators? Why error recovery? Example of error recovery Results
  • 4. Parsers Error Recovery for Practical Use Context BBCode Few BBCode tags [b]bold[/b] <b>bold</b> [i]italic[/i] <i>italic</i> [url=href]text[/url] <a href="href">text</a> [img]href[/img] <img src="href"/>
  • 5. Parsers Error Recovery for Practical Use Context BBCode example example of BBCode Example [quote="Nick"] original [b]text[/b] [/quote] Here it is the reply with [url=http://www.google.com]link[/url]
  • 6. Parsers Error Recovery for Practical Use Task Why parser combinators, 1 Regexp maintenance is a headache Bugs extremely hard to find No markup errors
  • 7. Parsers Error Recovery for Practical Use Task Why parser combinators, 2 One post source, many views HTML render for Web textual view for emails text-only short summary text-only for full-text search indexer
  • 8. Parsers Error Recovery for Practical Use Task Why parser combinators, 3 Post analysis algorithms links (e.g. spam automated analysis) images whatever structure analysis we’d want
  • 9. Parsers Error Recovery for Practical Use Task Universal AST One AST different printers various traversal algorithms
  • 10. Parsers Error Recovery for Practical Use Problem Sounds great. But. This all looks like a perfect world. But what’s the catch??
  • 11. Parsers Error Recovery for Practical Use Problem Sounds great. But. Humans. They do mistakes.
  • 12. Parsers Error Recovery for Practical Use Problem Sounds great. But. Humans. Example They do mistakes. [quote] [url=http://www.google.com] [img]http://www.image.com [/url[/img] [/b]
  • 13. Parsers Error Recovery for Practical Use Problem User-Generated Content: Problem Erroneous markup People do mistakes, But no one wants to see empty post, We have to show something meaningful in any case
  • 14. Parsers Error Recovery for Practical Use Problem Black or White World Scala parser combinators assume valid input Parser result: Success | NoSuccess no error recovery out of the box
  • 15. Parsers Error Recovery for Practical Use Solution Error recovery: our approach Our Parser never breaks It generates “error nodes” instead
  • 16. Parsers Error Recovery for Practical Use Solution Approach: Error nodes Part of AST, FailNode contains the possible causes of the failure They are meaningful for highlighting in editor to mark posts having failures in markup (for moderators/other users to see this)
  • 17. Parsers Error Recovery for Practical Use Solution Approach: input & unpaired tags Assume all input except tags as text E.g. [tag]text[/tag] is a text node Unpaired tags as the last choice: markup errors
  • 18. Parsers Error Recovery for Practical Use Example Example Example
  • 19. Parsers Error Recovery for Practical Use Example Trivial BBCode markup Example (Trivial "one tag" BBCode) Simplest [font=bold]BBCode [font=red]example[/font][/font] has only one tag, font though it may have an argument
  • 20. Parsers Error Recovery for Practical Use Example Corresponding AST AST trait Node case class Text(text: String) extends Node case class Font(arg: Option[String], subnodes: List[Node]) extends Node
  • 21. Parsers Error Recovery for Practical Use Example Parser BBCode parser lazy val nodes = rep(font | text) lazy val text = rep1(not(fontOpen|fontClose) ~> "(?s).".r) ^^ { texts => Text(texts.mkString) } lazy val font: Parser[Node] = { fontOpen ~ nodes <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => Font(Option(arg), subnodes) } }
  • 22. Parsers Error Recovery for Practical Use Example Valid markup Scalatest describe("parser") { it("keeps spaces") { parse(" ") must equal(Right(Text(" ") :: Nil)) parse(" n ") must equal(Right(Text(" n ") :: Nil)) } it("parses text") { parse("plain text") must equal(Right(Text("plain text") :: Nil)) } it("parses bbcode-like text") { parse("plain [tag] [fonttext") must equal(Right(Text(" plain [tag] [fonttext") :: Nil)) }
  • 23. Parsers Error Recovery for Practical Use Example Invalid markup Scalatest describe("error markup") { it("results in error") { parse("t[/font]") must be(’left) parse("[font]t") must be(’left) } }
  • 24. Parsers Error Recovery for Practical Use Example Recovery: Extra AST node FailNode case class FailNode(reason: String, markup: String) extends Node
  • 25. Parsers Error Recovery for Practical Use Example Recovery: helper methods Explicitly return FailNode protected def failed(reason: String) = FailNode(reason, "") Enrich FailNode with markup protected def recover(p: => Parser[Node]): Parser[Node] = Parser { in => val r = p(in) lazy val markup = in.source.subSequence(in.offset, r.next.offset ).toString r match { case Success(node: FailNode, next) => Success(node.copy(markup = markup), next) case other => other
  • 26. Parsers Error Recovery for Practical Use Example Recovery: Parser rules never break (provide “alone tag” parsers) return FailNode explicitly if needed nodes lazy val nodes = rep(node | missingOpen) lazy val node = font | text | missingClose
  • 27. Parsers Error Recovery for Practical Use Example “Missing open tag” parser Catching alone [/font] def missingOpen = recover { fontClose ^^^ { failed("missing open") } }
  • 28. Parsers Error Recovery for Practical Use Example Argument check font may have limits on argument lazy val font: Parser[Node] = recover { fontOpen ~ rep(node) <~ fontClose ^^ { case fontOpen(_, arg) ~ subnodes => if (arg == null || allowedFontArgs.contains(arg)) Font( Option(arg), subnodes) else failed("arg incorrect") } }
  • 29. Parsers Error Recovery for Practical Use Example Passes markup error tests Scalatest describe("recovery") { it("reports incorrect arg") { parse("[font=b]t[/font]") must equal(Right( FailNode("arg incorrect", "[font=b]t[/font]") :: Nil )) } it("recovers extra ending tag") { parse("t[/font]") must equal(Right( Text("t") :: FailNode("missing open", "[/font]") :: Nil )) }
  • 30. Parsers Error Recovery for Practical Use Example Passes longer tests Scalatest it("recovers extra starting tag in a longer sequence") { parse("[font][font]t[/font]") must equal(Right( FailNode("missing close", "[font]") :: Font(None, Text("t ") :: Nil) :: Nil )) } it("recovers extra ending tag in a longer sequence") { parse("[font]t[/font][/font]") must equal(Right( Font(None, Text("t") :: Nil) :: FailNode("missing open", " [/font]") :: Nil ))
  • 31. Parsers Error Recovery for Practical Use Example Examples source code Source code, specs: https://github.com/alaz/slides-err-recovery
  • 32. Parsers Error Recovery for Practical Use Results Production use outlines It works reliably Lower maintenance costs Performance (see next slides) Beware: Scala parser combinators are not thread-safe.
  • 33. Parsers Error Recovery for Practical Use Results Performance The biggest problem is performance. Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms Big w/err 76k 136ms 1245ms Workaround: caching
  • 34. Parsers Error Recovery for Practical Use Results Surprise! Never give up find a good motivator instead (e.g. presentation for Scala.by)
  • 35. Parsers Error Recovery for Practical Use Results Performance: success story Want performance? Do not use Lexical Forget those scary numbers! Benchmarks (real codebase) PHP Scala Typical 8k 5.3ms 51ms 16ms Big w/err 76k 136ms 1245ms 31ms Thank you, Scala.by!
  • 36. Parsers Error Recovery for Practical Use Results Thank you Email: azarov@osinka.ru Twitter: http://twitter.com/aazarov Source code, specs: https://github.com/alaz/slides-err-recovery