SlideShare a Scribd company logo
1 of 47
Download to read offline
Introduction
   HTML parser choice
HTML5::Sanitizer interna
 HTML5::Sanitizer usage
             Conclusion




         HTML5::Sanitizer
  Sanitizing HTML 5 with Perl 5


                 Uwe Voelker

                     XING AG


            August 16th 2011




            Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice
                   HTML5::Sanitizer interna
                    HTML5::Sanitizer usage
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice      Task: WYSIWYG editor
                 HTML5::Sanitizer interna   Team
                  HTML5::Sanitizer usage    Live example
                              Conclusion




1   Introduction
       Task: WYSIWYG editor
       Team
       Live example

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                             Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse
     goals: secure, share profiles (allowed tags) between frontend
     and backend




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Team




 Christopher Blum        Ingo Chao                           Uwe Voelker
 Javascript              QA (HTML5/CSS)                      Perl


                           Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Live example




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice      CPAN modules
                   HTML5::Sanitizer interna   Evaluation
                    HTML5::Sanitizer usage    Final decision
                                Conclusion




1   Introduction

2   HTML parser choice
     CPAN modules
     Evaluation
     Final decision

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                               Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      CPAN modules
               HTML5::Sanitizer interna   Evaluation
                HTML5::Sanitizer usage    Final decision
                            Conclusion


HTML parser on CPAN



     HTML::Parser
     HTML::TreeBuilder
     HTML::TreeBuilder::LibXML
     XML::LibXML
     HTML::HTML5::Parser
     Marpa::HTML
     ...




                           Uwe Voelker    HTML5::Sanitizer
Introduction
   HTML parser choice      CPAN modules
HTML5::Sanitizer interna   Evaluation
 HTML5::Sanitizer usage    Final decision
             Conclusion




            Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en
final choice: XML::LibXML




                      Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna
     Processing Phases
     Parsing
     Converting
     Writing

4   HTML5::Sanitizer usage

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)
      writing (DOM tree → HTML)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML

  use XML : : LibXML ;

  my $ p a r s e r = XML : : LibXML−>new (
       encoding                        => ’UTF−8 ’ ,
       recover                         => 2 ,
       keep blanks                     => 1 ,
       no cdata                        => 1 ,
       expand entities                 => 1 ,
      no network                       => 1 ,
       suppress errors                 => 1 ,
       s u p p r e s s w a r n i n g s => 1 ,
  );

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML



  my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g (
      $html ,
      {
          no cdata                        => 1 ,
          suppress errors                 => 1 ,
          s u p p r e s s w a r n i n g s => 1 ,
      },
  );




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                         Processing Phases
                 HTML parser choice
                                         Parsing
              HTML5::Sanitizer interna
                                         Converting
               HTML5::Sanitizer usage
                                         Writing
                           Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)




                          Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Processing Phases
                  HTML parser choice
                                          Parsing
               HTML5::Sanitizer interna
                                          Converting
                HTML5::Sanitizer usage
                                          Writing
                            Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes
     proceed recursively with child nodes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Processing Phases
                     HTML parser choice
                                             Parsing
                  HTML5::Sanitizer interna
                                             Converting
                   HTML5::Sanitizer usage
                                             Writing
                               Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML

  $text   =˜   s/&/&amp ; / g ;
  $text   =˜   s / ’ /&#39;/g;# ’
  $text   =˜   s /”/&q u o t ; / g;#”
  $text   =˜   s/</& l t ; / g ;
  $text   =˜   s/>/&g t ; / g ;
  $text   =˜   s / ‘/&#9 6 ; / g ;
  $text   =˜   s /{/&#1 2 3 ; / g ;
  $text   =˜   s /}/&#1 2 5 ; / g ;


                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage
     Usage
     Profile
     Examples
     Debugging

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Usage



 # construct object
 my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new (
      p r o f i l e => ’My : : P r o f i l e ’ ,
 );

 # c a l l process ()
 my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ;




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Usage
                  HTML parser choice
                                          Profile
               HTML5::Sanitizer interna
                                          Examples
                HTML5::Sanitizer usage
                                          Debugging
                            Conclusion


Profile


     you have to build your own




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:
           remove remove complete sub tree (boolean)
      rename tag rename tag (string)
     set attributes set these attributes (hashref)
     check attributes check/transform these attributes (hashref)
          set class set class (string)
         add class add class from other attributes (hashref)



                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }

      otherwise it would be converted to <span>
      and all children processed recursively




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - big



     <big> → <span class=”big”>




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Usage
                     HTML parser choice
                                             Profile
                  HTML5::Sanitizer interna
                                             Examples
                   HTML5::Sanitizer usage
                                             Debugging
                               Conclusion


Examples - big



      <big> → <span class=”big”>

  {
       r e n a m e t a g => ’ s p a n ’ ,
       s e t c l a s s => ’ b i g ’ ,
  }




                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Examples - a



     add rel=”nofollow” and target=” blank” to every link




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - a



      add rel=”nofollow” and target=” blank” to every link

  {
       s e t a t t r i b u t e s => {
             rel          => ’ n o f o l l o w ’ ,
             t a r g e t => ’ b l a n k ’ ,
       },
  }




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,

  sub c l a s s s i z e f o n t {
    my ( $ s e l f , $ v a l ) = @ ;
    return unless $val ;
    r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ;
    # ...
    r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ;

      r e t u r n ’ s i z e −l a r g e r ’        i f $ v a l =˜ /ˆ+/;
      r e t u r n ’ s i z e −s m a l l e r ’      i f $ v a l =˜ /ˆ −/;
      return ;
  }
                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Debugging

        if the result is not as expected, you can access intermediate
        results:

  my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t

  # s e e HTML5 : : S a n i t i z e r : : R e s u l t
  s a y $ r e s −>i n p u t ;
  s a y $ r e s −>p r e p r o c e s s e d ;
  s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ;
  s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ;
  s a y $ r e s −>o u t p u t ;

  p r i n t $ r e s −>d e b u g o u t p u t ;

                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice
                HTML5::Sanitizer interna
                 HTML5::Sanitizer usage
                             Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5
      Feedback? uwe@uwevoelker.de




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                HTML parser choice
             HTML5::Sanitizer interna
              HTML5::Sanitizer usage
                          Conclusion


Questions?




                         Uwe Voelker    HTML5::Sanitizer

More Related Content

What's hot

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...Bruno Tanoue
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous IntegrationKelli Mohr
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingShyam Sunder Verma
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy CodeAdam Culp
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfectionJorge Ortiz
 

What's hot (6)

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation Testing
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfection
 

Similar to Sanitizing HTML 5 with Perl 5

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?FossilDesigns
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's dayAnkur Mishra
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03Rajiv Pant
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsAntonio Carpentieri
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEdgar Parada
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerMichael Wales
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETerminalfour
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubDevOps.com
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module MaintenanceDave Cross
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith
 

Similar to Sanitizing HTML 5 with Perl 5 (20)

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's day
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
 
Xhtml
XhtmlXhtml
Xhtml
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 min
 
Daniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVMDaniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVM
 
Html5
Html5Html5
Html5
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A Freelancer
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module Maintenance
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 

Sanitizing HTML 5 with Perl 5

  • 1. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion HTML5::Sanitizer Sanitizing HTML 5 with Perl 5 Uwe Voelker XING AG August 16th 2011 Uwe Voelker HTML5::Sanitizer
  • 2. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 3. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion 1 Introduction Task: WYSIWYG editor Team Live example 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 4. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions Uwe Voelker HTML5::Sanitizer
  • 5. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse Uwe Voelker HTML5::Sanitizer
  • 6. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse goals: secure, share profiles (allowed tags) between frontend and backend Uwe Voelker HTML5::Sanitizer
  • 7. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Team Christopher Blum Ingo Chao Uwe Voelker Javascript QA (HTML5/CSS) Perl Uwe Voelker HTML5::Sanitizer
  • 8. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Live example Uwe Voelker HTML5::Sanitizer
  • 9. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion 1 Introduction 2 HTML parser choice CPAN modules Evaluation Final decision 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 10. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion HTML parser on CPAN HTML::Parser HTML::TreeBuilder HTML::TreeBuilder::LibXML XML::LibXML HTML::HTML5::Parser Marpa::HTML ... Uwe Voelker HTML5::Sanitizer
  • 11. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion Uwe Voelker HTML5::Sanitizer
  • 12. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags Uwe Voelker HTML5::Sanitizer
  • 13. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 14. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 15. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en final choice: XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 16. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna Processing Phases Parsing Converting Writing 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 17. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) Uwe Voelker HTML5::Sanitizer
  • 18. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) Uwe Voelker HTML5::Sanitizer
  • 19. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) Uwe Voelker HTML5::Sanitizer
  • 20. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) writing (DOM tree → HTML) Uwe Voelker HTML5::Sanitizer
  • 21. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML use XML : : LibXML ; my $ p a r s e r = XML : : LibXML−>new ( encoding => ’UTF−8 ’ , recover => 2 , keep blanks => 1 , no cdata => 1 , expand entities => 1 , no network => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , ); Uwe Voelker HTML5::Sanitizer
  • 22. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g ( $html , { no cdata => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , }, ); Uwe Voelker HTML5::Sanitizer
  • 23. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) Uwe Voelker HTML5::Sanitizer
  • 24. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> Uwe Voelker HTML5::Sanitizer
  • 25. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes Uwe Voelker HTML5::Sanitizer
  • 26. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes proceed recursively with child nodes Uwe Voelker HTML5::Sanitizer
  • 27. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 28. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML $text =˜ s/&/&amp ; / g ; $text =˜ s / ’ /&#39;/g;# ’ $text =˜ s /”/&q u o t ; / g;#” $text =˜ s/</& l t ; / g ; $text =˜ s/>/&g t ; / g ; $text =˜ s / ‘/&#9 6 ; / g ; $text =˜ s /{/&#1 2 3 ; / g ; $text =˜ s /}/&#1 2 5 ; / g ; Uwe Voelker HTML5::Sanitizer
  • 29. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage Usage Profile Examples Debugging 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 30. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Usage # construct object my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new ( p r o f i l e => ’My : : P r o f i l e ’ , ); # c a l l process () my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ; Uwe Voelker HTML5::Sanitizer
  • 31. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own Uwe Voelker HTML5::Sanitizer
  • 32. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: Uwe Voelker HTML5::Sanitizer
  • 33. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: remove remove complete sub tree (boolean) rename tag rename tag (string) set attributes set these attributes (hashref) check attributes check/transform these attributes (hashref) set class set class (string) add class add class from other attributes (hashref) Uwe Voelker HTML5::Sanitizer
  • 34. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) Uwe Voelker HTML5::Sanitizer
  • 35. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } Uwe Voelker HTML5::Sanitizer
  • 36. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } otherwise it would be converted to <span> and all children processed recursively Uwe Voelker HTML5::Sanitizer
  • 37. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> Uwe Voelker HTML5::Sanitizer
  • 38. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> { r e n a m e t a g => ’ s p a n ’ , s e t c l a s s => ’ b i g ’ , } Uwe Voelker HTML5::Sanitizer
  • 39. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link Uwe Voelker HTML5::Sanitizer
  • 40. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link { s e t a t t r i b u t e s => { rel => ’ n o f o l l o w ’ , t a r g e t => ’ b l a n k ’ , }, } Uwe Voelker HTML5::Sanitizer
  • 41. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , Uwe Voelker HTML5::Sanitizer
  • 42. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , sub c l a s s s i z e f o n t { my ( $ s e l f , $ v a l ) = @ ; return unless $val ; r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ; # ... r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ; r e t u r n ’ s i z e −l a r g e r ’ i f $ v a l =˜ /ˆ+/; r e t u r n ’ s i z e −s m a l l e r ’ i f $ v a l =˜ /ˆ −/; return ; } Uwe Voelker HTML5::Sanitizer
  • 43. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Debugging if the result is not as expected, you can access intermediate results: my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t # s e e HTML5 : : S a n i t i z e r : : R e s u l t s a y $ r e s −>i n p u t ; s a y $ r e s −>p r e p r o c e s s e d ; s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ; s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ; s a y $ r e s −>o u t p u t ; p r i n t $ r e s −>d e b u g o u t p u t ; Uwe Voelker HTML5::Sanitizer
  • 44. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer Uwe Voelker HTML5::Sanitizer
  • 45. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Uwe Voelker HTML5::Sanitizer
  • 46. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Feedback? uwe@uwevoelker.de Uwe Voelker HTML5::Sanitizer
  • 47. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Questions? Uwe Voelker HTML5::Sanitizer