SlideShare a Scribd company logo
Introduction
   HTML parser choice
HTML5::Sanitizer interna
 HTML5::Sanitizer usage
             Conclusion




         HTML5::Sanitizer
  Sanitizing HTML 5 with Perl 5


                 Uwe Voelker

                     XING AG


            August 16th 2011




            Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice
                   HTML5::Sanitizer interna
                    HTML5::Sanitizer usage
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice      Task: WYSIWYG editor
                 HTML5::Sanitizer interna   Team
                  HTML5::Sanitizer usage    Live example
                              Conclusion




1   Introduction
       Task: WYSIWYG editor
       Team
       Live example

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                             Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice      Task: WYSIWYG editor
                HTML5::Sanitizer interna   Team
                 HTML5::Sanitizer usage    Live example
                             Conclusion


Task: WYSIWYG editor



     integrate WYSIWYG editor in XING
     frontend architect researched open source solutions
     none was suited, mostly for security reasons
     decision was made, to build it inhouse
     goals: secure, share profiles (allowed tags) between frontend
     and backend




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Team




 Christopher Blum        Ingo Chao                           Uwe Voelker
 Javascript              QA (HTML5/CSS)                      Perl


                           Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      Task: WYSIWYG editor
               HTML5::Sanitizer interna   Team
                HTML5::Sanitizer usage    Live example
                            Conclusion


Live example




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                      HTML parser choice      CPAN modules
                   HTML5::Sanitizer interna   Evaluation
                    HTML5::Sanitizer usage    Final decision
                                Conclusion




1   Introduction

2   HTML parser choice
     CPAN modules
     Evaluation
     Final decision

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage

5   Conclusion


                               Uwe Voelker    HTML5::Sanitizer
Introduction
                  HTML parser choice      CPAN modules
               HTML5::Sanitizer interna   Evaluation
                HTML5::Sanitizer usage    Final decision
                            Conclusion


HTML parser on CPAN



     HTML::Parser
     HTML::TreeBuilder
     HTML::TreeBuilder::LibXML
     XML::LibXML
     HTML::HTML5::Parser
     Marpa::HTML
     ...




                           Uwe Voelker    HTML5::Sanitizer
Introduction
   HTML parser choice      CPAN modules
HTML5::Sanitizer interna   Evaluation
 HTML5::Sanitizer usage    Final decision
             Conclusion




            Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en




                      Uwe Voelker    HTML5::Sanitizer
Introduction
             HTML parser choice      CPAN modules
          HTML5::Sanitizer interna   Evaluation
           HTML5::Sanitizer usage    Final decision
                       Conclusion




started with HTML::HTML5::Parser (HH5P)
because it understands semantic of HTML 5 tags
but it also did this:
    http://example.com/?section=2&copy=3&lang=en
    http://example.com/?section=2©=3&lang=en
final choice: XML::LibXML




                      Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna
     Processing Phases
     Parsing
     Converting
     Writing

4   HTML5::Sanitizer usage

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Processing Phases
                    HTML parser choice
                                            Parsing
                 HTML5::Sanitizer interna
                                            Converting
                  HTML5::Sanitizer usage
                                            Writing
                              Conclusion


Processing phases




      preprocessing (e. g. migration)
      parsing (HTML → DOM tree)
      converting (rebuild tree according to profile)
      writing (DOM tree → HTML)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML

  use XML : : LibXML ;

  my $ p a r s e r = XML : : LibXML−>new (
       encoding                        => ’UTF−8 ’ ,
       recover                         => 2 ,
       keep blanks                     => 1 ,
       no cdata                        => 1 ,
       expand entities                 => 1 ,
      no network                       => 1 ,
       suppress errors                 => 1 ,
       s u p p r e s s w a r n i n g s => 1 ,
  );

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Processing Phases
                      HTML parser choice
                                              Parsing
                   HTML5::Sanitizer interna
                                              Converting
                    HTML5::Sanitizer usage
                                              Writing
                                Conclusion


Parsing HTML with XML::LibXML



  my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g (
      $html ,
      {
          no cdata                        => 1 ,
          suppress errors                 => 1 ,
          s u p p r e s s w a r n i n g s => 1 ,
      },
  );




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                         Processing Phases
                 HTML parser choice
                                         Parsing
              HTML5::Sanitizer interna
                                         Converting
               HTML5::Sanitizer usage
                                         Writing
                           Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)




                          Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Processing Phases
                  HTML parser choice
                                          Parsing
               HTML5::Sanitizer interna
                                          Converting
                HTML5::Sanitizer usage
                                          Writing
                            Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Converting - rebuilding DOM tree



     loop through every node (only ELEMENT and TEXT)
     drop unwanted elements completely (e. g. <script>)
     change unknown elements to <span>
     eventually change tag name (profile)
     transform (or copy) attributes
     proceed recursively with child nodes




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Processing Phases
                   HTML parser choice
                                           Parsing
                HTML5::Sanitizer interna
                                           Converting
                 HTML5::Sanitizer usage
                                           Writing
                             Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Processing Phases
                     HTML parser choice
                                             Parsing
                  HTML5::Sanitizer interna
                                             Converting
                   HTML5::Sanitizer usage
                                             Writing
                               Conclusion


Writing HTML

     mainly for additional escapes
     could not find a nice way to integrate this in XML::LibXML

  $text   =˜   s/&/&amp ; / g ;
  $text   =˜   s / ’ /&#39;/g;# ’
  $text   =˜   s /”/&q u o t ; / g;#”
  $text   =˜   s/</& l t ; / g ;
  $text   =˜   s/>/&g t ; / g ;
  $text   =˜   s / ‘/&#9 6 ; / g ;
  $text   =˜   s /{/&#1 2 3 ; / g ;
  $text   =˜   s /}/&#1 2 5 ; / g ;


                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion




1   Introduction

2   HTML parser choice

3   HTML5::Sanitizer interna

4   HTML5::Sanitizer usage
     Usage
     Profile
     Examples
     Debugging

5   Conclusion

                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Usage



 # construct object
 my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new (
      p r o f i l e => ’My : : P r o f i l e ’ ,
 );

 # c a l l process ()
 my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ;




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                          Usage
                  HTML parser choice
                                          Profile
               HTML5::Sanitizer interna
                                          Examples
                HTML5::Sanitizer usage
                                          Debugging
                            Conclusion


Profile


     you have to build your own




                           Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Profile


     you have to build your own
     class with just one method: element($tag)
     return undef or a hashref with:
           remove remove complete sub tree (boolean)
      rename tag rename tag (string)
     set attributes set these attributes (hashref)
     check attributes check/transform these attributes (hashref)
          set class set class (string)
         add class add class from other attributes (hashref)



                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - script



      completely remove <script> (including all children)

  {
       remove => 1 ,
  }

      otherwise it would be converted to <span>
      and all children processed recursively




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                            Usage
                    HTML parser choice
                                            Profile
                 HTML5::Sanitizer interna
                                            Examples
                  HTML5::Sanitizer usage
                                            Debugging
                              Conclusion


Examples - big



     <big> → <span class=”big”>




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                                             Usage
                     HTML parser choice
                                             Profile
                  HTML5::Sanitizer interna
                                             Examples
                   HTML5::Sanitizer usage
                                             Debugging
                               Conclusion


Examples - big



      <big> → <span class=”big”>

  {
       r e n a m e t a g => ’ s p a n ’ ,
       s e t c l a s s => ’ b i g ’ ,
  }




                              Uwe Voelker    HTML5::Sanitizer
Introduction
                                           Usage
                   HTML parser choice
                                           Profile
                HTML5::Sanitizer interna
                                           Examples
                 HTML5::Sanitizer usage
                                           Debugging
                             Conclusion


Examples - a



     add rel=”nofollow” and target=” blank” to every link




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - a



      add rel=”nofollow” and target=” blank” to every link

  {
       s e t a t t r i b u t e s => {
             rel          => ’ n o f o l l o w ’ ,
             t a r g e t => ’ b l a n k ’ ,
       },
  }




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                              Usage
                      HTML parser choice
                                              Profile
                   HTML5::Sanitizer interna
                                              Examples
                    HTML5::Sanitizer usage
                                              Debugging
                                Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,




                               Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Examples - font
  r e n a m e t a g => ’ s p a n ’ ,
  a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } ,

  sub c l a s s s i z e f o n t {
    my ( $ s e l f , $ v a l ) = @ ;
    return unless $val ;
    r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ;
    # ...
    r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ;

      r e t u r n ’ s i z e −l a r g e r ’        i f $ v a l =˜ /ˆ+/;
      r e t u r n ’ s i z e −s m a l l e r ’      i f $ v a l =˜ /ˆ −/;
      return ;
  }
                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                                                 Usage
                         HTML parser choice
                                                 Profile
                      HTML5::Sanitizer interna
                                                 Examples
                       HTML5::Sanitizer usage
                                                 Debugging
                                   Conclusion


Debugging

        if the result is not as expected, you can access intermediate
        results:

  my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t

  # s e e HTML5 : : S a n i t i z e r : : R e s u l t
  s a y $ r e s −>i n p u t ;
  s a y $ r e s −>p r e p r o c e s s e d ;
  s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ;
  s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ;
  s a y $ r e s −>o u t p u t ;

  p r i n t $ r e s −>d e b u g o u t p u t ;

                                  Uwe Voelker    HTML5::Sanitizer
Introduction
                   HTML parser choice
                HTML5::Sanitizer interna
                 HTML5::Sanitizer usage
                             Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer




                            Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                    HTML parser choice
                 HTML5::Sanitizer interna
                  HTML5::Sanitizer usage
                              Conclusion


Repositories



      HTML5::Sanitizer (backend)
      http://github.com/xing/html5-sanitizer
      wysihtml5 (javascript frontend)
      http://github.com/xing/wysihtml5
      Feedback? uwe@uwevoelker.de




                             Uwe Voelker    HTML5::Sanitizer
Introduction
                HTML parser choice
             HTML5::Sanitizer interna
              HTML5::Sanitizer usage
                          Conclusion


Questions?




                         Uwe Voelker    HTML5::Sanitizer

More Related Content

What's hot

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
Bruno Tanoue
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
Kelli Mohr
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
Ricardo Martinelli de Oliveira
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation Testing
Shyam Sunder Verma
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
Adam Culp
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfection
Jorge Ortiz
 

What's hot (6)

[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...[ENGLISH] TDC 2015 - PHP  Trail - Tests and PHP Continuous Integration Enviro...
[ENGLISH] TDC 2015 - PHP Trail - Tests and PHP Continuous Integration Enviro...
 
Continuous Integration
Continuous IntegrationContinuous Integration
Continuous Integration
 
Ensuring Software Quality in the cloud
Ensuring Software Quality in the cloudEnsuring Software Quality in the cloud
Ensuring Software Quality in the cloud
 
Joomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation TestingJoomla Code Quality Control and Automation Testing
Joomla Code Quality Control and Automation Testing
 
Refactoring Legacy Code
Refactoring Legacy CodeRefactoring Legacy Code
Refactoring Legacy Code
 
Building for perfection
Building for perfectionBuilding for perfection
Building for perfection
 

Similar to Sanitizing HTML 5 with Perl 5

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?
FossilDesigns
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's day
Ankur Mishra
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
Rajiv Pant
 
Xhtml
XhtmlXhtml
Html5
Html5Html5
Html5
Html5Html5
Html5
Fraboni Ec
 
Html5
Html5Html5
Html5
Html5Html5
Html5
Html5Html5
Html5
Html5Html5
Html5
Html5Html5
Html5
James Wong
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
Antonio Carpentieri
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 min
Edgar Parada
 
Daniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVMDaniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVM
Meet Magento Poland
 
Html5
Html5Html5
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A Freelancer
Michael Wales
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
Terminalfour
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
DevOps.com
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module Maintenance
Dave Cross
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith
 

Similar to Sanitizing HTML 5 with Perl 5 (20)

Why Embrace "Html5"?
Why Embrace "Html5"?Why Embrace "Html5"?
Why Embrace "Html5"?
 
Delhi student's day
Delhi student's dayDelhi student's day
Delhi student's day
 
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
HTML5 Presentation at Online Publishers Association Tech Conference 2011-03
 
Xhtml
XhtmlXhtml
Xhtml
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Html5
Html5Html5
Html5
 
Varnish e caching di applicazioni Rails
Varnish e caching di applicazioni RailsVarnish e caching di applicazioni Rails
Varnish e caching di applicazioni Rails
 
Everything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 minEverything you need to know about HTML5 in 15 min
Everything you need to know about HTML5 in 15 min
 
Daniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVMDaniel Sloof: Magento on HHVM
Daniel Sloof: Magento on HHVM
 
Html5
Html5Html5
Html5
 
How CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A FreelancerHow CodeIgniter Made Me A Freelancer
How CodeIgniter Made Me A Freelancer
 
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCETERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
TERMINALFOUR t44u 2009 - Enhanced Direct Edit and Tiny MCE
 
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHubIncrease the Velocity of Your Software Releases Using GitHub and DeployHub
Increase the Velocity of Your Software Releases Using GitHub and DeployHub
 
CPAN Module Maintenance
CPAN Module MaintenanceCPAN Module Maintenance
CPAN Module Maintenance
 
Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008Michael(tm) Smith: HTML5 at Web Directions South 2008
Michael(tm) Smith: HTML5 at Web Directions South 2008
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 

Sanitizing HTML 5 with Perl 5

  • 1. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion HTML5::Sanitizer Sanitizing HTML 5 with Perl 5 Uwe Voelker XING AG August 16th 2011 Uwe Voelker HTML5::Sanitizer
  • 2. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 3. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion 1 Introduction Task: WYSIWYG editor Team Live example 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 4. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions Uwe Voelker HTML5::Sanitizer
  • 5. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse Uwe Voelker HTML5::Sanitizer
  • 6. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Task: WYSIWYG editor integrate WYSIWYG editor in XING frontend architect researched open source solutions none was suited, mostly for security reasons decision was made, to build it inhouse goals: secure, share profiles (allowed tags) between frontend and backend Uwe Voelker HTML5::Sanitizer
  • 7. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Team Christopher Blum Ingo Chao Uwe Voelker Javascript QA (HTML5/CSS) Perl Uwe Voelker HTML5::Sanitizer
  • 8. Introduction HTML parser choice Task: WYSIWYG editor HTML5::Sanitizer interna Team HTML5::Sanitizer usage Live example Conclusion Live example Uwe Voelker HTML5::Sanitizer
  • 9. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion 1 Introduction 2 HTML parser choice CPAN modules Evaluation Final decision 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 10. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion HTML parser on CPAN HTML::Parser HTML::TreeBuilder HTML::TreeBuilder::LibXML XML::LibXML HTML::HTML5::Parser Marpa::HTML ... Uwe Voelker HTML5::Sanitizer
  • 11. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion Uwe Voelker HTML5::Sanitizer
  • 12. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags Uwe Voelker HTML5::Sanitizer
  • 13. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 14. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en Uwe Voelker HTML5::Sanitizer
  • 15. Introduction HTML parser choice CPAN modules HTML5::Sanitizer interna Evaluation HTML5::Sanitizer usage Final decision Conclusion started with HTML::HTML5::Parser (HH5P) because it understands semantic of HTML 5 tags but it also did this: http://example.com/?section=2&copy=3&lang=en http://example.com/?section=2&copy;=3&lang=en final choice: XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 16. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna Processing Phases Parsing Converting Writing 4 HTML5::Sanitizer usage 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 17. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) Uwe Voelker HTML5::Sanitizer
  • 18. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) Uwe Voelker HTML5::Sanitizer
  • 19. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) Uwe Voelker HTML5::Sanitizer
  • 20. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Processing phases preprocessing (e. g. migration) parsing (HTML → DOM tree) converting (rebuild tree according to profile) writing (DOM tree → HTML) Uwe Voelker HTML5::Sanitizer
  • 21. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML use XML : : LibXML ; my $ p a r s e r = XML : : LibXML−>new ( encoding => ’UTF−8 ’ , recover => 2 , keep blanks => 1 , no cdata => 1 , expand entities => 1 , no network => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , ); Uwe Voelker HTML5::Sanitizer
  • 22. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Parsing HTML with XML::LibXML my $doc = $ p a r s e r −>p a r s e h t m l s t r i n g ( $html , { no cdata => 1 , suppress errors => 1 , s u p p r e s s w a r n i n g s => 1 , }, ); Uwe Voelker HTML5::Sanitizer
  • 23. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) Uwe Voelker HTML5::Sanitizer
  • 24. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> Uwe Voelker HTML5::Sanitizer
  • 25. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes Uwe Voelker HTML5::Sanitizer
  • 26. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Converting - rebuilding DOM tree loop through every node (only ELEMENT and TEXT) drop unwanted elements completely (e. g. <script>) change unknown elements to <span> eventually change tag name (profile) transform (or copy) attributes proceed recursively with child nodes Uwe Voelker HTML5::Sanitizer
  • 27. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML Uwe Voelker HTML5::Sanitizer
  • 28. Introduction Processing Phases HTML parser choice Parsing HTML5::Sanitizer interna Converting HTML5::Sanitizer usage Writing Conclusion Writing HTML mainly for additional escapes could not find a nice way to integrate this in XML::LibXML $text =˜ s/&/&amp ; / g ; $text =˜ s / ’ /&#39;/g;# ’ $text =˜ s /”/&q u o t ; / g;#” $text =˜ s/</& l t ; / g ; $text =˜ s/>/&g t ; / g ; $text =˜ s / ‘/&#9 6 ; / g ; $text =˜ s /{/&#1 2 3 ; / g ; $text =˜ s /}/&#1 2 5 ; / g ; Uwe Voelker HTML5::Sanitizer
  • 29. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion 1 Introduction 2 HTML parser choice 3 HTML5::Sanitizer interna 4 HTML5::Sanitizer usage Usage Profile Examples Debugging 5 Conclusion Uwe Voelker HTML5::Sanitizer
  • 30. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Usage # construct object my $ s a n i t i z e r = HTML5 : : S a n i t i z e r −>new ( p r o f i l e => ’My : : P r o f i l e ’ , ); # c a l l process () my $ c l e a n = $ s a n i t i z e r −>p r o c e s s ( $html ) ; Uwe Voelker HTML5::Sanitizer
  • 31. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own Uwe Voelker HTML5::Sanitizer
  • 32. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: Uwe Voelker HTML5::Sanitizer
  • 33. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Profile you have to build your own class with just one method: element($tag) return undef or a hashref with: remove remove complete sub tree (boolean) rename tag rename tag (string) set attributes set these attributes (hashref) check attributes check/transform these attributes (hashref) set class set class (string) add class add class from other attributes (hashref) Uwe Voelker HTML5::Sanitizer
  • 34. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) Uwe Voelker HTML5::Sanitizer
  • 35. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } Uwe Voelker HTML5::Sanitizer
  • 36. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - script completely remove <script> (including all children) { remove => 1 , } otherwise it would be converted to <span> and all children processed recursively Uwe Voelker HTML5::Sanitizer
  • 37. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> Uwe Voelker HTML5::Sanitizer
  • 38. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - big <big> → <span class=”big”> { r e n a m e t a g => ’ s p a n ’ , s e t c l a s s => ’ b i g ’ , } Uwe Voelker HTML5::Sanitizer
  • 39. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link Uwe Voelker HTML5::Sanitizer
  • 40. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - a add rel=”nofollow” and target=” blank” to every link { s e t a t t r i b u t e s => { rel => ’ n o f o l l o w ’ , t a r g e t => ’ b l a n k ’ , }, } Uwe Voelker HTML5::Sanitizer
  • 41. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , Uwe Voelker HTML5::Sanitizer
  • 42. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Examples - font r e n a m e t a g => ’ s p a n ’ , a d d c l a s s => { s i z e => ’ s i z e f o n t ’ } , sub c l a s s s i z e f o n t { my ( $ s e l f , $ v a l ) = @ ; return unless $val ; r e t u r n ’ s i z e −xx−l a r g e ’ i f $ v a l eq ’ 7 ’ ; # ... r e t u r n ’ s i z e −xx−s m a l l ’ i f $ v a l eq ’ 1 ’ ; r e t u r n ’ s i z e −l a r g e r ’ i f $ v a l =˜ /ˆ+/; r e t u r n ’ s i z e −s m a l l e r ’ i f $ v a l =˜ /ˆ −/; return ; } Uwe Voelker HTML5::Sanitizer
  • 43. Introduction Usage HTML parser choice Profile HTML5::Sanitizer interna Examples HTML5::Sanitizer usage Debugging Conclusion Debugging if the result is not as expected, you can access intermediate results: my $ r e s = $ s a n i t i z e r −>p r o c e s s ( $html , { r e t u r n r e s u l t # s e e HTML5 : : S a n i t i z e r : : R e s u l t s a y $ r e s −>i n p u t ; s a y $ r e s −>p r e p r o c e s s e d ; s a y $ r e s −>p a r s e d d o c −>t o S t r i n g ; s a y $ r e s −>c o n v e r t e d d o c −>t o S t r i n g ; s a y $ r e s −>o u t p u t ; p r i n t $ r e s −>d e b u g o u t p u t ; Uwe Voelker HTML5::Sanitizer
  • 44. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer Uwe Voelker HTML5::Sanitizer
  • 45. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Uwe Voelker HTML5::Sanitizer
  • 46. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Repositories HTML5::Sanitizer (backend) http://github.com/xing/html5-sanitizer wysihtml5 (javascript frontend) http://github.com/xing/wysihtml5 Feedback? uwe@uwevoelker.de Uwe Voelker HTML5::Sanitizer
  • 47. Introduction HTML parser choice HTML5::Sanitizer interna HTML5::Sanitizer usage Conclusion Questions? Uwe Voelker HTML5::Sanitizer