Conversion of non-semantic HTML to semantic based on visuel cues




                                                 Rune...
About Rune Kaagaard
     Musician / Programmer
     Works for Prescriba a Danish Healthcare / Health Information
    compa...
What is he talking about?
   The problem:
         You have a lot of HTML that is messy and not semantic. Bad for SEO. Bad...
How to render a webpage headless on the server?


     Webkit from PyQT running inside Xvbf!
     xvfb-run
     --server-a...
What to do with this info?

    Strip out everything that does not have semantic meaning.
    Use information about positi...
Code

     Code examples not in presentation being shown... :)




Copyright 2009                                         ...
Read more

     http://code.google.com/p/py-webkit-html-
     manipulator
     http://drupal.org/project/clean




Copyrig...
More from same author

       http://drupal.org/project/autoadmin
       http://drupal.org/project/flexibody
       http://...
Upcoming SlideShare
Loading in …5
×

Semantic Html

883 views

Published on

HTML, Semantic, Python, Webkit, Conversion

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
883
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantic Html

  1. 1. Conversion of non-semantic HTML to semantic based on visuel cues Rune Kaagaard 07/01/2009 Copyright 2009
  2. 2. About Rune Kaagaard Musician / Programmer Works for Prescriba a Danish Healthcare / Health Information company. rumi.kg@gmail.com skype: rune_kg MSN: rune_kg@hotmail.com Copyright 2009 1
  3. 3. What is he talking about? The problem: You have a lot of HTML that is messy and not semantic. Bad for SEO. Bad for editors. The goal: Convert the HTML into clean, pretty semantic HTML. The idea: Try to understand the page [more] like a person would. Where is it positioned? How big is the text? What font is used. What distance does it have to other elements. The tools: Webkit PyQT PHP with htmlpurifier, Tidy and phpQuery Copyright 2009 2
  4. 4. How to render a webpage headless on the server? Webkit from PyQT running inside Xvbf! xvfb-run --server-args='-screen 0, 640x480x24' [PATH]/whm.py --url='[URL]' --js-file=[FILE_PATH] --output-file=[FILE_PATH] Xvfb is linux only. Returns output from .js code. Copyright 2009 3
  5. 5. What to do with this info? Strip out everything that does not have semantic meaning. Use information about position to transform into semantic HTML. Cleanup everything again. Copyright 2009 4
  6. 6. Code Code examples not in presentation being shown... :) Copyright 2009 5
  7. 7. Read more http://code.google.com/p/py-webkit-html- manipulator http://drupal.org/project/clean Copyright 2009 6
  8. 8. More from same author http://drupal.org/project/autoadmin http://drupal.org/project/flexibody http://code.google.com/p/phpetris/ http://code.google.com/p/php-alternative- syntax/ Copyright 2009 7

×