dylan@pretaweb.comPlone Conf 2010 Dylan Jay
FunnelWeb
Easy Content Conversions
Dylan Jay
PretaWeb
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Content Conversions suck

Large existing sites

Static html or old CMS

Hard to quote on

Content audit

Use plone to fix content

Convert Docs to Pages (coming...)
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
History

2008 - Obrien Intranet

2009 – pretaweb.funnelweb (deprecated)

Plone UI > Actions > Import

2010 – transmogrify.* release on pypi

2010 – collective.developermanual

sphinx to plone

2010 – funnelweb Recipe + Script

Thanks – Dylan Jay, Vitaliy Podoba, Rok Garbas, Mikko Ohtamaa, Tim
Knap
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Demo
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
funnelweb.recipe

Add to buildout
[funnelweb]
recipe = funnelweb
crawler-url=http://www.whitehouse.gov
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
bin/funnelweb

Crawls

Caches locally

Filters

Removes template

Restructures

Determines title,hidden etc

Uploads to plone
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Common Options

crawler:site_url

crawler:ignore

ploneupload:target

template1:description

template1:text

*-disable
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Command Line

bin/funnelweb --crawler:max=50
--localupload:output=var/funnelwebdebug
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Viewing the Pipeline

bin/funnelweb --pipeline
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Custom pipeline

bin/funnelweb –pipeline > pipeline.cfg

{edit} pipeline.cfg

bin/funnelweb --pipeline=pipeline.cfg
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Making your own blueprint
class MyBlueprint(object):
classProvides(ISectionBlueprint)
implements(ISection)
def __init__(self, transmogrifier, name, options, previous):
self.previous = previous
def __iter__(self):
for item in self.previous:
dosomethingto(item)
yield item
<utility component=".myblueprint.MyBluePrintr"
name="transmogrify.myblueprint" />
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
transmogrify.webcrawler

transmogrify.webcrawler

Crawls site or cache for content

transmogrify.webcrawler.typerecognitor

Sets Plone content type based on mime-type

transmogrify.webcrawler.cache

Saves content to disk
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
transmogrify.htmlcontentextractor

transmogrify.htmlcontentextractor

Provide XPath for title, description, text etc.

transmogrify.htmlcontentextractor.auto

Guesses XPaths from content
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
transmogrify.siteanalyser

transmogrify.siteanalyser.relinker

Moves, renames, url tidying

transmogrify.siteanalyser.title

Guess page titles

transmogrify.siteanalyser.defaultpage

Move index pages into folders

transmogrify.siteanalyser.attach

Move attachments closer to pages
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
transmogrify.ploneremote

Remoteconstructor

Adds content to plone via xmlrpc

Remoteschemaupdater

Updates content of existing object

Remotenavigationexcluder

Hides content not in orginal sites navigation

Remoteworkflowupdater

Publish content

Remoteredirector

Creates aliases for items that have moved
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Other blueprints

transmogrify.pathsorter

Puts folders before content and content in
right order

collective.transmogrifier.sections.condition

Useful to drop certain content
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Where to get it

http://github.com:djay/funnelweb.git

http://github.com:djay/transmogrify.*

Pypi release TBA
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
#TODO
• Extract content styles into visual editor
dylan@pretaweb.comPlone Conf 2010 Dylan Jay
Thanks
• djay@pretaweb.com
• IRC: djjay
• Twitter: djay75

Funnelweb ploneconf2010

  • 1.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay FunnelWeb Easy Content Conversions Dylan Jay PretaWeb
  • 2.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Content Conversions suck  Large existing sites  Static html or old CMS  Hard to quote on  Content audit  Use plone to fix content  Convert Docs to Pages (coming...)
  • 3.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay History  2008 - Obrien Intranet  2009 – pretaweb.funnelweb (deprecated)  Plone UI > Actions > Import  2010 – transmogrify.* release on pypi  2010 – collective.developermanual  sphinx to plone  2010 – funnelweb Recipe + Script  Thanks – Dylan Jay, Vitaliy Podoba, Rok Garbas, Mikko Ohtamaa, Tim Knap
  • 4.
  • 5.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay funnelweb.recipe  Add to buildout [funnelweb] recipe = funnelweb crawler-url=http://www.whitehouse.gov
  • 6.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay bin/funnelweb  Crawls  Caches locally  Filters  Removes template  Restructures  Determines title,hidden etc  Uploads to plone
  • 7.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Common Options  crawler:site_url  crawler:ignore  ploneupload:target  template1:description  template1:text  *-disable
  • 8.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Command Line  bin/funnelweb --crawler:max=50 --localupload:output=var/funnelwebdebug
  • 9.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Viewing the Pipeline  bin/funnelweb --pipeline
  • 10.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Custom pipeline  bin/funnelweb –pipeline > pipeline.cfg  {edit} pipeline.cfg  bin/funnelweb --pipeline=pipeline.cfg
  • 11.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Making your own blueprint class MyBlueprint(object): classProvides(ISectionBlueprint) implements(ISection) def __init__(self, transmogrifier, name, options, previous): self.previous = previous def __iter__(self): for item in self.previous: dosomethingto(item) yield item <utility component=".myblueprint.MyBluePrintr" name="transmogrify.myblueprint" />
  • 12.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay transmogrify.webcrawler  transmogrify.webcrawler  Crawls site or cache for content  transmogrify.webcrawler.typerecognitor  Sets Plone content type based on mime-type  transmogrify.webcrawler.cache  Saves content to disk
  • 13.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay transmogrify.htmlcontentextractor  transmogrify.htmlcontentextractor  Provide XPath for title, description, text etc.  transmogrify.htmlcontentextractor.auto  Guesses XPaths from content
  • 14.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay transmogrify.siteanalyser  transmogrify.siteanalyser.relinker  Moves, renames, url tidying  transmogrify.siteanalyser.title  Guess page titles  transmogrify.siteanalyser.defaultpage  Move index pages into folders  transmogrify.siteanalyser.attach  Move attachments closer to pages
  • 15.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay transmogrify.ploneremote  Remoteconstructor  Adds content to plone via xmlrpc  Remoteschemaupdater  Updates content of existing object  Remotenavigationexcluder  Hides content not in orginal sites navigation  Remoteworkflowupdater  Publish content  Remoteredirector  Creates aliases for items that have moved
  • 16.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Other blueprints  transmogrify.pathsorter  Puts folders before content and content in right order  collective.transmogrifier.sections.condition  Useful to drop certain content
  • 17.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Where to get it  http://github.com:djay/funnelweb.git  http://github.com:djay/transmogrify.*  Pypi release TBA
  • 18.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay #TODO • Extract content styles into visual editor
  • 19.
    dylan@pretaweb.comPlone Conf 2010Dylan Jay Thanks • djay@pretaweb.com • IRC: djjay • Twitter: djay75