SlideShare a Scribd company logo
Web Scraping
                               @alvaro_aguirre




Saturday, November 5, 2011
In search of our cosmic origins...




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Data Scraping
                                  vs
                             Web Scraping


Saturday, November 5, 2011
Data Scraping
                    <html>
                             <header></header>
                             <body>
                             .....
                             </body>
                    </html>




Saturday, November 5, 2011
Web Scraping




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Deliverance
                               XDV
                             Diazo



Saturday, November 5, 2011
Diazo




Saturday, November 5, 2011
Saturday, November 5, 2011
<replace css:content=”h1” css:theme=”#main” />




Saturday, November 5, 2011
<drop css:content=”h1” />

                             <drop css:theme=”breadcrumbs” />




Saturday, November 5, 2011
<replace css:theme=”#header” content=”#header-
                       element” if-content=”” />




Saturday, November 5, 2011
<drop css:theme="#info-box" if-path="/news"/>




Saturday, November 5, 2011
<theme/>
                         <notheme/>
                         <replace/>
                         <before/>
                         <after/>
                         <drop/>
                         <strip/>
                         <merge/>
                         <copy/>




Saturday, November 5, 2011
<replace css:theme="#details">
          <dl id="details">
            <xsl:for-each css:select="table#details > tr">
              <dt><xsl:copy-of select="td[1]/text()" /></dt>
              <dd><xsl:copy-of select="td[2]/node()"/></dd>
            </xsl:for-each>
          </dl>
       </replace>/></dt>




      <table id="details">
                                              <dl id="details">
          <tr>
                                                  <dt>One</dt>
               <td>One</td>
                                                  <dd>1</dd>
               <td>1</td>
                                                  <dt>Two</dt>
          </tr>
                                                  <dd>2</dd>
          <tr>
                                              </dl>
               <td>Two</td>
               <td>2</td>
          </tr>
      </table>




Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Saturday, November 5, 2011
Tools



Saturday, November 5, 2011
External Content




Saturday, November 5, 2011
Saturday, November 5, 2011
• development of web & mobile interfaces
                    • legacy apps integrations
                    • prototypes
                    • low coupling


Saturday, November 5, 2011
from diazo.compiler import compile_theme
                             from lxml import etree
                             from diazo.compiler import compile_theme

                             absolute_prefix = "/static"

                             rules = "rules.xml"
                             theme = "theme.html"

                             compiled_theme = compile_theme(rules, theme,
                                                            absolute_prefix=absolute_prefix)

                             transform = etree.XSLT(compiled_theme)
                             content = etree.parse(some_content)
                             transformed = transform(content)

                             output = etree.tostring(transformed)




Saturday, November 5, 2011
github/aaguirre




Saturday, November 5, 2011
diazo.org




Saturday, November 5, 2011
plone.org




Saturday, November 5, 2011
gracias!




Saturday, November 5, 2011

More Related Content

Similar to Web Scraping using Diazo!

What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceSencha
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0James Thomas
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPsmueller_sandsmedia
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1Duda Dornelles
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad partsMikko Ohtamaa
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기형우 안
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”Kejun Zhang
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemLeonard Axelsson
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端lifesinger
 
Parser combinators
Parser combinatorsParser combinators
Parser combinatorslifecoder
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Leonardo Borges
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesSencha
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)Mediacurrent
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyBlazing Cloud
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHPConFoo
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011Mikko Ohtamaa
 

Similar to Web Scraping using Diazo! (20)

Semantic chop
Semantic chopSemantic chop
Semantic chop
 
What's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James PearceWhat's new in HTML5, CSS3 and JavaScript, James Pearce
What's new in HTML5, CSS3 and JavaScript, James Pearce
 
Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0Moving to Dojo 1.7 and the path to 2.0
Moving to Dojo 1.7 and the path to 2.0
 
international PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHPinternational PHP2011_ilia alshanetsky_Hidden Features of PHP
international PHP2011_ilia alshanetsky_Hidden Features of PHP
 
JS Tooling in Rails 3.1
JS Tooling in Rails 3.1JS Tooling in Rails 3.1
JS Tooling in Rails 3.1
 
Javascript - How to avoid the bad parts
Javascript - How to avoid the bad partsJavascript - How to avoid the bad parts
Javascript - How to avoid the bad parts
 
잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기잘 알려지지 않은 Php 코드 활용하기
잘 알려지지 않은 Php 코드 활용하기
 
Sequel @ madrid-rb
Sequel @  madrid-rbSequel @  madrid-rb
Sequel @ madrid-rb
 
永不止步的“重构”
永不止步的“重构”永不止步的“重构”
永不止步的“重构”
 
A Tour Through the Groovy Ecosystem
A Tour Through the Groovy EcosystemA Tour Through the Groovy Ecosystem
A Tour Through the Groovy Ecosystem
 
让开发也懂前端
让开发也懂前端让开发也懂前端
让开发也懂前端
 
Parser combinators
Parser combinatorsParser combinators
Parser combinators
 
Xml
XmlXml
Xml
 
Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011) Clouds against the Floods (RubyConfBR2011)
Clouds against the Floods (RubyConfBR2011)
 
Ext GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced TemplatesExt GWT 3.0 Advanced Templates
Ext GWT 3.0 Advanced Templates
 
HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)HTML5 & CSS3 in Drupal (on the Bayou)
HTML5 & CSS3 in Drupal (on the Bayou)
 
Rails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_manyRails ORM De-mystifying Active Record has_many
Rails ORM De-mystifying Active Record has_many
 
The Solar Framework for PHP
The Solar Framework for PHPThe Solar Framework for PHP
The Solar Framework for PHP
 
The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011The Easy Way - Plone Conference 2011
The Easy Way - Plone Conference 2011
 
Xml presentation
Xml presentationXml presentation
Xml presentation
 

Recently uploaded

IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 

Recently uploaded (20)

IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 

Web Scraping using Diazo!