SlideShare a Scribd company logo
Citizen Activism using Scrubyt and RoR
   Only partially available
    online
     Formatted as web page or
     PDF
   Hard to search
   Can’t subscribe
   Can’t visualize
   Can’t re-use
Publishing Structured            Data Visualization
Feeds                            • Makes it easy to find new
• Ability to subscribe to          patterns.
  interesting data
• Data streams can be ‘mashed’
  in new ways.


Collaborative                    Crowdsourcing
Organization                     • Combines skills and input of
• Tagging, Voting, Sharing         large numbers of people
•   Governments publish
             Governments                       data streams
             publish data
               streams
                                           •   3rd parties create tools for
                                               analysis and oversight
                3rd
Issues are
               Party
                             Citizens
                            monitor data
                                           •   Citizens collaboratively
 resolved
               Tools         streams           monitor their
                                               government
                                           •   Citizens detect issues,
              Issues are
               detected
                                               give feedback
                                           •   Issues are resolved
 Government has little
                  incentive
                  ▪ Usually has disincentive

Why can’t the
                 Don’t want a single
government do     monolithic solution
everything?       ▪ Want to allow evolution of best-
                    of-breed tools


                 Tools created by citizens, for
                  citizens
   Focus:
     US Congress
     California
     Legislature

   Gives grants to
    online
    transparency
    tools
   $3.5 M Seed
A recent US
             Congress bill




Groups for     Groups
   bill       against bill
Votes


Donations
Publishing Structured             Data Visualization
Feeds                             • MAPLight makes relationship
• MAPLight is a mashup of           between money and votes
  data streams from different       visible.
  sources.



Collaborative                     Crowdsourcing
Organization                      • Thousands of journalists,
• Advocacy group tags               advocates, and citizens can
  donating companies as             browse data and flag issues.
  belonging to interest groups.
   Accelerate online transparency
Ideas       Raise Awareness
              With public
              With government
Skills      Raise Money

            Fund External Development:
Funds         Grants
              Contests
Prove
Concept


 Get Publicity     Direct Attention and
                   Money and to Online
                  Tools For Transparency
   Raise
   Awareness

    Show What’s
    Possible
   2003 Directive: Must
    publish travel and
    hospitality expenses
    on the web

   No standards for
    presentation defined
124 Departments
  - All different
Standardize          Stream          Visualize
• Scrape data into   • Publish RSS   • Provide basic
  standard format      feeds           visualization app
                                     • Run contest
1. LEARNING TEMPLATE     2. PRODUCTION SCRAPER

  Input                     Input
  • Example Page            • Any Page with
  • Example Text              Same Format



  Output:                   Output:
  • XML
  • Production Scraper
                            • XML
   Create a system
    where non-coders
    can train a scraper.
PRO                                          CON

   Ability to use ‘learning’                    Learning mode fails hard
    example (sometimes)
                                                 Doesn’t always learn
   Syntax integrates XML
    builder

   Supports all hpricot Xpath
    operations


    Note: For compatibility reasons, this project uses an older version of scrubyt.
                      Issues may be fixed in newer version.
   Create a system
    where non-coders
    can train a scraper.


.... Didn’t work.
Still need coders w/ the following expertise:

 1. XPath XML resolution


 2. Regular Expressions


 3. Firebug
1. Open This Link



2. Paste This Text
...created in the
   background
Go To Next Level
Split Level: Two Types of Links

                    Open This Link
Select Element




Get the XPath
Split Level: Two Types of Links
...created in the
   background
Test Random Reports



    Send Home
   Goal: Finish scraping in one day
       12/124 Completed: 112 to go
       5-20 Volunteers
       5-20 min. per department
       Downloadable app w/ setup instructions
       Integrated examples


   Benefits:
     Excuse to use scrubyt, firebug
     On-site tutorial + guidance
     Easy intro to a Rails App
Jennifer Bell
visiblegovernment.ca

More Related Content

Similar to VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

Benefits of Open Government Data
Benefits of Open Government DataBenefits of Open Government Data
Benefits of Open Government Data
Jennifer Bell
 
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
Scott Abel
 
Analyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation LibraryAnalyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation Library
Scott Abel
 
Contemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation ViewContemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation View
Dena Gray-Fisher
 
Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)
Jennifer Bell
 
Mac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative ProductionMac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative Production
Rob Jewitt
 
Government Next: NIC Presentation
Government Next: NIC PresentationGovernment Next: NIC Presentation
Government Next: NIC Presentation
Tara Hunt
 
Teaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital AgeTeaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital Age
Matthew Hayden
 
Technology Trends And Print Service Providers
Technology Trends And  Print Service ProvidersTechnology Trends And  Print Service Providers
Technology Trends And Print Service Providers
Jeffrey Stewart
 
Social Media Training Workshop for Small Business
Social Media Training Workshop for Small BusinessSocial Media Training Workshop for Small Business
Social Media Training Workshop for Small Business
Web.com
 
Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008
Tim O'Reilly
 
How To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking ApplicationHow To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking Application
MobileMonday Tel-Aviv
 
Mega Collaboration Interface
Mega Collaboration InterfaceMega Collaboration Interface
Mega Collaboration Interface
guest8c177f
 
Web 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examplesWeb 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examples
R. Sosa
 
Tf gsds
Tf gsdsTf gsds
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Frank van Harmelen
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage Uganda
Leila Janah
 
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
webcontent2007
 
Usnorthcom Internet Based Collaboration
Usnorthcom Internet Based CollaborationUsnorthcom Internet Based Collaboration
Usnorthcom Internet Based Collaboration
Dave "Mac" McKinley
 
Gov + Citi-Experts
Gov + Citi-ExpertsGov + Citi-Experts
Gov + Citi-Experts
CarlosPC_Mx
 

Similar to VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails (20)

Benefits of Open Government Data
Benefits of Open Government DataBenefits of Open Government Data
Benefits of Open Government Data
 
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
[Workshop] Analyzing Your Deliverables: Developing the Optimal Documentation ...
 
Analyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation LibraryAnalyzing Your Deliverables: Developing the Optimal Documentation Library
Analyzing Your Deliverables: Developing the Optimal Documentation Library
 
Contemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation ViewContemporary Communication Technologies Presentation View
Contemporary Communication Technologies Presentation View
 
Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)Benefits of Open Government Data (Expanded)
Benefits of Open Government Data (Expanded)
 
Mac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative ProductionMac281 Wikinomics And Colloborative Production
Mac281 Wikinomics And Colloborative Production
 
Government Next: NIC Presentation
Government Next: NIC PresentationGovernment Next: NIC Presentation
Government Next: NIC Presentation
 
Teaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital AgeTeaching 2.0 Learning & Leading in the Digital Age
Teaching 2.0 Learning & Leading in the Digital Age
 
Technology Trends And Print Service Providers
Technology Trends And  Print Service ProvidersTechnology Trends And  Print Service Providers
Technology Trends And Print Service Providers
 
Social Media Training Workshop for Small Business
Social Media Training Workshop for Small BusinessSocial Media Training Workshop for Small Business
Social Media Training Workshop for Small Business
 
Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008Tim O'Reilly Mashup Camp 2008
Tim O'Reilly Mashup Camp 2008
 
How To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking ApplicationHow To Create The Killer Location Aware Social Networking Application
How To Create The Killer Location Aware Social Networking Application
 
Mega Collaboration Interface
Mega Collaboration InterfaceMega Collaboration Interface
Mega Collaboration Interface
 
Web 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examplesWeb 2.0 Online Collaboration examples
Web 2.0 Online Collaboration examples
 
Tf gsds
Tf gsdsTf gsds
Tf gsds
 
Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...Semantic Web research anno 2006:main streams, popular falacies, current statu...
Semantic Web research anno 2006:main streams, popular falacies, current statu...
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage Uganda
 
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
David Esrati, The Blogzilla Report- Fact, Fiction Fear: The Monster of the In...
 
Usnorthcom Internet Based Collaboration
Usnorthcom Internet Based CollaborationUsnorthcom Internet Based Collaboration
Usnorthcom Internet Based Collaboration
 
Gov + Citi-Experts
Gov + Citi-ExpertsGov + Citi-Experts
Gov + Citi-Experts
 

Recently uploaded

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
Sease
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
zjhamm304
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 

Recently uploaded (20)

Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
From Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMsFrom Natural Language to Structured Solr Queries using LLMs
From Natural Language to Structured Solr Queries using LLMs
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...QA or the Highway - Component Testing: Bridging the gap between frontend appl...
QA or the Highway - Component Testing: Bridging the gap between frontend appl...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 

VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

  • 1. Citizen Activism using Scrubyt and RoR
  • 2.
  • 3. Only partially available online  Formatted as web page or PDF  Hard to search  Can’t subscribe  Can’t visualize  Can’t re-use
  • 4. Publishing Structured Data Visualization Feeds • Makes it easy to find new • Ability to subscribe to patterns. interesting data • Data streams can be ‘mashed’ in new ways. Collaborative Crowdsourcing Organization • Combines skills and input of • Tagging, Voting, Sharing large numbers of people
  • 5. Governments publish Governments data streams publish data streams • 3rd parties create tools for analysis and oversight 3rd Issues are Party Citizens monitor data • Citizens collaboratively resolved Tools streams monitor their government • Citizens detect issues, Issues are detected give feedback • Issues are resolved
  • 6.
  • 7.  Government has little incentive ▪ Usually has disincentive Why can’t the  Don’t want a single government do monolithic solution everything? ▪ Want to allow evolution of best- of-breed tools  Tools created by citizens, for citizens
  • 8. Focus:  US Congress  California Legislature  Gives grants to online transparency tools  $3.5 M Seed
  • 9. A recent US Congress bill Groups for Groups bill against bill
  • 11. Publishing Structured Data Visualization Feeds • MAPLight makes relationship • MAPLight is a mashup of between money and votes data streams from different visible. sources. Collaborative Crowdsourcing Organization • Thousands of journalists, • Advocacy group tags advocates, and citizens can donating companies as browse data and flag issues. belonging to interest groups.
  • 12.
  • 13.
  • 14. Accelerate online transparency Ideas  Raise Awareness  With public  With government Skills  Raise Money  Fund External Development: Funds  Grants  Contests
  • 15. Prove Concept Get Publicity Direct Attention and Money and to Online Tools For Transparency Raise Awareness Show What’s Possible
  • 16.
  • 17. 2003 Directive: Must publish travel and hospitality expenses on the web  No standards for presentation defined
  • 18. 124 Departments - All different
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Standardize Stream Visualize • Scrape data into • Publish RSS • Provide basic standard format feeds visualization app • Run contest
  • 24.
  • 25. 1. LEARNING TEMPLATE 2. PRODUCTION SCRAPER Input Input • Example Page • Any Page with • Example Text Same Format Output: Output: • XML • Production Scraper • XML
  • 26.
  • 27.
  • 28. Create a system where non-coders can train a scraper.
  • 29. PRO CON  Ability to use ‘learning’  Learning mode fails hard example (sometimes)  Doesn’t always learn  Syntax integrates XML builder  Supports all hpricot Xpath operations Note: For compatibility reasons, this project uses an older version of scrubyt. Issues may be fixed in newer version.
  • 30. Create a system where non-coders can train a scraper. .... Didn’t work.
  • 31. Still need coders w/ the following expertise: 1. XPath XML resolution 2. Regular Expressions 3. Firebug
  • 32.
  • 33. 1. Open This Link 2. Paste This Text
  • 34.
  • 35. ...created in the background
  • 36. Go To Next Level
  • 37. Split Level: Two Types of Links Open This Link
  • 39. Split Level: Two Types of Links
  • 40.
  • 41. ...created in the background
  • 42. Test Random Reports Send Home
  • 43. Goal: Finish scraping in one day  12/124 Completed: 112 to go  5-20 Volunteers  5-20 min. per department  Downloadable app w/ setup instructions  Integrated examples  Benefits:  Excuse to use scrubyt, firebug  On-site tutorial + guidance  Easy intro to a Rails App