Citizen Activism using Scrubyt and RoR
   Only partially available
    online
     Formatted as web page or
     PDF
   Hard to search
   Can’t subscribe
  ...
Publishing Structured            Data Visualization
Feeds                            • Makes it easy to find new
• Ability...
•   Governments publish
             Governments                       data streams
             publish data
            ...
 Government has little
                  incentive
                  ▪ Usually has disincentive

Why can’t the
          ...
   Focus:
     US Congress
     California
     Legislature

   Gives grants to
    online
    transparency
    tools
...
A recent US
             Congress bill




Groups for     Groups
   bill       against bill
Votes


Donations
Publishing Structured             Data Visualization
Feeds                             • MAPLight makes relationship
• MAP...
   Accelerate online transparency
Ideas       Raise Awareness
              With public
              With government
...
Prove
Concept


 Get Publicity     Direct Attention and
                   Money and to Online
                  Tools For...
   2003 Directive: Must
    publish travel and
    hospitality expenses
    on the web

   No standards for
    presenta...
124 Departments
  - All different
Standardize          Stream          Visualize
• Scrape data into   • Publish RSS   • Provide basic
  standard format     ...
1. LEARNING TEMPLATE     2. PRODUCTION SCRAPER

  Input                     Input
  • Example Page            • Any Page w...
   Create a system
    where non-coders
    can train a scraper.
PRO                                          CON

   Ability to use ‘learning’                    Learning mode fails ha...
   Create a system
    where non-coders
    can train a scraper.


.... Didn’t work.
Still need coders w/ the following expertise:

 1. XPath XML resolution


 2. Regular Expressions


 3. Firebug
1. Open This Link



2. Paste This Text
...created in the
   background
Go To Next Level
Split Level: Two Types of Links

                    Open This Link
Select Element




Get the XPath
Split Level: Two Types of Links
...created in the
   background
Test Random Reports



    Send Home
   Goal: Finish scraping in one day
       12/124 Completed: 112 to go
       5-20 Volunteers
       5-20 min. per dep...
Jennifer Bell
visiblegovernment.ca
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails
Upcoming SlideShare
Loading in …5
×

VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

1,658 views

Published on

VisibleGovernment.ca Expense Visualizer Pilot -- as presented at Montreal Ruby on Rails

Published in: Technology, Education
  • Be the first to comment

VisibleGovernment.ca Expense Visualizer Pilot - Montreal on Rails

  1. 1. Citizen Activism using Scrubyt and RoR
  2. 2.  Only partially available online  Formatted as web page or PDF  Hard to search  Can’t subscribe  Can’t visualize  Can’t re-use
  3. 3. Publishing Structured Data Visualization Feeds • Makes it easy to find new • Ability to subscribe to patterns. interesting data • Data streams can be ‘mashed’ in new ways. Collaborative Crowdsourcing Organization • Combines skills and input of • Tagging, Voting, Sharing large numbers of people
  4. 4. • Governments publish Governments data streams publish data streams • 3rd parties create tools for analysis and oversight 3rd Issues are Party Citizens monitor data • Citizens collaboratively resolved Tools streams monitor their government • Citizens detect issues, Issues are detected give feedback • Issues are resolved
  5. 5.  Government has little incentive ▪ Usually has disincentive Why can’t the  Don’t want a single government do monolithic solution everything? ▪ Want to allow evolution of best- of-breed tools  Tools created by citizens, for citizens
  6. 6.  Focus:  US Congress  California Legislature  Gives grants to online transparency tools  $3.5 M Seed
  7. 7. A recent US Congress bill Groups for Groups bill against bill
  8. 8. Votes Donations
  9. 9. Publishing Structured Data Visualization Feeds • MAPLight makes relationship • MAPLight is a mashup of between money and votes data streams from different visible. sources. Collaborative Crowdsourcing Organization • Thousands of journalists, • Advocacy group tags advocates, and citizens can donating companies as browse data and flag issues. belonging to interest groups.
  10. 10.  Accelerate online transparency Ideas  Raise Awareness  With public  With government Skills  Raise Money  Fund External Development: Funds  Grants  Contests
  11. 11. Prove Concept Get Publicity Direct Attention and Money and to Online Tools For Transparency Raise Awareness Show What’s Possible
  12. 12.  2003 Directive: Must publish travel and hospitality expenses on the web  No standards for presentation defined
  13. 13. 124 Departments - All different
  14. 14. Standardize Stream Visualize • Scrape data into • Publish RSS • Provide basic standard format feeds visualization app • Run contest
  15. 15. 1. LEARNING TEMPLATE 2. PRODUCTION SCRAPER Input Input • Example Page • Any Page with • Example Text Same Format Output: Output: • XML • Production Scraper • XML
  16. 16.  Create a system where non-coders can train a scraper.
  17. 17. PRO CON  Ability to use ‘learning’  Learning mode fails hard example (sometimes)  Doesn’t always learn  Syntax integrates XML builder  Supports all hpricot Xpath operations Note: For compatibility reasons, this project uses an older version of scrubyt. Issues may be fixed in newer version.
  18. 18.  Create a system where non-coders can train a scraper. .... Didn’t work.
  19. 19. Still need coders w/ the following expertise: 1. XPath XML resolution 2. Regular Expressions 3. Firebug
  20. 20. 1. Open This Link 2. Paste This Text
  21. 21. ...created in the background
  22. 22. Go To Next Level
  23. 23. Split Level: Two Types of Links Open This Link
  24. 24. Select Element Get the XPath
  25. 25. Split Level: Two Types of Links
  26. 26. ...created in the background
  27. 27. Test Random Reports Send Home
  28. 28.  Goal: Finish scraping in one day  12/124 Completed: 112 to go  5-20 Volunteers  5-20 min. per department  Downloadable app w/ setup instructions  Integrated examples  Benefits:  Excuse to use scrubyt, firebug  On-site tutorial + guidance  Easy intro to a Rails App
  29. 29. Jennifer Bell visiblegovernment.ca

×