Open Government Data and MongoDB

1,548
-1

Published on

Given at MongoDC on June 27, 2011.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,548
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Open Government Data and MongoDB

  1. 1. Open GovernmentData & MongoDB Luigi Montanez luigi@sunlightfoundation.com
  2. 2. Question? @LuigiMontanez
  3. 3. Open Data + Open Source = Open Government Question? @LuigiMontanez
  4. 4. MongoDB enables open data Question? @LuigiMontanez
  5. 5. Opening Up Data✴ Gather data from disparate sources ✴ Data dumps (SQL, Fixed-width columns) ✴ Web scraping ✴ Text/PDF parsing✴ Serving RESTful JSON APIs Question? @LuigiMontanez
  6. 6. JSON✴ Tree structure, not tabular✴ Still relational✴ JSON for data, XML for documents✴ Closely resembles native data structures✴ No manual parsing needed Question? @LuigiMontanez
  7. 7. Three Projects✴ Poligraft✴ Real Time Congress API✴ Open State Project Question? @LuigiMontanez
  8. 8. Three Projects✴ Poligraft✴ Real Time Congress API✴ Open State Project Question? @LuigiMontanez
  9. 9. App design drivesschema design Question? @LuigiMontanez
  10. 10. { "title": "President Obamas climate Plan B in hot water -Darren Samuelsohn - POLITICO.com"} Text
  11. 11. { "title": "President Obamas climate Plan B in hot water -Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................",} Text
  12. 12. { "title": "President Obamas climate Plan B in hot water -Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [...] Text}
  13. 13. { "title": "President Obamas climate Plan B in hot water -Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", }, ... ]}
  14. 14. { "title": "President Obamas climate Plan B in hot water -Darren Samuelsohn - POLITICO.com", "slug": "EOsc", "source_url": "http://www.politico.com/news/stories/ 0810/40534.html", "content": ".................", "entities": [ { Text "name": "Barack Obama", "type": "politician", "breakdown": {"indiv": "33", "pac": "67"} "top_industries": ["Lawyers/Lobbyists","Finance/Insurance/ Real Estate","Misc. Business"] }, ... ]}
  15. 15. Natural Schemas Question? @LuigiMontanez
  16. 16. Three Projects✴ Poligraft✴ Real Time Congress API✴ Open State Project Question? @LuigiMontanez
  17. 17. Real-Time Congress API Credit: vgm8383 on Flickr
  18. 18. Android App: “Congress”
  19. 19. Politiwidgets
  20. 20. Requirements✴ Aggregate lots of data Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings✴ Lightweight responses Question? @LuigiMontanez
  21. 21. {legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ...}
  22. 22. // limit selection to a subset of fieldsdb.people.find( { first_name : john }, { last_name : 1, address : 1 } );// use dot-notation to dig into an objectdb.people.find( { state: CA }, { address.zip_code: 1 } );
  23. 23. ?sections=last_name,first_name,state,earmarks {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } }
  24. 24. ?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 } }
  25. 25. Partial responses make payloads smaller Question? @LuigiMontanez
  26. 26. Three Projects✴ Poligraft✴ Real Time Congress API✴ Open State Project Question? @LuigiMontanez
  27. 27. 50 States =50 Formats Question? @LuigiMontanez
  28. 28. Schemalessnessallows for granular control Question? @LuigiMontanez
  29. 29. Custom Fields✴ Traditional RDBMS ✴ Update the schema for new fields, run a migration, feel icky ✴ Create a custom_fields table✴ MongoDB ✴ Just store it Question? @LuigiMontanez
  30. 30. Speaking JSON natively Question? @LuigiMontanez
  31. 31. PythonSource Scraped JSON PostgreSQL Transform
  32. 32. Source Scraped JSON MongoDB
  33. 33. Three Projects✴ Poligraft✴ Real Time Congress API✴ Open State Project Question? @LuigiMontanez
  34. 34. Developer Happiness
  35. 35. Thanks!sunlightlabs.com@LuigiMontanez Question? @LuigiMontanez

×