Your SlideShare is downloading. ×
Sunlight Labs & MongoDB @ MongoDC
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Sunlight Labs & MongoDB @ MongoDC

1,170
views

Published on

A look at how Sunlight Labs uses MongoDB. Presented at MongoDC on November 18, 2010.

A look at how Sunlight Labs uses MongoDB. Presented at MongoDC on November 18, 2010.


0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,170
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MongoDB @ Sunlight Luigi Montanez luigi@sunlightfoundation.com
  • 2. Question? @LuigiMontanez
  • 3. Question? @LuigiMontanez Open Source + Open Data = Open Government
  • 4. Question? @LuigiMontanez High Quality Raw Data ✴ First: Raw data in JSON, XML, or CSV ✴ Second: RESTful APIs in JSON or XML ✴ Third: Nothing else...
  • 5. Question? @LuigiMontanez MongoDB enables open data
  • 6. Question? @LuigiMontanez JSON has won (among developers)
  • 7. Question? @LuigiMontanez Opening Up Data ✴ Storing data from disparate sources ✴ Data dumps ✴ Web scraping ✴ Text/PDF parsing ✴ Serving RESTful JSON APIs
  • 8. Question? @LuigiMontanez Three Projects ✴ National Data Catalog ✴ Real-Time Congress API ✴ Open State Project
  • 9. Question? @LuigiMontanez Three Projects ✴ National Data Catalog ✴ Real-Time Congress API ✴ Open State Project
  • 10. Question? @LuigiMontanez App design drives schema design
  • 11. Text { "title": "Worldwide M1+ Earthquakes, Past Hour" }
  • 12. Text { "title": "Worldwide M1+ Earthquakes, Past Hour", "description": "Real-time, worldwide earthquake list for the past h "homepage": "http://data.gov/raw/32", "official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/", "organization": "Department of the Interior", "original_catalog": "data.gov", }
  • 13. Text { "title": "Worldwide M1+ Earthquakes, Past Hour", "description": "Real-time, worldwide earthquake list for the past "homepage": "http://data.gov/raw/32", "official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/", "organization_id": "4cbcc0ff2c34576ba4000001", "catalog_id": "4cbcc0ab2d34d76b97020433", }
  • 14. { "title": "Worldwide M1+ Earthquakes, Past Hour", "description": "Real-time, worldwide earthquake list for the past h "homepage": "http://data.gov/raw/32", "official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/", "organization": { "name": "Department of the Interior", "id": "4cbcc0ff2c34576ba4000001", "slug": "us-dept-of-interior" }, "original_catalog": { "name": "data.gov", "id": "4cbcc0ab2d34d76b97020433", "slug": "datagov" } }
  • 15. { "title": "Worldwide M1+ Earthquakes, Past Hour", "description": "Real-time, worldwide earthquake list for the past h "homepage": "http://data.gov/raw/32", "official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/", "organization": { "name": "Department of the Interior", "id": "4cbcc0ff2c34576ba4000001", "slug": "us-dept-of-interior" }, "original_catalog": { "name": "data.gov", "id": "4cbcc0ab2d34d76b97020433", "slug": "datagov" }, "downloads": [ { "type": "csv", "url": "http://data.gov/download/32 "ratings" : { "average_rating": 3.5, "rating_count": 23 }, "comments": [] }
  • 16. Question? @LuigiMontanez User-centric data? ✴ Source document: contains collection of user data ✴ User document: contains collection of source data ✴ UserSource document ✴ Rating, Favorite, Note docs
  • 17. Question? @LuigiMontanez Freedom of choice
  • 18. Question? @LuigiMontanez Three Projects ✴ National Data Catalog ✴ Real-Time Congress API ✴ Open State Project
  • 19. Real-Time Congress API (Drumbone) Credit: vgm8383 on Flickr
  • 20. Android App: “Congress”
  • 21. Politiwidgets
  • 22. Question? @LuigiMontanez Requirements ✴ Aggregate lots of data Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings ✴ Lightweight responses
  • 23. {legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ... }
  • 24. // limit selection to a subset of fields db.people.find( { 'first_name' : 'john' }, { 'last_name' : 1, 'address' : 1 } ); // use dot-notation to dig into an object db.people.find( { 'state': 'CA' }, { 'address.zip_code': 1 } );
  • 25. {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } } ?sections=last_name,first_name,state,earmarks
  • 26. {legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 } } ?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number
  • 27. Question? @LuigiMontanez Partial responses make payloads smaller
  • 28. Question? @LuigiMontanez Three Projects ✴ National Data Catalog ✴ Real-Time Congress API ✴ Open State Project
  • 29. Question? @LuigiMontanez 50 States = 50 Formats
  • 30. Question? @LuigiMontanez Schemalessness allows for losslessness
  • 31. Source Scraped JSON Python Transform PostgreSQL
  • 32. Source Scraped JSON MongoDB
  • 33. Question? @LuigiMontanez Three Projects ✴ National Data Catalog ✴ Real-Time Congress API ✴ Open State Project
  • 34. Question? @LuigiMontanez Thanks! sunlightlabs.com @LuigiMontanez