Sunlight Labs & MongoDB @ MongoDC

MongoDB
@ Sunlight
Luigi Montanez
luigi@sunlightfoundation.com

Question? @LuigiMontanez
Open Source + Open Data
=
Open Government

High Quality Raw Data
✴ First: Raw data in JSON, XML, or CSV
✴ Second: RESTful APIs in JSON or XML
✴ Third: Nothing else...

MongoDB enables
open data

JSON has won
(among developers)

Opening Up Data
✴ Storing data from disparate sources
✴ Data dumps
✴ Web scraping
✴ Text/PDF parsing
✴ Serving RESTful JSON APIs

Three Projects
✴ National Data Catalog
✴ Real-Time Congress API
✴ Open State Project

App design
drives
schema design

Text
{
"title": "Worldwide M1+ Earthquakes, Past Hour"
}

Text
{
"title": "Worldwide M1+ Earthquakes, Past Hour",
"description": "Real-time, worldwide earthquake list for the past h
"homepage": "http://data.gov/raw/32",
"official_docs": "http://earthquake.usgs.gov/eqcenter/catalogs/",
"organization": "Department of the Interior",
"original_catalog": "data.gov",
}

Text
{
"description": "Real-time, worldwide earthquake list for the past
"organization_id": "4cbcc0ff2c34576ba4000001",
"catalog_id": "4cbcc0ab2d34d76b97020433",
}

{
"organization": { "name": "Department of the Interior",
"id": "4cbcc0ff2c34576ba4000001",
"slug": "us-dept-of-interior"
},
"original_catalog": { "name": "data.gov",
"id": "4cbcc0ab2d34d76b97020433",
"slug": "datagov"
}
}

{
"organization": {
"name": "Department of the Interior",
"id": "4cbcc0ff2c34576ba4000001",
"slug": "us-dept-of-interior"
},
"original_catalog": {
"name": "data.gov",
"id": "4cbcc0ab2d34d76b97020433",
"slug": "datagov"
},
"downloads": [ { "type": "csv", "url": "http://data.gov/download/32
"ratings" : {
"average_rating": 3.5,
"rating_count": 23
},
"comments": []
}

User-centric data?
✴ Source document: contains collection of
user data
✴ User document: contains collection of
source data
✴ UserSource document
✴ Rating, Favorite, Note docs

Freedom of choice

Real-Time Congress API
(Drumbone)
Credit: vgm8383 on Flickr

Requirements
✴ Aggregate lots of data
Biographical, Bills, Votes, Earmarks,
Video Clips, Floor Updates, Legislative
Documents, Committee Schedules,
Contributions, Interest Group Ratings
✴ Lightweight responses

{legislator: {
in_office: true,
title: "Rep",
nickname: "",
district: "9",
bioguide_id: "L000551",
govtrack_id: "400237",
phone: "202-225-2661",
website: "http://lee.house.gov/index.html",
twitter_id: "",
last_name: "Lee",
name_suffix: "",
last_updated: "2010/04/13 00:00:14 +0000",
party: "D",
chamber: "house",
state: "CA",
youtube_url: "http://www.youtube.com/RepLee",
first_name: "Barbara",
gender: "F",
congress_office: "2444 Rayburn House Office Building",
earmarks: {
average_number: 20,
total_amount: 10000000,
average_amount: 22994535,
total_number: 28,
last_updated: "2010-03-18",
fiscal_year: 2010,
}
...
}

// limit selection to a subset of fields
db.people.find( { 'first_name' : 'john' },
{ 'last_name' : 1,
'address' : 1 } );
// use dot-notation to dig into an object
db.people.find( { 'state': 'CA' },
{ 'address.zip_code': 1 } );

{legislator: {
last_name: "Lee",
state: "CA",
earmarks: {
average_number: 20,
average_amount: 22994535,
total_number: 28,
last_updated: "2010-03-18",
fiscal_year: 2010,
}
}
?sections=last_name,first_name,state,earmarks

{legislator: {
last_name: "Lee",
state: "CA",
earmarks: {
total_number: 28
}
}
?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number

Partial responses
make payloads
smaller

50 States =
50 Formats

Schemalessness
allows for
losslessness

Source Scraped JSON
Python
Transform
PostgreSQL

Thanks!
sunlightlabs.com
@LuigiMontanez

Sunlight Labs & MongoDB @ MongoDC

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Sunlight Labs & MongoDB @ MongoDC

Similar to Sunlight Labs & MongoDB @ MongoDC (20)

More from Luigi Montanez

More from Luigi Montanez (9)

Sunlight Labs & MongoDB @ MongoDC