1 Year Re-cap● Open Sourced It (for real)● Improved the API (xml/json)● Decreased Load Times● Restructured the Back-end● Basic Documentation● Wrapped into a build system
The next year● In general.. ● Data Quality and Documentation ● Usage Tracking and Statistics ● User Interface Improvements ● Further separation of the Platform and Service● Right now ● Data Quality, Data Quality, Data Quality ● And a little bit of documentation
The Senate has Legislative Data Quality issues?
Well, not exactly● Legislative Research Service has the data ● Big, ancient mainframe to boot● They FTP us updates every 5 minutes ● In SOBI formats (what?) ● With some XML mixed in● We parse it back into XML/JSON/SQL structure
Reasons for Difficulty● Poorly Documented SOBI behavior● Formatted as a change log (sometimes) ● Finding sources of error can be hard● LRS is not co-operative
Solutions● Version Control ● Write objects to JSON/XML files ● With Git, commit each new version – Commit message points to the source SOBI ● Use git to trace data errors back to SOBI files● Unit Test known corner cases● Periodically do a scrape check?
Progress✔ Parsing has been overhauled✔ Objects are written to file✔ Bugs have been found and fixed✔ Periodic Scrapes are approved
A short task list✗ Integrate git into the parsing system.✗ Document expected behavoir✗ Write a small test suite✗ Try to avoid having to scrape.
HFOSS Symposium 2011● Bryan Sivak – Civic Commons● Mark Prutalis – Sahana Foundation● Many universities, Mozilla, Google● David, Moorthy, Brian, and Myself! ● 1 Hour and a few 3 x 4 posters.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.