Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loose-Schema Databases and Heterogenous Data <ul><li>Doug Reeder
Hominid Software (µ-ISV)
Task-list management for HP webOS </li></ul>
Heterogenous Data <ul><li>Different communities conceptualize similar things differently </li><ul><li>Music recording: art...
Users expect computers to adapt to them </li></ul><li>Single-source data can be heterogenous </li><ul><li>Medical tests: a...
Heterogenous Data, Homogenous Schema <ul><li>converting to homogenous format -> loss or alteration of data.  </li><ul><li>...
Acceptable for 1-way transfer </li></ul><li>round-trip data to external sources </li><ul><li>alteration generates a spurio...
losing data is unacceptable </li></ul></ul>
Heterogenous Data, Manual Union Schema <ul><li>Union of all fields in any source schema
schema must be updated before data with new source schema can be stored
Upcoming SlideShare
Loading in …5
×

Loose-Schema Databases and Heterogenous Data

480 views

Published on

Traditional databases are designed around homogenous data. Heterogenous data can be shoehorned into a homogenous format, but constraints are hard to enforce. A new generation of databases is emerging, which embraces data in varying formats from varying sources. Access may be with familiar relational operations, or radically different operations such as Map-Reduce. In this talk Doug will introduce us on what is happing in post-sql database space.

BIO: Doug Reeder has taught robotics and programmed for researchers at OSU. Currently he writes task-list management software in JavaScript for HP webOS smartphones and tablets.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Loose-Schema Databases and Heterogenous Data

  1. 1. Loose-Schema Databases and Heterogenous Data <ul><li>Doug Reeder
  2. 2. Hominid Software (µ-ISV)
  3. 3. Task-list management for HP webOS </li></ul>
  4. 4. Heterogenous Data <ul><li>Different communities conceptualize similar things differently </li><ul><li>Music recording: artist vs. composer+performer+conductor
  5. 5. Users expect computers to adapt to them </li></ul><li>Single-source data can be heterogenous </li><ul><li>Medical tests: any given patient has only had a few of many tests </li></ul></ul>
  6. 6. Heterogenous Data, Homogenous Schema <ul><li>converting to homogenous format -> loss or alteration of data. </li><ul><li>Contact: title+given+middle+family+suffix -> given+family
  7. 7. Acceptable for 1-way transfer </li></ul><li>round-trip data to external sources </li><ul><li>alteration generates a spurious change notification
  8. 8. losing data is unacceptable </li></ul></ul>
  9. 9. Heterogenous Data, Manual Union Schema <ul><li>Union of all fields in any source schema
  10. 10. schema must be updated before data with new source schema can be stored
  11. 11. storage may be inefficient
  12. 12. works fine for music meta-data, works poorly for medical tests </li></ul>
  13. 13. Loose-Schema DB <ul><li>Need not specify fields & types in advance
  14. 14. Array & object fields allowed </li><ul><li>If value is atomic for the purposes of the database, it doesn't violate 1NF </li></ul><li>Still normalize and selectively denormalize
  15. 15. May still enforce field constraints
  16. 16. Can't enforce foreign-key relations (no JOINs) </li></ul>
  17. 17. CouchDB <ul><li>stores JSON &quot;documents&quot;
  18. 18. RESTful HTTP API + client libs
  19. 19. crash-only file is always consistent
  20. 20. lockless & optimistic </li><ul><li>if revision number doesn't match, update fails.
  21. 21. Resolution similar to Subversion: client merge, re-submit </li></ul></ul>
  22. 22. CouchDB Queries <ul><li>only Map-Reduce on pre-defined views
  23. 23. 1st query against a view is slow; later only require incremental updating
  24. 24. map func (JS or Erlang) maps document to N-dimensional array of buckets
  25. 25. reduce func (optional) computes sums, averages, unions, etc over one or more dimensions </li></ul>
  26. 26. Data From Two Task Systems
  27. 27. Map-Reduce Completed Tasks Map: for each ancestor path, write [# completed, 1] Reduce: sum 1 st and 2 nd member of pair
  28. 28. Conclusion <ul><li>Heterogenous data may be more naturally processed using a non-relational model
  29. 29. couchdb.apache.org
  30. 30. Questions? </li></ul>

×