Non-Relational Databases: This hurts. I like it.

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Non-Relational Databases: This hurts. I like it. - Presentation Transcript

    1. Non-Relational Databases: This hurts. I like it. Christopher Groskopf / bouvard / @onyxfish
    2. Outline
      • First!
        • A Hypothetical
      • Second!
        • Platforms
      • Third!
        • Voter's Daily and CouchDB
    3. First! A Hypothetical
    4. I want to query space.
    5. The Kepler Mission
      • NASA's search for extra-solar planets
      • 100,000 stars
      • 3.5 years of constant observation
      • Sensitive measurements
      • How would you store this data so that your researchers can analyze it effectively?
      • (Hint: It is probably not sqlite on a thumb drive.)
    6. The Relational Model
    7. Pros and Cons
      • SQL lets you query all the data at once
      • Enforces data integrity
      • Minimizes repetition
      • Proven
      • Familiar
        • To your DBA
        • To your users
      • Rigidly schematic
      • Joins rapidly become a bottleneck
      • Difficult to scale up
      • Gets in the way of parallelization
      • Optimization may mitigate the benefits of normalization
    8. The Non-Relational Model
    9. Pros and Cons
      • Schema-less
      • Master ↔ Master replication
      • Scales well
      • Map/Reduce means everything runs in parallel
      • Built for the web
      • No SQL
      • Integrity-enforcement migrates to code
      • Limited ORM tooling
      • Significant learning curve
      • Proven only in a subset of cases
    10. Second! Platforms
    11. Traits of NRDBs
      • Usually they are a key/value datastore
      • Often, they offer Master ↔ Master replication
      • In most cases they store schema-less data
      • Typically they scale by “automatic” sharding
      • Sometimes they offer “eventual consistency”
      • For the most part they are fast
      • Generally they are targeted at web applications
      • Frequently we can't define what they are
    12. Used Memcache?
      • Memcache is a high-availability key/value store
      • Imagine if Memcache was your database
      • That is more or less what an NRDB is
      • Except that everything is permanently “cached” to disk
      • And only the most common result sets are in held in RAM (it could be all of them)
      • In most cases this is faster than computing fresh results based on indices (that is, SQL)
    13. Top 10 NRDBs...
      • Azure Table Storage
      • Berkeley DB
      • BigTable
      • Cassandra
      • CouchDB
      • HyperTable
      • MongoDB
      • Project Voldemort
      • SimpleDB
      • Tokyo Cabinet
    14. ...and their backers
      • Azure Table Storage ->
      • Berkeley DB ->
      • BigTable ->
      • Cassandra ->
      • CouchDB ->
      • HyperTable ->
      • MongoDB ->
      • Project Voldemort ->
      • SimpleDB ->
      • Tokyo Cabinet ->
        Microsoft ( 2, 6 ) Oracle ( 3 ) Google ( 4 , 1 ) Facebook ( 2 ) IBM ( 1 ) Baidu ( 9 ) SourceForge ( 182 ) LinkedIn ( 65 ) Amazon ( 29 ) Mixi ( 88 )
      Blue: Largest software companies according to Forbes (2009) Red: Highest traffic websites according to Alexa (as of 9/17)
    15. This is not a fad.
    16. Primary Use-cases
      • Ridiculous scale
      • Unstructured data
      • Massive datasets (broad > deep)
      • Fuzzy and/or fault tolerant data
      • Versioned data
      • Logging
      • When eventual consistency is good enough
    17. If you are storing a JSON or XML string in your SQL database: I Have Your Medicine
    18. Mis-use Cases
      • SQL is a prerequisite
      • Deeply hierarchical datasets
      • Data integrity that must be enforced by a DBA
      • High security applications where the database must enforce that security (LAN/WAN facing)
      • Transactional data (banking, analytics, etc.)
      • Usage is highly unpredictable, combinatorial, or likely to change suddenly
    19. Third! Voter's Daily and CouchDB
    20. My Requirements
      • Loss-less data structures (non-uniform data)
      • Loosely coupled dependency
        • Portable
        • RESTful
      • Scalable without refactoring
      • Understood by the Gov2.0 community
      • Reusable / Educational / Transparent
    21. CouchDB
      • Schema-less
      • “Speaks” JSON
      • “Thinks” Javascript (optionally, Python)
      • RESTful API
      • Pre-collates Views (on insert) for fast reads
      • Supports Master ↔ Master replication
      • “Futon” management interface
      • Written in Erlang
    22. An Example JSON Document { &quot; _id &quot;: &quot;2006-12-06T00:00:00Z - C-SPAN House Ways and Means Committee Schedule Scraper&quot; , &quot; _rev &quot;: &quot;1-2ca577e0a4a25ad2704fdf5a20161f9f&quot; , &quot; datetime &quot;: &quot;2006-12-06T00:00:00Z&quot; , &quot; end_datetime &quot;: null , &quot; title &quot;: &quot;Hearing on Patient Safety and Quality Issues in End Stage Renal Disease Treatment&quot; , &quot; description &quot;: null , &quot; branch &quot;: &quot;Legislative&quot; , &quot; entity &quot;: &quot;House of Representatives&quot; , &quot; source_url &quot;: &quot;http://www3.capwiz.com/c-span/dbq/officials/schedule.dbq?committee= hways&command=committee_schedules&chambername=House&chamber=H& period=&quot; , &quot; source_text &quot;: &quot;<span class=&quot;cwnormalbold&quot;>DECEMBER 06, 2006<br /></span> u000au0009<span class=&quot;cwnormal&quot;>Hearing on Patient Safety and Quality Issues in End Stage Renal Disease Treatment<br /></span>&quot; , &quot; access_datetime &quot;: &quot;2009-09-28T04:19:02Z&quot; , &quot; parser_name &quot;: &quot;C-SPAN House Ways and Means Committee Schedule Scraper&quot; , &quot; parser_version &quot;: &quot;0.1&quot; }
    23. Views with Map/Reduce I
        All events scraped for the Supreme Court Map: Reduce:
      function ( doc ) { if ( doc. entity == “Supreme Court” ) { emit ( doc. datetime , doc ) } } None
    24. Views with Map/Reduce II function ( doc ) { var month = doc. datetime . substr ( 0 , 7 ); emit ( month , 1 ); } function ( key , values ) { return sum ( values ); }
        All events counted by event month Map: Reduce:
    25. Lessons
      • Unlearning normalization is very difficult
      • Harnessing “high availability” requires a large up-front investment of development time
      • Map/Reduce and SQL shouldn't even be used in the same sentence (GQL is a stupid name)
      • Schema-less data is fantastic
      • Integrity checking in code is not so bad (that is what abstraction is for)
      • Doing Joins in code is actually very liberating
    26. Conclusions
      • You (probably) do not need an NRDB
      • But you ought to learn one anyway
      • It's not just for Twitter and bleeding edge startups
      • Amazon, Facebook, Google, IBM, and Microsoft all get this
      • Sometimes it is simply the right tool for the job
    27. Links
      • CouchDB: http://couchdb.apache.org/
      • CouchDB & Map/Reduce Emulator: http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html
      • NASA's Kepler Mission: http://kepler.nasa.gov/
      • ReadWriteWeb on NRDBs: http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php
      • Voter's Daily: http://github.com/bouvard/votersdaily
    SlideShare Zeitgeist 2009

    + OnyxfishOnyxfish Nominate

    custom

    192 views, 1 favs, 1 embeds more stats

    Delivered at San Luis Obispo .NET Users Group on Oc more

    More info about this document

    CC Attribution License

    Go to text version

    • Total Views 192
      • 169 on SlideShare
      • 23 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 1
    Most viewed embeds
    • 23 views on http://www.etlafins.com

    more

    All embeds
    • 23 views on http://www.etlafins.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories