Non Relational Databases And World Domination

  • 1,125 views
Uploaded on

Apparently NoSQL is all the rage these days, but what does it really mean and what technologies are out there? When to use a non-relational database? How to decide which one to use to achieve world …

Apparently NoSQL is all the rage these days, but what does it really mean and what technologies are out there? When to use a non-relational database? How to decide which one to use to achieve world domination? How do I use CouchDB with Ruby on Rails?

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,125
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
79
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Non-Relational Databases and World Domination Jason Davies Thursday, 3 December 2009
  • 2. Overview • Relational vs. Non-Relational • Why Switch? • Non-Relational Solutions • Document Databases • Key Value Stores • CouchDB Features Thursday, 3 December 2009
  • 3. Relational Databases Thursday, 3 December 2009
  • 4. Relational Databases • Relational algebra: union, intersection, difference, cartesian product • Easy to perform dynamic queries • Fixed Schemas • Normalisation Thursday, 3 December 2009
  • 5. Non-Relational Databases • Everything else! • Myriad of features, including: • Key-Value stores with external indexers • Schemaless • RESTful APIs Thursday, 3 December 2009
  • 6. CAP eorem • Three requirements for applications in a distributed environment: • Consistency • Availability • Partition tolerance • Pick two Thursday, 3 December 2009
  • 7. Why Switch? • Data structure • Scalability • The New Cool Thursday, 3 December 2009
  • 8. Data Structure Symptoms Thursday, 3 December 2009
  • 9. Sparse Data • Tables with many columns, only a few being used by any particular row Thursday, 3 December 2009
  • 10. Attribute Tables • Each row is (fkey, att_name, att_value) Thursday, 3 December 2009
  • 11. Data Dumps • Given up on using columns for structured data • Instead simply serialising it (JSON, YAML, XML, etc.) and dumping strings to database Thursday, 3 December 2009
  • 12. Too Many Joins • Schemas involving large numbers of many-to-many join tables or tree-like structures Thursday, 3 December 2009
  • 13. Frequent Schema Changes • May be fine for small databases • Can be tedious • Rebuilding indexes is slow for millions of rows Thursday, 3 December 2009
  • 14. Scalability Thursday, 3 December 2009
  • 15. Thursday, 3 December 2009
  • 16. Thursday, 3 December 2009
  • 17. Write Capacity • If read capacity is the problem, then set up master-slave replication Thursday, 3 December 2009
  • 18. Too Much Data • Too much for one server to hold • Hard to shard the data sensibly Thursday, 3 December 2009
  • 19. Non-Relational Solutions Thursday, 3 December 2009
  • 20. Diverse Ecosystem • Column-oriented databases • Document-oriented databases • Key value stores • Graph-oriented databases • Distributed databases • MapReduce Thursday, 3 December 2009
  • 21. BigTable • “a sparse, distributed multi-dimensional sorted map” • Designed to scale into the petabyte range • HBase (Java, Hadoop) • Hypertable • Cassandra (Facebook, based on Amazon’s Dynamo) Thursday, 3 December 2009
  • 22. Document Databases • Arbitrary number of “sparse” attributes per document • Documents often map well to JSON e.g. in CouchDB • Cons: usually can’t perform joins or transactions spanning multiple documents Thursday, 3 December 2009
  • 23. Graph Databases • Good for highly interconnected data • Focus on the relationships between items • Optimised for querying transitive relationships i.e. variable length chains of joins • Neo4J, AllegroGraph, Sesame Thursday, 3 December 2009
  • 24. Distributed K-V Stores • Giant hash table/dictionary • Mainly solve data scalability problems • Transparently partition and replicate data • Cons: • eventual consistency or other distributed transaction protocols • hard to do integrity constraints, hard to catch application bugs Thursday, 3 December 2009
  • 25. Distributed K-V Stores • Scalaris, Dynomite, Ringo: data consistency • MemcacheDB, Tokyo Cabinet: low latency Thursday, 3 December 2009
  • 26. Apache CouchDB Thursday, 3 December 2009
  • 27. CouchDB and Ruby # with !, it creates the database if it doesn't already exist @db = CouchRest.database!("http://127.0.0.1:5984/couchrest-test") response = @db.save_doc({ :key => 'value', 'another key' => 'another value' }) doc = @db.get(response['id']) puts doc.inspect Thursday, 3 December 2009
  • 28. CouchDB and Ruby @db.bulk_save([ {"wild" => "and random"}, {"mild" => "yet local"}, {"another" => ["set","of","keys"]} ]) # returns ids and revs of the current docs puts @db.documents.inspect Thursday, 3 December 2009
  • 29. CouchDB and Ruby @db.save_doc({ "_id" => "_design/first", :views => { :test => { :map => "function(doc){for(var w in doc) { if(!w.match(/^_/))emit(w,doc[w])}}" } } }) puts @db.view('first/test')['rows'].inspect Thursday, 3 December 2009
  • 30. CouchDB and Ruby • Read more about CouchRest on github • Also check out newcomer RubyAqua Thursday, 3 December 2009
  • 31. Schema-Free (JSON) • Features • Document Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 32. Schema-Free (JSON) • Features • Document Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 33. http://www.flickr.com/photos/stilleben2001/223243329/ Documents Thursday, 3 December 2009
  • 34. Schema-Free ( JSON) { "_id": "BCCD12CBB", "_rev": "AB764C", "type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true } Thursday, 3 December 2009
  • 35. Schema-Free ( JSON) { "_id": "BCCD12CBB", "_rev": "AB764C", "type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true } Thursday, 3 December 2009
  • 36. Schema-Free ( JSON) { "_id": "BCCD12CBB", "_rev": "AB764C", "type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true } Thursday, 3 December 2009
  • 37. Schema-Free ( JSON) { "_id": "BCCD12CBB", "_rev": "AB764C", "type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true } Thursday, 3 December 2009
  • 38. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 39. Document-Oriented Not Relational • Documents in the Real World™ • Bills, letters, tax forms… • Same type != same structure • Can be out of date (so what?) • No references Thursday, 3 December 2009
  • 40. Document-Oriented Not Relational • Documents in the Real World™ Bills, letters, tax forms… Natural Data • • Same type != same structure • Behaviour Can be out of date (so what?) • No references Thursday, 3 December 2009
  • 41. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 42. Highly Concurrent Thursday, 3 December 2009
  • 43. Highly Concurrent • Functional languages highly appropriate for parallellism Thursday, 3 December 2009
  • 44. Highly Concurrent • Functional languages highly appropriate for parallellism • Lightweight “processes” and message- passing; “shared-nothing” Thursday, 3 December 2009
  • 45. Highly Concurrent • Functional languages highly appropriate for parallellism • Lightweight “processes” and message- passing; “shared-nothing” • Easy to create fault-tolerant systems Thursday, 3 December 2009
  • 46. MVCC • Multiversion Concurrency Control • Reads: lock-free; never block • Potential for massive horizontal scaling • Writes: all-or-nothing • Success • Fail: conflict error, fetch and try again Thursday, 3 December 2009
  • 47. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 48. ful CRUD • Create HTTP PUT /db/mydocid • Read HTTP GET /db/mydocid • Update HTTP PUT /db/mydocid • Delete HTTP DELETE /db/mydocid Thursday, 3 December 2009
  • 49. ful Example couch = CouchRest.database!("http:// 127.0.0.1:5984/tweets") tweets_url = "http://twitter.com/statuses/ user_timeline.json" tweets = http.get(tweets_url) couch.bulk_save(tweets) Thursday, 3 December 2009
  • 50. Cacheability • Both documents and views return ETags • Clients send If-None-Match • CouchDB responds with 304 Not Modified and bypasses potentially expensive lookup • Can use Varnish/Squid as caching proxy • Proxy- friendly Thursday, 3 December 2009
  • 51. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 52. JavaScript-Powered Map/Reduce • Map functions extract data from your documents • Reduce functions aggregate intermediate values • The kicker: Incremental B-tree storage Thursday, 3 December 2009
  • 53. http://horicky.blogspot.com/2008/10/couchdb-implementation.html Thursday, 3 December 2009
  • 54. Map/Reduce Views Docs Map {"user" : "Chris", function(doc) { {"key": "Alice", "value": 5} "points" : 3 } if (doc.user && doc.points) { {"key": "Bob", "value": 7} {"user": "Joe", emit(doc.user, doc.points); {"key": "Chris", "value": 3} "points" : 10 } } {"key": "Joe", "value": 10} {"user": "Alice", } {"key": "Mary", "value": 9} "points" : 5 } {"user": "Mary", "points" : 9} {"user": "Bob", Reduce "points": 7} function(keys, values, rereduce) { Alice ... Chris: 15 return sum(values); Everyone: 34 } Thursday, 3 December 2009
  • 55. Map/Reduce Views Docs Map {"user" : "Chris", function(doc) { {"key": "Alice", "value": 5} "points" : 3 } if (doc.user && doc.points) { {"key": "Bob", "value": 7} {"user": "Joe", emit(doc.user, doc.points); {"key": "Chris", "value": 3} "points" : 10 } } {"key": "Joe", "value": 10} {"user": "Alice", } {"key": "Mary", "value": 9} "points" : 5 } {"user": "Mary", "points" : 9} {"user": "Bob", Reduce "points": 7} function(keys, values, rereduce) { Alice … Chris: 15 return sum(values); Everyone: 34 } Thursday, 3 December 2009
  • 56. Map/Reduce Views Docs Map {"user" : "Chris", function(doc) { {"key": "Alice", "value": 5} "points" : 3 } if (doc.user && doc.points) { {"key": "Bob", "value": 7} {"user": "Joe", emit(doc.user, doc.points); {"key": "Chris", "value": 3} "points" : 10 } } {"key": "Joe", "value": 10} {"user": "Alice", } {"key": "Mary", "value": 9} "points" : 5 } {"user": "Mary", "points" : 9} {"user": "Bob", Reduce "points": 7} function(keys, values, rereduce) { Alice … Chris: 15 return sum(values); Everyone: 34 } Thursday, 3 December 2009
  • 57. Render Views as HTML lists/index.js /drl/_list/sofa/index/recent-posts?descending=true&limit=8 Thursday, 3 December 2009
  • 58. Server-Side JavaScript • _show for transforming documents • _list for transforming views • _update for transforming PUTs/POSTs • Code-sharing between client and server • Easy deployment Thursday, 3 December 2009
  • 59. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 60. Replication • Incremental • Near-real-time • Clustered mirrors • Scheduled • Ad-hoc Thursday, 3 December 2009
  • 61. “Ground Computing” @jhuggins http://www.flickr.com/photos/mcpig/872293700/ Thursday, 3 December 2009
  • 62. http://www.flickr.com/photos/hercwad/2290378571/ Thursday, 3 December 2009
  • 63. Latency Sucks Thursday, 3 December 2009
  • 64. Stuart Langridge - Canonical ! ! Thursday, 3 December 2009
  • 65. Thursday, 3 December 2009
  • 66. Thursday, 3 December 2009
  • 67. Thursday, 3 December 2009
  • 68. Thursday, 3 December 2009
  • 69. Thursday, 3 December 2009
  • 70. Thursday, 3 December 2009
  • 71. Thursday, 3 December 2009
  • 72. Thursday, 3 December 2009
  • 73. Thursday, 3 December 2009
  • 74. Thursday, 3 December 2009
  • 75. Con icts Thursday, 3 December 2009
  • 76. Con ict resolution by example A B Thursday, 3 December 2009
  • 77. Con ict resolution by example A B ❦ Thursday, 3 December 2009
  • 78. Con ict resolution by example A B ❦ Thursday, 3 December 2009
  • 79. Con ict resolution by example A B ❦ ❦ Thursday, 3 December 2009
  • 80. Con ict resolution by example A B ❦ ❦ Thursday, 3 December 2009
  • 81. Con ict resolution by example A B ❦ ✿ ❦ Thursday, 3 December 2009
  • 82. Con ict resolution by example A B ❦ ✿ ❦ Thursday, 3 December 2009
  • 83. Con ict resolution by example A B ✿ Thursday, 3 December 2009
  • 84. Con ict resolution by example A B ✿ Thursday, 3 December 2009
  • 85. Con ict resolution by example A B ✿ Thursday, 3 December 2009
  • 86. Con ict resolution by example A B ✿ Thursday, 3 December 2009
  • 87. Schema-Free (JSON) • Features • Document-Oriented, Not Relational • Highly Concurrent • RESTful HTTP API • JavaScript-Powered Map/Reduce • N-Master Replication • Robust Storage Thursday, 3 December 2009
  • 88. Robust Storage Append-Only File Structure Designed to Crash Instant-On Thursday, 3 December 2009
  • 89. Robust Thursday, 3 December 2009
  • 90. Thursday, 3 December 2009
  • 91. anks! www.jasondavies.com @jasondavies Thursday, 3 December 2009