NoSQL Data ModelingConcepts and CasesShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases•...
Key Questions• How do I model data for my application?• How do I determine which one is right for me?• Can I easily shift ...
Agenda for this session• Explore some of the main NoSQL products• Understand how they are similar and different• How best ...
Document Databases• also GenieDB, SimpleDB
What is a document db?• One that stores documents• Popular options:  • MongoDB -- C++  • CouchDB -- Erlang  • Also Amazon’...
In the real world• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON• {name: “John Doe”,• zip: 10001}
What about db schema?• Schema-less• Different documents could be stored in a single collection
Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization o...
A BSON example: object id
Data types: CouchDB• Everything JSON• Large objects: attachments
CRUD operations for documents• Create• Read• Update• Delete
MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe...
MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "n...
MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: ...
CouchDB: RESTful• Supports REST verbs: GET, HEAD, PUT, POST, DELETE• Supports Replication• Supports the notion of attachme...
Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
Essential schema
Multi-dimensional View
A Map/Hash View•{• "row_key_1" : { "name" : {•     "first_name" : "Jolly", "last_name" : "Goodfellow"•     } } },•    "loca...
Architectural View (HBase)
The Persistence Mechanism
Model Wrappers (The GAE Way)• Python  • Model, Expando, PolyModel• Java  • JDO, JPA
HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
Transactions• Atomic row level• GAE Entity Groups
Indexes• Row ordered• Secondary indexes• GAE style multiple indexes  • thinking from output to query
Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon  • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Li...
Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/paper...
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in ...
R+W>N• Easy to determine consistent state• R + W = 2N  • absolutely consistent, can provide ACID gaurantee• In all cases w...
R = 1, W = N• more reads than writes•W=N  • 1 node failure = entire system unavailable
R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are avail...
R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated  value• Read-your-w...
Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quor...
Consistent Hashing
CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com
Upcoming SlideShare
Loading in...5
×

SDEC2011 NoSQL Data modelling

3,220

Published on

Published in: Technology, Education
1 Comment
5 Likes
Statistics
Notes
  • nosql
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,220
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
241
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

SDEC2011 NoSQL Data modelling

  1. 1. NoSQL Data ModelingConcepts and CasesShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  2. 2. NoSQL?
  3. 3. NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases• Object Databases• Graph Databases
  4. 4. Key Questions• How do I model data for my application?• How do I determine which one is right for me?• Can I easily shift from one database to the other?• Is there a standard way of storing, accessing, and querying data?
  5. 5. Agenda for this session• Explore some of the main NoSQL products• Understand how they are similar and different• How best to use these products in the stack•
  6. 6. Document Databases• also GenieDB, SimpleDB
  7. 7. What is a document db?• One that stores documents• Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB• ...what exactly is a document?
  8. 8. In the real world• (Source: http://guide.couchdb.org/draft/why.html)
  9. 9. In terms of JSON• {name: “John Doe”,• zip: 10001}
  10. 10. What about db schema?• Schema-less• Different documents could be stored in a single collection
  11. 11. Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
  12. 12. Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  13. 13. A BSON example: object id
  14. 14. Data types: CouchDB• Everything JSON• Large objects: attachments
  15. 15. CRUD operations for documents• Create• Read• Update• Delete
  16. 16. MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
  17. 17. Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
  18. 18. MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  19. 19. MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  20. 20. MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  21. 21. CouchDB: RESTful• Supports REST verbs: GET, HEAD, PUT, POST, DELETE• Supports Replication• Supports the notion of attachments• Could work in offline modes and supports small footprint profiles
  22. 22. Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
  23. 23. Essential schema
  24. 24. Multi-dimensional View
  25. 25. A Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  26. 26. Architectural View (HBase)
  27. 27. The Persistence Mechanism
  28. 28. Model Wrappers (The GAE Way)• Python • Model, Expando, PolyModel• Java • JDO, JPA
  29. 29. HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
  30. 30. Transactions• Atomic row level• GAE Entity Groups
  31. 31. Indexes• Row ordered• Secondary indexes• GAE style multiple indexes • thinking from output to query
  32. 32. Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Lily -- open source CMS built on HBase & Solr
  33. 33. Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  34. 34. Distributed Systems & Consistency (case: success)
  35. 35. Distributed Systems & Consistency (case: failure)
  36. 36. Binding by Transactions
  37. 37. Consistency Spectrum
  38. 38. Inconsistency Window
  39. 39. RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in the cluster.• In general: R < N and W < N for higher availability
  40. 40. R+W>N• Easy to determine consistent state• R + W = 2N • absolutely consistent, can provide ACID gaurantee• In all cases when R + W > N there is some overlap between read and write nodes.
  41. 41. R = 1, W = N• more reads than writes•W=N • 1 node failure = entire system unavailable
  42. 42. R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are available
  43. 43. R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
  44. 44. Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated value• Read-your-writes-consistency -- A writes a new value and never see the old one• Session consistency -- read-your-writes-consistency within a client session• Monotonic read consistency -- once seen a new value, never return previous value• Monotonic write consistency -- serialize writes by the same process
  45. 45. Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quorum and hinted handoff (recover from temporary failure)• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)• Anti-entropy using Merkle trees• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  46. 46. Consistent Hashing
  47. 47. CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
  48. 48. Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
  49. 49. Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×