SDEC2011 NoSQL Data modelling

  • 3,009 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • nosql
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,009
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
220
Comments
1
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NoSQL Data ModelingConcepts and CasesShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  • 2. NoSQL?
  • 3. NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases• Object Databases• Graph Databases
  • 4. Key Questions• How do I model data for my application?• How do I determine which one is right for me?• Can I easily shift from one database to the other?• Is there a standard way of storing, accessing, and querying data?
  • 5. Agenda for this session• Explore some of the main NoSQL products• Understand how they are similar and different• How best to use these products in the stack•
  • 6. Document Databases• also GenieDB, SimpleDB
  • 7. What is a document db?• One that stores documents• Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB• ...what exactly is a document?
  • 8. In the real world• (Source: http://guide.couchdb.org/draft/why.html)
  • 9. In terms of JSON• {name: “John Doe”,• zip: 10001}
  • 10. What about db schema?• Schema-less• Different documents could be stored in a single collection
  • 11. Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
  • 12. Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 13. A BSON example: object id
  • 14. Data types: CouchDB• Everything JSON• Large objects: attachments
  • 15. CRUD operations for documents• Create• Read• Update• Delete
  • 16. MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
  • 17. Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
  • 18. MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 19. MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 20. MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 21. CouchDB: RESTful• Supports REST verbs: GET, HEAD, PUT, POST, DELETE• Supports Replication• Supports the notion of attachments• Could work in offline modes and supports small footprint profiles
  • 22. Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
  • 23. Essential schema
  • 24. Multi-dimensional View
  • 25. A Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  • 26. Architectural View (HBase)
  • 27. The Persistence Mechanism
  • 28. Model Wrappers (The GAE Way)• Python • Model, Expando, PolyModel• Java • JDO, JPA
  • 29. HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
  • 30. Transactions• Atomic row level• GAE Entity Groups
  • 31. Indexes• Row ordered• Secondary indexes• GAE style multiple indexes • thinking from output to query
  • 32. Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Lily -- open source CMS built on HBase & Solr
  • 33. Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 34. Distributed Systems & Consistency (case: success)
  • 35. Distributed Systems & Consistency (case: failure)
  • 36. Binding by Transactions
  • 37. Consistency Spectrum
  • 38. Inconsistency Window
  • 39. RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in the cluster.• In general: R < N and W < N for higher availability
  • 40. R+W>N• Easy to determine consistent state• R + W = 2N • absolutely consistent, can provide ACID gaurantee• In all cases when R + W > N there is some overlap between read and write nodes.
  • 41. R = 1, W = N• more reads than writes•W=N • 1 node failure = entire system unavailable
  • 42. R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are available
  • 43. R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
  • 44. Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated value• Read-your-writes-consistency -- A writes a new value and never see the old one• Session consistency -- read-your-writes-consistency within a client session• Monotonic read consistency -- once seen a new value, never return previous value• Monotonic write consistency -- serialize writes by the same process
  • 45. Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quorum and hinted handoff (recover from temporary failure)• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)• Anti-entropy using Merkle trees• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 46. Consistent Hashing
  • 47. CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
  • 48. Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
  • 49. Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com