Your SlideShare is downloading. ×
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
SDEC2011 NoSQL Data modelling
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SDEC2011 NoSQL Data modelling

3,142

Published on

Published in: Technology, Education
1 Comment
5 Likes
Statistics
Notes
  • nosql
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,142
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
238
Comments
1
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. NoSQL Data ModelingConcepts and CasesShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  • 2. NoSQL?
  • 3. NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases• Object Databases• Graph Databases
  • 4. Key Questions• How do I model data for my application?• How do I determine which one is right for me?• Can I easily shift from one database to the other?• Is there a standard way of storing, accessing, and querying data?
  • 5. Agenda for this session• Explore some of the main NoSQL products• Understand how they are similar and different• How best to use these products in the stack•
  • 6. Document Databases• also GenieDB, SimpleDB
  • 7. What is a document db?• One that stores documents• Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB• ...what exactly is a document?
  • 8. In the real world• (Source: http://guide.couchdb.org/draft/why.html)
  • 9. In terms of JSON• {name: “John Doe”,• zip: 10001}
  • 10. What about db schema?• Schema-less• Different documents could be stored in a single collection
  • 11. Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
  • 12. Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 13. A BSON example: object id
  • 14. Data types: CouchDB• Everything JSON• Large objects: attachments
  • 15. CRUD operations for documents• Create• Read• Update• Delete
  • 16. MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
  • 17. Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
  • 18. MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 19. MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 20. MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 21. CouchDB: RESTful• Supports REST verbs: GET, HEAD, PUT, POST, DELETE• Supports Replication• Supports the notion of attachments• Could work in offline modes and supports small footprint profiles
  • 22. Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
  • 23. Essential schema
  • 24. Multi-dimensional View
  • 25. A Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  • 26. Architectural View (HBase)
  • 27. The Persistence Mechanism
  • 28. Model Wrappers (The GAE Way)• Python • Model, Expando, PolyModel• Java • JDO, JPA
  • 29. HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
  • 30. Transactions• Atomic row level• GAE Entity Groups
  • 31. Indexes• Row ordered• Secondary indexes• GAE style multiple indexes • thinking from output to query
  • 32. Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Lily -- open source CMS built on HBase & Solr
  • 33. Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 34. Distributed Systems & Consistency (case: success)
  • 35. Distributed Systems & Consistency (case: failure)
  • 36. Binding by Transactions
  • 37. Consistency Spectrum
  • 38. Inconsistency Window
  • 39. RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in the cluster.• In general: R < N and W < N for higher availability
  • 40. R+W>N• Easy to determine consistent state• R + W = 2N • absolutely consistent, can provide ACID gaurantee• In all cases when R + W > N there is some overlap between read and write nodes.
  • 41. R = 1, W = N• more reads than writes•W=N • 1 node failure = entire system unavailable
  • 42. R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are available
  • 43. R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
  • 44. Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated value• Read-your-writes-consistency -- A writes a new value and never see the old one• Session consistency -- read-your-writes-consistency within a client session• Monotonic read consistency -- once seen a new value, never return previous value• Monotonic write consistency -- serialize writes by the same process
  • 45. Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quorum and hinted handoff (recover from temporary failure)• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)• Anti-entropy using Merkle trees• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 46. Consistent Hashing
  • 47. CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
  • 48. Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
  • 49. Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com

×