• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SDEC2011 NoSQL Data modelling
 

SDEC2011 NoSQL Data modelling

on

  • 3,248 views

 

Statistics

Views

Total Views
3,248
Views on SlideShare
2,650
Embed Views
598

Actions

Likes
5
Downloads
212
Comments
1

6 Embeds 598

http://blog.nosqlfan.com 562
http://www.zhuaxia.com 20
http://xianguo.com 6
http://zhuaxia.com 5
http://reader.youdao.com 4
http://static.slidesharecdn.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • nosql
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    SDEC2011 NoSQL Data modelling SDEC2011 NoSQL Data modelling Presentation Transcript

    • NoSQL Data ModelingConcepts and CasesShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
    • NoSQL?
    • NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases• Object Databases• Graph Databases
    • Key Questions• How do I model data for my application?• How do I determine which one is right for me?• Can I easily shift from one database to the other?• Is there a standard way of storing, accessing, and querying data?
    • Agenda for this session• Explore some of the main NoSQL products• Understand how they are similar and different• How best to use these products in the stack•
    • Document Databases• also GenieDB, SimpleDB
    • What is a document db?• One that stores documents• Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB• ...what exactly is a document?
    • In the real world• (Source: http://guide.couchdb.org/draft/why.html)
    • In terms of JSON• {name: “John Doe”,• zip: 10001}
    • What about db schema?• Schema-less• Different documents could be stored in a single collection
    • Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
    • Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
    • A BSON example: object id
    • Data types: CouchDB• Everything JSON• Large objects: attachments
    • CRUD operations for documents• Create• Read• Update• Delete
    • MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
    • Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
    • MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
    • MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
    • MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
    • CouchDB: RESTful• Supports REST verbs: GET, HEAD, PUT, POST, DELETE• Supports Replication• Supports the notion of attachments• Could work in offline modes and supports small footprint profiles
    • Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
    • Essential schema
    • Multi-dimensional View
    • A Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
    • Architectural View (HBase)
    • The Persistence Mechanism
    • Model Wrappers (The GAE Way)• Python • Model, Expando, PolyModel• Java • JDO, JPA
    • HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
    • Transactions• Atomic row level• GAE Entity Groups
    • Indexes• Row ordered• Secondary indexes• GAE style multiple indexes • thinking from output to query
    • Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Lily -- open source CMS built on HBase & Solr
    • Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
    • Distributed Systems & Consistency (case: success)
    • Distributed Systems & Consistency (case: failure)
    • Binding by Transactions
    • Consistency Spectrum
    • Inconsistency Window
    • RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in the cluster.• In general: R < N and W < N for higher availability
    • R+W>N• Easy to determine consistent state• R + W = 2N • absolutely consistent, can provide ACID gaurantee• In all cases when R + W > N there is some overlap between read and write nodes.
    • R = 1, W = N• more reads than writes•W=N • 1 node failure = entire system unavailable
    • R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are available
    • R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
    • Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated value• Read-your-writes-consistency -- A writes a new value and never see the old one• Session consistency -- read-your-writes-consistency within a client session• Monotonic read consistency -- once seen a new value, never return previous value• Monotonic write consistency -- serialize writes by the same process
    • Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quorum and hinted handoff (recover from temporary failure)• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)• Anti-entropy using Merkle trees• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
    • Consistent Hashing
    • CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
    • Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
    • Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com