Your SlideShare is downloading. ×
SDEC2011 NoSQL concepts and models
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SDEC2011 NoSQL concepts and models

2,337

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,337
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
139
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. What Exactly is NoSQL?Document databases, Column-family stores, Key-value pairs, moreShashank Tiwariblog: shanky.org | twitter: @tshankyst@treasuryofideas.com
  • 2. NoSQL?
  • 3. NoSQL : Various Shapes and Sizes• Document Databases• Column-family Oriented Stores• Key/value Data stores• XML Databases• Object Databases• Graph Databases
  • 4. Document Databases• mostly MongoDB, little CouchDB
  • 5. What is a document db?• One that stores documents• Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB• ...what exactly is a document?
  • 6. In the real world• (Source: http://guide.couchdb.org/draft/why.html)
  • 7. In terms of JSON• {name: “John Doe”,• zip: 10001}
  • 8. What about db schema?• Schema-less• Different documents could be stored in a single collection
  • 9. Data types: MongoDB• Essential JSON types:• string• integer• boolean• double
  • 10. Data types: MongoDB (...cont)• Additional JSON types• null, array and object• BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 11. A BSON example: object id
  • 12. Data types: CouchDB• Everything JSON• Large objects: attachments
  • 13. CRUD operations for documents• Create• Read• Update• Delete
  • 14. MongoDB: Create Document• use mydb• w = {name: “John Doe”, zip: 10001};• db.location.save(w);
  • 15. Create db and collection• Lazily created• Implicitly created• use mydb• db.collection.save(w)
  • 16. MongoDB: Read Document• db.location.find({zip: 10001});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 17. MongoDB: Read Document (...cont)• db.location.find({name: "John Doe"});• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 18. MongoDB: Update Document• Atomic operations on single documents• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 19. Indexes(explain)• db.ratings.find().explain();
  • 20. Indexes(explain output)•{• "cursor" : "BasicCursor",• "nscanned" : 1000209,• "nscannedObjects" : 1000209,• "n" : 1000209,• "millis" : 1549,• "indexBounds" : {
  • 21. Indexes(ensure index)• db.ratings.ensureIndex({ movie_id:1 });• db.ratings.ensureIndex({ movie_id:-1 });
  • 22. Indexes(explain when index used)•{• "cursor" : "BtreeCursor movie_id_1",• "nscanned" : 2077,• "nscannedObjects" : 2077,• "n" : 2077,• "millis" : 2,• "indexBounds" : {
  • 23. Indexes(get indexes)• db.ratings.getIndexes();
  • 24. Sorted Ordered Column-family Datastores• Sorted• Ordered• Distributed• Map
  • 25. Essential schema
  • 26. Multi-dimensional View
  • 27. A Map/Hash View•{• "row_key_1" : { "name" : {• "first_name" : "Jolly", "last_name" : "Goodfellow"• } } },• "location" : { "zip": "94301" },
  • 28. Architectural View (HBase)
  • 29. The Persistence Mechanism
  • 30. The underlying file format
  • 31. Model Wrappers (The GAE Way)• Python • Model, Expando, PolyModel• Java • JDO, JPA
  • 32. HBase Data Access• Thrift + Avro• Java API -- HTable, HBaseAdmin• Hive (SQL like)• MapReduce -- sink and/or source
  • 33. Transactions• Atomic row level• GAE Entity Groups
  • 34. Indexes• Row ordered• Secondary indexes• GAE style multiple indexes • thinking from output to query
  • 35. Use cases• Many Google’s Products• Facebook Messaging• StumbleUpon • Open TSDB• Mahalo, Ning, Meetup, Twitter, Yahoo!• Lily -- open source CMS built on HBase & Solr
  • 36. Brewer’s CAP Theorem• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 37. Distributed Systems & Consistency (case: success)
  • 38. Distributed Systems & Consistency (case: failure)
  • 39. Binding by Transactions
  • 40. Consistency Spectrum
  • 41. Inconsistency Window
  • 42. RWN Math• R – Number of nodes that are read from.• W – Number of nodes that are written to.• N – Total number of nodes in the cluster.• In general: R < N and W < N for higher availability
  • 43. R+W>N• Easy to determine consistent state• R + W = 2N • absolutely consistent, can provide ACID gaurantee• In all cases when R + W > N there is some overlap between read and write nodes.
  • 44. R = 1, W = N• more reads than writes•W=N • 1 node failure = entire system unavailable
  • 45. R = N, W =1•W=N • Chance of data inconsistency quite high•R=N • Read only possible when all nodes in the cluster are available
  • 46. R = W = ceiling ((N + 1)/2)Effective quorum for eventual consistency
  • 47. Eventual consistency variants• Causal consistency -- A writes and informs B then B always sees updated value• Read-your-writes-consistency -- A writes a new value and never see the old one• Session consistency -- read-your-writes-consistency within a client session• Monotonic read consistency -- once seen a new value, never return previous value• Monotonic write consistency -- serialize writes by the same process
  • 48. Dynamo Techniques• Consistent Hashing (Incremental scalability)• Vector clocks (high availability for writes)• Sloppy quorum and hinted handoff (recover from temporary failure)• Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection)• Anti-entropy using Merkle trees• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 49. Consistent Hashing
  • 50. Vector clocks (a trivial example)• 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup• Joe -- suggests Palo Alto (t0)• Hillary and Eric -- decide to meet in Mountain View (t1)• Eric and Ajay -- decide to meet in Los Altos (t2)• Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3) • both Hillary and Ajay say: Eric knows
  • 51. Vector clocks (how it works)• Venue : Palo Alto• Vector Clock: Joe (ver 1)• Venue: Mountain View• Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1)• Venue: Los Altos• Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
  • 52. Vector clock (resolution)• Venue : Palo Alto• Vector Clock: Joe (ver 1)• Venue: Mountain View• Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2)• Venue: Los Altos• Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
  • 53. CouchDB MVCC Style• (Source: http://guide.couchdb.org/draft/consistency.html)
  • 54. Key/value Stores• Memcached• Membase• Redis• Tokyo Cabinet• Kyoto Cabinet• Berkeley DB
  • 55. Redis -- a key-value data structure server• open source key-value store• a data structure server • values in key-value pairs can be strings, hashes, lists, sets, sorted sets
  • 56. Where to find it?• redis.io• download a copy from http://redis.io/download
  • 57. Who is building it?• Core developers • Salvatore Sanfilippo, twitter: @antirez • Pieter Noordhuis, twitter: @pnoordhuis• Main sponsor • VMware
  • 58. Written in• ANSI C • runs on POSIX compliant systems with no external dependencies
  • 59. How can it be used?• as an in memory data store • with option to persist to disk• in standalone mode or as a master-slave replicated set • Redis cluster -- coming soon! (June 2011)• as cache
  • 60. Redis Architecture
  • 61. Download and install• curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz • (just a 436kb download)• tar zxvf redis-2.2.0-rc4.tar.gz• cd redis-2.2.0-rc4• make & make install (installs in /usr/local/bin)• make test (to be sure you install it correctly)
  • 62. Start the redis-server• /usr/local/bin/redis-server• ...Server started, Redis version 2.1.12• ...The server is now ready to accept connections on port 6379
  • 63. Connect with redis-cli• /usr/local/bin/redis-cli• redis> set key1 val1• OK• redis> get key1• "val1"
  • 64. String key-value pairs• like memcached • with persistence• key and value -- binary-safe strings
  • 65. Binary-safe?• redis> set "a key _" "another value"• OK• redis> get "a key _"• "another value"
  • 66. Questions?• blog: shanky.org | twitter: @tshanky• st@treasuryofideas.com

×