What Exactly is NoSQL?
Document databases, Column-family stores, Key-value pairs, more


Shashank Tiwari
blog: shanky.org | twitter: @tshanky
st@treasuryofideas.com
NoSQL?
NoSQL : Various Shapes and Sizes

• Document Databases


• Column-family Oriented Stores


• Key/value Data stores


• XML Databases


• Object Databases


• Graph Databases
Document Databases




• mostly MongoDB, little CouchDB
What is a document db?

• One that stores documents


• Popular options:


  • MongoDB -- C++


  • CouchDB -- Erlang


  • Also Amazon’s SimpleDB


• ...what exactly is a document?
In the real world




• (Source: http://guide.couchdb.org/draft/why.html)
In terms of JSON

• {name: “John Doe”,


• zip: 10001}
What about db schema?

• Schema-less


• Different documents could be stored in a single collection
Data types: MongoDB

• Essential JSON types:


• string


• integer


• boolean


• double
Data types: MongoDB (...cont)

• Additional JSON types


• null, array and object


• BSON types -- binary encoded serialization of JSON like documents


   • date, binary data, object id, regular expression and code


   • (Reference: bsonspec.org)
A BSON example: object id
Data types: CouchDB

• Everything JSON


• Large objects: attachments
CRUD operations for documents

• Create


• Read


• Update


• Delete
MongoDB: Create Document

• use mydb


• w = {name: “John Doe”, zip: 10001};


• db.location.save(w);
Create db and collection

• Lazily created


• Implicitly created


• use mydb


• db.collection.save(w)
MongoDB: Read Document

• db.location.find({zip: 10001});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Read Document (...cont)

• db.location.find({name: "John Doe"});


• { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe",
  "zip" : 10001 }
MongoDB: Update Document

• Atomic operations on single documents


• db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
Indexes
(explain)

• db.ratings.find().explain();
Indexes
(explain output)

•{


•    "cursor" : "BasicCursor",


•    "nscanned" : 1000209,


•    "nscannedObjects" : 1000209,


•    "n" : 1000209,


•    "millis" : 1549,


•    "indexBounds" : {
Indexes
(ensure index)

• db.ratings.ensureIndex({ movie_id:1 });


• db.ratings.ensureIndex({ movie_id:-1 });
Indexes
(explain when index used)

•{


•    "cursor" : "BtreeCursor movie_id_1",


•    "nscanned" : 2077,


•    "nscannedObjects" : 2077,


•    "n" : 2077,


•    "millis" : 2,


•    "indexBounds" : {
Indexes
(get indexes)

• db.ratings.getIndexes();
Sorted Ordered Column-family Datastores

• Sorted


• Ordered


• Distributed


• Map
Essential schema
Multi-dimensional View
A Map/Hash View

•{


• "row_key_1" : { "name" : {


•     "first_name" : "Jolly", "last_name" : "Goodfellow"


•     } } },


•    "location" : { "zip": "94301" },
Architectural View (HBase)
The Persistence Mechanism
The underlying file format
Model Wrappers (The GAE Way)

• Python


  • Model, Expando, PolyModel


• Java


  • JDO, JPA
HBase Data Access

• Thrift + Avro


• Java API -- HTable, HBaseAdmin


• Hive (SQL like)


• MapReduce -- sink and/or source
Transactions

• Atomic row level


• GAE Entity Groups
Indexes

• Row ordered


• Secondary indexes


• GAE style multiple indexes


  • thinking from output to query
Use cases

• Many Google’s Products


• Facebook Messaging


• StumbleUpon


  • Open TSDB


• Mahalo, Ning, Meetup, Twitter, Yahoo!


• Lily -- open source CMS built on HBase & Solr
Brewer’s CAP Theorem




• http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf


• http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
Distributed Systems & Consistency (case: success)
Distributed Systems & Consistency (case: failure)
Binding by Transactions
Consistency Spectrum
Inconsistency Window
RWN Math

• R – Number of nodes that are read from.


• W – Number of nodes that are written to.


• N – Total number of nodes in the cluster.




• In general: R < N and W < N for higher availability
R+W>N

• Easy to determine consistent state


• R + W = 2N


  • absolutely consistent, can provide ACID gaurantee


• In all cases when R + W > N there is some overlap between read and write
  nodes.
R = 1, W = N

• more reads than writes


•W=N


  • 1 node failure = entire system unavailable
R = N, W =1

•W=N


 • Chance of data inconsistency quite high


•R=N


 • Read only possible when all nodes in the cluster are available
R = W = ceiling ((N + 1)/2)
Effective quorum for eventual consistency
Eventual consistency variants

• Causal consistency -- A writes and informs B then B always sees updated
  value


• Read-your-writes-consistency -- A writes a new value and never see the old
  one


• Session consistency -- read-your-writes-consistency within a client session


• Monotonic read consistency -- once seen a new value, never return previous
  value


• Monotonic write consistency -- serialize writes by the same process
Dynamo Techniques

• Consistent Hashing (Incremental scalability)


• Vector clocks (high availability for writes)


• Sloppy quorum and hinted handoff (recover from temporary failure)


• Gossip based membership protocol (periodic, pair wise, inter-process
  interactions, low reliability, random peer selection)


• Anti-entropy using Merkle trees


• (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-
  dynamo-sosp2007.pdf)
Consistent Hashing
Vector clocks (a trivial example)

• 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup


• Joe -- suggests Palo Alto (t0)


• Hillary and Eric -- decide to meet in Mountain View (t1)


• Eric and Ajay -- decide to meet in Los Altos (t2)


• Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3)


   • both Hillary and Ajay say: Eric knows
Vector clocks (how it works)

• Venue : Palo Alto


• Vector Clock: Joe (ver 1)


• Venue: Mountain View


• Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1)


• Venue: Los Altos


• Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
Vector clock (resolution)

• Venue : Palo Alto


• Vector Clock: Joe (ver 1)


• Venue: Mountain View


• Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2)


• Venue: Los Altos


• Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
CouchDB MVCC Style




• (Source: http://guide.couchdb.org/draft/consistency.html)
Key/value Stores

• Memcached


• Membase


• Redis


• Tokyo Cabinet


• Kyoto Cabinet


• Berkeley DB
Redis -- a key-value data structure server

• open source key-value store


• a data structure server


   • values in key-value pairs can be strings, hashes, lists, sets, sorted sets
Where to find it?

• redis.io


• download a copy from http://redis.io/download
Who is building it?

• Core developers


  • Salvatore Sanfilippo, twitter: @antirez


  • Pieter Noordhuis, twitter: @pnoordhuis


• Main sponsor


  • VMware
Written in

• ANSI C


  • runs on POSIX compliant systems with no external dependencies
How can it be used?

• as an in memory data store


  • with option to persist to disk


• in standalone mode or as a master-slave replicated set


  • Redis cluster -- coming soon! (June 2011)


• as cache
Redis Architecture
Download and install

• curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz


      • (just a 436kb download)


• tar zxvf redis-2.2.0-rc4.tar.gz


• cd redis-2.2.0-rc4


• make & make install (installs in /usr/local/bin)


• make test (to be sure you install it correctly)
Start the redis-server

• /usr/local/bin/redis-server




• ...Server started, Redis version 2.1.12


• ...The server is now ready to accept connections on port 6379
Connect with redis-cli

• /usr/local/bin/redis-cli




• redis> set key1 val1


• OK


• redis> get key1


• "val1"
String key-value pairs

• like memcached


   • with persistence


• key and value -- binary-safe strings
Binary-safe?

• redis> set "a key _" "another value"


• OK


• redis> get "a key _"


• "another value"
Questions?




• blog: shanky.org | twitter: @tshanky


• st@treasuryofideas.com

SDEC2011 NoSQL concepts and models

  • 1.
    What Exactly isNoSQL? Document databases, Column-family stores, Key-value pairs, more Shashank Tiwari blog: shanky.org | twitter: @tshanky st@treasuryofideas.com
  • 2.
  • 3.
    NoSQL : VariousShapes and Sizes • Document Databases • Column-family Oriented Stores • Key/value Data stores • XML Databases • Object Databases • Graph Databases
  • 4.
    Document Databases • mostlyMongoDB, little CouchDB
  • 5.
    What is adocument db? • One that stores documents • Popular options: • MongoDB -- C++ • CouchDB -- Erlang • Also Amazon’s SimpleDB • ...what exactly is a document?
  • 6.
    In the realworld • (Source: http://guide.couchdb.org/draft/why.html)
  • 7.
    In terms ofJSON • {name: “John Doe”, • zip: 10001}
  • 8.
    What about dbschema? • Schema-less • Different documents could be stored in a single collection
  • 9.
    Data types: MongoDB •Essential JSON types: • string • integer • boolean • double
  • 10.
    Data types: MongoDB(...cont) • Additional JSON types • null, array and object • BSON types -- binary encoded serialization of JSON like documents • date, binary data, object id, regular expression and code • (Reference: bsonspec.org)
  • 11.
    A BSON example:object id
  • 12.
    Data types: CouchDB •Everything JSON • Large objects: attachments
  • 13.
    CRUD operations fordocuments • Create • Read • Update • Delete
  • 14.
    MongoDB: Create Document •use mydb • w = {name: “John Doe”, zip: 10001}; • db.location.save(w);
  • 15.
    Create db andcollection • Lazily created • Implicitly created • use mydb • db.collection.save(w)
  • 16.
    MongoDB: Read Document •db.location.find({zip: 10001}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 17.
    MongoDB: Read Document(...cont) • db.location.find({name: "John Doe"}); • { "_id" : ObjectId("4c97053abe67000000003857"), "name" : "John Doe", "zip" : 10001 }
  • 18.
    MongoDB: Update Document •Atomic operations on single documents • db.location.update( { name:"John Doe" }, { $set: { name: "Jane Doe" } } );
  • 19.
  • 20.
    Indexes (explain output) •{ • "cursor" : "BasicCursor", • "nscanned" : 1000209, • "nscannedObjects" : 1000209, • "n" : 1000209, • "millis" : 1549, • "indexBounds" : {
  • 21.
    Indexes (ensure index) • db.ratings.ensureIndex({movie_id:1 }); • db.ratings.ensureIndex({ movie_id:-1 });
  • 22.
    Indexes (explain when indexused) •{ • "cursor" : "BtreeCursor movie_id_1", • "nscanned" : 2077, • "nscannedObjects" : 2077, • "n" : 2077, • "millis" : 2, • "indexBounds" : {
  • 23.
  • 24.
    Sorted Ordered Column-familyDatastores • Sorted • Ordered • Distributed • Map
  • 25.
  • 26.
  • 27.
    A Map/Hash View •{ •"row_key_1" : { "name" : { • "first_name" : "Jolly", "last_name" : "Goodfellow" • } } }, • "location" : { "zip": "94301" },
  • 28.
  • 29.
  • 30.
  • 31.
    Model Wrappers (TheGAE Way) • Python • Model, Expando, PolyModel • Java • JDO, JPA
  • 32.
    HBase Data Access •Thrift + Avro • Java API -- HTable, HBaseAdmin • Hive (SQL like) • MapReduce -- sink and/or source
  • 33.
    Transactions • Atomic rowlevel • GAE Entity Groups
  • 34.
    Indexes • Row ordered •Secondary indexes • GAE style multiple indexes • thinking from output to query
  • 35.
    Use cases • ManyGoogle’s Products • Facebook Messaging • StumbleUpon • Open TSDB • Mahalo, Ning, Meetup, Twitter, Yahoo! • Lily -- open source CMS built on HBase & Solr
  • 36.
    Brewer’s CAP Theorem •http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • http://theory.lcs.mit.edu/tds/papers/Gilbert/Brewer6.ps
  • 37.
    Distributed Systems &Consistency (case: success)
  • 38.
    Distributed Systems &Consistency (case: failure)
  • 39.
  • 40.
  • 41.
  • 42.
    RWN Math • R– Number of nodes that are read from. • W – Number of nodes that are written to. • N – Total number of nodes in the cluster. • In general: R < N and W < N for higher availability
  • 43.
    R+W>N • Easy todetermine consistent state • R + W = 2N • absolutely consistent, can provide ACID gaurantee • In all cases when R + W > N there is some overlap between read and write nodes.
  • 44.
    R = 1,W = N • more reads than writes •W=N • 1 node failure = entire system unavailable
  • 45.
    R = N,W =1 •W=N • Chance of data inconsistency quite high •R=N • Read only possible when all nodes in the cluster are available
  • 46.
    R = W= ceiling ((N + 1)/2) Effective quorum for eventual consistency
  • 47.
    Eventual consistency variants •Causal consistency -- A writes and informs B then B always sees updated value • Read-your-writes-consistency -- A writes a new value and never see the old one • Session consistency -- read-your-writes-consistency within a client session • Monotonic read consistency -- once seen a new value, never return previous value • Monotonic write consistency -- serialize writes by the same process
  • 48.
    Dynamo Techniques • ConsistentHashing (Incremental scalability) • Vector clocks (high availability for writes) • Sloppy quorum and hinted handoff (recover from temporary failure) • Gossip based membership protocol (periodic, pair wise, inter-process interactions, low reliability, random peer selection) • Anti-entropy using Merkle trees • (source: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon- dynamo-sosp2007.pdf)
  • 49.
  • 50.
    Vector clocks (atrivial example) • 4 hackers: Joe, Hillary, Eric and Ajay decide to meetup • Joe -- suggests Palo Alto (t0) • Hillary and Eric -- decide to meet in Mountain View (t1) • Eric and Ajay -- decide to meet in Los Altos (t2) • Joe mails: PA, Hillary responds: Mtn View, Ajay responds: Los Altos (t3) • both Hillary and Ajay say: Eric knows
  • 51.
    Vector clocks (howit works) • Venue : Palo Alto • Vector Clock: Joe (ver 1) • Venue: Mountain View • Vector Clock: Joe (ver 1), Hillary (ver 1), Eric (ver 1) • Venue: Los Altos • Vector Clock: Joe (ver 1), Ajay (ver 1), Eric (ver 1)
  • 52.
    Vector clock (resolution) •Venue : Palo Alto • Vector Clock: Joe (ver 1) • Venue: Mountain View • Vector Clock: Joe (ver 1), Hillary (ver 1), Ajay (ver 0), Eric (ver 2) • Venue: Los Altos • Vector Clock: Joe (ver 1), Hillary (ver 0), Ajay (ver 1), Eric (ver 1)
  • 53.
    CouchDB MVCC Style •(Source: http://guide.couchdb.org/draft/consistency.html)
  • 54.
    Key/value Stores • Memcached •Membase • Redis • Tokyo Cabinet • Kyoto Cabinet • Berkeley DB
  • 55.
    Redis -- akey-value data structure server • open source key-value store • a data structure server • values in key-value pairs can be strings, hashes, lists, sets, sorted sets
  • 56.
    Where to findit? • redis.io • download a copy from http://redis.io/download
  • 57.
    Who is buildingit? • Core developers • Salvatore Sanfilippo, twitter: @antirez • Pieter Noordhuis, twitter: @pnoordhuis • Main sponsor • VMware
  • 58.
    Written in • ANSIC • runs on POSIX compliant systems with no external dependencies
  • 59.
    How can itbe used? • as an in memory data store • with option to persist to disk • in standalone mode or as a master-slave replicated set • Redis cluster -- coming soon! (June 2011) • as cache
  • 60.
  • 61.
    Download and install •curl -O http://redis.googlecode.com/files/redis-2.2.0-rc4.tar.gz • (just a 436kb download) • tar zxvf redis-2.2.0-rc4.tar.gz • cd redis-2.2.0-rc4 • make & make install (installs in /usr/local/bin) • make test (to be sure you install it correctly)
  • 62.
    Start the redis-server •/usr/local/bin/redis-server • ...Server started, Redis version 2.1.12 • ...The server is now ready to accept connections on port 6379
  • 63.
    Connect with redis-cli •/usr/local/bin/redis-cli • redis> set key1 val1 • OK • redis> get key1 • "val1"
  • 64.
    String key-value pairs •like memcached • with persistence • key and value -- binary-safe strings
  • 65.
    Binary-safe? • redis> set"a key _" "another value" • OK • redis> get "a key _" • "another value"
  • 66.
    Questions? • blog: shanky.org| twitter: @tshanky • st@treasuryofideas.com