Agenda
What is NoSQL
Databases Overview
Aggregate Data Models
Distributions Models
Consistency
NWR
Purpose of this talk
Just to share some information
To spend time nicely
Facilitate the discussion
(questions are welcome )
Rise of NoSQL
Inspired by 2 papers:
Amazon Dynamo
Google BigTable
What is NoSQL
Not a well defined term
(just the name of one single meetup in
2009 at San Francisco)
So, what does it stand for?
It is better to pay attention what does it
mean rather than what does it stand for
Common characteristics of
NoSQL
● Don't use SQL as a query language
(provide it is own query mechanism)
● Non relational
● Open-source projects
● Run on clusters
● Developed in 21st
century
● Schemaless
Schemaless
While being schemaless, there is still
implicit schema in the application code
Why do you use NoSQL
To operate on big data on multiple
machines running across the cluster
Increase developer productivity
(even if there is no demand for big data)
What is wrong with traditional
RDBMS
● Nothing really, they will not disappear
(who knows ;)
● Well defined tools
(even the whole profession is behind
DBA)
● There is no black or white choice, NoSQL
and RDBMS will continue to work closely
together, i.e. the rise of Polyglot
Persistence
But, RDBMS is not perfect
Impedance mismatch
Running on cluster is a challenge
NoSQL World (major ones)
Document Oriented
Key-Value
Column-Family
Graph Databases
Data Model
Aggregate Oriented VS Relational
- Access by key
- Make it easier to manage data storage over
clusters
- Usually you adopt you aggregate/data model to
the query pattern your application has
Aggregate – is the collection of related objects that we wish to treat as a unit
ACID
NoSQL has ACID, but in scope of one
aggregate
(we can do atomic manipulate of a single
aggregate at a time)
Graph databases actually have full support of ACID
Distribution Models
● Single Sever (no distribution at all)
● Sharding (can be combined with replication)
(shard key – range based or hash based)
● Master-Slave Replication (“read” scalability)
(writes to M, reads can be done from S)
(M – single point of failure)
● Peer-to-Peer Replication (common to CF)
(consistency issue)
(Eventual)Consistency
Actual trade off is between latency and consitency
NWR
● N – number of nodes to replicate to
(replication factor, number of copies in
the cluster)
● W – number of nodes to write before write
succeeded successful
● R – number of nodes to read from before
read succeeded successful
NWR
● W+R <= N – eventual consistency
(eventually all the nodes in the cluster will get
the data)
● W = N, R = 1 – consistency by writes
(what RDBMS does)
● W = 1, R = N – consistency by reads
(conflicts must be resolved somehow)
● W + R > N – consistency by quorum
Quorum (W+R > N)
Read from more than half and
write to more than half
(QUORUM = N/2 + 1)
Books

NoSQL Talk at eBuddy

  • 1.
    Agenda What is NoSQL DatabasesOverview Aggregate Data Models Distributions Models Consistency NWR
  • 2.
    Purpose of thistalk Just to share some information To spend time nicely Facilitate the discussion (questions are welcome )
  • 3.
    Rise of NoSQL Inspiredby 2 papers: Amazon Dynamo Google BigTable
  • 4.
    What is NoSQL Nota well defined term (just the name of one single meetup in 2009 at San Francisco)
  • 5.
    So, what doesit stand for? It is better to pay attention what does it mean rather than what does it stand for
  • 6.
    Common characteristics of NoSQL ●Don't use SQL as a query language (provide it is own query mechanism) ● Non relational ● Open-source projects ● Run on clusters ● Developed in 21st century ● Schemaless
  • 7.
    Schemaless While being schemaless,there is still implicit schema in the application code
  • 8.
    Why do youuse NoSQL To operate on big data on multiple machines running across the cluster Increase developer productivity (even if there is no demand for big data)
  • 9.
    What is wrongwith traditional RDBMS ● Nothing really, they will not disappear (who knows ;) ● Well defined tools (even the whole profession is behind DBA) ● There is no black or white choice, NoSQL and RDBMS will continue to work closely together, i.e. the rise of Polyglot Persistence
  • 10.
    But, RDBMS isnot perfect Impedance mismatch Running on cluster is a challenge
  • 11.
    NoSQL World (majorones) Document Oriented Key-Value Column-Family Graph Databases
  • 12.
    Data Model Aggregate OrientedVS Relational - Access by key - Make it easier to manage data storage over clusters - Usually you adopt you aggregate/data model to the query pattern your application has Aggregate – is the collection of related objects that we wish to treat as a unit
  • 13.
    ACID NoSQL has ACID,but in scope of one aggregate (we can do atomic manipulate of a single aggregate at a time) Graph databases actually have full support of ACID
  • 14.
    Distribution Models ● SingleSever (no distribution at all) ● Sharding (can be combined with replication) (shard key – range based or hash based) ● Master-Slave Replication (“read” scalability) (writes to M, reads can be done from S) (M – single point of failure) ● Peer-to-Peer Replication (common to CF) (consistency issue)
  • 15.
    (Eventual)Consistency Actual trade offis between latency and consitency
  • 16.
    NWR ● N –number of nodes to replicate to (replication factor, number of copies in the cluster) ● W – number of nodes to write before write succeeded successful ● R – number of nodes to read from before read succeeded successful
  • 17.
    NWR ● W+R <=N – eventual consistency (eventually all the nodes in the cluster will get the data) ● W = N, R = 1 – consistency by writes (what RDBMS does) ● W = 1, R = N – consistency by reads (conflicts must be resolved somehow) ● W + R > N – consistency by quorum
  • 18.
    Quorum (W+R >N) Read from more than half and write to more than half (QUORUM = N/2 + 1)
  • 19.