NoSQL in Perspective




                           Jeff Smith
           jeffreyksmithjr@gmail.com
NoSQL on Wikipedia


92 databases
8 types
6 sub-types
Easy Questions
Is this a graph?
Do I already have XML or JSON?
Is this a caching problem?
Paul Graham
on Programming Languages


                    Lisp
    C
Math Problem
Lisp is just math.
Math doesn't get stale.
What in databases is just math?
Putting the R in RDBMSes
 Relation

            Attributes
 Tuples
Database Analogy
C is to Lisp
as
Relational Algebra is to Relational Calculus

C: Lisp::Relational Algebra: Relational Calculus
Relational Algebra in Action
Relational Algebra:
R ⋉S = { t : t R, s S, Fun (t s) }




SQL:
SELECT * FROM audience WHERE clue > 0;
Relational Calculus in Action?
Relational Calculus:
{ t : {name} | ∃ s : {name, wage} ( Employee(s) ∧ s.wage = 50.000 ∧ t.name = s.name ) }




Relevant Implemented Language:
This space under construction.
Relational Model Utility


Essentially, all models are wrong,
  but some are useful.
- George E. P. Box
When relations are wrong
Sparse data
Irregular data
Poorly understood interrelationships
No definable indexes
Big data
No vertically scalable hardware
Papers Read Around the World
Google's BigTable:
  http://research.google.com/archive/
  bigtable.html
Amazon's Dynamo:
  http://www.allthingsdistributed.com/
  2007/10/amazons_dynamo.html
Lessons from
 Functional Programming
MapReduce:
  http://research.google.com/archive/
  mapreduce.html
MapReduce
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
 EmitIntermediate(w, "1");


reduce(String key, Iterator values):
// key: a word
// values: a list of counts
int result = 0;
for each v in values:
 result += ParseInt(v);
Emit(AsString(result));       [1]
CAP Theorem


Consistency
Availability
Partition tolerance
CAP Theorem?
               Availability




 Consistency             Partition Tolerance
Sacrifice Availability




  Consistency        Partition Tolerance
Then, sacrifice what?
  Availability          Availability




  Consistency      Partition Tolerance
PACELC
In the event of a Partition,
    does the system prioritize
    Availability
    or
    Consistency
Else
    does the system prioritize
    Latency
    or
    Consistency?
PACELC as a Tree




       Partition                     Else


Availability Consistency   Latency   Consistency
Traditional RDBMSes: PC/EC




   Partition                 Else


        Consistency          Consistency
Eventually Consistent: PA/EL




        Partition              Else


Availability         Latency
ELC: Replication Options


1. Update all nodes
2. Update the master node first
3. Update an arbitrary node first
Best of both worlds?




     SQL
HadoopDB
MySQL Cluster
Riak Demo
N:
     persisted copies
R:
   read copies
W:
   write copies
Strong Consistency:
   R+W>N
Thanks




                         Jeff Smith
         jeffreyksmithjr@gmail.com

NoSQL in Perspective