Brad Anderson presented on NOSQL databases and CouchDB. He discussed how relational databases do not scale well and are rigid. NOSQL databases like CouchDB are a better fit for large, growing datasets. CouchDB is a document oriented database written in Erlang that uses a REST API and supports views and incremental replication. It can be deployed on a cloud platform to improve scalability, redundancy and query distribution.
5. RELATIONAL DATABASES
RDBMS
• Rigid Schema / ORM fun
1970-2009
• Scalability
• Everything is a Nail
http://www.flickr.com/photos/36041246@N00/3419197777/
5
6. MASTER-SLAVE
• Master-Slave Replication
• One (and only one) master
• One or more slaves
• All writes go to the master, replicated to slaves
• Reads balanced among master and slaves
• Issues
• single point of failure
• single point of bottleneck
• static topology
6
7. MASTER-MASTER
• Master-Master Replication
• One or more masters
• Writes and reads can go to any master
• Writes are replicated among masters
• Issues
• limited performance and scalability (typically due to 2PC)
• complexity
• static topology
7
8. VERTICAL PARTITION
• Vertical Partitioning
• Put tables belonging to different functional areas on different
database nodes
• Scale data & load by function
• Move joins to the application level
• Issues
• no longer truly relational
• a functional area grows too much
8
9. HORIZONTAL PARTITION
• Horizontal Partitioning
• Split tables by key and put partitions (shards) on different nodes
• Scale data & load by key
• Move joins to the application level
• Issues
• no longer truly relational
• a partition grows too much
9
10. CACHING
• Put a cache in front of your database
• Distribute
• Write-through for scaling reads
• Write-behind for scaling reads and writes
• Issues
• “only” scales read/write load
• invalidation
10
12. NOSQL
NOT ONLY SQL
A moniker for different data storage systems
solving very different problems,
all where a relational database is not the right fit.
12
13. RIGHT FIT
• Google indexes 400 Pb / day (2007)
• CERN, LHC generates 100 Pb / sec
• Unique data created each year (IDC, 2007)
• 2007 40 Eb
• 2010 988 Eb (exponential growth)
13
15. BIG TAKEAWAY
function
function data
function function
data data
function function
data data
data data
data data
data data
function function
data data
data data
data data
function function
data data
function
data
Bring the function to the data
15
17. HUH? ERLANG?
• Programming Language created at Ericsson (20 yrs
old now)
• Designed for scalable, long-lived systems
• Compiled, Functional, Dynamically Typed, Open
Source
17
18. 3 BIGGIES
• Massively Concurrent
• green threads, very lightweight != os threads
• Seamlessly Distributed
• node = os thread = VM, processes can live anywhere
• Fault Tolerant
• 99.9999999 = 32ms downtime per year - AXD301
18
20. COUCHDB
• Schema-free document database server
• Robust, highly concurrent, fault-tolerant
• RESTful JSON API
• Futon web admin console
• MapReduce system for generating custom views
• Bi-directional incremental replication
• couchapp: lightweight
HTML+JavaScript apps served directly
from CouchDB using views to transform JSON
20
21. FROM INTEREST TO ADOPTION
• 100+ production users • Active
commercial
•3
development
books being written
• Rapidly maturing
• Vibrant, open community 21
22. OF THE WEB
Django may be built for the Web, but
CouchDB is built of the Web. I've never
seen software that so completely embraces
the philosophies behind HTTP ... this is
what the software of the future looks like.
Jacob Kaplan-Moss
October 17 2007
http://jacobian.org/writing/of-the-web/
22
23. DOCUMENTS
• Documents are JSON Objects
• Underscore-prefixed fields are reserved
• Documents can have binary attachments
• MVCC _rev deterministically generated from doc content
23
24. ROBUST
• Never overwrite previously committed data
• In
the event of a server crash or power failure, just restart
CouchDB -- there is no “repair”
• Take snapshots with “cp”
• Configurable levels of durability: can choose to fsync after
every update, or less often to gain better throughput
24
25. CONCURRENT
• Erlang
approach: lightweight processes to model the natural
concurrency in a problem
• For CouchDB that means one process per TCP connection
• Lock-free
architecture; each process works with an MVCC
snapshot of a DB.
• Performance degrades gracefully under heavy concurrent load
25
26. REST API
• Create
PUT /mydb/mydocid
• Retrieve
GET /mydb/mydocid
• Update
PUT /mydb/mydocid
• Delete
DELETE /mydb/mydocid
26
28. VIEWS
• Custom, persistent representations of document data
• “Closeto the metal” -- no dynamic queries in production, so
you know exactly what you’re getting
• Generated using MapReduce functions written in JavaScript
(and other languages)
view must have a map function and may also have a
• Each
reduce function
• Leverages view collation, rich view query API
28
31. INCREMENTAL
• Computing a view can be expensive, so CouchDB saves the
result in a B-tree and keeps it up-to-date
• Leafnodes store map results, inner nodes store reductions of
children
http://horicky.blogspot.com/2008/10/couchdb-implementation.html
31
32. REPLICATION
• Peer-based, bi-directional replication using normal HTTP calls
• Mediated by a replicator process which can live on the
source, target, or somewhere else entirely
• Replicate
a subset of documents in a DB meeting criteria
defined in a custom filter function (coming soon)
• Applications (_design documents) replicate along with the
data
• Ideal for offline applications -- “ground computing”
32
35. ARCHITECTURE
• Each cluster is a ring of nodes (Dynamo, Dynomite)
• Any node can handle request (consistent hashing)
• O(1), with a hop
• nodes own partitions (ring is divided)
• data are distributed evenly across partitions and replicas
• mapreduce functions are passed to nodes for execution
36. RESEARCH
• Google’s MapReduce, http://bit.ly/bJbyq5
• Amazon’s Dynamo, http://bit.ly/b7FlsN
• CAP theorem, http://bit.ly/bERr2H
37. CLUSTER CONTROLS
•N - Replication
Q
•Q - Partitions = 2
•R - Read Quorum
•W - Write Quorum
• These constants define the cluster
38. N
Consistency
Throughput
Durability
N = Number of replicas per item stored in cluster
39. Q
Throughput Scalability
2^Q = Number of partitions (shards) in cluster
T = Number of nodes in cluster
2^Q / T = Number of partitions per node
40. R
Latency Consistency
R = Number of successful reads before
returning value(s) to client
41. W
Latency Durability
W = Number of successful writes before
returning ‘success’ to client
42. Load Balancer
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
43. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
Load Balancer
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
44. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
Load Balancer
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
45. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
Load Balancer
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X hash(blah) = E E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
46. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
N=3
W=2
Load Balancer
R=2
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X hash(blah) = E E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
47. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
N=3
W=2
Load Balancer
R=2
Node 1
24 No
de A B C D de
No B
2
A
Z C
Y D
X hash(blah) = E E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
48. request
PUT http://boorad.cloudant.com/dbname/blah?w=2
N=3
W=2
Load Balancer
R=2
24
Node 1
No
node down
de A B C D de
No B
2
A
Z C
Y D
X hash(blah) = E E
C N
od
e
D 3
E
F
D
No
de
E
4
F
G
49. RESULT
• For standalone or cluster
• one REST API
• one URL
• For cluster
• redundant data
• distributed queries
• scale out
43
20 yrs old, open source since mid-90’s, iirc.
like a mobile telephone grid
compiled (but to bytecode for a VM)
open source
Why Erlang?
Here are my three big ticket items
- massively concurrent
- seamlessly distributed into multi-machine clusters
- extremely fault tolerant
Great for my projects
- data storage & retrieval
- scalable web apps
Maybe not so hot for computationally intensive projects
- unless they lend themselves to parallelism