• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
DevNation Atlanta
 

DevNation Atlanta

on

  • 1,039 views

A talk given at DevNation Atlanta, 4/3/2010

A talk given at DevNation Atlanta, 4/3/2010

Statistics

Views

Total Views
1,039
Views on SlideShare
1,037
Embed Views
2

Actions

Likes
0
Downloads
10
Comments
0

2 Embeds 2

http://www.linkedin.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • You are not Google <br /> <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • Right Tool for the Job <br />
  • small companies <br /> huge data <br /> information from data <br />
  • <br />
  • <br />
  • <br />
  • 20 yrs old, open source since mid-90&#x2019;s, iirc. <br /> <br /> like a mobile telephone grid <br /> <br /> compiled (but to bytecode for a VM) <br /> <br /> open source <br /> <br />
  • <br />
  • Cluster Of Unreliable Commodity Hardware <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

DevNation Atlanta DevNation Atlanta Presentation Transcript

  • NOSQL, COUCHDB AND THE CLOUD Brad Anderson Cloudant 1
  • BRAD ANDERSON • BS Hotel Management • Restaurant Chain Data - econometric modeling, BI/DW • Open Source - trac, dsource.org, couchdb • NOSQLEast 2009 • Cloudant • http://twitter.com/boorad 2
  • AGENDA • NOSQL • COUCHDB • Erlang • Cloud • Dynamo • MapReduce 3
  • IF YOU • don’t have ‘medium data’ or ‘big data’ • arecool with 25K loc object-relational mappers • love an ops challenge • areokay paying Uncle Larry 4
  • IF YOU • don’t have ‘medium data’ or ‘big data’ • arecool with 25K loc object-relational mappers • love an ops challenge • areokay paying Uncle Larry 4
  • IF NOT http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg 5
  • RELATIONAL DATABASES RDBMS • Rigid Schema / ORM fun 1970-2010 • Scale Up • Everything is a Nail http://www.flickr.com/photos/36041246@N00/3419197777/ 6
  • SCALING RDBMS • Replication Sucks • master-slave • master-master • Partitioning Sucks • vertical (by functional area) • horizontal (by some key, say time) • Caching sort of works 7
  • OKAY, NOT SCREWED http://www.bigfatmoneybags.com/blog/wp-content/uploads/2009/12/screwed.jpg 8
  • RELATIONAL DATABASES RDBMS • Not Dead 1970- • Just have a ‘smell’ for certain tasks http://www.flickr.com/photos/36041246@N00/3419197777/ 9
  • NOSQL NOT ONLY SQL A moniker for different data storage systems solving very different problems, all where a relational database is not the right fit. 10
  • RIGHT FIT • Google indexes 400 Pb / day (2007) • CERN, LHC generates 100 Pb / sec • Unique data created each year (IDC, 2007) • 2007 40 Eb • 2010 988 Eb (exponential growth) • Flightcaster 11
  • FOUR CATEGORIES • Key/Value Stores • Dynomite, Voldemort, Tokyo • Document Stores • CouchDB, MongoDB • Column Stores / BigTable • HBase, Hypertable, Cassandra • Graph Databases • Neo4j, AllegroGraph, VertexDB 12
  • BIG TAKEAWAY function data data data data data data data data data data 13
  • BIG TAKEAWAY function function data function function data data function function data data data data data data data data function function data data data data data data function function data data function data Bring the function to the data 13
  • 14
  • HUH? ERLANG? • Programming Language created at Ericsson (20 yrs old now) • Designed for scalable, long-lived systems • Compiled, Functional, Dynamically Typed, Open Source 15
  • 3 BIGGIES • Massively Concurrent • green threads, very lightweight != os threads • Seamlessly Distributed • node = os thread = VM, processes can live anywhere • Fault Tolerant • 99.9999999 = 32ms downtime per year - AXD301 16
  • Of fi cia lB et a! CouchDB Apache 17
  • COUCHDB • Schema-free document database server • Robust, highly concurrent, fault-tolerant • RESTful JSON API • Futon web admin console • MapReduce system for generating custom views • Bi-directional incremental replication • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON 18
  • FROM INTEREST TO ADOPTION • 100+ production users • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open community 19
  • OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ 20
  • DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content 21
  • ROBUST • Never overwrite previously committed data • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput 22
  • CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load 23
  • REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid 24
  • 25
  • VIEWS • Custom, persistent representations of document data • “Closeto the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) view must have a map function and may also have a • Each reduce function • Leverages view collation, rich view query API 26
  • DOCUMENTS BY AUTHOR 27
  • INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leafnodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html 28
  • REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” 29
  • CLOUD 30
  • SHOWROOM A cluster of couches 31
  • Help me, this name sucks! SHOWROOM A cluster of couches 31
  • ARCHITECTURE • Each cluster is a ring of nodes (Dynamo, Dynomite) • Any node can handle request (consistent hashing) • O(1), with a hop • nodes own partitions (ring is divided) • data are distributed evenly across partitions and replicas • mapreduce functions are passed to nodes for execution
  • RESEARCH • Google’s MapReduce, http://bit.ly/bJbyq5 • Amazon’s Dynamo, http://bit.ly/b7FlsN • CAP theorem, http://bit.ly/bERr2H
  • CLUSTER CONTROLS •N - Replication Q •Q - Partitions = 2 •R - Read Quorum •W - Write Quorum • These constants define the cluster
  • N Consistency Throughput Durability N = Number of replicas per item stored in cluster
  • Q Throughput Scalability 2^Q = Number of partitions (shards) in cluster T = Number of nodes in cluster 2^Q / T = Number of partitions per node
  • R Latency Consistency R = Number of successful reads before returning value(s) to client
  • W Latency Durability W = Number of successful writes before returning ‘success’ to client
  • Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 Load Balancer Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 Node 1 24 No de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • request PUT http://boorad.cloudant.com/dbname/blah?w=2 N=3 W=2 Load Balancer R=2 24 Node 1 No node down de A B C D de No B 2 A Z C Y D X hash(blah) = E E C N od e D 3 E F D No de E 4 F G
  • RESULT • For standalone or cluster • one REST API • one URL • For cluster • redundant data • distributed queries • scale out 40
  • DEMO? 41
  • QUESTIONS?
  • CREDITS • Emil Eifrem, http://bit.ly/5D40WQ • Sergio Bossa, http://bit.ly/c9UoRZ • Cliff Moon, http://bit.ly/bX887c 43