20100310 Miller Sts


Published on

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

Published in: Technology, Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

20100310 Miller Sts

  1. 1. Of fi cia lB et a! Apache CouchDB 1
  2. 2. INTRODUCTIONS • Mike Miller • not an Apache CouchDB committer • ParticlePhysics Faculty at U. of Washington • Cloudant cofounder, Chief Scientist • Background: machine learning, analysis, big data, globally distributed systems STS 3/10/2010 2 Mike Miller UW/Cloudant
  3. 3. NEW PROBLEMS => NEW SOLUTIONS ( http://www.economist.com/specialreports/ PrinterFriendly.cfm?story_id=15557443) Big Data Dynamic, Realtime STS 3/10/2010 3 Mike Miller UW/Cloudant
  4. 4. NEW PROBLEMS => NEW SOLUTIONS “old” “new” Scale this horizontally/linearly STS 3/10/2010 4 Mike Miller UW/Cloudant
  5. 5. CAP THEOREM read record consistency availability partition tolerance write record masterless, scalable Brewer: Pick two (at a time) (e.g., Cloudant’s Couch) STS 3/10/2010 5 Mike Miller UW/Cloudant
  6. 6. NoSQL: Not Only SQL • Name blame: Eric Evans from Rackspace • Pragmatic response to new class of problems poorly fit to RDBMS: • heterogeneous/dynamic data, geographic/network separation, scaling/concurrency • Not an abandonment of 40+ years of DB research • From home-brew => general solutions • Best collection of online talks: https://nosqleast.com/2009/#agenda STS 3/10/2010 6 Mike Miller UW/Cloudant
  7. 7. TAXONOMY: 1 E. Eifrem, no:sql(east) STS 3/10/2010 7 Mike Miller UW/Cloudant
  8. 8. TAXONOMY: 2 E. Eifrem, no:sql(east) STS 3/10/2010 8 Mike Miller UW/Cloudant
  9. 9. EMIL’S CLASSIFICATION but neglects: concurrency, performance, reliability... STS 3/10/2010 9 Mike Miller UW/Cloudant
  10. 10. Cl os in gi n on 1.0 Apache CouchDB 10
  11. 11. CouchDB • easy, flexible, robust, concurrent, replicates • Document: Schema-free database server • RESTful JSON API • views: MapReduce system for generating custom • replication: Bi-directional incremental • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON STS 3/10/2010 11 Mike Miller UW/Cloudant
  12. 12. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ STS 3/10/2010 12 Mike Miller UW/Cloudant
  13. 13. SHOULD YOU CARE? The internet happened, and we ignored it. In retrospect that was a mistake. Paraphrased from Bill Warner (Founder Avid, Wildfire) YCombinator Summer, 2008 CouchDB: An enabling technology STS 3/10/2010 13 Mike Miller UW/Cloudant
  14. 14. FROM INTEREST TO ADOPTION • 100+ production apps • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open STS 3/10/2010 community 14 Mike Miller UW/Cloudant
  15. 15. LETS TAKE A LOOK STS 3/10/2010 15 Mike Miller UW/Cloudant
  16. 16. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content STS 3/10/2010 16 Mike Miller UW/Cloudant
  17. 17. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid etag header plug-n-play cache • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid STS 3/10/2010 17 Mike Miller UW/Cloudant
  18. 18. STS 3/10/2010 18 Mike Miller UW/Cloudant
  19. 19. ROBUST • Never overwrite previously committed data • Documents stored in append, only b+trees, ‘copy-on-write’ • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput STS 3/10/2010 19 Mike Miller UW/Cloudant
  20. 20. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load STS 3/10/2010 20 Mike Miller UW/Cloudant
  21. 21. CONCURRENCY IN ACTION: BBC • WYSIWYG: graceful at scale: load, volume, concurrency STS 3/10/2010 21 Mike Miller UW/Cloudant
  22. 22. VIEWS: B.Y.O. INDEX • Custom, persistent representations of document data • “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) • Each view must have a map function and may also have a reduce function • Leverages view collation, rich view query API STS 3/10/2010 22 Mike Miller UW/Cloudant
  23. 23. VIEWS: INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leaf nodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html STS 3/10/2010 23 Mike Miller UW/Cloudant
  24. 24. VIEWS: EXAMPLE key value STS 3/10/2010 24 Mike Miller UW/Cloudant
  25. 25. DOCUMENTS BY AUTHOR key value STS 3/10/2010 25 Mike Miller UW/Cloudant
  26. 26. WORD COUNT STS 3/10/2010 26 Mike Miller UW/Cloudant
  27. 27. RELATIONAL STS 3/10/2010 27 Mike Miller UW/Cloudant
  28. 28. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” STS 3/10/2010 28 Mike Miller UW/Cloudant
  29. 29. REPLICATION Source Replicator Target What updates do you have since X? Which of these GET /db/_changes?since=X are you missing? POST /db/_missing_revs Target needs these id@rev documents Save these documents [GET /db/doc?open_revs=...] POST /db/_bulk_docs Record the latest source update seen by the target STS 3/10/2010 29 Mike Miller UW/Cloudant
  30. 30. MASTER-SLAVE STS 3/10/2010 30 Mike Miller UW/Cloudant
  31. 31. MASTER-MASTER STS 3/10/2010 31 Mike Miller UW/Cloudant
  32. 32. ROBUST MULTI-MASTER STS 3/10/2010 32 Mike Miller UW/Cloudant
  33. 33. CONFLICT RESOLUTION • Replication can introduce conflicts in a multi-master setup • CouchDB deterministically chooses a winner; the loser is saved as a conflict revision • Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions STS 3/10/2010 33 Mike Miller UW/Cloudant
  34. 34. CONFLICT RESOLUTION Case 1: save the same update to multiple replication partners PUT /a/foo PUT /b/foo replicate No Conflict STS 3/10/2010 34 Mike Miller UW/Cloudant
  35. 35. CONFLICT RESOLUTION Case 2: save different updates to the same document PUT /a/foo PUT /b/foo replicate Conflict STS 3/10/2010 35 Mike Miller UW/Cloudant
  36. 36. CONFLICT RESOLUTION replicate No Conflict STS 3/10/2010 36 Mike Miller UW/Cloudant
  37. 37. CONFLICT RESOLUTION save different updates to the same document to replication partners STS 3/10/2010 37 Mike Miller UW/Cloudant
  38. 38. REPLICATION IN ACTION: UBUNTU 9.10 “When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!” Elliot Murphy, 10/13/2009 replication is a potential game-changer STS 3/10/2010 38 Mike Miller UW/Cloudant
  39. 39. STUFF I DIDN’T TALK ABOUT • Server-side processing: validate, upsert, show, list • Authentication using HTTP Basic, Cookie, OAuth • couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1] • _external indexers, including full-text search powered by CouchDB-Lucene [2] • The many excellent client libraries, view servers, etc. contributed by the community [3] • SQLite wrapper for CouchDB [4], and much more... [1]: http://github.com/couchapp/couchapp [2]: http://github.com/rnewson/couchdb-lucene [3]: http://wiki.apache.org/couchdb/Related_Projects [4]: http://apsw.googlecode.com/svn/publish/couchdb.html STS 3/10/2010 39 Mike Miller UW/Cloudant
  40. 40. TAKE IT FOR A SPIN hosted free: cloudant.com easy offline: CouchDBX STS 3/10/2010 40 Mike Miller UW/Cloudant
  41. 41. relax STS 3/10/2010 41 Mike Miller UW/Cloudant