20100310 Miller Sts
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

20100310 Miller Sts

  • 3,170 views
Uploaded on

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,170
On Slideshare
3,165
From Embeds
5
Number of Embeds
1

Actions

Shares
Downloads
26
Comments
0
Likes
2

Embeds 5

http://www.slideshare.net 5

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Of fi cia lB et a! Apache CouchDB 1
  • 2. INTRODUCTIONS • Mike Miller • not an Apache CouchDB committer • ParticlePhysics Faculty at U. of Washington • Cloudant cofounder, Chief Scientist • Background: machine learning, analysis, big data, globally distributed systems STS 3/10/2010 2 Mike Miller UW/Cloudant
  • 3. NEW PROBLEMS => NEW SOLUTIONS ( http://www.economist.com/specialreports/ PrinterFriendly.cfm?story_id=15557443) Big Data Dynamic, Realtime STS 3/10/2010 3 Mike Miller UW/Cloudant
  • 4. NEW PROBLEMS => NEW SOLUTIONS “old” “new” Scale this horizontally/linearly STS 3/10/2010 4 Mike Miller UW/Cloudant
  • 5. CAP THEOREM read record consistency availability partition tolerance write record masterless, scalable Brewer: Pick two (at a time) (e.g., Cloudant’s Couch) STS 3/10/2010 5 Mike Miller UW/Cloudant
  • 6. NoSQL: Not Only SQL • Name blame: Eric Evans from Rackspace • Pragmatic response to new class of problems poorly fit to RDBMS: • heterogeneous/dynamic data, geographic/network separation, scaling/concurrency • Not an abandonment of 40+ years of DB research • From home-brew => general solutions • Best collection of online talks: https://nosqleast.com/2009/#agenda STS 3/10/2010 6 Mike Miller UW/Cloudant
  • 7. TAXONOMY: 1 E. Eifrem, no:sql(east) STS 3/10/2010 7 Mike Miller UW/Cloudant
  • 8. TAXONOMY: 2 E. Eifrem, no:sql(east) STS 3/10/2010 8 Mike Miller UW/Cloudant
  • 9. EMIL’S CLASSIFICATION but neglects: concurrency, performance, reliability... STS 3/10/2010 9 Mike Miller UW/Cloudant
  • 10. Cl os in gi n on 1.0 Apache CouchDB 10
  • 11. CouchDB • easy, flexible, robust, concurrent, replicates • Document: Schema-free database server • RESTful JSON API • views: MapReduce system for generating custom • replication: Bi-directional incremental • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON STS 3/10/2010 11 Mike Miller UW/Cloudant
  • 12. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ STS 3/10/2010 12 Mike Miller UW/Cloudant
  • 13. SHOULD YOU CARE? The internet happened, and we ignored it. In retrospect that was a mistake. Paraphrased from Bill Warner (Founder Avid, Wildfire) YCombinator Summer, 2008 CouchDB: An enabling technology STS 3/10/2010 13 Mike Miller UW/Cloudant
  • 14. FROM INTEREST TO ADOPTION • 100+ production apps • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open STS 3/10/2010 community 14 Mike Miller UW/Cloudant
  • 15. LETS TAKE A LOOK STS 3/10/2010 15 Mike Miller UW/Cloudant
  • 16. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content STS 3/10/2010 16 Mike Miller UW/Cloudant
  • 17. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid etag header plug-n-play cache • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid STS 3/10/2010 17 Mike Miller UW/Cloudant
  • 18. STS 3/10/2010 18 Mike Miller UW/Cloudant
  • 19. ROBUST • Never overwrite previously committed data • Documents stored in append, only b+trees, ‘copy-on-write’ • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput STS 3/10/2010 19 Mike Miller UW/Cloudant
  • 20. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load STS 3/10/2010 20 Mike Miller UW/Cloudant
  • 21. CONCURRENCY IN ACTION: BBC • WYSIWYG: graceful at scale: load, volume, concurrency STS 3/10/2010 21 Mike Miller UW/Cloudant
  • 22. VIEWS: B.Y.O. INDEX • Custom, persistent representations of document data • “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) • Each view must have a map function and may also have a reduce function • Leverages view collation, rich view query API STS 3/10/2010 22 Mike Miller UW/Cloudant
  • 23. VIEWS: INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leaf nodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html STS 3/10/2010 23 Mike Miller UW/Cloudant
  • 24. VIEWS: EXAMPLE key value STS 3/10/2010 24 Mike Miller UW/Cloudant
  • 25. DOCUMENTS BY AUTHOR key value STS 3/10/2010 25 Mike Miller UW/Cloudant
  • 26. WORD COUNT STS 3/10/2010 26 Mike Miller UW/Cloudant
  • 27. RELATIONAL STS 3/10/2010 27 Mike Miller UW/Cloudant
  • 28. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” STS 3/10/2010 28 Mike Miller UW/Cloudant
  • 29. REPLICATION Source Replicator Target What updates do you have since X? Which of these GET /db/_changes?since=X are you missing? POST /db/_missing_revs Target needs these id@rev documents Save these documents [GET /db/doc?open_revs=...] POST /db/_bulk_docs Record the latest source update seen by the target STS 3/10/2010 29 Mike Miller UW/Cloudant
  • 30. MASTER-SLAVE STS 3/10/2010 30 Mike Miller UW/Cloudant
  • 31. MASTER-MASTER STS 3/10/2010 31 Mike Miller UW/Cloudant
  • 32. ROBUST MULTI-MASTER STS 3/10/2010 32 Mike Miller UW/Cloudant
  • 33. CONFLICT RESOLUTION • Replication can introduce conflicts in a multi-master setup • CouchDB deterministically chooses a winner; the loser is saved as a conflict revision • Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions STS 3/10/2010 33 Mike Miller UW/Cloudant
  • 34. CONFLICT RESOLUTION Case 1: save the same update to multiple replication partners PUT /a/foo PUT /b/foo replicate No Conflict STS 3/10/2010 34 Mike Miller UW/Cloudant
  • 35. CONFLICT RESOLUTION Case 2: save different updates to the same document PUT /a/foo PUT /b/foo replicate Conflict STS 3/10/2010 35 Mike Miller UW/Cloudant
  • 36. CONFLICT RESOLUTION replicate No Conflict STS 3/10/2010 36 Mike Miller UW/Cloudant
  • 37. CONFLICT RESOLUTION save different updates to the same document to replication partners STS 3/10/2010 37 Mike Miller UW/Cloudant
  • 38. REPLICATION IN ACTION: UBUNTU 9.10 “When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!” Elliot Murphy, 10/13/2009 replication is a potential game-changer STS 3/10/2010 38 Mike Miller UW/Cloudant
  • 39. STUFF I DIDN’T TALK ABOUT • Server-side processing: validate, upsert, show, list • Authentication using HTTP Basic, Cookie, OAuth • couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1] • _external indexers, including full-text search powered by CouchDB-Lucene [2] • The many excellent client libraries, view servers, etc. contributed by the community [3] • SQLite wrapper for CouchDB [4], and much more... [1]: http://github.com/couchapp/couchapp [2]: http://github.com/rnewson/couchdb-lucene [3]: http://wiki.apache.org/couchdb/Related_Projects [4]: http://apsw.googlecode.com/svn/publish/couchdb.html STS 3/10/2010 39 Mike Miller UW/Cloudant
  • 40. TAKE IT FOR A SPIN hosted free: cloudant.com easy offline: CouchDBX STS 3/10/2010 40 Mike Miller UW/Cloudant
  • 41. relax STS 3/10/2010 41 Mike Miller UW/Cloudant