0
Of
                 fi   cia
                        lB
                             et
                               a!

...
INTRODUCTIONS
• Mike          Miller

• not     an Apache CouchDB committer

• ParticlePhysics Faculty at U. of
   Washing...
NEW PROBLEMS => NEW SOLUTIONS
    ( http://www.economist.com/specialreports/
      PrinterFriendly.cfm?story_id=15557443)
...
NEW PROBLEMS => NEW SOLUTIONS




                “old”                  “new”

                        Scale this horizon...
CAP THEOREM
                                           read record




  consistency          availability



            ...
NoSQL: Not Only SQL

 • Name           blame: Eric Evans from Rackspace

 • Pragmatic           response to new class of p...
TAXONOMY: 1




                         E. Eifrem, no:sql(east)

STS 3/10/2010        7               Mike Miller UW/Clou...
TAXONOMY: 2




                         E. Eifrem, no:sql(east)

STS 3/10/2010        8                  Mike Miller UW/C...
EMIL’S CLASSIFICATION




          but neglects: concurrency, performance, reliability...

STS 3/10/2010                 ...
Cl
                 os
                   in
                      gi
                        n
                          ...
CouchDB
 • easy, flexible, robust, concurrent, replicates

 • Document: Schema-free         database server

 • RESTful    ...
OF THE WEB

                   Django may be built for the Web, but
                  CouchDB is built of the Web. I've ne...
SHOULD YOU CARE?

                The internet happened, and we ignored it.
                    In retrospect that was a m...
FROM INTEREST TO ADOPTION




 • 100+         production apps         • Active
                                           ...
LETS TAKE A LOOK



STS 3/10/2010          15          Mike Miller UW/Cloudant
DOCUMENTS




 • Documents        are JSON Objects

 • Underscore-prefixed        fields are reserved

 • Documents        c...
REST API
 • Create
    PUT /mydb/mydocid

 • Retrieve
    GET /mydb/mydocid           etag header
                        ...
STS 3/10/2010   18   Mike Miller UW/Cloudant
ROBUST
 • Never         overwrite previously committed data

 • Documents          stored in append, only b+trees, ‘copy-o...
CONCURRENT

 • Erlang approach: lightweight processes to model the natural
    concurrency in a problem

 • For      Couch...
CONCURRENCY IN ACTION: BBC




 • WYSIWYG: graceful   at scale: load, volume, concurrency
STS 3/10/2010                  2...
VIEWS: B.Y.O. INDEX
 • Custom, persistent   representations of document data

 • “Close  to the metal” -- no dynamic queri...
VIEWS: INCREMENTAL
 • Computing     a view can be expensive, so CouchDB saves the
    result in a B-tree and keeps it up-t...
VIEWS: EXAMPLE


         key      value




STS 3/10/2010             24     Mike Miller UW/Cloudant
DOCUMENTS BY AUTHOR


         key    value




STS 3/10/2010           25   Mike Miller UW/Cloudant
WORD COUNT




STS 3/10/2010       26       Mike Miller UW/Cloudant
RELATIONAL




STS 3/10/2010       27       Mike Miller UW/Cloudant
REPLICATION
 • Peer-based, bi-directional       replication using normal HTTP calls

 • Mediated    by a replicator proces...
REPLICATION
   Source                                Replicator                              Target

                  Wha...
MASTER-SLAVE




STS 3/10/2010        30        Mike Miller UW/Cloudant
MASTER-MASTER




STS 3/10/2010         31        Mike Miller UW/Cloudant
ROBUST MULTI-MASTER




STS 3/10/2010            32     Mike Miller UW/Cloudant
CONFLICT RESOLUTION

 • Replication   can introduce conflicts in a multi-master setup

 • CouchDB     deterministically cho...
CONFLICT RESOLUTION
       Case 1: save the same update to multiple replication partners
                PUT /a/foo       ...
CONFLICT RESOLUTION
                Case 2: save different updates to the same document
                PUT /a/foo        ...
CONFLICT RESOLUTION




                    replicate

                   No Conflict
STS 3/10/2010          36       Mike ...
CONFLICT RESOLUTION
         save different updates to the same document to replication
                                  ...
REPLICATION IN ACTION: UBUNTU 9.10




                “When Ubuntu 9.10 is released on October 29th every single Ubuntu u...
STUFF I DIDN’T TALK ABOUT
 •   Server-side processing: validate, upsert, show, list
 •   Authentication using HTTP Basic, ...
TAKE IT FOR A SPIN

         hosted free:
        cloudant.com



                               easy offline:
            ...
relax
STS 3/10/2010     41    Mike Miller UW/Cloudant
Upcoming SlideShare
Loading in...5
×

20100310 Miller Sts

2,708

Published on

My 10 minute overview of nosql and 20 minutes on couchdb. Additional slides on mapreduce views for complex queries added.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,708
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "20100310 Miller Sts"

  1. 1. Of fi cia lB et a! Apache CouchDB 1
  2. 2. INTRODUCTIONS • Mike Miller • not an Apache CouchDB committer • ParticlePhysics Faculty at U. of Washington • Cloudant cofounder, Chief Scientist • Background: machine learning, analysis, big data, globally distributed systems STS 3/10/2010 2 Mike Miller UW/Cloudant
  3. 3. NEW PROBLEMS => NEW SOLUTIONS ( http://www.economist.com/specialreports/ PrinterFriendly.cfm?story_id=15557443) Big Data Dynamic, Realtime STS 3/10/2010 3 Mike Miller UW/Cloudant
  4. 4. NEW PROBLEMS => NEW SOLUTIONS “old” “new” Scale this horizontally/linearly STS 3/10/2010 4 Mike Miller UW/Cloudant
  5. 5. CAP THEOREM read record consistency availability partition tolerance write record masterless, scalable Brewer: Pick two (at a time) (e.g., Cloudant’s Couch) STS 3/10/2010 5 Mike Miller UW/Cloudant
  6. 6. NoSQL: Not Only SQL • Name blame: Eric Evans from Rackspace • Pragmatic response to new class of problems poorly fit to RDBMS: • heterogeneous/dynamic data, geographic/network separation, scaling/concurrency • Not an abandonment of 40+ years of DB research • From home-brew => general solutions • Best collection of online talks: https://nosqleast.com/2009/#agenda STS 3/10/2010 6 Mike Miller UW/Cloudant
  7. 7. TAXONOMY: 1 E. Eifrem, no:sql(east) STS 3/10/2010 7 Mike Miller UW/Cloudant
  8. 8. TAXONOMY: 2 E. Eifrem, no:sql(east) STS 3/10/2010 8 Mike Miller UW/Cloudant
  9. 9. EMIL’S CLASSIFICATION but neglects: concurrency, performance, reliability... STS 3/10/2010 9 Mike Miller UW/Cloudant
  10. 10. Cl os in gi n on 1.0 Apache CouchDB 10
  11. 11. CouchDB • easy, flexible, robust, concurrent, replicates • Document: Schema-free database server • RESTful JSON API • views: MapReduce system for generating custom • replication: Bi-directional incremental • couchapp: lightweight HTML+JavaScript apps served directly from CouchDB using views to transform JSON STS 3/10/2010 11 Mike Miller UW/Cloudant
  12. 12. OF THE WEB Django may be built for the Web, but CouchDB is built of the Web. I've never seen software that so completely embraces the philosophies behind HTTP ... this is what the software of the future looks like. Jacob Kaplan-Moss October 17 2007 http://jacobian.org/writing/of-the-web/ STS 3/10/2010 12 Mike Miller UW/Cloudant
  13. 13. SHOULD YOU CARE? The internet happened, and we ignored it. In retrospect that was a mistake. Paraphrased from Bill Warner (Founder Avid, Wildfire) YCombinator Summer, 2008 CouchDB: An enabling technology STS 3/10/2010 13 Mike Miller UW/Cloudant
  14. 14. FROM INTEREST TO ADOPTION • 100+ production apps • Active commercial •3 development books being written • Rapidly maturing • Vibrant, open STS 3/10/2010 community 14 Mike Miller UW/Cloudant
  15. 15. LETS TAKE A LOOK STS 3/10/2010 15 Mike Miller UW/Cloudant
  16. 16. DOCUMENTS • Documents are JSON Objects • Underscore-prefixed fields are reserved • Documents can have binary attachments • MVCC _rev deterministically generated from doc content STS 3/10/2010 16 Mike Miller UW/Cloudant
  17. 17. REST API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid etag header plug-n-play cache • Update PUT /mydb/mydocid • Delete DELETE /mydb/mydocid STS 3/10/2010 17 Mike Miller UW/Cloudant
  18. 18. STS 3/10/2010 18 Mike Miller UW/Cloudant
  19. 19. ROBUST • Never overwrite previously committed data • Documents stored in append, only b+trees, ‘copy-on-write’ • In the event of a server crash or power failure, just restart CouchDB -- there is no “repair” • Take snapshots with “cp” • Configurable levels of durability: can choose to fsync after every update, or less often to gain better throughput STS 3/10/2010 19 Mike Miller UW/Cloudant
  20. 20. CONCURRENT • Erlang approach: lightweight processes to model the natural concurrency in a problem • For CouchDB that means one process per TCP connection • Lock-free architecture; each process works with an MVCC snapshot of a DB. • Performance degrades gracefully under heavy concurrent load STS 3/10/2010 20 Mike Miller UW/Cloudant
  21. 21. CONCURRENCY IN ACTION: BBC • WYSIWYG: graceful at scale: load, volume, concurrency STS 3/10/2010 21 Mike Miller UW/Cloudant
  22. 22. VIEWS: B.Y.O. INDEX • Custom, persistent representations of document data • “Close to the metal” -- no dynamic queries in production, so you know exactly what you’re getting • Generated using MapReduce functions written in JavaScript (and other languages) • Each view must have a map function and may also have a reduce function • Leverages view collation, rich view query API STS 3/10/2010 22 Mike Miller UW/Cloudant
  23. 23. VIEWS: INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Leaf nodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html STS 3/10/2010 23 Mike Miller UW/Cloudant
  24. 24. VIEWS: EXAMPLE key value STS 3/10/2010 24 Mike Miller UW/Cloudant
  25. 25. DOCUMENTS BY AUTHOR key value STS 3/10/2010 25 Mike Miller UW/Cloudant
  26. 26. WORD COUNT STS 3/10/2010 26 Mike Miller UW/Cloudant
  27. 27. RELATIONAL STS 3/10/2010 27 Mike Miller UW/Cloudant
  28. 28. REPLICATION • Peer-based, bi-directional replication using normal HTTP calls • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function (coming soon) • Applications (_design documents) replicate along with the data • Ideal for offline applications -- “ground computing” STS 3/10/2010 28 Mike Miller UW/Cloudant
  29. 29. REPLICATION Source Replicator Target What updates do you have since X? Which of these GET /db/_changes?since=X are you missing? POST /db/_missing_revs Target needs these id@rev documents Save these documents [GET /db/doc?open_revs=...] POST /db/_bulk_docs Record the latest source update seen by the target STS 3/10/2010 29 Mike Miller UW/Cloudant
  30. 30. MASTER-SLAVE STS 3/10/2010 30 Mike Miller UW/Cloudant
  31. 31. MASTER-MASTER STS 3/10/2010 31 Mike Miller UW/Cloudant
  32. 32. ROBUST MULTI-MASTER STS 3/10/2010 32 Mike Miller UW/Cloudant
  33. 33. CONFLICT RESOLUTION • Replication can introduce conflicts in a multi-master setup • CouchDB deterministically chooses a winner; the loser is saved as a conflict revision • Conflict revisions are replicated; if you replicate A→B and B→A, both will agree on the winning and losing revisions STS 3/10/2010 33 Mike Miller UW/Cloudant
  34. 34. CONFLICT RESOLUTION Case 1: save the same update to multiple replication partners PUT /a/foo PUT /b/foo replicate No Conflict STS 3/10/2010 34 Mike Miller UW/Cloudant
  35. 35. CONFLICT RESOLUTION Case 2: save different updates to the same document PUT /a/foo PUT /b/foo replicate Conflict STS 3/10/2010 35 Mike Miller UW/Cloudant
  36. 36. CONFLICT RESOLUTION replicate No Conflict STS 3/10/2010 36 Mike Miller UW/Cloudant
  37. 37. CONFLICT RESOLUTION save different updates to the same document to replication partners STS 3/10/2010 37 Mike Miller UW/Cloudant
  38. 38. REPLICATION IN ACTION: UBUNTU 9.10 “When Ubuntu 9.10 is released on October 29th every single Ubuntu user will have an address book stored in CouchDB that replicates with one.ubuntu.com, and Tomboy notes that are replicated via a web API at the application but then stored in CouchDB and carried along in the CouchDB replication that we have set up. Optionally they can also store all their Firefox bookmarks in CouchDB and have those replicated as well. We'll be doing our best to help teach application developers to use CouchDB in order to ʻcloud-enableʼ their apps. A couch on every desktop!” Elliot Murphy, 10/13/2009 replication is a potential game-changer STS 3/10/2010 38 Mike Miller UW/Cloudant
  39. 39. STUFF I DIDN’T TALK ABOUT • Server-side processing: validate, upsert, show, list • Authentication using HTTP Basic, Cookie, OAuth • couchapp: serve lightweight HTML+JavaScript web apps directly out of CouchDB using _show and _list functions to transform JSON [1] • _external indexers, including full-text search powered by CouchDB-Lucene [2] • The many excellent client libraries, view servers, etc. contributed by the community [3] • SQLite wrapper for CouchDB [4], and much more... [1]: http://github.com/couchapp/couchapp [2]: http://github.com/rnewson/couchdb-lucene [3]: http://wiki.apache.org/couchdb/Related_Projects [4]: http://apsw.googlecode.com/svn/publish/couchdb.html STS 3/10/2010 39 Mike Miller UW/Cloudant
  40. 40. TAKE IT FOR A SPIN hosted free: cloudant.com easy offline: CouchDBX STS 3/10/2010 40 Mike Miller UW/Cloudant
  41. 41. relax STS 3/10/2010 41 Mike Miller UW/Cloudant
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×