• Save
Learning To Relax
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Learning To Relax

  • 7,499 views
Uploaded on

Alan Hoffman's WindyCityDB talk on beginning CouchDB.

Alan Hoffman's WindyCityDB talk on beginning CouchDB.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
7,499
On Slideshare
5,829
From Embeds
1,670
Number of Embeds
12

Actions

Shares
Downloads
0
Comments
0
Likes
15

Embeds 1,670

http://nosql.mypopescu.com 1,058
http://blog.cloudant.com 319
http://www.nosqldatabases.com 219
https://cloudant.com 22
http://cloudant.new 16
http://cloudant.codebymonkey.com 12
http://lanyrd.com 11
http://www.bookandman.com 6
http://static.slidesharecdn.com 3
http://translate.googleusercontent.com 2
http://tweetree.com 1
http://lanyrd.dev 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Transcript

  • 1. LEARNING TO RELAX: CouchDB for Beginners Windy City DB 1
  • 2. OUTLINE • Introduction and Overview • CouchDB Basics • Special Topics in Relaxation: Scaling CouchDB • Use Cases In the Wild • Takeaways Windy City DB 2 June 26, 2010
  • 3. HI • Alan Hoffman • @_hoffman • alan@cloudant.com • Experimental particle physicist • Background: machine learning, big data analysis, distributed systems • Co-founder of Cloudant (Hosted Couch) • Not a committer, but... Windy City DB 3 June 26, 2010
  • 4. COUCH: THE BIG PICTURE • Apache project • Schema-free document database management system • Robust, concurrent, fault-tolerant • RESTful JSON API • Custom persistent views using MapReduce • Bi-directional incremental replication • Futon web admin console Windy City DB 4 June 26, 2010
  • 5. WHO CARES? The internet happened, and we ignored it. In retrospect, that was a mistake. -Bill Warner (Avid, Wildfire, Techstars) Summer, 2008 Disruptive technologies enable new business Windy City DB 5 June 26, 2010
  • 6. DOCUMENTS Primary Key MVCC & Insta-cache Nested Structures • Reserved fields are prefixed with an underscore • MVCC _rev deterministically generated from doc content Binary Attachments • Binary attachments Windy City DB 6 June 26, 2010
  • 7. RESTFUL API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid “Built of the Web Completely embraces... HTTP” • Update PUT /mydb/mydocid -Jacob Kaplan-Moss • Delete October 2007 DELETE /mydb/mydocid GET /mydb/_all_docs?include_docs=true http://wiki.apache.org/couchdb/Reference Windy City DB 7 June 26, 2010
  • 8. VIEWS value ap du ce m re key • Docs can be indexed by any attribute using views. Custom, persistent representations of the data. • Each view must have a map function and may also have a reduce function • View indices are stored in B-trees for efficient lookup by map key • Stored in special documents called _design documents Windy City DB 8 June 26, 2010
  • 9. INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Only new docs or changed docs get ‘re-indexed’ • Leaf nodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html Windy City DB 9 June 26, 2010
  • 10. ROBUST • Never overwrite previously committed data • Append only b+trees, ‘copy-on-write’ • Server crash, power failure? just restart CouchDB -- there is no “repair” • Take snapshots with “cp” J.C. Anderson • ACID at the single document level Windy City DB 10 June 26, 2010
  • 11. REPLICATION source target progress The beauty of MVCC one click CouchDB => “Cloud ready” Windy City DB 11 June 26, 2010
  • 12. REPLICATION • Peer-based, bi-directional replication using normal HTTP • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function • Applications (_design documents) replicate along with the data • Ideal for offline applications: “ground computing” Windy City DB 12 June 26, 2010
  • 13. FILTERED REPLICATION Write the filter function Embed it in a design doc Specify in the replication request Windy City DB 13 June 26, 2010
  • 14. MULTI-COUCH SETUPS Master-Slave Robust Multi-Master Master-Master Windy City DB 14 June 26, 2010
  • 15. CONFLICTS PUT /a/foo PUT /b/foo replicate Conflict • Replication can introduce conflicts in a multi-master setup • CouchDB deterministically chooses a winner but the loser is saved with the document as a conflicting rev • Conflicting revs are replicated; both source and target will agree on winning and losing revs • Compacting the DB removes all losing revs Windy City DB 15 June 26, 2010
  • 16. BUILDING A BIG COUCH D oesn’t Why CouchDB ^ Doesn’t Scale Windy City DB 16 June 26, 2010
  • 17. WHAT WE TALK ABOUT WHEN WE TALK ABOUT SCALING • Horizontal scaling: more servers creates more capacity • Transparent to the application: adding more capacity should not affect the business logic of the application. • No single point of failure. Physics Joke! Pseudo Scalars http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/ Windy City DB 17 June 26, 2010
  • 18. COUCHDB LOUNGE • Proxy-based partitioning and clustering PUT/GET application • Designed originally for use at Meebo Dumbproxy (nginx) • Uses consistent hashing to partition docs across nodes • Dumbproxy - nginx module that handles simple GETs and PUTs • Smartproxy - A twisted/python daemon that handles view requests Smartproxy • Want to know more? R. Leeds (tilgovi) http://tilgovi.github.com/couchdb-lounge/ GET /_deisgn/... Windy City DB 18 June 26, 2010
  • 19. OPEN CLOUDANT • Clustering in a ring (a la Dynamo) PUT http://alan.cloudant.com/dbname/blah?w=2 • Any node can handle a request • O(1) lookup N=3 Load Balancer • Quorum system (N, R, W) W=2 R=2 • Views distributed like documents 24 Node 1 No • Distributed erlang de A B C D de No B 2 Y Z A C D • Masterless X hash(blah) = E E C N od ✓ Horiziontally Scalable e D 3 E ✓ No SPOF F ✓ Transparent to the D application No E de 4 F Coming soon to a G github near you! Windy City DB 19 June 26, 2010
  • 20. IN THE WILD • 15+ million deployments • Activecommercial support • 3 books • 1.0 imminent • Vibrant, open community Windy City DB 20 June 26, 2010
  • 21. CASE #1: REALTIME ANALYTICS • Analytics on high-rate advertising data • ETL analysis workflow too slow for their customers (24 hr cycle) • Needed a realtime solution • Complicated SQL stored procedures for social graph analysis required 40+ postgres tables • Replaced it all with a single CouchDB document type and two views: • group level collation to bin data at multiple granularities => customers get updated results in seconds, not hours • single view (30 lines of JS) for graph analysis. Windy City DB 21 June 26, 2010
  • 22. MONEY QUOTE Migrating to CouchDB really opened a lot of doors for us product-wise. The time delay between data arriving in our systems and becoming available to our customers went from 24 hours to less than 30 min - on similar hardware - even while we greatly increased the level of granularity that our processing provided Windy City DB 22 June 26, 2010
  • 23. CASE #2: EASYBIB • Online bibliography service, ~10 years old, initially built on MySQL (and Coldfusion) • Had suffered through many migrations • Choice: massive sharding and replication of MySQL v. “another option” • Why Couch: • Schema Free (replacing 40 - 50 tables with 3 DBs) • Easily scalable • Strong community support “In your best Borat voice: ‘Great Success!’” Windy City DB 23 June 26, 2010
  • 24. CASE #3: MEEBO • “All your friends and networks, from wherever you are.” • Why Couch? • No Schema (and ergo, no schema migrations) • Replication • Could deal with queries that would break on a sharded RDBMS • REST interface -- easy to re-use existing tools and libraries • Easy to write a proxy layer that keeps sharding out of the app logic • Wishes? Speed, API stability, native clustering Windy City DB 24 June 26, 2010
  • 25. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun • successful in production • Why Not Couch? • Missing Features • ad hoc queries • authz/authn • doesn’t scale • Too New -- api still changing, still alpha • “Too Slow” Windy City DB 25 June 26, 2010
  • 26. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun, scalable, powerful • successful in production, active community, industry adoption • Why Not Couch? • Missing Features • ad hoc queries • authz/authn • doesn’t scale • Too New -- api still changing, still alpha • “Too Slow” Windy City DB 25 June 26, 2010
  • 27. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun, scalable, powerful • successful in production, active community, industry adoption • Why Not Couch? • Missing Features • ad hoc queries True, by design • authz/authn Included in 0.11 • doesn’t scale Lounge, Pillow, Open Cloudant, etc • Too New -- api still changing, still alpha • “Too Slow” 0.11 Feature freeze and 1.0 imminent Perhaps, but... Windy City DB 25 June 26, 2010
  • 28. DESERVING OF MORE TIME • CouchApp: HTML+JS framework for building lightweight, portable apps and serving them directly from CouchDB • http://github.com/couchapp/couchapp/ • External indexers like CouchDB-Lucene • http://github.com/rnewson/couchdb-lucene • The plethora of client libraries and tools... Windy City DB 26 June 26, 2010
  • 29. TRY IT OUT Hosted Free: Cloudant.com Easy Offline: CouchDBX Windy City DB 27 June 26, 2010
  • 30. THANK YOU • Books • CouchDB: The Definitive Guide. J. Chris Anderson, Jan Lehnardt, Noah Slater • Beginning CouchDB. Joe Lennon • Web • http://wiki.apache.org/couchdb/ • http://planet.couchdb.org/ • IRC • Freenode #couchdb • Freenode #cloudant Windy City DB 28 June 26, 2010
  • 31. relax
  • 32. AUTHZ/AUTHN • Remember, Couch acts like a web service • Authentication: • 0.11+ ships with support for OAuth, cookie, and basic • Handlers specified in a config file • Users defined in authentication database (“_users” by default) • Authorization • 3 levels: DB reader, DB admin, Server Admin • Per DB roles defined in security document Windy City DB 30 June 26, 2010
  • 33. EXAMPLES User Document Security Document Caution! Do not leave arrays blank http://wiki.apache.org/couchdb/ Security_Features_Overview Windy City DB 31 June 26, 2010
  • 34. DRAWBACKS • “Futon -- difficult to use for installations that have a lot of DBs (1000+)” • “Tools for managing design docs are deficient” • “Client libraries too focused on Couch as the ‘M’ in MVC apps.” • “Couch 1.0 is a moving target” Windy City DB 32 June 26, 2010