LEARNING TO RELAX:
   CouchDB for Beginners
        Windy City DB

              1
OUTLINE

• Introduction     and Overview
• CouchDB       Basics
• Special Topics    in Relaxation: Scaling CouchDB
• Use  ...
HI
• Alan    Hoffman
  • @_hoffman
  • alan@cloudant.com


• Experimental    particle physicist

• Background: machine    ...
COUCH: THE BIG PICTURE
• Apache         project

• Schema-free        document database management system

• Robust, concu...
WHO CARES?

                The internet happened, and we ignored it.
                    In retrospect, that was a mistak...
DOCUMENTS
                                            Primary Key

                                               MVCC
   ...
RESTFUL API
 •   Create
     PUT /mydb/mydocid
 •   Retrieve
     GET /mydb/mydocid                     “Built of the Web
...
VIEWS
                                          value



                                    ap                     du ce
...
INCREMENTAL
• Computing    a view can be expensive, so CouchDB saves
  the result in a B-tree and keeps it up-to-date
• On...
ROBUST

•   Never overwrite previously committed
    data

•   Append only b+trees, ‘copy-on-write’

•   Server crash, pow...
REPLICATION
source               target


                                   progress




                  The beauty of ...
REPLICATION
•   Peer-based, bi-directional replication using normal HTTP
•   Mediated by a replicator process which can
  ...
FILTERED REPLICATION

                       Write the filter function



                                  Embed it in a d...
MULTI-COUCH SETUPS
          Master-Slave        Robust Multi-Master




         Master-Master




Windy City DB         ...
CONFLICTS
             PUT /a/foo                                        PUT /b/foo



                                   ...
BUILDING A BIG COUCH




                             D oesn’t
                Why CouchDB ^ Doesn’t Scale
Windy City DB  ...
WHAT WE TALK ABOUT WHEN WE
          TALK ABOUT SCALING
•   Horizontal scaling: more servers creates more capacity
•   Tra...
COUCHDB LOUNGE
•   Proxy-based partitioning and clustering             PUT/GET

    application
•   Designed originally fo...
OPEN CLOUDANT
                                                                                  •        Clustering in a r...
IN THE WILD




• 15+  million deployments
                                  • Activecommercial support
• 3 books
        ...
CASE #1: REALTIME ANALYTICS

•   Analytics on high-rate advertising data
•   ETL analysis workflow too slow for their custo...
MONEY QUOTE

           Migrating to CouchDB really opened a lot of doors
           for us product-wise. The time delay b...
CASE #2: EASYBIB
•   Online bibliography service, ~10 years old, initially built on MySQL
    (and Coldfusion)
•   Had suf...
CASE #3: MEEBO
• “All
     your friends and networks, from wherever you are.”
• Why Couch?
   • No Schema (and ergo, no sc...
PARAPHRASING THE MASSES
• Why           CouchDB?
   •   Simple, robust, concurrent, fun
   •   successful in production
• ...
PARAPHRASING THE MASSES
• Why           CouchDB?
   •   Simple, robust, concurrent, fun, scalable, powerful
   •   success...
PARAPHRASING THE MASSES
• Why           CouchDB?
   •   Simple, robust, concurrent, fun, scalable, powerful
   •   success...
DESERVING OF MORE TIME
• CouchApp:      HTML+JS framework for building
   lightweight, portable apps and serving them dire...
TRY IT OUT


Hosted Free:
Cloudant.com


                             Easy Offline:
                             CouchDBX

...
THANK YOU
• Books
   • CouchDB: The   Definitive Guide. J. Chris Anderson, Jan
    Lehnardt, Noah Slater
  • Beginning Couc...
relax
AUTHZ/AUTHN
• Remember, Couch          acts like a web service
• Authentication:
   •   0.11+ ships with support for OAuth...
EXAMPLES
           User Document




           Security Document
                                     Caution!
         ...
DRAWBACKS

• “Futon  -- difficult to use for installations that have a lot
   of DBs (1000+)”
• “Tools        for managing ...
Upcoming SlideShare
Loading in...5
×

Learning To Relax

6,914

Published on

Alan Hoffman's WindyCityDB talk on beginning CouchDB.

Published in: Technology, Education
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,914
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide



































  • Learning To Relax

    1. 1. LEARNING TO RELAX: CouchDB for Beginners Windy City DB 1
    2. 2. OUTLINE • Introduction and Overview • CouchDB Basics • Special Topics in Relaxation: Scaling CouchDB • Use Cases In the Wild • Takeaways Windy City DB 2 June 26, 2010
    3. 3. HI • Alan Hoffman • @_hoffman • alan@cloudant.com • Experimental particle physicist • Background: machine learning, big data analysis, distributed systems • Co-founder of Cloudant (Hosted Couch) • Not a committer, but... Windy City DB 3 June 26, 2010
    4. 4. COUCH: THE BIG PICTURE • Apache project • Schema-free document database management system • Robust, concurrent, fault-tolerant • RESTful JSON API • Custom persistent views using MapReduce • Bi-directional incremental replication • Futon web admin console Windy City DB 4 June 26, 2010
    5. 5. WHO CARES? The internet happened, and we ignored it. In retrospect, that was a mistake. -Bill Warner (Avid, Wildfire, Techstars) Summer, 2008 Disruptive technologies enable new business Windy City DB 5 June 26, 2010
    6. 6. DOCUMENTS Primary Key MVCC & Insta-cache Nested Structures • Reserved fields are prefixed with an underscore • MVCC _rev deterministically generated from doc content Binary Attachments • Binary attachments Windy City DB 6 June 26, 2010
    7. 7. RESTFUL API • Create PUT /mydb/mydocid • Retrieve GET /mydb/mydocid “Built of the Web Completely embraces... HTTP” • Update PUT /mydb/mydocid -Jacob Kaplan-Moss • Delete October 2007 DELETE /mydb/mydocid GET /mydb/_all_docs?include_docs=true http://wiki.apache.org/couchdb/Reference Windy City DB 7 June 26, 2010
    8. 8. VIEWS value ap du ce m re key • Docs can be indexed by any attribute using views. Custom, persistent representations of the data. • Each view must have a map function and may also have a reduce function • View indices are stored in B-trees for efficient lookup by map key • Stored in special documents called _design documents Windy City DB 8 June 26, 2010
    9. 9. INCREMENTAL • Computing a view can be expensive, so CouchDB saves the result in a B-tree and keeps it up-to-date • Only new docs or changed docs get ‘re-indexed’ • Leaf nodes store map results, inner nodes store reductions of children http://horicky.blogspot.com/2008/10/couchdb-implementation.html Windy City DB 9 June 26, 2010
    10. 10. ROBUST • Never overwrite previously committed data • Append only b+trees, ‘copy-on-write’ • Server crash, power failure? just restart CouchDB -- there is no “repair” • Take snapshots with “cp” J.C. Anderson • ACID at the single document level Windy City DB 10 June 26, 2010
    11. 11. REPLICATION source target progress The beauty of MVCC one click CouchDB => “Cloud ready” Windy City DB 11 June 26, 2010
    12. 12. REPLICATION • Peer-based, bi-directional replication using normal HTTP • Mediated by a replicator process which can live on the source, target, or somewhere else entirely • Replicate a subset of documents in a DB meeting criteria defined in a custom filter function • Applications (_design documents) replicate along with the data • Ideal for offline applications: “ground computing” Windy City DB 12 June 26, 2010
    13. 13. FILTERED REPLICATION Write the filter function Embed it in a design doc Specify in the replication request Windy City DB 13 June 26, 2010
    14. 14. MULTI-COUCH SETUPS Master-Slave Robust Multi-Master Master-Master Windy City DB 14 June 26, 2010
    15. 15. CONFLICTS PUT /a/foo PUT /b/foo replicate Conflict • Replication can introduce conflicts in a multi-master setup • CouchDB deterministically chooses a winner but the loser is saved with the document as a conflicting rev • Conflicting revs are replicated; both source and target will agree on winning and losing revs • Compacting the DB removes all losing revs Windy City DB 15 June 26, 2010
    16. 16. BUILDING A BIG COUCH D oesn’t Why CouchDB ^ Doesn’t Scale Windy City DB 16 June 26, 2010
    17. 17. WHAT WE TALK ABOUT WHEN WE TALK ABOUT SCALING • Horizontal scaling: more servers creates more capacity • Transparent to the application: adding more capacity should not affect the business logic of the application. • No single point of failure. Physics Joke! Pseudo Scalars http://adam.heroku.com/past/2009/7/6/sql_databases_dont_scale/ Windy City DB 17 June 26, 2010
    18. 18. COUCHDB LOUNGE • Proxy-based partitioning and clustering PUT/GET application • Designed originally for use at Meebo Dumbproxy (nginx) • Uses consistent hashing to partition docs across nodes • Dumbproxy - nginx module that handles simple GETs and PUTs • Smartproxy - A twisted/python daemon that handles view requests Smartproxy • Want to know more? R. Leeds (tilgovi) http://tilgovi.github.com/couchdb-lounge/ GET /_deisgn/... Windy City DB 18 June 26, 2010
    19. 19. OPEN CLOUDANT • Clustering in a ring (a la Dynamo) PUT http://alan.cloudant.com/dbname/blah?w=2 • Any node can handle a request • O(1) lookup N=3 Load Balancer • Quorum system (N, R, W) W=2 R=2 • Views distributed like documents 24 Node 1 No • Distributed erlang de A B C D de No B 2 Y Z A C D • Masterless X hash(blah) = E E C N od ✓ Horiziontally Scalable e D 3 E ✓ No SPOF F ✓ Transparent to the D application No E de 4 F Coming soon to a G github near you! Windy City DB 19 June 26, 2010
    20. 20. IN THE WILD • 15+ million deployments • Activecommercial support • 3 books • 1.0 imminent • Vibrant, open community Windy City DB 20 June 26, 2010
    21. 21. CASE #1: REALTIME ANALYTICS • Analytics on high-rate advertising data • ETL analysis workflow too slow for their customers (24 hr cycle) • Needed a realtime solution • Complicated SQL stored procedures for social graph analysis required 40+ postgres tables • Replaced it all with a single CouchDB document type and two views: • group level collation to bin data at multiple granularities => customers get updated results in seconds, not hours • single view (30 lines of JS) for graph analysis. Windy City DB 21 June 26, 2010
    22. 22. MONEY QUOTE Migrating to CouchDB really opened a lot of doors for us product-wise. The time delay between data arriving in our systems and becoming available to our customers went from 24 hours to less than 30 min - on similar hardware - even while we greatly increased the level of granularity that our processing provided Windy City DB 22 June 26, 2010
    23. 23. CASE #2: EASYBIB • Online bibliography service, ~10 years old, initially built on MySQL (and Coldfusion) • Had suffered through many migrations • Choice: massive sharding and replication of MySQL v. “another option” • Why Couch: • Schema Free (replacing 40 - 50 tables with 3 DBs) • Easily scalable • Strong community support “In your best Borat voice: ‘Great Success!’” Windy City DB 23 June 26, 2010
    24. 24. CASE #3: MEEBO • “All your friends and networks, from wherever you are.” • Why Couch? • No Schema (and ergo, no schema migrations) • Replication • Could deal with queries that would break on a sharded RDBMS • REST interface -- easy to re-use existing tools and libraries • Easy to write a proxy layer that keeps sharding out of the app logic • Wishes? Speed, API stability, native clustering Windy City DB 24 June 26, 2010
    25. 25. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun • successful in production • Why Not Couch? • Missing Features • ad hoc queries • authz/authn • doesn’t scale • Too New -- api still changing, still alpha • “Too Slow” Windy City DB 25 June 26, 2010
    26. 26. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun, scalable, powerful • successful in production, active community, industry adoption • Why Not Couch? • Missing Features • ad hoc queries • authz/authn • doesn’t scale • Too New -- api still changing, still alpha • “Too Slow” Windy City DB 25 June 26, 2010
    27. 27. PARAPHRASING THE MASSES • Why CouchDB? • Simple, robust, concurrent, fun, scalable, powerful • successful in production, active community, industry adoption • Why Not Couch? • Missing Features • ad hoc queries True, by design • authz/authn Included in 0.11 • doesn’t scale Lounge, Pillow, Open Cloudant, etc • Too New -- api still changing, still alpha • “Too Slow” 0.11 Feature freeze and 1.0 imminent Perhaps, but... Windy City DB 25 June 26, 2010
    28. 28. DESERVING OF MORE TIME • CouchApp: HTML+JS framework for building lightweight, portable apps and serving them directly from CouchDB • http://github.com/couchapp/couchapp/ • External indexers like CouchDB-Lucene • http://github.com/rnewson/couchdb-lucene • The plethora of client libraries and tools... Windy City DB 26 June 26, 2010
    29. 29. TRY IT OUT Hosted Free: Cloudant.com Easy Offline: CouchDBX Windy City DB 27 June 26, 2010
    30. 30. THANK YOU • Books • CouchDB: The Definitive Guide. J. Chris Anderson, Jan Lehnardt, Noah Slater • Beginning CouchDB. Joe Lennon • Web • http://wiki.apache.org/couchdb/ • http://planet.couchdb.org/ • IRC • Freenode #couchdb • Freenode #cloudant Windy City DB 28 June 26, 2010
    31. 31. relax
    32. 32. AUTHZ/AUTHN • Remember, Couch acts like a web service • Authentication: • 0.11+ ships with support for OAuth, cookie, and basic • Handlers specified in a config file • Users defined in authentication database (“_users” by default) • Authorization • 3 levels: DB reader, DB admin, Server Admin • Per DB roles defined in security document Windy City DB 30 June 26, 2010
    33. 33. EXAMPLES User Document Security Document Caution! Do not leave arrays blank http://wiki.apache.org/couchdb/ Security_Features_Overview Windy City DB 31 June 26, 2010
    34. 34. DRAWBACKS • “Futon -- difficult to use for installations that have a lot of DBs (1000+)” • “Tools for managing design docs are deficient” • “Client libraries too focused on Couch as the ‘M’ in MVC apps.” • “Couch 1.0 is a moving target” Windy City DB 32 June 26, 2010

    ×