Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real World CouchDB

22,653 views

Published on

CouchDB has several features that help it stand out from the other databases in this rapidly growing field. Incremental map/reduce, peer to peer replication, mobile device synchronization, a realtime update feed, and the ability to host an application in the database itself (also known as a Couchapp) are just a few. See how companies such as the BBC, Radical Dynamic, Signal, and Incandescent Software are using CouchDB to solve their real world challenges.

Published in: Technology, Business, Education

Real World CouchDB

  1. 1. Real World John Wood Windy City DB 2011 @johnpwood
  2. 2. About Me
  3. 3. • Software Developer at Signal• Coding for about 15 years• Working with CouchDB for 2.5 years (in production for about 2 years)• Enjoy tinkering with data storage solutions
  4. 4. CouchDB Overview
  5. 5. Document Database{ “_id” : “2d7f015226a05b6940984bbe39004fde”, “_rev” : “2-477f6ab2dec6df185de1a078d270d8”, “first_name” : “John”, “last_name” : “Wood”, “interests” : [“hacking”, “fishing”, “reading”], “offspring” : [ { “name” : “Dylan”, “age” : 6 }, { “name” : “Chloe”, “age” : 3 } ]}
  6. 6. Strong Focus on Replication
  7. 7. RESTful API# CreatePOST http://localhost:5984/employees# ReadGET http://localhost:5984/employees/1# UpdatePUT http://localhost:5984/employees/1# DeleteDELETE http://localhost:5984/employees/1
  8. 8. Queried and Indexed with MapReduce function(doc) { if (doc.first_name == “John”) emit(doc._id, 1); }function(keys, values, rereduce) { return sum(values);}
  9. 9. Ultra Durable
  10. 10. Erlang OTP
  11. 11. Views
  12. 12. MapReduce// Mapfunction(doc) { emit(doc._id, 1);}// Reducefunction(keys, values, rereduce) { return sum(values);}
  13. 13. MapReduce// Mapfunction(doc) { if (doc.dependents) { for (i in doc.dependents) { emit(doc._id, doc.dependents[i]); } }}// Reduce_count
  14. 14. MapReducefunction(keys, values, rereduce) { function sum(type_counts, totals, status) { if (type_counts[status]) { // OK or ERR if (!totals[status]) { totals[status] = new Object(); } var status_totals = totals[status]; var status_type_counts = type_counts[status]; for (key in status_type_counts) { // MO, MT, CM, etc. var count = status_type_counts[key]; if (!status_totals[key]) { status_totals[key] = count; } else { status_totals[key] += count; } } } } var totals = new Object(); // values should be something like // {"OK":{"MO":1234,"MT":1000,"CM":20},"ERR":{"MO":1,"MT": 1}} for (i = 0; i < values.length; i++) { var message_count = values[i]; sum(message_count, totals, OK); } return totals;}
  15. 15. Design Documents{ "_id": "_design/stats", "views": { "total_employees": { "map": "function(doc) { emit(doc._id, 1); }", "reduce": "function(keys, values, rereduce) { returnsum(values); }" }, "by_lastname": { "map": "function(doc) { emit(doc.last_name, null); }" }, "dependents": { "map": "function(doc) { if (doc.dependents) { for (i indoc.dependents) { emit(doc._id, doc.dependents[i]); } } }", "reduce": "_count" } }}
  16. 16. { "_id": "1", "first_name": "Robert", { "last_name": "Johnson", "_id": "4", "date_hired": "2010/01/10", "first_name": "Bob", "dependents": [ { { "last_name": "Smith", { "first_name": "Margie", "last_name": "Johnson" }, "_id": "2", "_id": "3", "salary": 80000, { "first_name": "Charlie", "last_name": "Johnson" }, "first_name": "Jim", "first_name": "Sally", "date_hired": "2010/03/11", { "first_name": "Sophie", "last_name": "Johnson" } "last_name": "Jones", "last_name": "Stevenson", "dependents": [ ], "date_hired": "2010/02/11", "date_hired": "2010/04/23", { "first_name": "Susan", "last_name": "Smith" } "salary": 250000 "salary": 150000 "salary": 100000 ]} } } } MapReduce {"total_rows":4,"offset":0,"rows":[ {"id":"1","key":"1","value":{"first_name":"Margie","last_name":"Johnson"}}, {"id":"1","key":"1","value":{"first_name":"Charlie","last_name":"Johnson"}}, {"id":"1","key":"1","value":{"first_name":"Sophie","last_name":"Johnson"}}, {"id":"4","key":"4","value":{"first_name":"Susan","last_name":"Smith"}} ]}
  17. 17. { "_id": "1", "first_name": "Robert", { "last_name": "Johnson", "_id": "4", "date_hired": "2010/01/10", "first_name": "Bob", "dependents": [ { { "last_name": "Smith", { "first_name": "Margie", "last_name": "Johnson" }, "_id": "2", "_id": "3", "salary": 80000, { "first_name": "Charlie", "last_name": "Johnson" }, "first_name": "Jim", "first_name": "Sally", "date_hired": "2010/03/11", { "first_name": "Sophie", "last_name": "Johnson" } "last_name": "Jones", "last_name": "Stevenson", "dependents": [ ], "date_hired": "2010/02/11", "date_hired": "2010/04/23", { "first_name": "Susan", "last_name": "Smith" } "salary": 250000 "salary": 150000 "salary": 100000 ]} } } } MapReduce {"rows":[ {"key":null,"value":4} ]}
  18. 18. { "_id": "1", "first_name": "Robert", { "last_name": "Johnson", "_id": "4", "date_hired": "2010/01/10", "first_name": "Bob", "dependents": [ { { "last_name": "Smith", { "first_name": "Margie", "last_name": "Johnson" }, "_id": "2", "_id": "3", "salary": 80000, { "first_name": "Charlie", "last_name": "Johnson" }, "first_name": "Jim", "first_name": "Sally", "date_hired": "2010/03/11", { "first_name": "Sophie", "last_name": "Johnson" } "last_name": "Jones", "last_name": "Stevenson", "dependents": [ ], "date_hired": "2010/02/11", "date_hired": "2010/04/23", { "first_name": "Susan", "last_name": "Smith" } "salary": 250000 "salary": 150000 "salary": 100000 ]} } } } MapReduce {"rows":[ {"key":"1","value":3}, {"key":"4","value":1} ]}
  19. 19. View Structure http://guide.couchdb.org/editions/1/en/views.html
  20. 20. View Structure ?key="ch" http://guide.couchdb.org/editions/1/en/views.html
  21. 21. Real World Example
  22. 22. The Problem• Reports that utilized data in some large tables (30M+ rows) were taking a very long to create • Increasing query execution times • Occasional page timeouts• Limited resources for super powered hardware or the leading relational database product• Database migrations on these large tables were taking an increasingly long time to run
  23. 23. The Solution• Using CouchDB as an archive database• Migrated old data in tables to CouchDB, dramatically reducing the tables sizes, speeding up queries that were still hitting those tables• Re-wrote SQL queries as views to fetch data from the archive database, dramatically reducing the amount of time needed to fetch the old data• Views updated nightly with the new set of archived data
  24. 24. Replication
  25. 25. POST /_replicate{"source":"database", "target":"http://example.org/database"}
  26. 26. Unidirectional
  27. 27. Bidirectional
  28. 28. Continuous
  29. 29. Filteredfunction(doc, req) { if (doc.type && doc.type == "foo") { return true; } else { return false; }}
  30. 30. Filteredfunction(doc, req) { if (doc.type && doc.type == req.query.doc_type) { return true; } else { return false; }}
  31. 31. Named
  32. 32. Conflicts
  33. 33. "_conflicts":["2-7c971bb974251ae8541b8fe045964219"]
  34. 34. Finding Conflicts function(doc) { if (doc._conflicts) { emit(doc._conflicts, null); } }{"total_rows":1,"offset":0,"rows":[{"id":"foo","key":["2-7c971bb974251ae8541b8fe045964219"],"value":null}]} http://guide.couchdb.org/draft/conflicts.html
  35. 35. Resolving Conflicts# Step 1PUT /db/document {... merged data ...}# Step 2DELETE /db/document?rev=2-7c971bb974251ae8541b8fe045964219
  36. 36. Real World Example http://www.couchbase.com/case-studies/bbc
  37. 37. The Problem• Need to make sure site was always up and available, even in the face of a data center catastrophe• Needed a solution that could easily replicate data between two or more data centers• Needed the solution to store data in a safe and reliable way http://www.couchbase.com/case-studies/bbc
  38. 38. The Solution• Using CouchDB to create a multi-master, multi-data center failover configuration• 32 nodes in the cluster • 16 nodes in each of their two data centers • 8 primary nodes, 8 backup nodes• Terabyte of data• 150 - 170 million requests per day http://www.couchbase.com/case-studies/bbc
  39. 39. My Guess at BBC’s Replication Setup App Load BalancerP1 P2 P3 P4 P5 P6 P7 P8 P1 P2 P3 P4 P5 P6 P7 P8B1 B2 B3 B4 B5 B6 B7 B8 B1 B2 B3 B4 B5 B6 B7 B8
  40. 40. Change Notifications
  41. 41. {"seq":12,"id":"foo","changes":[{"rev":"1-23202479633c2b380f79507a776743d5"}]}
  42. 42. Polling GET /db/_changes{"results":[{"seq":1,"id":"test","changes":[{"rev":"1-aaa8e2a031bca334f50b48b6682fb486"}]},{"seq":2,"id":"test2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]}],"last_seq":2}
  43. 43. Polling GET /db/_changes?since=1{"results":[{"seq":2,"id":"test2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}]}],"last_seq":2}
  44. 44. Polling GET /db/_changes?since=1 &include_docs=true{"results":[{"seq":2,"id":"test2","changes":[{"rev":"1-e18422e6a82d0f2157d74b5dcf457997"}],,"doc":{"_id":"test2", "name":"John", "age":"33","_rev":"1-e18422e6a82d0f2157d74b5dcf457997"}}],"last_seq":2}
  45. 45. Long PollingGET /db/_changes?feed=longpoll&since=2
  46. 46. Continuous ChangesGET /db/_changes?feed=continuous
  47. 47. Filtered ChangesGET /db/_changes?filter=filters/signal_employees function(doc, req) { if (doc.company == “Signal”) { return true; } else { return false; } }
  48. 48. Filtered ChangesGET /db/_changes?filter=filters/employees&company=Signalfunction(doc, req) { if (doc.company == req.query.company) { return true; } else { return false; }}
  49. 49. Real World Example http://www.couchbase.com/case-studies/skechers
  50. 50. Sketchers• Already using CouchDB to help power www.sketchers.com• Utilized the _changes long poll feature to add a “What’s happening now” widget to the main page• Updates are processed in real time• The widget was written in just a few hours, with the majority of the code handling the display of the data http://www.couchbase.com/case-studies/skechers
  51. 51. Real World Examplehttp://www.dimagi.com/pulling-data-from-couchdb-to-a-relational-database-made-easy-with-_changes/
  52. 52. “Perhaps at the top of the list of ‘things that are annoying in CouchDB’ is general reporting.” http://browsertoolkit.com/fault-tolerance.png
  53. 53. dimagi• “Perhaps at the top of the list of ‘things that are annoying in CouchDB’ is general reporting.”• CouchDB views are not nearly as flexible as SQL• Using the _changes feed to mirror changes made in CouchDB over to a relational database• The relational database is used more extensive reporting• “Couch to SQL in 20 lines of code!”http://www.dimagi.com/pulling-data-from-couchdb-to-a-relational-database-made-easy-with-_changes/
  54. 54. Real World Examplecouchdb-lucene https://github.com/rnewson/couchdb-lucene
  55. 55. couchdb-lucene• Provides full text search functionality for data stored in CouchDB• Uses the continuous _changes feed to stay notified of the most recent changes in the database• Documents are included with change notifications• Index is updated shortly after document is saved in CouchDB https://github.com/rnewson/couchdb-lucene
  56. 56. Mobile Device Support
  57. 57. Image credit: http://happyclouddesign.blogspot.com/
  58. 58. Image credit: http://gmflightlog.blogspot.com
  59. 59. Image credit: http://gmflightlog.blogspot.com
  60. 60. Image credit: http://happyclouddesign.blogspot.com/
  61. 61. Real World Example http://www.couchbase.com/case-studies/groupcomplete
  62. 62. The Problem• Looking to modernize mobile data collection (surveys, etc)• People collecting the data (mobile workers) have limited ability to review or modify data once it is submitted• Mobile workers work in a void, unable to collaborate with their team members, and increasing the likelihood of double-entry and duplicated effort• Mobile workers don’t have access to aggregated data, as this is usually done on the back end, where the data is sent• Access to a laptop or desktop is required to perform certain tasks http://www.couchbase.com/case-studies/groupcomplete
  63. 63. The Solution• Cluster of CouchDB servers with shared forms, data, and profiles for mobile devices collecting data• A native application running on the device collects the data, and interacts with a local CouchDB server• Native application can access the data on the remote servers, or locally via replicated databases served by CouchDB running on the device• Since data can be stored locally, access is fast, and unaffected by spotty network availability http://www.couchbase.com/case-studies/groupcomplete
  64. 64. The Solution• Mobile workers can easily share form templates and data• The application manages conflicts, allowing mobile workers to update, correct, and revise collected data at any time• Resolved conflicts are distributed to the team via standard replication, so everybody has the same data• Rich media (pictures, audio, video) stored as _attachments http://www.couchbase.com/case-studies/groupcomplete
  65. 65. CouchApps
  66. 66. HTTP/1.1 200 OKServer: CouchDB/1.0.2 (Erlang OTP/R14B)Date: Tue, 07 Jun 2011 12:24:36 GMTContent-Type: text/plain;charset=utf-8Content-Length: 40Cache-Control: must-revalidate{"couchdb":"Welcome","version":"1.0.2"}
  67. 67. !=
  68. 68. Open Data
  69. 69. https://github.com/benoitc/afgwardiary
  70. 70. JSON != HTML
  71. 71. Show Functionsfunction(doc, req) { return <h1> + doc.title + </h1>;} http://guide.couchdb.org/draft/show.html
  72. 72. Show Functionsfunction(doc, req) { return { body : "<foo>" + doc.title + "</foo>", headers : { "Content-Type" : "application/xml", "X-My-Own-Header": "foo" } }} http://guide.couchdb.org/draft/show.html
  73. 73. List Functionsfunction(head, req) { send(<ul>); while (row = getRow()) { send(<li> + row.value + </li>); } return(</ul>);}
  74. 74. String concatenation to build HTML? Ewww! How do I get all of my Javascript into CouchDB?Can I use my existing development tools? What about images, CSS files, and other resources?
  75. 75. The CouchApp Project• Scripts that allow you to easily deploy your CouchApp from your file system to CouchDB• Where the files live on your filesystem determine where they will be pushed to the database. myapp/views/foobar/map.js will be pushed to _design/myapp, into a view named foobar, as the map function. https://github.com/couchapp/couchapp
  76. 76. The CouchApp Project• Evently - A declarative, CouchDB friendly jQuery library for writing event based Javascript applications• jquery.couch.js - Javascript library for communicating with CouchDB• jquery.pathbinder.js - Framework for triggering events based on paths in URL hash• mustache.js - A simple javascript template framework https://github.com/couchapp/couchapp
  77. 77. Templates<!DOCTYPE html><html> <head> <title>Example</title> <link rel=”stylesheet” href=”../../style/screen.css” type=”text/css”> </head> <body> <h1 id=”post_title”>{{title}}</h1> <div id=”post_body”>{{body}}</div> </body> <script src=”../../script/awesome.js”></script></html>
  78. 78. Templates// Show Functionfunction(doc, req) {  var mustache = require("vendor/couchapp/lib/mustache");  var data = {    title : doc.title, body : doc.body  };  return mustache.to_html(this.templates.blog_post, data);}
  79. 79. Real World Example http://www.couchbase.com/case-studies/incandescent
  80. 80. The Problem• Wanted to develop a web based solution for managing a veterinary clinic (managing patients, procedures, back office, etc)• Needed something that could operate in an environment without an internet connection• Wanted something flexible enough to scale up to a SaaS offering http://www.couchbase.com/case-studies/incandescent
  81. 81. The Solution• The application was built as an installable CouchApp• Written entirely in HTML and Javascript• Developed using Backbone.js• Platform independent, running on all platforms and browsers (iPad too!)• iPhone and Android versions in development http://www.couchbase.com/case-studies/incandescent
  82. 82. Resources
  83. 83. Resources CouchDB Project Website http://couchdb.apache.orgCouchDB: The Definitive Guide http://guide.couchdb.org CouchDB Project Wikihttp://wiki.apache.org/couchdb CouchApps http://couchapp.org
  84. 84. ?
  85. 85. Thanks!john@johnpwood.net @johnpwood

×