CouchDB has several features that help it stand out from the other databases in this rapidly growing field. Incremental map/reduce, peer to peer replication, mobile device synchronization, a realtime update feed, and the ability to host an application in the database itself (also known as a Couchapp) are just a few. See how companies such as the BBC, Radical Dynamic, Signal, and Incandescent Software are using CouchDB to solve their real world challenges.
3. • Software Developer at Signal
• Coding for about 15 years
• Working with CouchDB for 2.5 years (in
production for about 2 years)
• Enjoy tinkering with data storage solutions
7. RESTful API
# Create
POST http://localhost:5984/employees
# Read
GET http://localhost:5984/employees/1
# Update
PUT http://localhost:5984/employees/1
# Delete
DELETE http://localhost:5984/employees/1
8. Queried and Indexed
with MapReduce
function(doc) {
if (doc.first_name == “John”)
emit(doc._id, 1);
}
function(keys, values, rereduce) {
return sum(values);
}
13. MapReduce
// Map
function(doc) {
if (doc.dependents) {
for (i in doc.dependents) {
emit(doc._id, doc.dependents[i]);
}
}
}
// Reduce
_count
14. MapReduce
function(keys, values, rereduce) {
function sum(type_counts, totals, status) {
if (type_counts[status]) { // OK or ERR
if (!totals[status]) {
totals[status] = new Object();
}
var status_totals = totals[status];
var status_type_counts = type_counts[status];
for (key in status_type_counts) { // MO, MT, CM, etc.
var count = status_type_counts[key];
if (!status_totals[key]) {
status_totals[key] = count;
} else {
status_totals[key] += count;
}
}
}
}
var totals = new Object();
// values should be something like
// {"OK":{"MO":1234,"MT":1000,"CM":20},"ERR":{"MO":1,"MT": 1}}
for (i = 0; i < values.length; i++) {
var message_count = values[i];
sum(message_count, totals, 'OK');
}
return totals;
}
22. The Problem
• Reports that utilized data in some large tables (30M+
rows) were taking a very long to create
• Increasing query execution times
• Occasional page timeouts
• Limited resources for super powered hardware or the
leading relational database product
• Database migrations on these large tables were taking an
increasingly long time to run
23. The Solution
• Using CouchDB as an archive database
• Migrated old data in tables to CouchDB, dramatically
reducing the tables sizes, speeding up queries that were
still hitting those tables
• Re-wrote SQL queries as views to fetch data from the
archive database, dramatically reducing the amount of
time needed to fetch the old data
• Views updated nightly with the new set of archived data
37. The Problem
• Need to make sure site was always up and available, even
in the face of a data center catastrophe
• Needed a solution that could easily replicate data
between two or more data centers
• Needed the solution to store data in a safe and reliable
way
http://www.couchbase.com/case-studies/bbc
38. The Solution
• Using CouchDB to create a multi-master, multi-data
center failover configuration
• 32 nodes in the cluster
• 16 nodes in each of their two data centers
• 8 primary nodes, 8 backup nodes
• Terabyte of data
• 150 - 170 million requests per day
http://www.couchbase.com/case-studies/bbc
50. Sketchers
• Already using CouchDB to help power
www.sketchers.com
• Utilized the _changes long poll feature to add a “What’s
happening now” widget to the main page
• Updates are processed in real time
• The widget was written in just a few hours, with the
majority of the code handling the display of the data
http://www.couchbase.com/case-studies/skechers
52. “Perhaps at the top of the list of ‘things that are annoying in CouchDB’ is
general reporting.”
http://browsertoolkit.com/fault-tolerance.png
53. dimagi
• “Perhaps at the top of the list of ‘things that are annoying
in CouchDB’ is general reporting.”
• CouchDB views are not nearly as flexible as SQL
• Using the _changes feed to mirror changes made in
CouchDB over to a relational database
• The relational database is used more extensive reporting
• “Couch to SQL in 20 lines of code!”
http://www.dimagi.com/pulling-data-from-couchdb-to-a-relational-database-made-easy-with-_changes/
55. couchdb-lucene
• Provides full text search functionality for data stored in
CouchDB
• Uses the continuous _changes feed to stay notified of the
most recent changes in the database
• Documents are included with change notifications
• Index is updated shortly after document is saved in
CouchDB
https://github.com/rnewson/couchdb-lucene
64. The Problem
• Looking to modernize mobile data collection (surveys, etc)
• People collecting the data (mobile workers) have limited ability to
review or modify data once it is submitted
• Mobile workers work in a void, unable to collaborate with their
team members, and increasing the likelihood of double-entry and
duplicated effort
• Mobile workers don’t have access to aggregated data, as this is
usually done on the back end, where the data is sent
• Access to a laptop or desktop is required to perform certain tasks
http://www.couchbase.com/case-studies/groupcomplete
65. The Solution
• Cluster of CouchDB servers with shared forms, data, and profiles for
mobile devices collecting data
• A native application running on the device collects the data, and
interacts with a local CouchDB server
• Native application can access the data on the remote servers, or
locally via replicated databases served by CouchDB running on the
device
• Since data can be stored locally, access is fast, and unaffected by spotty
network availability
http://www.couchbase.com/case-studies/groupcomplete
66. The Solution
• Mobile workers can easily share form templates and data
• The application manages conflicts, allowing mobile workers to
update, correct, and revise collected data at any time
• Resolved conflicts are distributed to the team via standard
replication, so everybody has the same data
• Rich media (pictures, audio, video) stored as _attachments
http://www.couchbase.com/case-studies/groupcomplete
80. String concatenation to build HTML? Ewww!
How do I get all of my Javascript into CouchDB?
Can I use my existing development tools?
What about images, CSS files, and other resources?
81. The CouchApp Project
• Scripts that allow you to easily deploy your CouchApp from your
file system to CouchDB
• Where the files live on your filesystem determine where they will
be pushed to the database. myapp/views/foobar/map.js will be
pushed to _design/myapp, into a view named foobar, as the map
function.
https://github.com/couchapp/couchapp
82. The CouchApp Project
• Evently - A declarative, CouchDB friendly jQuery library for writing
event based Javascript applications
• jquery.couch.js - Javascript library for communicating with CouchDB
• jquery.pathbinder.js - Framework for triggering events based on
paths in URL hash
• mustache.js - A simple javascript template framework
https://github.com/couchapp/couchapp
84. Templates
// Show Function
function(doc, req) {
var mustache = require("vendor/couchapp/lib/mustache");
var data = {
title : doc.title,
body : doc.body
};
return mustache.to_html(this.templates.blog_post, data);
}
86. The Problem
• Wanted to develop a web based solution for managing a
veterinary clinic (managing patients, procedures, back
office, etc)
• Needed something that could operate in an environment
without an internet connection
• Wanted something flexible enough to scale up to a SaaS
offering
http://www.couchbase.com/case-studies/incandescent
87. The Solution
• The application was built as an installable CouchApp
• Written entirely in HTML and Javascript
• Developed using Backbone.js
• Platform independent, running on all platforms and
browsers (iPad too!)
• iPhone and Android versions in development
http://www.couchbase.com/case-studies/incandescent
* In this talk I&#x2019;ll be discussing some of CouchDB&#x2019;s key features, and how they&#x2019;re being used in the real world\n
\n
* Signal provides a web based service for conducting and managing digital marketing campaigns over SMS, email, web, facebook, twitter, and other channels\n\n* We have been working with NoSQL databases lately because we collect a lot of data while these campaigns are in progress\n\n* In addition, our customers collect a lot of data about their customers\n\n* Managing all of this data has proven challenging, and we&#x2019;re always looking for better ways to do it\n
\n
* Documents must contain valid JSON\n\n* _id must be unique in the database\n\n* Other than that, no requirements. Documents in the same database can be wildly different from one another.\n\n* Documents are self contained. Relationships are not supported.\n\n* Binary attachments can be stored in the _attachments property\n
* Designed with replication and off line operation in mind\n\n* Incredibly easy to replicate from one database to another\n
* All interaction is done via HTTP\n\n* Administration too (creating databases, triggering replication, triggering a compaction, etc) is done via HTTP\n\n* Documents are treated as resources\n
* This simple map function will emit the doc id and a &#x201C;1&#x201D; for every document with a first_name of &#x201C;John&#x201D;. The reduce function then sums those values, effectively giving me a count of all documents in the database with first_name of &#x201C;John&#x201D;.\n\n* CouchDB does not support adhoc queries\n\n* In order to query data, CouchDB must first build an index by passing all of the documents in the database through a map function, and then optionally reducing their results\n\n* MapReduce can be slow on large datasets. But once the index is built, queries are fast.\n\n\n
* CouchDB has a crash only design where the server does not go through a shutdown process.\n\n* You can kill the process at any time, and your data will be safe. In fact, killing the process is the way you shut down CouchDB.\n\n* Uses multi version concurrency control, meaning it never overwrites committed data. Instead, it always appends new data. This dramatically reduces the risk of data corruption.\n\n
* Erlang OTP has a strong emphasis on concurrent operations and fault tolerance, something that CouchDB takes advantage of\n
* Views are how data is queried using CouchDB\n\n* Views add structure back to unstructured data, so it can be queried\n\n* Views are made up of a map function, and optional reduce function, which aggregates the results of the map function\n
* Views build indexes of key/value pairs\n\n* Keys can be single values, or arrays of values\n\n* Values can be just about anything, including single values, hashes, or arrays\n\n* View results are sorted by key (can be in ascending or descending order)\n\n* View Indexes are stored on disk separately from the database\n\n* Once built, indexes are updated incrementally as documents are added/updated/deleted - the biggest difference between Google/Hadoop style MapReduce, which map through the entire dataset for each execution\n\n
* This map function checks to see if the document has a property called &#x201C;dependents&#x201D;. If it does, it loops through all of the elements in the dependents array, emitting the element in the array. The reduce function then counts the number of dependents emitted.\n\n* Map functions have access to the document, and can interrogate its data\n\n* A single document may emit 0 or many key/value pairs\n\n* CouchDB has a few built in reduce functions, like _count, _sum, and _stats\n\n
* Views can do more than simply count and sum single values\n\n* Views can get complicated, depending on the values they are working with\n\n* This example basically sums the values in a hash\n\n
* Views are stored in special documents called design documents.\n\n* Views stored in the same design document share a data structure on disk.\n\n* This is important to note, because changes that require one view to be rebuilt will impact all views in the same design document.\n\n* Also, view indexes can take up a lot of space on disk. Grouping related views in the same design document is a way to save on disk space.\n\n* This is something to keep in mind when you are designing views.\n\n
* Here is some sample output from the map function we saw previously that emitted information about a document&#x2019;s dependents\n\n* You can instruct CouchDB not to reduce the results via a query parameter\n\n
* Here is the reduced result, which is a count of all of the dependents in the database\n\n
* The view API also lets you group your results by key\n\n* Here we see the number of dependents in each document, broken out by the document&#x2019;s id\n\n\n
* View indexes are stored as a b+ tree on disk\n\n* B+ trees are nice because they are very flat data structures, and use a minimal number of disk seeks to fetch data.\n\n* Leaf nodes store results, parent nodes store reduction of child results\n
* When CouchDB determines that a query would include all of the sub nodes of a parent node, it&#x2019;ll simply pull the value from the parent node, preventing it from having to rereduce all of the children\n\n* It then pulls the values from the other nodes, and runs the reduce function with all of the values it pulled.\n\n* Here, it pulls the &#x201C;3&#x201D; from the parent, and then the &#x201C;1&#x201D; from the element to the left, and reduces those values to come up with the result.\n\n
\n
\n
\n
* CouchDB&#x2019;s bread and butter. This is what it was built for.\n\n* Synchronizes two copies of the same database, either on the same database server or different database servers\n\n
* Simply tells CouchDB what database to replicate, and where to replicate it\n\n* Replication is designed to handle failure gracefully. Will simply pick up where it left off.\n
* Replicates changes from one database to the other\n\n* CouchDB will compare databases, finding documents that differ, and then submit a batch of changed documents to the target (until all changes are transferred)\n\n* CouchDB increments a sequence number every time the DB is changed. CouchDB uses that sequence number to help find differences between two databases.\n\n
* Bidirectional replication simply consists of two unidirectional replication requests, with the source and target switched\n\n* When complete, both databases will be in sync\n
* Keeps a HTTP connection open, and streams changes to the target\n\n* Replicates changes as they are committed\n\n
* All documents eligible for replication are fed through a filter function. The document will only be replicated if the function returns true.\n\n* Provide filter function and any query params in the request : "filter":"myddoc/myfilter", "query_params": {"key":"value"}\n\n
* Filter functions also have access to the replication request, which can be used to support more dynamic behavior\n
* Provide doc_ids in the request : "doc_ids":["foo","bar","baz]\n\n
* CouchDB was designed to operate offline. If both databases continue to process updates while &#x201C;disconnected&#x201D; from each other, conflicts are bound to happen. A conflict happens when the same document is updated in both databases.\n\n* During replication, CouchDB will detect when there are multiple versions of the same document, and CouchDB records the conflict\n\n* A winner is chosen by CouchDB, because it needs a &#x201C;latest&#x201D; document\n\n* The losing revision is stored as the previous version of the document, and is made available for merging\n
* Losing revisions are identified by the _conflicts property\n\n* CouchDB does not attempt to merge conflicting documents\n\n
* Best way of finding conflicting documents is via a view, like this one\n\n* Could have a job that runs this view periodically, and merges conflicting documents\n\n* You can also resolve conflicts on read. By default, CouchDB will not return the _conflicts property when a document is fetched. However, it will if you include a parameter telling it to do so in your HTTP request to fetch the document. With that information, you can take care of conflicts as you encounter them.\n\n
* To resolve conflicts, save the merged content as a new version of the document, and delete the conflicting revisions.\n\n* Compacting the database will also remove losing revisions\n
\n
\n
\n
* Enough replication going on here that these data centers could easily be confused for a rabbit farm\n\n* Red line = CouchDB Continuous Replication\n\n* Each primary server capable of handling reads and writes, for scalability\n\n* Matching primary servers in each datacenter are continuously replicating changes to each other, to keep them in sync\n\n* In addition, each primary continuously replicates changes to its backup, so the backup is ready to take over at any point in time\n\n
* API for being notified of changes in the database\n
* The contents of an item in the change notification feed:\nseq - The update_seq number in the database that was created for this change\nid - The id of the document that was changed\nchanges - What was changed in the document\n
* Polling will pull changes by request\n\n* When no arguments are specified, all changes will be displayed\n\n
* You can easily get all changes since a given sequence number\n
* You can include the document content in the change notification with the include_docs query parameter\n\n* Avoids another call to CouchDB for document information\n
* For less frequent updates\n\n* Will hold the connection open until we get an update. Connection is closed after update is received.\n\n* Using this avoids the need to continuously poll for changes\n
* Will hold the connection open indefinitely.\n\n* Results are streamed in as individual chunks of JSON, making them easy to parse.\n
* Like replication, you have the ability to filter change notifications\n\n* Only changed documents that pass the specified filter function by returning true will be sent to the client\n
* Filter functions have access to a request object, which includes any query parameters that were specified\n\n* Provides the ability for more dynamic functionality\n
\n
\n
\n
\n
\n
\n
\n
* One area where CouchDB really stands out is mobile device support\n
* CouchDB runs natively on iOS (iPhone and iPad) and Android\n\n* WebOS has a local storage solution capable of syncing with CouchDB\n\n
* Many native applications pull data over the network from a remote service\n\n* This works fine when the network is nice\n
* But sometimes the network is mean. No bars. Spotty coverage. Too much traffic.\n\n* These apps either don&#x2019;t work, will only give you access to features that don&#x2019;t require data from the network, or are incredibly slow when there are network issues\n
* If your app uses CouchDB, you&#x2019;d be able to continue interacting with your data locally when the network is unavailable\n\n* Data access will be quick and snappy, regardless\n\n
* When the network becomes available, you can easily sync your changes to a remote CouchDB instance\n\n* Your app doesn&#x2019;t need to worry about replicating changes. CouchDB will do it for you.\n\n* Once synced, your changes will be available elsewhere\n\n
* CouchDB has minimal impact on battery life when idling\n\n* It&#x2019;s also well know that radio transmissions are very expensive with regards to power consumption. Apps that use local data will be much easier on the battery than apps that continuously pull data over the network.\n\n
\n
\n
\n
\n
* CouchApps are HTML/Javascript applications that are served right out of CouchDB\n\n* I&#x2019;ve read that one of CouchDB&#x2019;s original design goals was to one day be able to serve web applications, which highlights how different it really is from traditional databases\n
* CouchDB speaks HTTP\n\n* Can easily serve static files that are stored as attachments inside a document\n\n* Also supports URL rewriting and Virtual Hosts\n\n
* Just because CouchDB can serve your web application does not mean it&#x2019;s a full blown web framework\n\n* But, some applications don&#x2019;t need a fully featured web framework\n\n* In fact, more and more applications are falling into this category. Many believe that this is the future of web applications.\n\n* Javascript frameworks like Backbone.js and Spine, along with template frameworks like mustache.js, are making it much easier to write web applications entirely in Javascript and HTML that run on the client, only using the server to fetch and store data. This is the sort of application that would be great as a CouchApp.\n
* Traditional 3-tier architecture\n\n* Client machines, web servers, and a database\n
* CouchApps eliminate the need for separate web and database servers, since CouchDB can both serve the application, and act as the database\n\n* Can easily deploy multiple CouchDB instances, replicating changes between them, for scalability\n\n* CouchDB provides an Etags header, which if used properly can further reduce the load on the servers\n\n* If this architecture works, then....\n
* If your application is capable of talking to CouchDB remotely, then it can just as easily talk to a local instance of CouchDB\n\n* Only difference in the application is the URL used to connect to CouchDB\n\n* This scale down architecture allows you to replicate your application and data to a local instance of CouchDB, and run your application in &#x201C;offline&#x201D; mode\n\n
* No need for a powerful client machine\n\n* If your CouchApp is optimized for viewing on a mobile device, the same scale down architecture applies to mobile devices as well\n
* Not only are CouchApps open source by nature, but they are also open data\n\n
* Kabul War Diary CouchApp is a perfect example of this\n\n* This is a CouchApp created by one of the CouchDB committers\n\n* Wraps wiki leak data in application that allows you to easily browse it\n\n* Because it&#x2019;s a CouchApp, the data and application can easily, via a single command, be replicated to any CouchDB instance\n\n
* How can CouchDB serve web applications when the documents are stored in JSON?\n\n* My web browser can&#x2019;t make a pretty web page out of JSON. It needs HTML.\n\n
* Show functions let you transform documents into some other format (HTML, XML, etc)\n\n* This example simply throws the document&#x2019;s title inside an HTML heading tag\n\n* Stored in design documents like views\n\n* Show functions cannot update the database, and are side effect free. Hence, the output can easily be cached.\n
* CouchDB lets you set response headers as well\n
* List functions let you transform view results into something else (HTML, CSV files, etc)\n\n* The send function sends a chunk of HTML to the client\n\n* The return function sends the final chunk\n\n* This function simply sticks the &#x201C;values&#x201D; of the view results inside an unordered list\n\n* Since we&#x2019;re sending data to the client programmatically, we can filter and aggregate view result data on the fly, letting you tweak the data before sending it to the client\n\n* Like show functions, list functions cannot update the database and are side effect free.\n
* Right now there are probably a few questions running through your head\n
* Enables you to continue using the tools your familiar with\n\n
* The CouchApp project also ships with serval javascript libraries that can help with development\n
* This is an example mustache.js HTML template\n\n* The post&#x2019;s title and body are included via template variables, seen here in green\n\n* CSS files, javascript files, images, and other resources can be referenced using relative paths. Simply place them in the appropriate spot in the CouchApp directory structure, and it will just work.\n\n* Now, this looks a lot more like what web developers are used to working with\n\n
* Templates can be rendered from show functions, list functions, or functions triggered by event callbacks\n\n* This show function renders a blog post using the template in blog_post.html with the data in the data hash\n\n
\n
\n
* Backbone.js provides an MVC like structure to Javascript applications\n
\n
* Some resources for finding out more about CouchDB\n