Real World CouchDB

Software Developer at Signal
Jun. 25, 2011
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
Real World CouchDB
1 of 91

More Related Content

Viewers also liked

CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the WebKarel Minarik
TenacityTenacity
TenacityJohn Wood
Migrating to CouchDBMigrating to CouchDB
Migrating to CouchDBJohn Wood
Couch dbCouch db
Couch dbRashmi Agale
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
Learning To RelaxLearning To Relax
Learning To RelaxCloudant

Recently uploaded

Edge Computing - A Future Fuel of 21st Century.pptxEdge Computing - A Future Fuel of 21st Century.pptx
Edge Computing - A Future Fuel of 21st Century.pptxNidhiShingade
grrcon-2023-scheduled-tasks.pdfgrrcon-2023-scheduled-tasks.pdf
grrcon-2023-scheduled-tasks.pdfBrandon DeVault
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixJiangwei Pan
Accelerating Data Science through Feature Platform, Transformers and GenAIAccelerating Data Science through Feature Platform, Transformers and GenAI
Accelerating Data Science through Feature Platform, Transformers and GenAIFeatureByte
Obsoleting Global Supply Chain ManagementObsoleting Global Supply Chain Management
Obsoleting Global Supply Chain ManagementConverge
Webinar : L&H Insurance in the 21st Century: Navigating Antimicrobial Resista...Webinar : L&H Insurance in the 21st Century: Navigating Antimicrobial Resista...
Webinar : L&H Insurance in the 21st Century: Navigating Antimicrobial Resista...The Digital Insurer

Real World CouchDB

Editor's Notes

  1. * In this talk I’ll be discussing some of CouchDB’s key features, and how they’re being used in the real world\n
  2. \n
  3. * Signal provides a web based service for conducting and managing digital marketing campaigns over SMS, email, web, facebook, twitter, and other channels\n\n* We have been working with NoSQL databases lately because we collect a lot of data while these campaigns are in progress\n\n* In addition, our customers collect a lot of data about their customers\n\n* Managing all of this data has proven challenging, and we’re always looking for better ways to do it\n
  4. \n
  5. * Documents must contain valid JSON\n\n* _id must be unique in the database\n\n* Other than that, no requirements. Documents in the same database can be wildly different from one another.\n\n* Documents are self contained. Relationships are not supported.\n\n* Binary attachments can be stored in the _attachments property\n
  6. * Designed with replication and off line operation in mind\n\n* Incredibly easy to replicate from one database to another\n
  7. * All interaction is done via HTTP\n\n* Administration too (creating databases, triggering replication, triggering a compaction, etc) is done via HTTP\n\n* Documents are treated as resources\n
  8. * This simple map function will emit the doc id and a “1” for every document with a first_name of “John”. The reduce function then sums those values, effectively giving me a count of all documents in the database with first_name of “John”.\n\n* CouchDB does not support adhoc queries\n\n* In order to query data, CouchDB must first build an index by passing all of the documents in the database through a map function, and then optionally reducing their results\n\n* MapReduce can be slow on large datasets. But once the index is built, queries are fast.\n\n\n
  9. * CouchDB has a crash only design where the server does not go through a shutdown process.\n\n* You can kill the process at any time, and your data will be safe. In fact, killing the process is the way you shut down CouchDB.\n\n* Uses multi version concurrency control, meaning it never overwrites committed data. Instead, it always appends new data. This dramatically reduces the risk of data corruption.\n\n
  10. * Erlang OTP has a strong emphasis on concurrent operations and fault tolerance, something that CouchDB takes advantage of\n
  11. * Views are how data is queried using CouchDB\n\n* Views add structure back to unstructured data, so it can be queried\n\n* Views are made up of a map function, and optional reduce function, which aggregates the results of the map function\n
  12. * Views build indexes of key/value pairs\n\n* Keys can be single values, or arrays of values\n\n* Values can be just about anything, including single values, hashes, or arrays\n\n* View results are sorted by key (can be in ascending or descending order)\n\n* View Indexes are stored on disk separately from the database\n\n* Once built, indexes are updated incrementally as documents are added/updated/deleted - the biggest difference between Google/Hadoop style MapReduce, which map through the entire dataset for each execution\n\n
  13. * This map function checks to see if the document has a property called “dependents”. If it does, it loops through all of the elements in the dependents array, emitting the element in the array. The reduce function then counts the number of dependents emitted.\n\n* Map functions have access to the document, and can interrogate its data\n\n* A single document may emit 0 or many key/value pairs\n\n* CouchDB has a few built in reduce functions, like _count, _sum, and _stats\n\n
  14. * Views can do more than simply count and sum single values\n\n* Views can get complicated, depending on the values they are working with\n\n* This example basically sums the values in a hash\n\n
  15. * Views are stored in special documents called design documents.\n\n* Views stored in the same design document share a data structure on disk.\n\n* This is important to note, because changes that require one view to be rebuilt will impact all views in the same design document.\n\n* Also, view indexes can take up a lot of space on disk. Grouping related views in the same design document is a way to save on disk space.\n\n* This is something to keep in mind when you are designing views.\n\n
  16. * Here is some sample output from the map function we saw previously that emitted information about a document’s dependents\n\n* You can instruct CouchDB not to reduce the results via a query parameter\n\n
  17. * Here is the reduced result, which is a count of all of the dependents in the database\n\n
  18. * The view API also lets you group your results by key\n\n* Here we see the number of dependents in each document, broken out by the document’s id\n\n\n
  19. * View indexes are stored as a b+ tree on disk\n\n* B+ trees are nice because they are very flat data structures, and use a minimal number of disk seeks to fetch data.\n\n* Leaf nodes store results, parent nodes store reduction of child results\n
  20. * When CouchDB determines that a query would include all of the sub nodes of a parent node, it’ll simply pull the value from the parent node, preventing it from having to rereduce all of the children\n\n* It then pulls the values from the other nodes, and runs the reduce function with all of the values it pulled.\n\n* Here, it pulls the “3” from the parent, and then the “1” from the element to the left, and reduces those values to come up with the result.\n\n
  21. \n
  22. \n
  23. \n
  24. * CouchDB’s bread and butter. This is what it was built for.\n\n* Synchronizes two copies of the same database, either on the same database server or different database servers\n\n
  25. * Simply tells CouchDB what database to replicate, and where to replicate it\n\n* Replication is designed to handle failure gracefully. Will simply pick up where it left off.\n
  26. * Replicates changes from one database to the other\n\n* CouchDB will compare databases, finding documents that differ, and then submit a batch of changed documents to the target (until all changes are transferred)\n\n* CouchDB increments a sequence number every time the DB is changed. CouchDB uses that sequence number to help find differences between two databases.\n\n
  27. * Bidirectional replication simply consists of two unidirectional replication requests, with the source and target switched\n\n* When complete, both databases will be in sync\n
  28. * Keeps a HTTP connection open, and streams changes to the target\n\n* Replicates changes as they are committed\n\n
  29. * All documents eligible for replication are fed through a filter function. The document will only be replicated if the function returns true.\n\n* Provide filter function and any query params in the request : "filter":"myddoc/myfilter", "query_params": {"key":"value"}\n\n
  30. * Filter functions also have access to the replication request, which can be used to support more dynamic behavior\n
  31. * Provide doc_ids in the request : "doc_ids":["foo","bar","baz]\n\n
  32. * CouchDB was designed to operate offline. If both databases continue to process updates while “disconnected” from each other, conflicts are bound to happen. A conflict happens when the same document is updated in both databases.\n\n* During replication, CouchDB will detect when there are multiple versions of the same document, and CouchDB records the conflict\n\n* A winner is chosen by CouchDB, because it needs a “latest” document\n\n* The losing revision is stored as the previous version of the document, and is made available for merging\n
  33. * Losing revisions are identified by the _conflicts property\n\n* CouchDB does not attempt to merge conflicting documents\n\n
  34. * Best way of finding conflicting documents is via a view, like this one\n\n* Could have a job that runs this view periodically, and merges conflicting documents\n\n* You can also resolve conflicts on read. By default, CouchDB will not return the _conflicts property when a document is fetched. However, it will if you include a parameter telling it to do so in your HTTP request to fetch the document. With that information, you can take care of conflicts as you encounter them.\n\n
  35. * To resolve conflicts, save the merged content as a new version of the document, and delete the conflicting revisions.\n\n* Compacting the database will also remove losing revisions\n
  36. \n
  37. \n
  38. \n
  39. * Enough replication going on here that these data centers could easily be confused for a rabbit farm\n\n* Red line = CouchDB Continuous Replication\n\n* Each primary server capable of handling reads and writes, for scalability\n\n* Matching primary servers in each datacenter are continuously replicating changes to each other, to keep them in sync\n\n* In addition, each primary continuously replicates changes to its backup, so the backup is ready to take over at any point in time\n\n
  40. * API for being notified of changes in the database\n
  41. * The contents of an item in the change notification feed:\nseq - The update_seq number in the database that was created for this change\nid - The id of the document that was changed\nchanges - What was changed in the document\n
  42. * Polling will pull changes by request\n\n* When no arguments are specified, all changes will be displayed\n\n
  43. * You can easily get all changes since a given sequence number\n
  44. * You can include the document content in the change notification with the include_docs query parameter\n\n* Avoids another call to CouchDB for document information\n
  45. * For less frequent updates\n\n* Will hold the connection open until we get an update. Connection is closed after update is received.\n\n* Using this avoids the need to continuously poll for changes\n
  46. * Will hold the connection open indefinitely.\n\n* Results are streamed in as individual chunks of JSON, making them easy to parse.\n
  47. * Like replication, you have the ability to filter change notifications\n\n* Only changed documents that pass the specified filter function by returning true will be sent to the client\n
  48. * Filter functions have access to a request object, which includes any query parameters that were specified\n\n* Provides the ability for more dynamic functionality\n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. * One area where CouchDB really stands out is mobile device support\n
  57. * CouchDB runs natively on iOS (iPhone and iPad) and Android\n\n* WebOS has a local storage solution capable of syncing with CouchDB\n\n
  58. * Many native applications pull data over the network from a remote service\n\n* This works fine when the network is nice\n
  59. * But sometimes the network is mean. No bars. Spotty coverage. Too much traffic.\n\n* These apps either don’t work, will only give you access to features that don’t require data from the network, or are incredibly slow when there are network issues\n
  60. * If your app uses CouchDB, you’d be able to continue interacting with your data locally when the network is unavailable\n\n* Data access will be quick and snappy, regardless\n\n
  61. * When the network becomes available, you can easily sync your changes to a remote CouchDB instance\n\n* Your app doesn’t need to worry about replicating changes. CouchDB will do it for you.\n\n* Once synced, your changes will be available elsewhere\n\n
  62. * CouchDB has minimal impact on battery life when idling\n\n* It’s also well know that radio transmissions are very expensive with regards to power consumption. Apps that use local data will be much easier on the battery than apps that continuously pull data over the network.\n\n
  63. \n
  64. \n
  65. \n
  66. \n
  67. * CouchApps are HTML/Javascript applications that are served right out of CouchDB\n\n* I’ve read that one of CouchDB’s original design goals was to one day be able to serve web applications, which highlights how different it really is from traditional databases\n
  68. * CouchDB speaks HTTP\n\n* Can easily serve static files that are stored as attachments inside a document\n\n* Also supports URL rewriting and Virtual Hosts\n\n
  69. * Just because CouchDB can serve your web application does not mean it’s a full blown web framework\n\n* But, some applications don’t need a fully featured web framework\n\n* In fact, more and more applications are falling into this category. Many believe that this is the future of web applications.\n\n* Javascript frameworks like Backbone.js and Spine, along with template frameworks like mustache.js, are making it much easier to write web applications entirely in Javascript and HTML that run on the client, only using the server to fetch and store data. This is the sort of application that would be great as a CouchApp.\n
  70. * Traditional 3-tier architecture\n\n* Client machines, web servers, and a database\n
  71. * CouchApps eliminate the need for separate web and database servers, since CouchDB can both serve the application, and act as the database\n\n* Can easily deploy multiple CouchDB instances, replicating changes between them, for scalability\n\n* CouchDB provides an Etags header, which if used properly can further reduce the load on the servers\n\n* If this architecture works, then....\n
  72. * If your application is capable of talking to CouchDB remotely, then it can just as easily talk to a local instance of CouchDB\n\n* Only difference in the application is the URL used to connect to CouchDB\n\n* This scale down architecture allows you to replicate your application and data to a local instance of CouchDB, and run your application in “offline” mode\n\n
  73. * No need for a powerful client machine\n\n* If your CouchApp is optimized for viewing on a mobile device, the same scale down architecture applies to mobile devices as well\n
  74. * Not only are CouchApps open source by nature, but they are also open data\n\n
  75. * Kabul War Diary CouchApp is a perfect example of this\n\n* This is a CouchApp created by one of the CouchDB committers\n\n* Wraps wiki leak data in application that allows you to easily browse it\n\n* Because it’s a CouchApp, the data and application can easily, via a single command, be replicated to any CouchDB instance\n\n
  76. * How can CouchDB serve web applications when the documents are stored in JSON?\n\n* My web browser can’t make a pretty web page out of JSON. It needs HTML.\n\n
  77. * Show functions let you transform documents into some other format (HTML, XML, etc)\n\n* This example simply throws the document’s title inside an HTML heading tag\n\n* Stored in design documents like views\n\n* Show functions cannot update the database, and are side effect free. Hence, the output can easily be cached.\n
  78. * CouchDB lets you set response headers as well\n
  79. * List functions let you transform view results into something else (HTML, CSV files, etc)\n\n* The send function sends a chunk of HTML to the client\n\n* The return function sends the final chunk\n\n* This function simply sticks the “values” of the view results inside an unordered list\n\n* Since we’re sending data to the client programmatically, we can filter and aggregate view result data on the fly, letting you tweak the data before sending it to the client\n\n* Like show functions, list functions cannot update the database and are side effect free.\n
  80. * Right now there are probably a few questions running through your head\n
  81. * Enables you to continue using the tools your familiar with\n\n
  82. * The CouchApp project also ships with serval javascript libraries that can help with development\n
  83. * This is an example mustache.js HTML template\n\n* The post’s title and body are included via template variables, seen here in green\n\n* CSS files, javascript files, images, and other resources can be referenced using relative paths. Simply place them in the appropriate spot in the CouchApp directory structure, and it will just work.\n\n* Now, this looks a lot more like what web developers are used to working with\n\n
  84. * Templates can be rendered from show functions, list functions, or functions triggered by event callbacks\n\n* This show function renders a blog post using the template in blog_post.html with the data in the data hash\n\n
  85. \n
  86. \n
  87. * Backbone.js provides an MVC like structure to Javascript applications\n
  88. \n
  89. * Some resources for finding out more about CouchDB\n
  90. \n
  91. \n