Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CouchDB at Its Core
Global Data Storage and Rich Incremental Indexing at Cloudant
Adam Kocoloski
StampedeCon 2013
What is Cloudant?
• Founded by “big data” scientists
• Particle physicists @ MIT analyzing
petabytes of collider data
• Fr...
Cloudant Overview
• Operational JSON document store
• Web service
• Advanced APIs
• Replication & Sync
• Full-text Search
...
5
Cloudant: 34 locations on 5 hosting providers
Anatomy of the Cloudant Data Network
US-EAST “Node”
Single-
tenant
cluster
Multi-tenant
cluster
HTTP POST, GET,…{JSON doc}...
Horizontal Clustering Framework
How CouchDB Fits In
Visualization
Lucene
Search
Chainable
MapReduce
Management
Monitoring
...
Why CouchDB?
8
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Data stru...
9
Append-only Storage
Append-only Storage
10
• Rewrite path to root in each index on
document update
• Large sequential writes, smaller random r...
11
Sequence Index
Sequence Index
12
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:2, “id”: “bar”, “r...
Sequence Index
13
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:3, “id”: “baz”, “r...
Sequence Index
14
• Index each document in order of most recent update
• Allows incremental, resumable processing in the b...
15
Eventual Consistency
Eventual Consistency
16
• CAP theorem (Brewer)
• O"en over-simplified
• I’ll offer my own oversimplification: “You must cho...
Eventual Consistency: Hash Histories
17
• Multiple concurrent versions of data will happen
• Default strategy cannot be to...
18
Replication & Synchronization
Replication & Sync
19
1-5a4...
2-ab6...
3-085...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
/db1/foo /d...
Replication & Sync
20
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
/db1/foo /db2/foo
1-5a4...
2-ab6...
3-...
Replication & Sync
21
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
4-7...
Replication & Sync
22
• Not your RDBMS’ notion of replication
• Transfers updates from any source DB to any target DB
• Bu...
Why CouchDB Recap
23
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Dat...
What’s Next?
24
• BigCouch ➜ CouchDB
• Cloudant will continue development under ASF umbrella
• Fewer code forks ➜ better v...
Thank You
adam@cloudant.com
@kocolosk
Upcoming SlideShare
Loading in …5
×

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

1,877 views

Published on

At the StampedeCon 2013 Big Data conference in St. Louis, Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert, discussed CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013. Cloudant operates database clusters comprising 100+ nodes based on BigCouch, the company’s fork of CouchDB. Key elements of CouchDB’s design have proven instrumental to success at this scale, including version histories, append-­only storage, and multi-­master replication. In this talk, Cloudant Co­Founder and Apache CouchDB Committer Adam Kocoloski will discuss lessons learned from running production CouchDB clusters bigger than many well­publicized Hadoop deployments, and how Cloudant’s experience at scale is informing development work on the next release of Apache CouchDB.

Published in: Technology
  • Want to preview some of our plans? You can get 50 Woodworking Plans and a 440-Page "The Art of Woodworking" Book... Absolutely FREE ♣♣♣ http://tinyurl.com/y3hc8gpw
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

  1. 1. CouchDB at Its Core Global Data Storage and Rich Incremental Indexing at Cloudant Adam Kocoloski StampedeCon 2013
  2. 2. What is Cloudant? • Founded by “big data” scientists • Particle physicists @ MIT analyzing petabytes of collider data • Frustrated by inadequate tools, founders became experts in scaling CouchDB (“BigCouch”) 2 • Started Cloudant in 2008 as a managed data layer • Premise: Apps should grow into their data layer, not out of it • Built: Scalable, global, fault-tolerant data layer managed service • Funded by Avalon, Devonshire (Fidelity), IQT, Rackspace, Samsung Ventures, Toba Capital, Y Combinator
  3. 3. Cloudant Overview • Operational JSON document store • Web service • Advanced APIs • Replication & Sync • Full-text Search • Geospatial • Incremental MapReduce • Scalable, Highly Available Performance • Cross-data center data distribution & fail over • Geo load balancing • Multi-tenant and single-tenant clusters • Monitoring, admin & dev dashboards • Managed 24x7 by experts 4
  4. 4. 5 Cloudant: 34 locations on 5 hosting providers
  5. 5. Anatomy of the Cloudant Data Network US-EAST “Node” Single- tenant cluster Multi-tenant cluster HTTP POST, GET,…{JSON doc} Edge Database Cluster Mobile Devices AP-JP Filtered Replication & Sync Secondary Data Centers (for DR & distributed access) EU-NL 6
  6. 6. Horizontal Clustering Framework How CouchDB Fits In Visualization Lucene Search Chainable MapReduce Management Monitoring IOQ Fabric Mem3 Rexi Apache CouchDB Docs: JSON, Attachments Developer APIs Prioritizing IO types; prevents “noisy neighbors” in multi-tenancy Clustering API, Sharding, Intra-cluster messaging GET/PUT docs, Views, Replication… Horizontal Clustering Framework Geospatial Indexing Geo-Load Balancing Connects users to closest copy of data Dashboards-Monitoring, Admin, Development 7
  7. 7. Why CouchDB? 8 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization The right primitives for a global data network
  8. 8. 9 Append-only Storage
  9. 9. Append-only Storage 10 • Rewrite path to root in each index on document update • Large sequential writes, smaller random reads • Wasted space must be periodically vacuumed • Disk is cheap • SSD-friendly access pattern • We build what we run ➜ we make things that are easy to run • (We automated the heck out of the compactor) This used to be controversial, now everyone does it
  10. 10. 11 Sequence Index
  11. 11. Sequence Index 12 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:2, “id”: “bar”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”}
  12. 12. Sequence Index 13 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”} {“seq”:5, “id”: “bar”, “rev”:”2-...”} 5 bar OR GET /db/_changes?since=4 {“seq”:5, “id”: “bar”, “rev”:”2-...”}
  13. 13. Sequence Index 14 • Index each document in order of most recent update • Allows incremental, resumable processing in the background • Originally, MapReduce views • First class API endpoint ➜ DIY integrations (c.f. ElasticSearch) • Lucene-based text search • Geospatial indexes and querying • First class internal service ➜ add additional consumers as need arises
  14. 14. 15 Eventual Consistency
  15. 15. Eventual Consistency 16 • CAP theorem (Brewer) • O"en over-simplified • I’ll offer my own oversimplification: “You must choose P” • When faced with a network partition, you optimize for consistency or availability • Cloudant is an ODS • Availability is paramount • Strong consistency across geographies introduces unacceptable latency* ✱ Unless you’re Google and you install atomic clocks in your data centers
  16. 16. Eventual Consistency: Hash Histories 17 • Multiple concurrent versions of data will happen • Default strategy cannot be to discard user data • Hash histories track versions of a document • Baked into every document • Think git • Document versions derived from contents + edit history • Same series of edits, applied in same order, yield same version ID • History comparison detects divergences and how the versions fit into the “family tree” 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e...
  17. 17. 18 Replication & Synchronization
  18. 18. Replication & Sync 19 1-5a4... 2-ab6... 3-085... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... /db1/foo /db2/foo
  19. 19. Replication & Sync 20 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo 1-5a4... 2-ab6... 3-085...3-f57...
  20. 20. Replication & Sync 21 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo
  21. 21. Replication & Sync 22 • Not your RDBMS’ notion of replication • Transfers updates from any source DB to any target DB • Builds on earlier primitives • Leverages sequence index to determine what’s changed • Leverages hash histories to determine what’s missing on the target • Critical “anti-entropy” element in clusters • DBs are divided into partitions, copies of each partition are stored on multiple distinct nodes • Partition copies replicate with each other to ensure that documents are durably stored and that consistency is achieved ... eventually
  22. 22. Why CouchDB Recap 23 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization
  23. 23. What’s Next? 24 • BigCouch ➜ CouchDB • Cloudant will continue development under ASF umbrella • Fewer code forks ➜ better velocity • New CouchDB web UI “Fauxton” • Better developer tooling for server-side code • Plugins for Cloudant-specific functionality • Cloudant is betting on data “at the edge”
  24. 24. Thank You adam@cloudant.com @kocolosk

×