CouchDB at Its Core
Global Data Storage and Rich Incremental Indexing at Cloudant
Adam Kocoloski
StampedeCon 2013
What is Cloudant?
• Founded by “big data” scientists
• Particle physicists @ MIT analyzing
petabytes of collider data
• Fr...
Cloudant Overview
• Operational JSON document store
• Web service
• Advanced APIs
• Replication & Sync
• Full-text Search
...
5
Cloudant: 34 locations on 5 hosting providers
Anatomy of the Cloudant Data Network
US-EAST “Node”
Single-
tenant
cluster
Multi-tenant
cluster
HTTP POST, GET,…{JSON doc}...
Horizontal Clustering Framework
How CouchDB Fits In
Visualization
Lucene
Search
Chainable
MapReduce
Management
Monitoring
...
Why CouchDB?
8
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Data stru...
9
Append-only Storage
Append-only Storage
10
• Rewrite path to root in each index on
document update
• Large sequential writes, smaller random r...
11
Sequence Index
Sequence Index
12
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:2, “id”: “bar”, “r...
Sequence Index
13
1
foo
2
bar
3
baz
4
bif
GET /db/_changes
{“seq”:1, “id”: “foo”, “rev”:”1-...”}
{“seq”:3, “id”: “baz”, “r...
Sequence Index
14
• Index each document in order of most recent update
• Allows incremental, resumable processing in the b...
15
Eventual Consistency
Eventual Consistency
16
• CAP theorem (Brewer)
• O"en over-simplified
• I’ll offer my own oversimplification: “You must cho...
Eventual Consistency: Hash Histories
17
• Multiple concurrent versions of data will happen
• Default strategy cannot be to...
18
Replication & Synchronization
Replication & Sync
19
1-5a4...
2-ab6...
3-085...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
/db1/foo /d...
Replication & Sync
20
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
/db1/foo /db2/foo
1-5a4...
2-ab6...
3-...
Replication & Sync
21
1-5a4...
2-ab6...
3-085...3-f57...
4-7ba... 4-8bf...
5-d4e...
1-5a4...
2-ab6...
3-085...3-f57...
4-7...
Replication & Sync
22
• Not your RDBMS’ notion of replication
• Transfers updates from any source DB to any target DB
• Bu...
Why CouchDB Recap
23
• Durable append-only storage engine
• Sequence tree enabling incremental processing of updates
• Dat...
What’s Next?
24
• BigCouch ➜ CouchDB
• Cloudant will continue development under ASF umbrella
• Fewer code forks ➜ better v...
Thank You
adam@cloudant.com
@kocolosk
Upcoming SlideShare
Loading in …5
×

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

1,682 views

Published on

At the StampedeCon 2013 Big Data conference in St. Louis, Adam Kocoloski, Co­Founder & CTO of Cloudant, CouchDB Expert, discussed CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013. Cloudant operates database clusters comprising 100+ nodes based on BigCouch, the company’s fork of CouchDB. Key elements of CouchDB’s design have proven instrumental to success at this scale, including version histories, append-­only storage, and multi-­master replication. In this talk, Cloudant Co­Founder and Apache CouchDB Committer Adam Kocoloski will discuss lessons learned from running production CouchDB clusters bigger than many well­publicized Hadoop deployments, and how Cloudant’s experience at scale is informing development work on the next release of Apache CouchDB.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,682
On SlideShare
0
From Embeds
0
Number of Embeds
145
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CouchDB at its Core: Global Data Storage and Rich Incremental Indexing at Cloudant - StampedeCon 2013

  1. 1. CouchDB at Its Core Global Data Storage and Rich Incremental Indexing at Cloudant Adam Kocoloski StampedeCon 2013
  2. 2. What is Cloudant? • Founded by “big data” scientists • Particle physicists @ MIT analyzing petabytes of collider data • Frustrated by inadequate tools, founders became experts in scaling CouchDB (“BigCouch”) 2 • Started Cloudant in 2008 as a managed data layer • Premise: Apps should grow into their data layer, not out of it • Built: Scalable, global, fault-tolerant data layer managed service • Funded by Avalon, Devonshire (Fidelity), IQT, Rackspace, Samsung Ventures, Toba Capital, Y Combinator
  3. 3. Cloudant Overview • Operational JSON document store • Web service • Advanced APIs • Replication & Sync • Full-text Search • Geospatial • Incremental MapReduce • Scalable, Highly Available Performance • Cross-data center data distribution & fail over • Geo load balancing • Multi-tenant and single-tenant clusters • Monitoring, admin & dev dashboards • Managed 24x7 by experts 4
  4. 4. 5 Cloudant: 34 locations on 5 hosting providers
  5. 5. Anatomy of the Cloudant Data Network US-EAST “Node” Single- tenant cluster Multi-tenant cluster HTTP POST, GET,…{JSON doc} Edge Database Cluster Mobile Devices AP-JP Filtered Replication & Sync Secondary Data Centers (for DR & distributed access) EU-NL 6
  6. 6. Horizontal Clustering Framework How CouchDB Fits In Visualization Lucene Search Chainable MapReduce Management Monitoring IOQ Fabric Mem3 Rexi Apache CouchDB Docs: JSON, Attachments Developer APIs Prioritizing IO types; prevents “noisy neighbors” in multi-tenancy Clustering API, Sharding, Intra-cluster messaging GET/PUT docs, Views, Replication… Horizontal Clustering Framework Geospatial Indexing Geo-Load Balancing Connects users to closest copy of data Dashboards-Monitoring, Admin, Development 7
  7. 7. Why CouchDB? 8 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization The right primitives for a global data network
  8. 8. 9 Append-only Storage
  9. 9. Append-only Storage 10 • Rewrite path to root in each index on document update • Large sequential writes, smaller random reads • Wasted space must be periodically vacuumed • Disk is cheap • SSD-friendly access pattern • We build what we run ➜ we make things that are easy to run • (We automated the heck out of the compactor) This used to be controversial, now everyone does it
  10. 10. 11 Sequence Index
  11. 11. Sequence Index 12 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:2, “id”: “bar”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”}
  12. 12. Sequence Index 13 1 foo 2 bar 3 baz 4 bif GET /db/_changes {“seq”:1, “id”: “foo”, “rev”:”1-...”} {“seq”:3, “id”: “baz”, “rev”:”1-...”} {“seq”:4, “id”: “bif”, “rev”:”1-...”} {“seq”:5, “id”: “bar”, “rev”:”2-...”} 5 bar OR GET /db/_changes?since=4 {“seq”:5, “id”: “bar”, “rev”:”2-...”}
  13. 13. Sequence Index 14 • Index each document in order of most recent update • Allows incremental, resumable processing in the background • Originally, MapReduce views • First class API endpoint ➜ DIY integrations (c.f. ElasticSearch) • Lucene-based text search • Geospatial indexes and querying • First class internal service ➜ add additional consumers as need arises
  14. 14. 15 Eventual Consistency
  15. 15. Eventual Consistency 16 • CAP theorem (Brewer) • O"en over-simplified • I’ll offer my own oversimplification: “You must choose P” • When faced with a network partition, you optimize for consistency or availability • Cloudant is an ODS • Availability is paramount • Strong consistency across geographies introduces unacceptable latency* ✱ Unless you’re Google and you install atomic clocks in your data centers
  16. 16. Eventual Consistency: Hash Histories 17 • Multiple concurrent versions of data will happen • Default strategy cannot be to discard user data • Hash histories track versions of a document • Baked into every document • Think git • Document versions derived from contents + edit history • Same series of edits, applied in same order, yield same version ID • History comparison detects divergences and how the versions fit into the “family tree” 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e...
  17. 17. 18 Replication & Synchronization
  18. 18. Replication & Sync 19 1-5a4... 2-ab6... 3-085... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... /db1/foo /db2/foo
  19. 19. Replication & Sync 20 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo 1-5a4... 2-ab6... 3-085...3-f57...
  20. 20. Replication & Sync 21 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... 1-5a4... 2-ab6... 3-085...3-f57... 4-7ba... 4-8bf... 5-d4e... /db1/foo /db2/foo
  21. 21. Replication & Sync 22 • Not your RDBMS’ notion of replication • Transfers updates from any source DB to any target DB • Builds on earlier primitives • Leverages sequence index to determine what’s changed • Leverages hash histories to determine what’s missing on the target • Critical “anti-entropy” element in clusters • DBs are divided into partitions, copies of each partition are stored on multiple distinct nodes • Partition copies replicate with each other to ensure that documents are durably stored and that consistency is achieved ... eventually
  22. 22. Why CouchDB Recap 23 • Durable append-only storage engine • Sequence tree enabling incremental processing of updates • Data structures supporting eventual consistency • Sophisticated replication & synchronization
  23. 23. What’s Next? 24 • BigCouch ➜ CouchDB • Cloudant will continue development under ASF umbrella • Fewer code forks ➜ better velocity • New CouchDB web UI “Fauxton” • Better developer tooling for server-side code • Plugins for Cloudant-specific functionality • Cloudant is betting on data “at the edge”
  24. 24. Thank You adam@cloudant.com @kocolosk

×