CouchDB is an open-source document-oriented NoSQL database that stores data in JSON format. It provides ACID support through multi-version concurrency control and a crash-only design that ensures data integrity even if the database or servers crash. CouchDB supports single node or clustered deployments and uses bidirectional replication to synchronize data across nodes. It prioritizes availability and partition tolerance according to the CAP theorem.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Apache CouchDB
1. What Is Apache CouchDB ?
● A document oriented NoSQL database
● Open sourced / Apache 2.0 license
● Written in Erlang, JavaScript, C, C++
● Stores documents using JSON
● Single node or cluster
● Takes offline first approach / uses bi directional replication
● DB access via HTTP requests
2. How Does CouchDB Work?
● It provides ACID support (Atomic Consistent Isolated Durable)
● It has a crash-only design
– No shutdown, just termination
● CouchDB uses Multi-Version Concurrency Control (MVCC)
● OS crash or power failure
– Partially flushed updates are simply forgotten (or)
– Surviving copy of previous identical headers remains
– Ensures coherency of all previously committed data
● Crash friendly design
3. Cross Platform
● Available for
– Linux / Unix
– FreeBSD
– Windows
– Mac OSX
– Cloud
– Mobile ( IOS / Android – Lite version )
● Install from binary or source
● Install via Docker / Snap
● Install on Kubernetes
4. CouchDB Replication
● Synchronise two copies of same database
● One source and one target database
● Can be on same or different CouchDB instances
● Can be one way or bi directional ( Master – Master )
● Controlling documents to replicate
– Local documents never replicated
– Filter functions to select documents
– Use Selector Objects
● A query object to test document
● For replication
5. CouchDB Cluster
● CouchDB can be single node or clustered
● Cluster defined by
– Number of shards or parts of database (q)
– Number of document copies / replicas (n)
● Since V3 default is q=2, n=3
– Each database (and secondary index)
– Split into 2 shards, with 3 replicas per shard
– For a total of 6 shard replica files
6. CouchDB Cluster
● Replicas add failure resistance
● Some nodes can be offline
● Without everything crashing down
– n=1 - All nodes must be up.
– n=2 - Any 1 node can be down
– n=3 - Any 2 nodes can be down
● Using default values and a single database
– q x n = 2 x 3 = 6 nodes
– A maximum of six nodes
– Defines maximum nodes for horizontal scaling
9. CouchDB + CAP Theorum
● CAP Theorum examines
– Consistency
● All database clients see the same data, even
with concurrent updates.
– Availability
● All database clients are able to access some
version of the data.
– Partition tolerance
● The database can be split over multiple servers
● CouchDB provides eventual consistency by
– By balancing partition tolerance and availability
11. Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
12. Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration