CAP Theorem

CAP Theorem
Vikash Kodati
10/19/2016

CAP Theorem (also Brewer's theorem)
4/6/2016 T-MobileConfidential2
It is impossible for a distributed system to simultaneously provide all three of the
following guarantees: (Pick any two)
1. Consistency: All nodes should see the same data at the same time or reads
return latest written value by any client
2. Availability: Every request receives a response. The system allows operations all
the time and operations return quickly
3. Partition – Tolerance: the system continues to operate despite arbitrary
partitioning due to network failures

Why is Availability Important?
Availability = Reads / writes complete reliably and quickly.
• Data shows that a 500 ms increase in latency for operations at Amazon.com or at
Google.com can cause a 20% drop in revenue
• At Amazon , each added millisecond of latency implies a $6M yearly loss
• SLAs written by providers predominantly deal with latencies faced by clients

Why is Consistency Important?
Consistency = all nodes see same data at time, or reads return latest written value by
any client.
• When you access your bank or investment account via multiple clients, you want the
updates done from one client to be visible to other clients.
• When thousands of customers are looking to book a flight, all updates from any client
should be accessible by other clients.

Why is Partition - Tolerance Important?
Partitions can happen across datacenter when the network gets disconnected
• Internet router outages
• Under-sea cables cut
• DNS not working
• Partitions can also occur within a datacenter, e.g., a rack switch outage.
• Still we desire a system to continue functioning normally

DESIGN FOR FAILURE
Typical first year for a new cluster:
~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)
~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back)
~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours)
~1 network rewiring (rolling ~5% of machines down over 2-day span)
~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back)
~5 racks go wonky (40-80 machines see 50% packet loss)
~8 network maintenances (4 might cause ~30-minute random connectivity losses)
~12 router reloads (takes out DNS and external vips for a couple minutes)
~3 router failures (have to immediately pull traffic for an hour)
~dozens of minor 30-second blips for dns
~1000 individual machine failures
~thousands of hard drive failures
slow disks, bad memory, misconfigured machines, flaky machines, etc.
Note: Data taken from Jeff Dean’s slides (Google)

CAP Theorem Fallout
• Since partition-tolerance is essential / inevitable in today’s cloud computing systems,
CAP theorem implies that a system has to choose between consistency and
availability.
• Cassandra (AP System, Sacrifice Consistency)
• Eventual (Weak) consistency , Availability and Partition –Tolerance.
• Traditional RDBMSs (CA System, Partitions can’t happen [single node])
• Strong consistency over availability under a partition.

NOSQL Landscape

Eventual Consistency
• If all writes stop (to a key), then all its values (replicas) will coverage eventually.
• If writes continues , then system always tries to keep converging.
• Moving “wave” of updated values lagging behind the latest values sent by
clients , but always trying to catch up.
• But works well when there a few periods of low writes – system converges quickly.

RDBMS vs. Key value stores
• While RDBMS provide ACID.
• Atomicity
• Consistency
• Isolation
• Durability
• Key – value stores like Cassandra provide BASE
• Basically Available Soft-state Eventual Consistency
• Prefers Availability over Consistency

CONCLUSION
• Business vs Engineering decisions
• Would you rather be down or show wrong prices/inventory
• Would you rather be slow or show wrong prices/inventory
• New Paradigm
• Industries trying trying to run on availability
• Model systems to mimic laws of Physics
• Once done, cant revert. Why: because we cannot go back in time
• Once done, we can correct (if needed). This is eventual consistency
• Telecommunication Industry Example
• Commissions team: Pay advances but chargeback later

THANK YOU & QA
Vikash Kodati
• Email: Vikash.Kodati@t-mobile.com
• Yammer: https://www.yammer.com/t-mobile.com/users/vikashkodati
• Github: https://github.com/vikashkodati
• LinkedIn: /in/vikashkodati
• Twitter: @vikashkodati
• Blog: https://tmobileusa.sharepoint.com/portals/hub/personal/vikashkodati

REFERENCES
http://mwhittaker.github.io/2014/08/16/illustrated-proof-cap-theorem/
https://pinboard.in/u:sids/t:fifthel2015

Mandelbroth Set

CAP Theorem

More Related Content

What's hot

Viewers also liked

Similar to CAP Theorem

CAP Theorem

Editor's Notes