Your SlideShare is downloading. ×
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Göteborg Distributed: Eventual Consistency in Apache Cassandra
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Göteborg Distributed: Eventual Consistency in Apache Cassandra

309

Published on

A brief introduction to Cassandra and an overview of eventual consistency in Cassandra.

A brief introduction to Cassandra and an overview of eventual consistency in Cassandra.

Published in: Software, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
309
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. Jeremy Hanna Support Engineer Eventual Consistency in Apache Cassandra
  • 2. Cassandra Design •Massive scalability •High Performance •Reliability/Availability •Ease of use
  • 3. Developer friendly •CQL3 •Collections (List, Map, Set) •User defined types (2.1) •Cassandra native drivers •Native paging •Tracing •DataStax DevCenter tool •Atomic batches •Lightweight transactions •Triggers
  • 4. CQL3 examples CREATE KEYSPACE shire WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'eu' : 3, 'us-east' : 2}; SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC; INSERT INTO excelsior.clicks (userid, url, date, name)
 VALUES (
 3715e600-2eb0-11e2-81c1-0800200c9a66,
 ‘http://cassandra.apache.org',
 ‘2013-10-09', ‘Mary')
 USING TTL 86400; UPDATE users SET email = ‘charlie@wonka.com’ WHERE login = ‘cbucket64' IF email = ‘cbucket@wonka.com’ CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER; GRANT ALTER ON KEYSPACE shire TO gandalf;
  • 5. Ops Friendly •Simple design •no special role, no single point of failure •Lots of exposed metrics via JMX •Nodes and entire datacenters can go down with no loss of service •Rapid read protection •DataStax OpsCenter •Visual monitoring tool •REST interface to metric data •Free version •Hands-off services
  • 6. Some C* Users
  • 7. Cassandra Design •Massive scalability •Multi-datacenter •High Performance •Reliability/Availability •no SPOF, no special roles •Ease of Use
  • 8. Fully Distributed •Distributed systems introduce complex problems •What is “down”? •Individual server is down •Network link is down •Long server pause (e.g. GC pause) •Variable network latency •What do I do when a server is overloaded? •How can I stay available/reliable in such circumstances? •How can I maintain consistency? •How do I reconcile differences?
  • 9. CAP Theorem •Select two Consistency Availability Partition Tolerance
  • 10. Eventual Consistency •Individual server durability •Write to commitlog (batch or periodic sync) •Write to memtable (which gets flushed to disk) •Achieving consistency level •ONE, QUORUM, ALL •LOCAL_ONE, LOCAL_QUORUM •ANY, EACH_QUORUM (for writes) •Important to note: •All replicas always get a copy of the write
  • 11. Stuff happens •Overloaded node •“Down” node(s) •Network partition •Datacenter down •Outcome: inconsistency among replicas
  • 12. Continually cleaning •Hinted handoff •valid for a window of time •replays back to node restored to service •Read repair •after a read, check that data for agreement (digest) •read_repair_chance defaults to 0.1 •also dclocal_read_repair_chance •Anti-entropy service (manual repair) •Check for agreement for all data for range A-B •Run manual repair every gc grace seconds
  • 13. Advanced Repair •Manual repairs have limited resolution •“There is something different in these 1000 rows” •Therefore you have to stream all 1000 rows •Leads to overstreaming, waste •You can specify start/end keys •Get row level precision •More complicated to execute •DataStax has a repair service to help
  • 14. Safely consistent? •(LOCAL_)QUORUM reads/writes to be safe? •Ultimately depends on your requirements •Theoretical versus empirical
  • 15. Netflix Study •Two datacenters (US-East and US-West) •Wrote 500,000 records in each datacenter •50k write operations per second in each DC •Wrote at consistency level ONE •All data read back correctly in other DC •Tried 5 different runs, introduced failures along the way See planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/
  • 16. Practical Consistency •ONE is not suitable for all cases •Review your requirements, SLA •Do your own testing to get comfortable •Flexibility translates into the best performance for your use case
  • 17. Questions?

×