How to run Apache Cassandra over an IaaS of a distributed organization? What are the challenges we can solve with Cassandra?
One of the tool for measuring replication latency can be found here:
https://github.com/gitaroktato/cassandra-replication-latency-tools
3. ABOUT ME
Oresztész Margaritisz
• Java CC member since 2015
• Distributed / Cloud Computing
• NoSQL
• Agile
• DevOps
@gitaroktato gitaroktato https://www.linkedin.com/in/oreszteszgitaroktato
5. TYPICAL ISSUES WITH RDBMS
• EPAM needs global delivery of services
• 25 countries
• 4 continents
• 19,600 employees
• Data storage with traditional RDBMS can be cumbersome
• Configuration issues
• Migrating data between locations can be hard
• Master - Slave configuration in local site gives tradeoff in performance
24. CAPACITY PLANNING
• Replication latency between regions
• Transactions per second for the whole cluster
• 3 MEDIUM instance in EPAM-BY1
• 3 MEDIUM instance in AWS-AP-NORTHEAST
34. References pt. 2
Cassandra’s Rapid Read Protection
http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2
Editor's Notes
1% chance to respond in 1s
99% chance to respond in
Chance to respond 100 requests in 1s is 63%
Measured on client side
Top 25 most popular web pages were downloaded in simluation
Typical issues ….
Write doesn‘t scale
Synchronous Master-Master with 220ms latency is impossible
When master stops, best case is that you’re stuck with read-only mode
Replication drifts are hard to fix
Key collisions
You need partitioning strategy
Any coordinator, any writer, any reader
Client connection distribution
Built-in multi-region deployment
DHT
Gossip
Tunable consistency
ACKs
Rapid Read Protection
We don't have answers for all your questions: we need you