SlideShare a Scribd company logo
1 of 36
Running Cassandra on Apache Mesos
across multiple datacenters at Uber
Abhishek Verma (verma@uber.com)
About me
● MS (2010) and PhD (2012) in Computer Science from University of Illinois
at Urbana-Champaign
● 2 years at Google, worked on Borg and Omega and first author of the
Borg paper
● ~ 1 year at TCS Research, Mumbai
● Currently at Uber working on running Cassandra on Mesos
© DataStax, All Rights Reserved. 2
“Transportation as reliable as running water,
everywhere, for everyone”
“Transportation as reliable as running
water, everywhere, for everyone”
99.99%
“Transportation as reliable as running water,
everywhere, for everyone”
efficient
Cluster Management @ Uber
● Statically partitioned machines across different services
● Move from custom deployment system to everything running on Mesos
● Gain efficiency by increasing machine utilization
○ Co-locate services on the same machine
○ Can lead to 30% fewer machines1
● Build stateful service frameworks to run on Mesos
© DataStax, All Rights Reserved. 6
“Large-scale cluster management at Google with Borg”, EuroSys 2015
Apache Mesos
7
● Mesos abstracts CPU, memory, storage away from machines
○ program like it’s a single pool of resources
● Linear scalability
● High availability
● Native support for launching containers
● Pluggable resource isolation
● Two level scheduling
Apache Cassandra
8
● Horizontal scalability
○ Scales reads and writes linearly as new nodes are added
● High availability
○ Fault tolerant with tunable consistency levels
● Low latency, solid performance
● Operational simplicity
○ Homogeneous cluster, no SPOF
● Rich data model
Uber
● Abhishek Verma
● Karthik Gandhi
● Matthias Eichstaedt
● Varun Gupta
● Zhitao Li
DC/OS Cassandra Service
9
Mesosphere
● Chris Lambert
● Gabriel Hartmann
● Keith Chambers
● Kenneth Owens
● Mohit Soni
https://github.com/mesosphere/dcos-cassandra-service
Cassandra service architecture
10
Framework
dcos-cassandra-service
Mesos agent
Mesos master
(Leader)
Web interface
Control plane API
C*Cluster 1 C*Cluster 2
Aurora (DC1)
Mesos master
(Standby)
C*Node
1a
C*Node
2a
Mesos agent
C*Node
1b
C*Node
2b
Mesos agent
C*Node
1c
Aurora (DC2)
Deployment system
DC2
ZK ZK
ZK
ZooKeeper
quorum
Client App
uses CQL
interface
CQL CQL CQL CQL CQL
. . .
Cassandra Mesos primitives
11
● Mesos containerizer
● Override 5 ports in configuration (storage_port,
ssl_storage_port, native_transport_port, rpc_port, jmx_port)
● Use persistent volumes
○ Data stored outside of the sandbox directory
○ Offered to the same task if it crashes and restarts
● Use dynamic reservation
Custom seed provider
12
Node 1
10.0.0.1
http://scheduler/seeds
{
isSeed: true
seeds: [ ]
}
Node 1
10.0.0.1
Node 2
10.0.0.2
Node 3
10.0.0.3
Node 2
10.0.0.2
{
isSeed: true
seeds: [ 10.0.0.1]
}
{
isSeed: false
seeds: [ 10.0.0.1,
10.0.0.2]
}
Node 3
10.0.0.3
Number of Nodes = 3
Number of Seeds = 2
Cassandra Service: Features
13
● Custom seed provider
● Increasing cluster size
● Changing Cassandra configuration
● Replacing a dead node
● Backup/Restore
● Cleanup
● Repair
Plan, Phases and Blocks
14
● Plan
○ Phases
■ Reconciliation
■ Deployment
■ Backup
■ Restore
■ Cleanup
■ Repair
Spinning up a new Cassandra cluster
15
https://www.youtube.com/watch?v=gbYmjtDKSzs
Automate Cassandra operations
16
● Repair
○ Synchronize all data across replicas
■ Last write wins
○ Anti-entropy mechanism
○ Repair primary key range node-by-node
● Cleanup
○ Remove data whose ownership has changed
■ Because of addition or removal of nodes
Cleanup operation
17
https://www.youtube.com/watch?v=VxRLSl8MpYI
Failure scenarios
18
● Executor failure
○ Restarted automatically
● Cassandra daemon failure
○ Restarted automatically
● Node failure
○ Manual REST endpoint to replace node
● Scheduling framework failure
○ Existing nodes keep running, new nodes cannot be added
Experiments
19
Cluster startup
20
For each node in the cluster:
1.Receive and accept offer
2.Launch task
3.Fetch executor, JRE, Cassandra binaries from S3/HDFS
4.Launch executor
5.Launch Cassandra daemon
6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
Cluster startup time
21
Framework can start ~ one new node per minute
Tuning JVM Garbage collection
22
Changed from CMS to G1 garbage collector
Left: https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L213
Right: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_tune_jvm_c.html?scroll=concept_ds_sv5_k4w_dk__tuning-java-garbage-collection
Tuning JVM Garbage collection
23
Metric CMS G1
G1 : CMS
Factor
op rate 1951 13765 7.06
latency mean (ms) 3.6 0.4 9.00
latency median (ms) 0.3 0.3 1.00
latency 95th percentile (ms) 0.6 0.4 1.50
latency 99th percentile (ms) 1 0.5 2.00
latency 99.9th percentile (ms) 11.6 0.7 16.57
latency max (ms) 13496.9 4626.9 2.92
G1 garbage collector is much better without any tuning
Using cassandra-stress, 32 threads client
Cluster Setup
24
● 3 nodes
● Local DC
● 24 cores, 128 GB RAM, 2TB SAS drives
● Cassandra running on bare metal
● Cassandra running in a Mesos container
Bare metal Mesos
Read Latency
25
Mean: 0.38 ms
P95: 0.74 ms
P99: 0.91 ms
Mean: 0.44 ms
P95: 0.76 ms
P99: 0.98 ms
Bare metal Mesos
Read Throughput
26
Bare metal Mesos
Write Latency
27
Mean: 0.43 ms
P95: 0.94 ms
P99: 1.05 ms
Mean: 0.48 ms
P95: 0.93 ms
P99: 1.26 ms
Bare metal Mesos
Write Throughput
28
Running across datacenters
29
● Four datacenters
○ Each running dcos-cassandra-service instance
○ Sync datacenter phase
■ Periodically exchange seeds with external dcs
● Cassandra nodes gossip topology
○ Discover nodes in other datacenters
Asynchronous cross-dc replication latency
30
● Write a row to dc1 using consistency level LOCAL_ONE
○ Write timestamp to a file when operation completed
● Spin in a loop to read the same row using consistency LOCAL_ONE in dc2
○ Write timestamp to a file when operation completed
● Difference between the two gives asynchronous replication latency
○ p50 : 44.69ms, p95 : 46.38ms, p99:47.44ms
● Round trip ping latency
○ 77.8ms
Cassandra on Mesos in Production
31
● ~20 clusters replicating across two datacenters (west and east coast)
● ~300 machines across two datacenters
● Largest 2 clusters: more than a million writes/sec and ~100k reads/sec
● Mean read latency: 13ms and write latency: 25ms
● Mostly use LOCAL_QUORUM consistency level
Questions?
32
verma@uber.com
Cluster startup
33
For each node in the cluster:
1.Receive and accept offer
2.Launch task
3.Fetch executor, JRE, Cassandra binaries from S3/HDFS
4.Launch executor
5.Launch Cassandra daemon
6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
Aurora hogging offers
Aurora hogs offers
34
● Aurora designed to be the only framework running on Mesos and
controlling all the machines
● Holds on to all received offers
○ Does not accept or reject them
● Mesos waits for --offer_timeout time duration and rescinds offer
● --offer_timeout config
○ Duration of time before an offer is rescinded from a framework. This helps fairness when
running frameworks that hold on to offers, or frameworks that accidentally drop offers. If
not set, offers do not timeout.
Long term solution: dynamic reservations
35
● Dynamically reserve all the machines resources to the “cassandra”
role
● Resources are offered only to cassandra frameworks
● Improves node startup time: 30s/node
● Node failure replacement or updates are much faster
Using the Cassandra cluster
36
https://www.youtube.com/watch?v=qgqO39DteHo

More Related Content

What's hot

What's hot (20)

Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DBDistributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
Distributed Databases Deconstructed: CockroachDB, TiDB and YugaByte DB
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 

Viewers also liked

Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
DataStax
 

Viewers also liked (11)

Building Real-Time Applications with Android and WebSockets
Building Real-Time Applications with Android and WebSocketsBuilding Real-Time Applications with Android and WebSockets
Building Real-Time Applications with Android and WebSockets
 
Just Add Reality: Managing Logistics with the Uber Developer Platform
Just Add Reality: Managing Logistics with the Uber Developer PlatformJust Add Reality: Managing Logistics with the Uber Developer Platform
Just Add Reality: Managing Logistics with the Uber Developer Platform
 
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal..."Building Data Foundations and Analytics Tools Across The Product" by Crystal...
"Building Data Foundations and Analytics Tools Across The Product" by Crystal...
 
Taxi Startup Presentation for Taxi Company
Taxi Startup Presentation for Taxi CompanyTaxi Startup Presentation for Taxi Company
Taxi Startup Presentation for Taxi Company
 
Open-source Infrastructure at Lyft
Open-source Infrastructure at LyftOpen-source Infrastructure at Lyft
Open-source Infrastructure at Lyft
 
Uber's new mobile architecture
Uber's new mobile architectureUber's new mobile architecture
Uber's new mobile architecture
 
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
Geospatial Indexing at Scale: The 15 Million QPS Redis Architecture Powering ...
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
31 - IDNOG03 - Bergas Bimo Branarto (GOJEK) - Scaling Gojek
 
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
Maintaining Consistency Across Data Centers (Randy Fradin, BlackRock) | Cassa...
 
Culture
CultureCulture
Culture
 

Similar to Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
NETWAYS
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
Yulian Slobodyan
 

Similar to Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016 (20)

From swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container serviceFrom swarm to swam-mode in the CERN container service
From swarm to swam-mode in the CERN container service
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos Fully fault tolerant real time data pipeline with docker and mesos
Fully fault tolerant real time data pipeline with docker and mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
 
Containerization - The DevOps Revolution
Containerization - The DevOps RevolutionContainerization - The DevOps Revolution
Containerization - The DevOps Revolution
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Container orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platformsContainer orchestration in geo-distributed cloud computing platforms
Container orchestration in geo-distributed cloud computing platforms
 
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
Dataservices based on mesos and kafka kostiantyn bokhan dataconf 21 04 18
 
Mosix Cluster
Mosix ClusterMosix Cluster
Mosix Cluster
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
Jörg Schad - NO ONE PUTS Java IN THE CONTAINER - Codemotion Milan 2017
 
Introduction to mesos
Introduction to mesosIntroduction to mesos
Introduction to mesos
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Enhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance ComputingEnhancing and Preparing TIMES for High Performance Computing
Enhancing and Preparing TIMES for High Performance Computing
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million PeopleWSO2Con2024 - Unleashing the Financial Potential of 13 Million People
WSO2Con2024 - Unleashing the Financial Potential of 13 Million People
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2Driving Innovation: Scania's API Revolution with WSO2
Driving Innovation: Scania's API Revolution with WSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAMWSO2Con2024 - Organization Management: The Revolution in B2B CIAM
WSO2Con2024 - Organization Management: The Revolution in B2B CIAM
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 

Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* Summit 2016

  • 1. Running Cassandra on Apache Mesos across multiple datacenters at Uber Abhishek Verma (verma@uber.com)
  • 2. About me ● MS (2010) and PhD (2012) in Computer Science from University of Illinois at Urbana-Champaign ● 2 years at Google, worked on Borg and Omega and first author of the Borg paper ● ~ 1 year at TCS Research, Mumbai ● Currently at Uber working on running Cassandra on Mesos © DataStax, All Rights Reserved. 2
  • 3. “Transportation as reliable as running water, everywhere, for everyone”
  • 4. “Transportation as reliable as running water, everywhere, for everyone” 99.99%
  • 5. “Transportation as reliable as running water, everywhere, for everyone” efficient
  • 6. Cluster Management @ Uber ● Statically partitioned machines across different services ● Move from custom deployment system to everything running on Mesos ● Gain efficiency by increasing machine utilization ○ Co-locate services on the same machine ○ Can lead to 30% fewer machines1 ● Build stateful service frameworks to run on Mesos © DataStax, All Rights Reserved. 6 “Large-scale cluster management at Google with Borg”, EuroSys 2015
  • 7. Apache Mesos 7 ● Mesos abstracts CPU, memory, storage away from machines ○ program like it’s a single pool of resources ● Linear scalability ● High availability ● Native support for launching containers ● Pluggable resource isolation ● Two level scheduling
  • 8. Apache Cassandra 8 ● Horizontal scalability ○ Scales reads and writes linearly as new nodes are added ● High availability ○ Fault tolerant with tunable consistency levels ● Low latency, solid performance ● Operational simplicity ○ Homogeneous cluster, no SPOF ● Rich data model
  • 9. Uber ● Abhishek Verma ● Karthik Gandhi ● Matthias Eichstaedt ● Varun Gupta ● Zhitao Li DC/OS Cassandra Service 9 Mesosphere ● Chris Lambert ● Gabriel Hartmann ● Keith Chambers ● Kenneth Owens ● Mohit Soni https://github.com/mesosphere/dcos-cassandra-service
  • 10. Cassandra service architecture 10 Framework dcos-cassandra-service Mesos agent Mesos master (Leader) Web interface Control plane API C*Cluster 1 C*Cluster 2 Aurora (DC1) Mesos master (Standby) C*Node 1a C*Node 2a Mesos agent C*Node 1b C*Node 2b Mesos agent C*Node 1c Aurora (DC2) Deployment system DC2 ZK ZK ZK ZooKeeper quorum Client App uses CQL interface CQL CQL CQL CQL CQL . . .
  • 11. Cassandra Mesos primitives 11 ● Mesos containerizer ● Override 5 ports in configuration (storage_port, ssl_storage_port, native_transport_port, rpc_port, jmx_port) ● Use persistent volumes ○ Data stored outside of the sandbox directory ○ Offered to the same task if it crashes and restarts ● Use dynamic reservation
  • 12. Custom seed provider 12 Node 1 10.0.0.1 http://scheduler/seeds { isSeed: true seeds: [ ] } Node 1 10.0.0.1 Node 2 10.0.0.2 Node 3 10.0.0.3 Node 2 10.0.0.2 { isSeed: true seeds: [ 10.0.0.1] } { isSeed: false seeds: [ 10.0.0.1, 10.0.0.2] } Node 3 10.0.0.3 Number of Nodes = 3 Number of Seeds = 2
  • 13. Cassandra Service: Features 13 ● Custom seed provider ● Increasing cluster size ● Changing Cassandra configuration ● Replacing a dead node ● Backup/Restore ● Cleanup ● Repair
  • 14. Plan, Phases and Blocks 14 ● Plan ○ Phases ■ Reconciliation ■ Deployment ■ Backup ■ Restore ■ Cleanup ■ Repair
  • 15. Spinning up a new Cassandra cluster 15 https://www.youtube.com/watch?v=gbYmjtDKSzs
  • 16. Automate Cassandra operations 16 ● Repair ○ Synchronize all data across replicas ■ Last write wins ○ Anti-entropy mechanism ○ Repair primary key range node-by-node ● Cleanup ○ Remove data whose ownership has changed ■ Because of addition or removal of nodes
  • 18. Failure scenarios 18 ● Executor failure ○ Restarted automatically ● Cassandra daemon failure ○ Restarted automatically ● Node failure ○ Manual REST endpoint to replace node ● Scheduling framework failure ○ Existing nodes keep running, new nodes cannot be added
  • 20. Cluster startup 20 For each node in the cluster: 1.Receive and accept offer 2.Launch task 3.Fetch executor, JRE, Cassandra binaries from S3/HDFS 4.Launch executor 5.Launch Cassandra daemon 6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL
  • 21. Cluster startup time 21 Framework can start ~ one new node per minute
  • 22. Tuning JVM Garbage collection 22 Changed from CMS to G1 garbage collector Left: https://github.com/apache/cassandra/blob/cassandra-2.2/conf/cassandra-env.sh#L213 Right: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_tune_jvm_c.html?scroll=concept_ds_sv5_k4w_dk__tuning-java-garbage-collection
  • 23. Tuning JVM Garbage collection 23 Metric CMS G1 G1 : CMS Factor op rate 1951 13765 7.06 latency mean (ms) 3.6 0.4 9.00 latency median (ms) 0.3 0.3 1.00 latency 95th percentile (ms) 0.6 0.4 1.50 latency 99th percentile (ms) 1 0.5 2.00 latency 99.9th percentile (ms) 11.6 0.7 16.57 latency max (ms) 13496.9 4626.9 2.92 G1 garbage collector is much better without any tuning Using cassandra-stress, 32 threads client
  • 24. Cluster Setup 24 ● 3 nodes ● Local DC ● 24 cores, 128 GB RAM, 2TB SAS drives ● Cassandra running on bare metal ● Cassandra running in a Mesos container
  • 25. Bare metal Mesos Read Latency 25 Mean: 0.38 ms P95: 0.74 ms P99: 0.91 ms Mean: 0.44 ms P95: 0.76 ms P99: 0.98 ms
  • 26. Bare metal Mesos Read Throughput 26
  • 27. Bare metal Mesos Write Latency 27 Mean: 0.43 ms P95: 0.94 ms P99: 1.05 ms Mean: 0.48 ms P95: 0.93 ms P99: 1.26 ms
  • 28. Bare metal Mesos Write Throughput 28
  • 29. Running across datacenters 29 ● Four datacenters ○ Each running dcos-cassandra-service instance ○ Sync datacenter phase ■ Periodically exchange seeds with external dcs ● Cassandra nodes gossip topology ○ Discover nodes in other datacenters
  • 30. Asynchronous cross-dc replication latency 30 ● Write a row to dc1 using consistency level LOCAL_ONE ○ Write timestamp to a file when operation completed ● Spin in a loop to read the same row using consistency LOCAL_ONE in dc2 ○ Write timestamp to a file when operation completed ● Difference between the two gives asynchronous replication latency ○ p50 : 44.69ms, p95 : 46.38ms, p99:47.44ms ● Round trip ping latency ○ 77.8ms
  • 31. Cassandra on Mesos in Production 31 ● ~20 clusters replicating across two datacenters (west and east coast) ● ~300 machines across two datacenters ● Largest 2 clusters: more than a million writes/sec and ~100k reads/sec ● Mean read latency: 13ms and write latency: 25ms ● Mostly use LOCAL_QUORUM consistency level
  • 33. Cluster startup 33 For each node in the cluster: 1.Receive and accept offer 2.Launch task 3.Fetch executor, JRE, Cassandra binaries from S3/HDFS 4.Launch executor 5.Launch Cassandra daemon 6.Wait for it’s mode to transition STARTING -> JOINING -> NORMAL Aurora hogging offers
  • 34. Aurora hogs offers 34 ● Aurora designed to be the only framework running on Mesos and controlling all the machines ● Holds on to all received offers ○ Does not accept or reject them ● Mesos waits for --offer_timeout time duration and rescinds offer ● --offer_timeout config ○ Duration of time before an offer is rescinded from a framework. This helps fairness when running frameworks that hold on to offers, or frameworks that accidentally drop offers. If not set, offers do not timeout.
  • 35. Long term solution: dynamic reservations 35 ● Dynamically reserve all the machines resources to the “cassandra” role ● Resources are offered only to cassandra frameworks ● Improves node startup time: 30s/node ● Node failure replacement or updates are much faster
  • 36. Using the Cassandra cluster 36 https://www.youtube.com/watch?v=qgqO39DteHo