Orchestrating Cassandra
with Kubernetes
Challenges and Opportunities
Raghavendra Prabhu, Software Engineer, Yelp
University of Cambridge Engineering Event
About
● Me: Raghavendra D Prabhu
○ rprabhu@yelp.com / @randomsurfer / me@rdprabhu.com
○ Senior Software Engineer, Yelp
● Team: Database and Reliability Engineering (DRE)
● Where: London, UK & San Francisco, CA
● What: Databases at Yelp: MySQL, Cassandra, ZooKeeper
Yelp’s Mission
Connecting people with great
local businesses.
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration and Abstractions
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
“ A distributed system is one in which the failure of
a computer you didn't even know existed can
render your own computer unusable. ”
- Leslie Lamport
Desired Traits of
Distributed Systems
● Reliability
● Scalability
● Maintainability
Distributed systems fallacies
● The network is reliable
● Latency is zero
● Bandwidth is infinite
● The network is secure
● Topology doesn't change
● One administrator
● Zero Transport cost
● Homogeneous network
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration and Abstractions
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Powered by Cassandra
Your Reviews
1
2
1
Waitlist for diners2
Cassandra (C*)
● Distributed wide-column NoSQL datastore
● Leaderless / Multi-master
● Written in Java
● Multi-region
● Tunable consistency
● Write optimized
● Cloud-aware: gossip, failure detection, topology-aware, handoffs
C* at Yelp
● Both primary and derived data
● Use cases
● Deployed on Amazon Web Services (AWS)
● Network-attached storage
● Automated schema management
● Backups into S3
● ZooKeeper-based cluster coordination
us-west-2
us-west-1
us-east-1
Yelp Cassandra @ 100000 ft
Multi-region Cluster
Yelp (EC2) Cassandra @ 10000 ft
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Orchestration
Pets vs Cattle
Velocity
● Churn
● Feature
● Scale
Safety
● Reliable
● Consistent
● Available
Orchestration
● What is Orchestration
● Why Orchestrate
○ Reliability, Scalability, Maintainability
● Clean abstractions
● Extended Control Plane
○ Helps us sleep well ;)
● C* Operator + PaaS + Kubernetes
Orchestrator
Orchestrator
Kubernetes / k8s
● Popular Open Source Container-based orchestration
● Actively developed
● Organizes containers into pods
● Stateful and stateless applications
● Well-defined building blocks for distributed systems
● Integrates into our PaaS
○ k8s: generic but extensible
Orchestrator
● Yelp PaaSTA: Stateless Microservices on Mesos
○ Clusters on Kubernetes
● Few thousand microservices deployed and growing
● Hundreds of deployments every day
● Handles compute, storage and network abstractions
● Why PaaSTA
○ Clusterman
○ Spot and statically-reserved fleet
PaaSTA: Kubernetes/Mesos at Yelp
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration and Abstractions
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Orchestrator
Cassandra
Operator
C* Operator: Intro
● Developed by DRE team
● Defines a custom resource for k8s
○ Statefulset, Container spec, Storage and more
● operator-sdk and controller-runtime
○ Ideal for non-trivial clustered services
● Extended control plane of our C* clusters
● Written in Golang and deployed in k8s itself
C* Operator: Responsibilities
● Creating cluster from specs
● Scaling the cluster up and down
● Safe and Reliable Change Deployments
● Lifecycle Management
● Multi-region coordination
● Credential management
● Optimizing resource usage
us-west-2
us-west-1
us-east-1
Yelp Cassandra @ 100000 ft + Operator
Multi-region Cluster
us-west-2
us-west-1
us-east-1
Exclusive Leases
Where we
are heading
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration and Abstractions
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Opportunities &
Challenges
Opportunities
Toil Avoidance
Lifecycle Management
Agile Deployment
Autoscaling
0
1x
2x
3x
Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
Thanksgiving /
Holiday
Requests
Security
Challenges
Isolation
Performance
Config Management
Observability
Deployment Strategies
Overview
● Introduction
● Cassandra and Cassandra at Yelp
● Orchestration and Abstractions
● Cassandra Operator
● Opportunities and Challenges
● Conclusion
Distributed systems fallacies
● The network is reliable
● Latency is zero
● Bandwidth is infinite
● The network is secure
● Topology doesn't change
● One administrator
● Zero Transport cost
● Homogeneous network
Distributed Systems:
Building Reliable Systems
Out of Unreliable Components
We're Hiring!
www.yelp.com/careers/
Software Engineering Interns:
London & Hamburg
Questions?
Credits
● Apache cassandra logo
● https://puppet.com/
● https://www.terraform.io/
● https://kubernetes.io/
● https://etcd.io/
● https://xkcd.com/
● https://aws.amazon.com/architecture/icons/
● https://dataintensive.net/
● https://www.yelp.com/brand
● https://thenounproject.com/
● https://upload.wikimedia.org/wikipedia/commons/2/2e/Chicago_Symphony_O
rchestra_2005.jpg
● https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Blank_US_map
_borders_labels.svg/1000px-Blank_US_map_borders_labels.svg.png
@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp

Orchestrating Cassandra with Kubernetes: Challenges and Opportunities