SQL -> Cassandra -> K8S ->
Multi-cluster k8ssandra
3
Intro
01
Agenda
NoSQL/Cassandra
02
k8ssandra
03
Multi-cluster/cloud
k8ssandra operator
04
Demos
05
What’s next?
06
➢ Developer/Architect
➢ Mechanical Engineer (so many moons
ago)
➢ Distributed systems
➢ Love to teach and communicate
➢ Inner loop == developer productivity
Developer Advocate @rags
@ragsns
Raghavan “Rags” Srinivas
@ragss
��🇷
Cedrick
Lunven
��🇸
Rags
Srinivas
��🇺
Aleksandr
Volochnev
DataStax Developers Crew
��🇹
Stefano
Lottini
��🇧
Jack
Fryer
��🇸
Ryan
Welford
��🇸
David
Gilardi
��🇸
Kirsten
Hunter
��🇸
Sonia
Siganporia
��🇸
Artem
Chebotko
��🇸
David
Dieruf
��🇸
Aaron
Ploetz
��🇰
Gary
Harvey
6
Intro
01
Agenda
NoSQL/Cassandra
02
k8ssandra
03
Multi-cluster/cloud
k8ssandra operator
04
Demos
05
What’s next?
06
Origin of the term “NoSQL”
7
● Meetup name on June 11, 2009 in San Francisco
○ Catchy hashtag intended to refer to databases like BigTable and DynamoDB
○ Meetup presentations: Cassandra, MongoDB, CouchDB, HBase, Voldemort,
Dynomite, and Hypertable
● Sometimes referred to “Not only SQL”
Relational vs. NoSQL
8
● Relational
○ Standard relational data model
and language SQL
○ ACID transactions
○ Integration database
○ Designed for a single machine
○ Hard to scale
○ Impedance mismatch
● NoSQL
○ Variety of data models and
languages
○ Lower-guarantee transactions
○ Application database
○ Designed for a cluster
○ Easy to scale
○ Better database-app
compatibility
The CAP Theorem
9
Availability
Consistency
AP
CA
Partition tolerance
CP
Always responds,
may not always return
the most recent write
pick two
Every read receives
the most recent write
or an error
Operates in the
presence of network
partition failures
The CAP Theorem
10
Availability
Consistency
AP
CA
Partition
tolerance
CP
pick two
Master-less (Peer-to-Peer) Architecture
1. NO Single Point of Failure
2. Scales for writes and reads
3. Application can contact any node
(in case of failure - just contact next one)
11
Why partitioning?
Because scaling doesn’t have to be [s]hard!
Big Data doesn’t fit to a single server, splitting it into
chunks we can easily spread them over dozens, hundreds
or even thousands of servers, adding more if needed.
Is Cassandra AP or CP?
Cassandra is configurably consistent. In any moment of the time,
for any particular query you can set the Consistency Level you
require to have. It defines how many CONFIRMATIONS you’ll
wait before the response is dispatched;
13
PreparedStatement pstmt = session.prepare(
"INSERT INTO product (sku, description) VALUES (?, ?)"
);
pstmt.setConsistencyLevel(ConsistencyLevel.ONE);
cqlsh> CONSISTENCY
Current consistency level is QUORUM.
cqlsh> CONSISTENCY ALL
Consistency level set to ALL.
Cassandra Biggest Users (and Developers)
dtsx.io/cassandra-at-netflix
And many others...
15
Intro
01
Agenda
NoSQL/Cassandra
02
k8ssandra
03
Multi-cluster/cloud
k8ssandra operator
04
Demos
05
What’s next?
06
Can you run databases on K8s?
https://thenewstack.io/a-case-for-databases-on-kubernetes-from-a-former-skeptic/
17
Cloud native, scalable data tier with administration tools and easy data access
Installation
1
8
helm repo add k8ssandra https://helm.k8ssandra.io/
helm repo update
helm install k8ssandra k8ssandra/k8ssandra
Local Installation (Development) Cloud Installation (Production)
19
Local and Cloud Installation Options
Apache
Cassandra®
Cass-Operator
Cassandra
Medusa
(backup/restore)
Metrics
Collector
📁S3, GCP,...
Repear
(repair)
Graphql endpoint +
Playground
rest/doc API +
Swagger
Authentication
Endpoint
Repear UI
Traefik UI
Prometheus UI
Grafana
CQL Endpoint
20
21
Intro
01
Agenda
NoSQL/Cassandra
02
k8ssandra
03
Multi-cluster/cloud
k8ssandra operator
04
Demos
05
What’s next?
06
Pushing Helm to its limit
https://thenewstack.io/we-pushed-helm-to-the-limit-then-built-a-kubernetes-operator/
Why Multi-cluster
Cassandra is designed for multi-region
24
● Partition tolerant
● Each node in the cluster maintains the full topology
● Nodes automatically route traffic to nearby neighbors
● Data is automatically and asynchronously replicated
● The cluster is homogenous
● Any node can service any client request
● Clients can be configured to automatically route traffic to the local datacenter
Kubernetes was not designed for multi-region
25
● Increased latencies
● The cost is higher consensus request latency from crossing data center boundaries
● Loss of connectivity to ectd could cause outages
● Services should route traffic to nearby endpoints
K8ssandra Operator
k8ssandra operator
We're getting there… before this users have vastly
simplified operations at the datacenter level, but still
have to manage each of those independently of
each other.
Overview of k8ssandra operator
28
● Operator for K8ssandra
● Supports Multi-datacenter/region Cassandra clusters
● K8ssandra Operator consists of a control plane and a data plane
● The control plane creates and manages objects that only exist in the API server
● The control plane can only be installed in a single cluster (for now)
● The data plane can be installed in any number of clusters
● The control plane cluster can dual function as the data plane
Overview
29
K8ssandraCluster
30
kind: K8ssandraCluster
metadata:
name: demo
spec:
cassandra:
cluster: demo
serverVersion: "4.0.1"
datacenters:
- metadata:
name: dc1
k8sContext: east
size: 3
stargate:
size: 1
- metadata:
name: dc2
k8sContext: west
size: 3
stargate:
size: 1
● The operator creates a CassandraDatacenter for each object
in the datacenters array
● The operator creates a Stargate for each stargate
○ Co-located with the CassandraDatacenter
● k8sContext tells the operator in which cluster to create the
objects
K8ssandraCluster
31
ClientConfig
32
apiVersion:
config.k8ssandra.io/v1beta1
kind: ClientConfig
name: east
namespace: k8ssandra-operator
spec:
contextName: east
kubeConfigSecret:
name: east-config
● Basically a pointer to a secret that contains a kubeconfig for a remote
cluster
● The operator queries for ClientConfigs at startup and for each one,
○ Creates a Cluster object
○ Adds the Cluster to the Manager
○ Stores the remote Client object in a cache
○ Cache entries are keyed off of the k8s context name
○ This has to be done at startup (for now)
● Cluster needs to be added to Manager before Manager starts
○ Otherwise the Cache won't be started
○ Fixed in
https://github.com/kubernetes-sigs/controller-runtime/pull/1681
33
Intro
01
Agenda
NoSQL/Cassandra
02
k8ssandra
03
Multi-cluster/cloud
k8ssandra operator
04
Demos
05
What’s next?
06
EKS using EKS Kubefed
Multi-cluster with k8ssandra operator
36
Housekeeping
Live and Hands-on
01
Agenda
Our Goal
Use case and Technologies
02
Database Setup
Data model & Astra DB
03
AI
Train a text classifier
04
API
Bring to production
05
What’s next?
Quiz, Homework, Agenda
06
Resources
https://k8ssandra.io
https://k8ssandra.io/blog/tutorials/how-to/multi-region-cassandra-on-eks-with-k8ssandra-and-kubefed/
https://www.socallinuxexpo.org/scale/19x/speakers/raghavan-srinivas
https://github.com/datastaxdevs/
dtsx.io/discord
!discord
Datastax Developers Discord (18k+)
Badges
Thank You!

Multi-cluster k8ssandra