Multi-Region Cassandra Clusters

Novel Multi-region Clusters
Cassandra Deployments Split Between Heterogeneous Data Centres
with NAT & DNS-SD
#CassandraSummit

Adam Zegelin
Co-founder & VP of Engineering
www.instaclustr.com
adam@instaclustr.com 
@adamzegelin

Instaclustr
• Instaclustr provides Cassandra-as-a-service in the cloud 
(Currently only on AWS — Google Cloud in private beta)
• We currently manage 50+ Cassandra nodes for various customers
• We often get requests to do cool things — and try and make it
happen!

Multi-DC @ Instaclustr
• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud
• Works out-of-the-box today.
• Requires per-node public IP
• Private network clusters ⇄ Cloud clusters
• Easy if your private network allocates per-node public IP addresses
• VPNs
• Something else?

• Overview of multi- region/data centre clusters
• What is supported out-of-the-box
• Alternative solutions
• Supporting technology overview (NAT/PAT and DNS-SD)
• Implementation

Single Node
• What you get from running
apt-get install
cassandra and /usr/bin/
cassandra
• Fragile (no redundancy)
• Dev/test/sandbox only
C*

Multi-node, Single Data Centre
• Two or more servers running
Cassandra within one DC
• Replication of data
(redundancy)
• Increased capacity (storage +
throughput)
• Baseline for production
clusters
C* C*
C*

Multi-node, Multi-DC
• Cassandra running in two or
more data centres
• Global deployments
• Data near your customers
(reduced latency)
• Supported out-of-the-box
C* C*
C*
C* C*
C*
C* C*
C*

Snitches
• Understands data centres and racks
• Implementation may automatically determine node DC and rack 
(EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads
a .properties ﬁle)
• Node DC and rack is advertised via Gossip
• Determine node proximity (estimated link latency)
• Cluster may use a combination of Snitch implementations

Data Centres
• Collection of Racks
• Complete replications
• Geographically separate
• Possibly high-latency interconnects 
(e.g. East Coast US → Sydney, ~300ms round-trip)

Racks
• Collection of nodes
• May fail as a single unit
• Modelled on the traditional DC rack/cage 
(n-servers running of a UPS)

☁
• Amazon Web Services 
(use EC2MultiRegionSnitch)
• Data Centre ≡ AWS Region 
(e.g. US_East_1, AP_SOUTHEAST_2)
• Rack ≡ Availability Zone 
(e.g. us-east-1a, ap-southeast-2b)
• Google Cloud Platform 
(no out-of-the-box auto-conﬁguring snitch — use GossipingPropertiesFileSnitch, or roll your own!)
• Data Centre ≡ GCP Region 
(e.g. US, Europe)
• Rack ≡ Zone 
(e.g. us-central1-a, europe-west1-a)

Data Centre Aware
• Cassandra is data centre aware
• Only fetch data from a remote DC if absolutely required 
(remote data is more “expensive”)
• Clients can be made data centre aware
• If your app knows its DC, client will talk to the closest DC

Cluster cluster = Cluster.builder()
.addContactPoint(…)
.withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1"))
.build();

Multi DC Support
• Per-node public (internet-facing) IP address
• Optionally, per-node private IP address
• Per-node public address is used for inter-data centre connectivity
• Per node private address is used for intra-data centre connectivity

Multi DC Support
• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional
• Easy to setup per-node public and private addresses
• Private network clusters ⇄ Cloud clusters
• Private networks: 𝑛 public addresses, shared by 𝑥 private
addresses. Not 1 ↔ 1 
(where often 𝑥 > 𝑛)
• done via Network Address Translation

IPv4 Address Space Exhaustion
Source: http://www.potaroo.net/tools/ipv4/

Multi-DC Support
• IPv4
• Address exhaustion
• Over time, will become more expensive to purchase addresses
• Wasteful 
(being a good internet citizen)

Alternatives
• IPv6
• Java supports it ∴ Cassandra probably supports it 
(untested by us)
• Global IPv6 adoption is ~4% 
(according to Google — google.com/intl/en/ipv6/statistics.html)
• IPv6/IPv4 hybrid 
(Teredo, 6over4, et. al.)
• AWS EC2 does not support IPv6. End of story. 
(Elastic Load Balancer does support IPv6)

Alternatives
• VPNs
• tinc, OpenVPN, etc.
• All private address space — no dual addressing
• Requires multiple links — between every DC and per client
• Address space overlaps between multiple VPNs
• Connectivity to multiple clusters an issue 
(for multi-cluster apps, centralised monitoring, etc)

Data Centres Links
3 3
5 10
7 21

Alternatives
• Network Address Translation (NAT) 
(aka IP Masquerading or Port Address Translation (PAT))
• Deployed on most private networks
• Connectivity between private network clusters ⇄ Cloud clusters
• Supports client connectivity to multiple clusters

NAT Basics
• Re-maps IP address spaces 
(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)
• 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 
(where often n = 1, 𝑥 ＞ 𝑛)
• Port Address Translation
• Private port ↔ Public port
• Outbound connections only without port forwarding or NAT traversal
• Per DC gateway device — performs NAT and port forwarding

NAT with Inbound Connections
• Static port forwarding 
(conﬁgured on the gateway)
• Automatic port forwarding — UPnP, NAT-PMP/PCP 
(conﬁgured by the application, e.g. Cassandra)
• NAT Traversal — STUN, ICE, etc.

NAT + C∗
Situation: 𝑛 Cassandra nodes, 1 public address per data centre
• Port forward different public ports for each node
• Advertise assigned ports
• Modify Cassandra and client applications to connect to
advertised ports

Advertising Port Mappings
• Extend Cassandra Gossip
• Include port numbers in node address announcements
• Allow seed node addresses to include port numbers
• Allow multiple nodes to have identical public & private addresses 
(only port numbers differ per DC)
• How to bootstrap? SIP?
• Cassandra must be aware of the allocated ports in order to advertise
• Hard if C* is not directly responsible for the port mapping 
(e.g. static port forwarding)
• Too many modiﬁcations to internals

• DNS-SD — dns-sd.org 
(aka Bonjour/Zeroconf)
• Reads — works with existing DNS implementations 
(it’s just a DNS query)
• Even inside restrictive networks, DNS usually works
• Combination of DNS TXT, SRV and PTR records.
• Updates
• via DNS Update & TSIG — supported by bind
• via API — e.g. for AWS Route 53

• DNS-SD cont’d.
• SRV records contain hostname and port 
(i.e., hostname of the NAT gateway and public C* port)
• TXT records contain key=value pairs 
(useful for additional connection & conﬁg details)
• Modify C* connection code to lookup foreign node port from DNS
• Modify client driver connection code to lookup ports from DNS
• Can be queried & updated out-of-band 
(updated by the NAT device or central management server which knows which ports were mapped)

Advertised Details
• Each cluster is it’s own browse domain
• Each NAT gateway device has an A record in the browse domain
• Each DNS-SD service is named based on the private IP address
• Requires unique private IP addresses across data centres
• SRV port is the C* thrift port
• Additional ports are advertise via TXT

Configuration
• Cassandra is configured to only use private addresses
• On cluster creation
• Establish a new DNS-SD browse domain
• Create A records for each gateway device
• NAT gateway device is notified when a new C* node is started
• Allocates random public ports for C* and configures Port Forwarding
• Updates DNS-SD
• New SRV and TXT record

$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Browsing for _cassandra._tcp
A/R Flags if Domain Service Type Instance Name
Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4
$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-
us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0)
version=2.0.7
cqlport=1237
$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.
Non-authoritative answer:
Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au
Address: 54.209.123.195
Output of dns-sd 
(Can also use avahi-browse, dig, or any other DNS query tool)

Java Driver Modiﬁcations
• This is usually a no-op 
(the default is IdentityTranslater)
• Modify translate() to perform a DNS-SD lookup.
• The address parameter is a node private IP address.
• Locate a service with a name = private IP address to determine
public IP/port.
public interface AddressTranslater {
public InetSocketAddress translate(InetSocketAddress address);
}

Modifying Cassandra
• Responsible for managing Socket connections.
• Modify newSocket() to perform a DNS-SD lookup.
• The endpoint parameter is a node private IP address.
• Locate a service with a name = private IP address to determine
public IP/port
public class OutboundTcpConnectionPool
{
⋮
public static Socket newSocket(InetAddress endpoint) throws IOException {…} 
⋮
}

C* C*
C*
C* C*
C*
NAT Gateway NAT Gateway
DNS (+ DNS-SD) Server 
(Route 53, Self-hosted, etc)Client
Application

Thanks!
Questions?
adam@instaclustr.com

Multi-Region Cassandra Clusters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Multi-Region Cassandra Clusters

Similar to Multi-Region Cassandra Clusters (20)

More from Instaclustr

More from Instaclustr (20)

Recently uploaded

Recently uploaded (20)

Multi-Region Cassandra Clusters