SlideShare a Scribd company logo
1 of 24
Download to read offline
Leveraging Docker and CoreOS
to provide always available Cassandra at
Instaclustr
Adam Zegelin
Founding Software Engineer & Co-founder of Instaclustr
adam@instaclustr.com ∙ @zegelin
Instaclustr
• Managed Apache Cassandra and DataStax Enterprise in the ☁ (AWS, Azure,
GCP, SoftLayer)
• Self-service dashboard — create, manage & monitor clusters
• Grew from a need for Cassandra in a project
• No one on the market that offered what we wanted.
• One service existed, but ran C* behind a HTTP/JSON API — SLOW!
• Stopped the project, turned the ship around and sailed in a different
direction
Ubuntu — The Early Years
• Initially we ran a custom Ubuntu AMI (Amazon Machine Image)
• Based on stock Ubuntu AMI
• Custom cloud-init scripts — RAID disks, fetch config, etc.
• Cassandra installed with apt-get install cassandra / dse
AWS
• We use instance storage backed AWS instances
• Instance storage is fast (SSDs) and low latency (local disk) but is volatile —
terminate the machine and it’s gone!
• The alternative, EBS (Elastic Block Storage) is basically SAN — slow, higher
latency and shares instance network bandwidth
• Only way to change AMIs is to start a new machine
• Not possible to use immutable images with persistent ephemeral data
• Only feasible solution for updates is apt-get install
• One of the first “Docker Operating Systems”
• Small and minimalist — not much userland (not even man — gah!)
• Other useful software — etcd, fleet, etc.

(we currently don’t use them — but in the future)
• In-use by some big players (Rackspace, PlayStation, Instaclustr 😀)
• Recent funding from Google Ventures
• Available on GCP (Google Cloud Platform) — oddly, Ubuntu wasn’t (huh?)
• Runs systemd (vs. Ubuntu’s at-the-time upstart) & dbus — more on this later
CoreOS
• CoreOS is responsible for building images for AWS, Azure,
GCP, etc. — one less step in our build process
• In-place updates and rollback on failure
• 2 system partitions, USR-A and USR-B
• One is flagged active, other is inactive
• Updates are installed to inactive partition and active flags
swapped
• Failed updates rolled back by swapping the active flag
CoreOS cont’d
Docker
• Container runtime + standardised image distribution & hosting + ecosystem
• Private image hosting options available, such as quay.io
• Immutable images — Yay! 🎉
• Images running in dev, test and production environments are equal
• Software installs, upgrades and uninstalls are clean
• Components are isolated — potentially conflicting components (different library
versions, JVM versions, etc.) can co-exist
• Even different userland layouts (Ubuntu, Debian, CentOS, etc)
Docker + CoreOS
• Docker gives us immutable images for our components without
instance replacement
• CoreOS handles the rest (OS-level) via in-place updates
• Docker is provider agnostic
• CoreOS runs on all major cloud providers and bare-metal
• Instaclustr-managed C* can run anywhere
Integration
• Cassandra data and configuration is persistent
• Survives container restart
• Cassandra data and configuration directories mounted from host

-v /var/lib/instaclustr/etc/cassandra:/etc/cassandra …
• We containerise everything — internal services, node management and monitoring
apps, and C*
• Single, well understood, image build and deploy process — docker build & docker
push

(psst! We use script that via Makefiles — one target per image)
• Helps that all our internal apps are Java-based too
Embedded in AMI
debian:jessie
common-base
base-openjdk base-oraclejdk
instaclustr apps
cassandra-
common
apache-
cassandra
dse-cassandra
~120MB
~100MB
~300MB~100MB
~20KB
~300MB~40MB
Common/utility packages:

python, openssl, curl, bzip, etc.
DataStax OpsCenter
• 1 instance per cluster
• Accessible by users via our dashboard
• Segregated for security
• Hosted independently of the cluster
• 1 instance = 1 Docker container
• Multiple instances per host = cost effective
Cassandra Versioning
• We support multiple versions of Cassandra
• 2.0.x vs. 2.1.x
• Apache (ASF) vs. DataStax Enterprise
• Rollback for when new versions have serious bugs
• 1 docker image per C* distribution (ASF/DSE). 1 tag per version (e.g., 2.1.x)
• vs. distribution version × provider region

(e.g, on AWS, one C* version = 9 images, one per region)
• We currently support 2 distributions, with a total of 13 versions between them, on 3 providers, with
a total of 29 regions (each requiring a separate image)
• 13 versions × 29 provider regions = 377 images! 😳
Versioning cont’d
• Every Instaclustr cluster has a specific C* version
• Selected by user at creation time
• Version = C* version + distribution (ASF/DSE)
• New & replaced nodes run the exact same version
• Known, sane configuration on every node cluster wide
Update Rollout
• Build docker image for new Cassandra version
• Deploy to our testing environments
• Perform clean installs and rolling upgrades of test clusters to verify reliability
• Enable in production to select customers (or internal support) for field testing
• Make generally available
• New clusters will run new version by default
• Liaise with customers to perform a rolling, cluster-wide upgrade
apt-get
install 2.0.11
apt-get
install 2.0.12
apt-get
install 2.0.13
docker run
cas:2.0.10
docker run
cas:2.0.11
docker run
cas:2.0.12
docker run
cas:2.0.13
build ami
2.0.10
build ami
2.0.11
build ami
2.0.12
build ami
2.0.13
apt-get
install 2.0.12
apt-get
install 2.0.13
docker run cassandra:2.0.9
docker run cassandra:2.0.10
docker run cassandra:2.0.14
apt-get install cassandra:2.0.9
apt-get install cassandra:2.0.10
apt-get install cassandra:2.0.14

(hm, the 2.0.14 package changes a
few things, and now there is junk
left over from the 2.0.10 install
and conflicts)
rm …; vim …
😎🍺 🎉 😫
docker run cassandra:2.0.9
docker run cassandra:2.0.10

(oops, botched update)
docker run cassandra:2.0.9

(rollback!)
apt-get install cassandra:2.0.9
apt-get install cassandra:2.0.10

(hm, now C* doesn’t start)
apt-get purge cassandra

apt-get autoremove --purge

(hope this fixes everything)
apt-get install cassandra:2.0.9

(ah crap, the package doesn’t
exist any more)
😎 🍺🎉 😫
systemd
• CoreOS uses systemd for service management
• systemd supports inter-service dependencies (of course!)
• e.g. snapshotd.service requires cassandra.service
• aka, snapshotd only runs when cassandra is running
• systemd automatically restarts services
• Our services are fail-fast
• Cassandra not so much — in some cases
dbus
• RPC between applications/services
• Notifications
• Socket-based (typically UNIX sockets)
• Multiple language bindings, including Java
• systemd is controlable via dbus
• Control host systemd inside a Docker container
• No need to fork/exec to run systemctl and co.

(in-fact, systemctl is a wrapper around dbus calls)
dbus-java
systemctl restart cassandra ➫ systemdManager.RestartUnit(“cassandra.service”, “replace”)
systemd + dbus + C*
• Service status = “active” — process running, or something more?
• Cassandra java process running vs. C* accepting CQL connections
• systemd dependencies start when required units become active
• CQL clients are dependencies, but shouldn’t start until CQL is available
• Small clients could fail-fast on no connectivity
• Larger, complex clients require a reconnect loop
• Cassandra is more than just CQL — Thrift + JMX too.
systemd + dbus + C* cont’d
• Notify systemd when Cassandra is accepting CQL connections
• Has to be done from the same process — systemd restriction
• Java agent (java -javaagent:agent.jar …) is best
• Agent attempts connections to CQL port.

When successful notifies systemd via dbus
• No code modification. Works with DSE
• Timeout issues when C* bootstrap takes longer. Set TimeoutStartSec=0
systemd + dbus + C* cont’d
• Simple service cassandra-cql inserted in
dependency chain
• Simple tool that watches a port for connectivity
• Active when connection succeeds
• Exits/inactive if connection fails or drops
• Shift the Java agent logic here
• Works for multiple ports — Thrift, JMX
cassandra.service
cassandra-
cql.service
client-app.service
Thanks
Questions?
Adam Zegelin
VP of Engineering & Co-founder of Instaclustr
adam@instaclustr.com ∙ @zegelin

More Related Content

What's hot

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
Tim Mackey
 

What's hot (20)

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Cassandra on Docker
Cassandra on DockerCassandra on Docker
Cassandra on Docker
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Ceph with CloudStack
Ceph with CloudStackCeph with CloudStack
Ceph with CloudStack
 
Building clouds with apache cloudstack apache roadshow 2018
Building clouds with apache cloudstack   apache roadshow 2018Building clouds with apache cloudstack   apache roadshow 2018
Building clouds with apache cloudstack apache roadshow 2018
 
TIAD : Automating the aplication lifecycle
TIAD : Automating the aplication lifecycleTIAD : Automating the aplication lifecycle
TIAD : Automating the aplication lifecycle
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
Data Stores @ Netflix
Data Stores @ NetflixData Stores @ Netflix
Data Stores @ Netflix
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Scaling DataStax in Docker
Scaling DataStax in DockerScaling DataStax in Docker
Scaling DataStax in Docker
 
Cassandra: An Alien Technology That's not so Alien
Cassandra: An Alien Technology That's not so AlienCassandra: An Alien Technology That's not so Alien
Cassandra: An Alien Technology That's not so Alien
 
Wido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStackWido den Hollander - building highly available cloud with Ceph and CloudStack
Wido den Hollander - building highly available cloud with Ceph and CloudStack
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
Percona Live 4/14/15: Leveraging open stack cinder for peak application perfo...
 
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
CloudStack Day Japan 2015 - Hypervisor Selection in CloudStack 4.5
 

Similar to Leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr

Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
javaonfly
 
PDXPortland - Dockerize Django
PDXPortland - Dockerize DjangoPDXPortland - Dockerize Django
PDXPortland - Dockerize Django
Hannes Hapke
 

Similar to Leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr (20)

DataStax: Dockerizing Cassandra on Modern Linux
DataStax: Dockerizing Cassandra on Modern LinuxDataStax: Dockerizing Cassandra on Modern Linux
DataStax: Dockerizing Cassandra on Modern Linux
 
Cassandra and docker
Cassandra and dockerCassandra and docker
Cassandra and docker
 
Best Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker ContainersBest Practices for Running Kafka on Docker Containers
Best Practices for Running Kafka on Docker Containers
 
Cassandra and Docker Lessons Learned
Cassandra and Docker Lessons LearnedCassandra and Docker Lessons Learned
Cassandra and Docker Lessons Learned
 
Docker and kubernetes
Docker and kubernetesDocker and kubernetes
Docker and kubernetes
 
Docker - Portable Deployment
Docker - Portable DeploymentDocker - Portable Deployment
Docker - Portable Deployment
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at Scale
 
PDXPortland - Dockerize Django
PDXPortland - Dockerize DjangoPDXPortland - Dockerize Django
PDXPortland - Dockerize Django
 
Docker and Puppet for Continuous Integration
Docker and Puppet for Continuous IntegrationDocker and Puppet for Continuous Integration
Docker and Puppet for Continuous Integration
 
Docker-Intro
Docker-IntroDocker-Intro
Docker-Intro
 
Azure: Docker Container orchestration, PaaS ( Service Farbic ) and High avail...
Azure: Docker Container orchestration, PaaS ( Service Farbic ) and High avail...Azure: Docker Container orchestration, PaaS ( Service Farbic ) and High avail...
Azure: Docker Container orchestration, PaaS ( Service Farbic ) and High avail...
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
Docker y azure container service
Docker y azure container serviceDocker y azure container service
Docker y azure container service
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
Containerization with Azure
Containerization with AzureContainerization with Azure
Containerization with Azure
 
Docker & Daily DevOps
Docker & Daily DevOpsDocker & Daily DevOps
Docker & Daily DevOps
 
Docker and-daily-devops
Docker and-daily-devopsDocker and-daily-devops
Docker and-daily-devops
 
Containerization with Microsoft Azure
Containerization with Microsoft AzureContainerization with Microsoft Azure
Containerization with Microsoft Azure
 
Commit to excellence - Java in containers
Commit to excellence - Java in containersCommit to excellence - Java in containers
Commit to excellence - Java in containers
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New York
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr

  • 1. Leveraging Docker and CoreOS to provide always available Cassandra at Instaclustr Adam Zegelin Founding Software Engineer & Co-founder of Instaclustr adam@instaclustr.com ∙ @zegelin
  • 2. Instaclustr • Managed Apache Cassandra and DataStax Enterprise in the ☁ (AWS, Azure, GCP, SoftLayer) • Self-service dashboard — create, manage & monitor clusters • Grew from a need for Cassandra in a project • No one on the market that offered what we wanted. • One service existed, but ran C* behind a HTTP/JSON API — SLOW! • Stopped the project, turned the ship around and sailed in a different direction
  • 3. Ubuntu — The Early Years • Initially we ran a custom Ubuntu AMI (Amazon Machine Image) • Based on stock Ubuntu AMI • Custom cloud-init scripts — RAID disks, fetch config, etc. • Cassandra installed with apt-get install cassandra / dse
  • 4. AWS • We use instance storage backed AWS instances • Instance storage is fast (SSDs) and low latency (local disk) but is volatile — terminate the machine and it’s gone! • The alternative, EBS (Elastic Block Storage) is basically SAN — slow, higher latency and shares instance network bandwidth • Only way to change AMIs is to start a new machine • Not possible to use immutable images with persistent ephemeral data • Only feasible solution for updates is apt-get install
  • 5. • One of the first “Docker Operating Systems” • Small and minimalist — not much userland (not even man — gah!) • Other useful software — etcd, fleet, etc.
 (we currently don’t use them — but in the future) • In-use by some big players (Rackspace, PlayStation, Instaclustr 😀) • Recent funding from Google Ventures • Available on GCP (Google Cloud Platform) — oddly, Ubuntu wasn’t (huh?) • Runs systemd (vs. Ubuntu’s at-the-time upstart) & dbus — more on this later CoreOS
  • 6. • CoreOS is responsible for building images for AWS, Azure, GCP, etc. — one less step in our build process • In-place updates and rollback on failure • 2 system partitions, USR-A and USR-B • One is flagged active, other is inactive • Updates are installed to inactive partition and active flags swapped • Failed updates rolled back by swapping the active flag CoreOS cont’d
  • 7. Docker • Container runtime + standardised image distribution & hosting + ecosystem • Private image hosting options available, such as quay.io • Immutable images — Yay! 🎉 • Images running in dev, test and production environments are equal • Software installs, upgrades and uninstalls are clean • Components are isolated — potentially conflicting components (different library versions, JVM versions, etc.) can co-exist • Even different userland layouts (Ubuntu, Debian, CentOS, etc)
  • 8. Docker + CoreOS • Docker gives us immutable images for our components without instance replacement • CoreOS handles the rest (OS-level) via in-place updates • Docker is provider agnostic • CoreOS runs on all major cloud providers and bare-metal • Instaclustr-managed C* can run anywhere
  • 9. Integration • Cassandra data and configuration is persistent • Survives container restart • Cassandra data and configuration directories mounted from host
 -v /var/lib/instaclustr/etc/cassandra:/etc/cassandra … • We containerise everything — internal services, node management and monitoring apps, and C* • Single, well understood, image build and deploy process — docker build & docker push
 (psst! We use script that via Makefiles — one target per image) • Helps that all our internal apps are Java-based too
  • 10. Embedded in AMI debian:jessie common-base base-openjdk base-oraclejdk instaclustr apps cassandra- common apache- cassandra dse-cassandra ~120MB ~100MB ~300MB~100MB ~20KB ~300MB~40MB Common/utility packages:
 python, openssl, curl, bzip, etc.
  • 11. DataStax OpsCenter • 1 instance per cluster • Accessible by users via our dashboard • Segregated for security • Hosted independently of the cluster • 1 instance = 1 Docker container • Multiple instances per host = cost effective
  • 12. Cassandra Versioning • We support multiple versions of Cassandra • 2.0.x vs. 2.1.x • Apache (ASF) vs. DataStax Enterprise • Rollback for when new versions have serious bugs • 1 docker image per C* distribution (ASF/DSE). 1 tag per version (e.g., 2.1.x) • vs. distribution version × provider region
 (e.g, on AWS, one C* version = 9 images, one per region) • We currently support 2 distributions, with a total of 13 versions between them, on 3 providers, with a total of 29 regions (each requiring a separate image) • 13 versions × 29 provider regions = 377 images! 😳
  • 13. Versioning cont’d • Every Instaclustr cluster has a specific C* version • Selected by user at creation time • Version = C* version + distribution (ASF/DSE) • New & replaced nodes run the exact same version • Known, sane configuration on every node cluster wide
  • 14. Update Rollout • Build docker image for new Cassandra version • Deploy to our testing environments • Perform clean installs and rolling upgrades of test clusters to verify reliability • Enable in production to select customers (or internal support) for field testing • Make generally available • New clusters will run new version by default • Liaise with customers to perform a rolling, cluster-wide upgrade
  • 15. apt-get install 2.0.11 apt-get install 2.0.12 apt-get install 2.0.13 docker run cas:2.0.10 docker run cas:2.0.11 docker run cas:2.0.12 docker run cas:2.0.13 build ami 2.0.10 build ami 2.0.11 build ami 2.0.12 build ami 2.0.13 apt-get install 2.0.12 apt-get install 2.0.13
  • 16. docker run cassandra:2.0.9 docker run cassandra:2.0.10 docker run cassandra:2.0.14 apt-get install cassandra:2.0.9 apt-get install cassandra:2.0.10 apt-get install cassandra:2.0.14
 (hm, the 2.0.14 package changes a few things, and now there is junk left over from the 2.0.10 install and conflicts) rm …; vim … 😎🍺 🎉 😫
  • 17. docker run cassandra:2.0.9 docker run cassandra:2.0.10
 (oops, botched update) docker run cassandra:2.0.9
 (rollback!) apt-get install cassandra:2.0.9 apt-get install cassandra:2.0.10
 (hm, now C* doesn’t start) apt-get purge cassandra
 apt-get autoremove --purge
 (hope this fixes everything) apt-get install cassandra:2.0.9
 (ah crap, the package doesn’t exist any more) 😎 🍺🎉 😫
  • 18. systemd • CoreOS uses systemd for service management • systemd supports inter-service dependencies (of course!) • e.g. snapshotd.service requires cassandra.service • aka, snapshotd only runs when cassandra is running • systemd automatically restarts services • Our services are fail-fast • Cassandra not so much — in some cases
  • 19. dbus • RPC between applications/services • Notifications • Socket-based (typically UNIX sockets) • Multiple language bindings, including Java • systemd is controlable via dbus • Control host systemd inside a Docker container • No need to fork/exec to run systemctl and co.
 (in-fact, systemctl is a wrapper around dbus calls)
  • 20. dbus-java systemctl restart cassandra ➫ systemdManager.RestartUnit(“cassandra.service”, “replace”)
  • 21. systemd + dbus + C* • Service status = “active” — process running, or something more? • Cassandra java process running vs. C* accepting CQL connections • systemd dependencies start when required units become active • CQL clients are dependencies, but shouldn’t start until CQL is available • Small clients could fail-fast on no connectivity • Larger, complex clients require a reconnect loop • Cassandra is more than just CQL — Thrift + JMX too.
  • 22. systemd + dbus + C* cont’d • Notify systemd when Cassandra is accepting CQL connections • Has to be done from the same process — systemd restriction • Java agent (java -javaagent:agent.jar …) is best • Agent attempts connections to CQL port.
 When successful notifies systemd via dbus • No code modification. Works with DSE • Timeout issues when C* bootstrap takes longer. Set TimeoutStartSec=0
  • 23. systemd + dbus + C* cont’d • Simple service cassandra-cql inserted in dependency chain • Simple tool that watches a port for connectivity • Active when connection succeeds • Exits/inactive if connection fails or drops • Shift the Java agent logic here • Works for multiple ports — Thrift, JMX cassandra.service cassandra- cql.service client-app.service
  • 24. Thanks Questions? Adam Zegelin VP of Engineering & Co-founder of Instaclustr adam@instaclustr.com ∙ @zegelin