At Athena Health, we are creating a new performance management application for our clients, and one of its key components is Apache Druid. Since we are deploying this new application in the cloud, we needed an automated (CI/CD) based approach to create, update and delete Druid clusters, as well as scale different node groups within the cluster based on expected load. In this talk, we will go over how we implemented this process on AWS utilizing Terraform to deploy and update clusters within minutes.
Watch video: https://imply.io/virtual-druid-summit/automating-ci-cd-for-druid-clusters-at-athena-health
Automating CI/CD for Druid Clusters at Athena Health
1. Automating CI/CD for Druid
clusters at Athena Health
April 2020
Shyam
Mudambi
Sr. Architect
Athena intelligence
Athenahealth
1
Karthik
Urs
Lead MTS
Athena intelligence
Athenahealth
Ramesh
Kempanna
Principal MTS
Athena intelligence
Athenahealth
2. ● Goals
● Druid architecture at Athena
● Why Terraform?
● Athena’s CI/CD processes
● Creating a Druid cluster
● Deployment demo
● Scale up and down example
● Conclusions and next steps
● Questions
Overview
3. ● Druid will power a new self-service analytics environment
● Key features that led us to Druid
• Low latency – sub-second response on large datasets
• Horizontally scalable – support 100's of sessions in parallel
• Standard OLAP support – rollups on ingestion
• Time series support is built-in - pros & cons
● Snowflake – Low latency/high concurrency is not its sweet spot
● Cassandra – High dimensionality with many different query patterns
Druid at Athena
6. Motivation to automate
● A volatile environment as we are still in development
• A lot of build/destroy of druid clusters
● Scaling up/down clusters involved a lot of (semi) manual work
• Tuning JVM for each type of machine
• Setting up and managing file systems for data & logs
● Governance around configuration changes
• Security groups
• Machine instance changes
● Monitoring/alerting capabilities
7. ● Terraform
• Declarative - separates specification from execution
• Support - Large community support
• Multi-provider support in a single stack
• Composition – Easy to incorporate existing stacks
• Modularity - Robust module system for reusable code.
• No lag between AWS rollout and Terraform parity.
● State Management - Utility
• Buddy
Why Terraform – Pros and Cons
9. Config uploader
• tar.gz of all druid
service config
S3 bucket
Query cluster
Historical
cluster
Creating a Druid cluster
Jenkins Create
Env
PostgreSQL
(RDS)
~ 7-8mins
Lambda
Create User and DB in
PostgreSQL
2 mins
Zookeeper
Cluster
Master cluster
ALB
• Router service
MiddleManager
cluster
10. Dissection of the Druid service instance creation
CloudInit
Download OS Dependency
Packages
Based on Instance Type + Service
Disk Setup
Log Volume
Data Volume
Format Partition
Log Partition
Data Partition
Mount Partition
Log Directory
Data Directory
Bootstrapbash
script
Download Druid Binaries
Download Config files from s3
bucket
Replace Config file + Supervise
scripts
Config File update based on
Resource limits
Based on CPU cores.
Based on RAM Size.
Based on Data volume.
Install + Config Filebeat log
forwarder & Prometheus
Based on Service Name Start Multiple services via Supervise Scripts
13. 13
Time for
questions
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
14. 14
Register now for
Druid Summit
November 2-4, 2020
San Francisco, CA
druidsummit.org
Apache Druid is an independent project of The Apache Software Foundation. More information can be found at https://druid.apache.org.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
DRUID
SUMMIT