SlideShare a Scribd company logo
on
Apache Druid
on Kubernetes
Apache Druid Database Overview
Kubernetes & Helm Charts
Apache Druid’s Helm Chart
Overview
Scaling Up and Down
Auto-Scaling Ingestion
What it Doesn’t (yet) Do
It is a database that is:
Fully scalable
Batch and real-time data
Ad-hoc statistical queries
Low latency delivery
What is Apache Druid?
log search
real-time ingest
flexible schema
text search
Fully scalable
Batch and real-time data
Ad-hoc statistical queries
Low latency delivery
log search
real-time ingest
flexible schema
text search
timeseries
low latency ingest
time-based storage
time functions
Fully scalable
Batch and real-time data
Ad-hoc statistical queries
Low latency delivery
columnar
efficient storage
fast analytic queries
data distribution
log search
real-time ingest
flexible schema
text search
timeseries
low latency ingest
time-based storage
time functions
Fully scalable
Batch and real-time data
Ad-hoc statistical queries
Low latency delivery
columnar
efficient storage
fast analytic queries
data distribution
log search
real-time ingest
flexible schema
text search
timeseries
low latency ingest
time-based storage
time functions
High Performance
Real-time Analytics
Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
The Architecture
The Druid Architecture
Overview & High Availability
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
Deep Storage:
- HDFS
- S3, GCP, Azure
- local ( test only)
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
The Druid Architecture
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
The Druid Architecture
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
REST API
The Druid Architecture
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
Streaming
data
Batch
data
The Druid Architecture
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
middle-
manager
middle-
manager
middle-
manager
middle-
manager
Streaming
data
Batch
data
Streaming
data
Streaming
data
Deep Storage:
The Druid Architecture
Data Management Processing
Query Services Data Services Master Services
broker
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
Streaming
data
Batch
data
Deep Storage:
The Druid Architecture
Query Processing
Query Services Data Services Master Services
broker
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
Streaming
data
Batch
data
middle-
manager
middle-
manager
REST API
A Little History first… in their own words
Kubernetes
Kubernetes Cluster
High Level Functions
Kubernetes
Control Plane
➔ Acquire/manage Nodes and
Storage
➔ Accept new object requests
➔ Schedule and manage
containers on Nodes
➔ Instantiate containers for
object deployment
➔ Monitor object state
➔ Apply application policies
◆ Restart policy
◆ Upgrade
◆ Fault tolerance
namespace my-dev
Dev Node
Operating System
Container Runtime
Container
zookeeper 2.1.4
zookeeper
Container
druid 0.22.1
coordinator
Container
druid 0.22.1
historical
Container
druid 0.22.1
overlord
Container
druid 0.22.1
broker
Container
druid 0.22.1
router
Container
druid 0.22.1
middle-manager
Container
postgresql 8.6.4
postgresql
namespace qa-test
Master Node
Operating System
Container Runtime
Container
zookeeper 2.1.4
zookeeper
Container
druid 0.22.1
coordinator
Container
postgresql 8.6.4
postgresql
Master
Operating System
Container Runtime
Container
druid 0.22.1
overlord
Master Node
Operating System
Container Runtime
Container
zookeeper 2.1.4
zookeeper
Container
druid 0.22.1
coordinator
Master
Operating System
Container Runtime
Container
zookeeper 2.1.4
zookeeper
Container
druid 0.22.1
overlord
Container
druid 0.22.1
broker
Query Node
Operating System
Container Runtime
Container
druid 0.22.1
router
Container
druid 0.22.1
broker
Query
Operating System
Container Runtime
Container
druid 0.22.1
router
Data Node
Operating System
Container Runtime
Container
druid 0.22.1
historical
Data
Operating System
Container Runtime
Container
druid 0.22.1
historical
Data Node
Operating System
Container Runtime
Container
druid 0.22.1
middle-manager
Realtime
Operating System
Container Runtime
Container
druid 0.22.1
middle-manager
Kubernetes provides Orchestration at Scale
● High Availability -
○ Recovery - Actively monitors and restarts pods if appropriate
○ AntiAffinity - Insures no single point of failure by placing services on separate nodes
○ Persistent storage enables fast Historical recovery
● Scalability
○ Manage individual components’ scale by changing one property
○ Autoscaling based on resource utilization
● Security -
○ Encryption
○ Ingress control & network Isolation
● Upgrades -
○ Roll out changes automatically and with controlled disruption
Why Apache Druid on Kubernetes
In general, it is a set of templates that describe Kubernetes objects that, in turn, provide
services & applications.
Apache Druid ® helm chart @ https://github.com/apache/druid/tree/master/helm/druid
- Dependencies - zookeeper, postgresql or mysql
- Templates for each microservice (historical, broker, middlemanager, etc.)
- Default values.yaml - these are the parameters for an installation.
Users override values to create different deployments with their own values.yaml:
A Parameterization of Complex Deployments
Helm Charts
historical:
replicaCount: 10 # scale of historical data
middleManager:
replicaCount: 6 # scale of real-time ingestion
Template Objects
Apache Druid Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
● Deployment - manages a set of stateless pods and
keeps them running
● Ingress - outside access
● Service - Logical persistent network access, HTTP(S)
port
Template Objects
Apache Druid Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
● StatefulSet - local files hold intermediate ingestion
files, so stateful helps jobs pick up where they left off..
● PodDisruptionBudget - determines how many pods
can be offline at a time -> upgrades
● Service - Logical persistent access, HTTP(S) port
● Horizontal Pod Autoscaler - controls autoscaling
Template Objects
Apache Druid Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
● StatefulSet - persistent storage is extremely important
at recovery time => very fast recovery.
● PodDisruptionBudget - determines how many pods
can be offline at a time -> upgrades
● Service - Logical persistent access, HTTP(S) port
● No HPA - Autoscaling is not a good idea here.
A very simple example
How to Use - Druid Helm Chart
An example helps, so if we deploy vanilla:
> git clone https://github.com/apache/druid
> cd druid
> helm dependency update helm/druid
> helm install helm/druid a_druid -n a_space --create-namespace
> kubectl get pods -n a_space
NAME READY STATUS RESTARTS AGE
druid-broker-744c5f46b7-5l4r7 1/1 Running 0 8m12s
druid-coordinator-7c79f9c6c9-9wlqk 1/1 Running 0 8m12s
druid-historical-0 1/1 Running 0 8m12s
druid-middle-manager-0 1/1 Running 0 8m12s
druid-postgresql-0 1/1 Running 0 8m12s
druid-router-84d7cc6d87-w546r 1/1 Running 0 8m12s
druid-zookeeper-0 1/1 Running 0 8m12s
druid-zookeeper-1 1/1 Running 0 7m41s
druid-zookeeper-2 1/1 Running 0 7m10s
A very simple example
How to Use - Druid Helm Chart
Create a change file like values_2_historicals.yaml:
historical:
replicaCount: 2 # scale of historical data
Best Practice ( requires helm diff add-on ) :
> helm diff upgrade -C 2 a_druid helm/druid -n a_space -f values_2_historicals.yaml
reading three way merge from env
default, druid-historical, StatefulSet (apps) has changed:
...
spec:
serviceName: druid-historical
- replicas: 1
+ replicas: 2
selector:
matchLabels:
A very simple example
How to Use - Druid Helm Chart
Apply the change:
> helm upgrade helm/druid a_druid -n a_space -f values_2_historicals.yaml
> kubectl get pods -n a_space
NAME READY STATUS RESTARTS AGE
druid-broker-744c5f46b7-5l4r7 1/1 Running 0 13m
druid-coordinator-7c79f9c6c9-9wlqk 1/1 Running 0 13m
druid-historical-0 1/1 Running 0 13m
druid-historical-1 0/1 Running 0 23s
druid-middle-manager-0 1/1 Running 0 13m
druid-postgresql-0 1/1 Running 0 13m
druid-router-84d7cc6d87-w546r 1/1 Running 0 13m
druid-zookeeper-0 1/1 Running 0 13m
druid-zookeeper-1 1/1 Running 0 13m
druid-zookeeper-2 1/1 Running 0 12m
Configuration with Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
Deep Storage:
s3
local
hdfs
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
Metadata DB:
postgresql
mysql
my_values.yaml:
configVars:
druid_storage_type
(
See also more properties @
https://druid.apache.org/doc
s/latest/configuration/index.
html#deep-storage
)
Configuration with Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
Deep Storage:
s3
local
hdfs
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
Metadata DB:
postgresql
mysql
my_values.yaml:
configVars:
druid_metadata_storage_type
…connector_connectURI
…connector_user
…connector_password
my_values.yaml:
configVars:
druid_storage_type
(
See also more properties @
https://druid.apache.org/doc
s/latest/configuration/index.
html#deep-storage
)
Configuration with Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
broker
broker
Deep Storage:
s3
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
middle-
manager
historical
middle-
manager
historical
Metadata DB:
postgresql
my_values.yaml:
<service>:
resources:
requests:
cpu: 250m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
(
A great resource to determine good values is
@https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html
)
Data Ingestion and Helm Chart
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
REST API
my_values.yaml:
router:
replicaCount: 2
ingress:
enabled: True
my_values.yaml:
overlord:
replicaCount: 2
coordinator:
replicaCount: 2
Data Ingestion and Helm Chart
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
REST API
my_values.yaml:
middleManager:
replicaCount: 2
antiaffinity
nodeSelector
config:
druid_indexer_runn…
druid_indexer_fork…
Highly Available Data Ingestion
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
Streaming
data
Batch
data
kafka_ingestion.json:
{
…
“ioConfig”:{
“taskCount”: 2,
“replicas”: 2,
“taskDuration”:”PT1H”
}
}
Data Ingestion and Helm Chart
Data Ingestion Processing
Query Services Data Services Master Services
broker
historical
historical
middle-
manager
historical
broker
broker
Deep Storage:
router overlord
overlord
coordinator
coordinator
historical
historical
middle-
manager
historical
middle-
manager
middle-
manager
middle-
manager
middle-
manager
Streaming
data
Batch
data
Streaming
data
Streaming
data
my_values.yaml:
middleManager:
replicaCount: 6
my_values.yaml:
middleManager:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 6
metrics:
memory and cpu
thresholds
Deep Storage:
Historicals and Helm Chart
Data Management Processing
Query Services Data Services Master Services
broker
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
Streaming
data
Batch
data
my_values.yaml:
historical:
replicaCount: 2
antiaffinity
nodeSelector
Deep Storage:
Query Processing & Helm Chart
Query Services Data Services Master Services
broker
middle-
manager
historical
broker
broker
router overlord
overlord
coordinator
coordinator
middle-
manager
historical
Streaming
data
Batch
data
middle-
manager
middle-
manager
REST API
my_values.yaml:
broker:
replicaCount: 3
antiaffinity
nodeSelector
Summarizing
● Apache Druid is a real-time OLAP database
● Kubernetes makes deploying and managing the database easier
○ Increased availability (monitored, auto-recovered, persistent)
○ Better RTO and RPO
○ Autoscaled components for ingestion and real-time query
● helm install makes it easy to deploy many different configurations:
○ Create and manage different values.yaml for each config:
■ dev-min-cluster.yaml
■ qa-ha-cluster.yaml
■ prod-ha-cluster-autoscaling.yaml
● Changes to the configs can be applied live
■ helm diff and helm upgrade
● Not just scaling
● Rolling upgrades too
What can you do to help
What doesn’t it do?
● Metrics configuration - enable metrics collection and display
○ Metrics are part of Apache Druid
○ Metric-emitters have been contributed by the community
■ Influxdb-metrics-emitter, prometheus-emitter,
kafka-emitter… and many more
○ Helm chart could use a set of options to turn on metrics and
enable specific emitters.
● Multi-tier configurations are not yet enabled
○ Apache Druid support multiple temperature levels, i.e.
■ High speed SSDs vs High volume HDDs
○ Helm chart could use a dynamic tier configuration mechanism
● The Apache Druid Community :
○ You are invited!
○ Fork the repo at https://github.com/apache/druid
○ Make your changes
○ Submit a PR!
ASF Slack
#druid
Google Groups
https://groups.google.com/forum/#!forum/druid-user
Druid Meetups
https://www.meetup.com/pro/apache-druid/
Druid News & Info
@druidio #apachedruid @implydata
Druid Professionals Group
https://www.linkedin.com/groups/8791983/
Druid User Forum by Imply
https://www.druidforum.org
Imply Community Team
community@imply.io
&
Imply Training Program
https://learn.imply.io
Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
Thank you

More Related Content

Similar to Dok Talks #124 - Intro to Druid on Kubernetes

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
Hao Chen
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
DataWorks Summit/Hadoop Summit
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
Matteo Moretti
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
MariaDB plc
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
Nathan Handler
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Ovadiah Myrgorod
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
Puppet
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
Kai Zhao
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
Davinder Kohli
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
Chiou-Nan Chen
 
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS SummitAutomatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Amazon Web Services
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
Michael Hudak
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
DataWorks Summit
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
spinningmatt
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
Amazon Web Services
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
DoiT International
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
Brian Brazil
 

Similar to Dok Talks #124 - Intro to Druid on Kubernetes (20)

Apache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San JoseApache Eagle at Hadoop Summit 2016 San Jose
Apache Eagle at Hadoop Summit 2016 San Jose
 
Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
MariaDB on Docker
MariaDB on DockerMariaDB on Docker
MariaDB on Docker
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
HadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop OverviewHadoopCon- Trend Micro SPN Hadoop Overview
HadoopCon- Trend Micro SPN Hadoop Overview
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Private Cloud with Open Stack, Docker
Private Cloud with Open Stack, DockerPrivate Cloud with Open Stack, Docker
Private Cloud with Open Stack, Docker
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS SummitAutomatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
Automatically scaling your Kubernetes workloads - SVC201-S - Chicago AWS Summit
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Orchestrating Redis & K8s Operators
Orchestrating Redis & K8s OperatorsOrchestrating Redis & K8s Operators
Orchestrating Redis & K8s Operators
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 

More from DoKC

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
DoKC
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
DoKC
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
DoKC
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
DoKC
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
DoKC
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
DoKC
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
DoKC
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
DoKC
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
DoKC
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
DoKC
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
DoKC
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
DoKC
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
DoKC
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
DoKC
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
DoKC
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
DoKC
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
DoKC
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
DoKC
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
DoKC
 

More from DoKC (20)

Distributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and HowDistributed Vector Databases - What, Why, and How
Distributed Vector Databases - What, Why, and How
 
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes OperatorsIs It Safe? Security Hardening for Databases Using Kubernetes Operators
Is It Safe? Security Hardening for Databases Using Kubernetes Operators
 
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster RecoveryStop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
Stop Worrying and Keep Querying, Using Automated Multi-Region Disaster Recovery
 
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
Transforming Data Processing with Kubernetes: Journey Towards a Self-Serve Da...
 
The State of Stateful on Kubernetes
The State of Stateful on KubernetesThe State of Stateful on Kubernetes
The State of Stateful on Kubernetes
 
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
Colocating Data Workloads and Web Services on Kubernetes to Improve Resource ...
 
Make Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-ReadyMake Your Kafka Cluster Production-Ready
Make Your Kafka Cluster Production-Ready
 
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
Dynamic Large Scale Spark on Kubernetes: Empowering the Community with Argo W...
 
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the CloudRun PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
Run PostgreSQL in Warp Speed Using NVMe/TCP in the Cloud
 
The Kubernetes Native Database
The Kubernetes Native DatabaseThe Kubernetes Native Database
The Kubernetes Native Database
 
ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023ING Data Services hosted on ICHP DoK Amsterdam 2023
ING Data Services hosted on ICHP DoK Amsterdam 2023
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154StatefulSets in K8s - DoK Talks #154
StatefulSets in K8s - DoK Talks #154
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151Analytics with Apache Superset and ClickHouse - DoK Talks #151
Analytics with Apache Superset and ClickHouse - DoK Talks #151
 
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
Overcoming challenges with protecting and migrating data in multi-cloud K8s e...
 
Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147Evaluating Cloud Native Storage Vendors - DoK Talks #147
Evaluating Cloud Native Storage Vendors - DoK Talks #147
 
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
Kubernetes Cluster Upgrade Strategies and Data: Best Practices for your State...
 
We will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8sWe will Dok You! - The journey to adopt stateful workloads on k8s
We will Dok You! - The journey to adopt stateful workloads on k8s
 
Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators Mastering MongoDB on Kubernetes, the power of operators
Mastering MongoDB on Kubernetes, the power of operators
 

Recently uploaded

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
devvsandy
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 

Recently uploaded (20)

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
Top 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptxTop 9 Trends in Cybersecurity for 2024.pptx
Top 9 Trends in Cybersecurity for 2024.pptx
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 

Dok Talks #124 - Intro to Druid on Kubernetes

  • 1. on
  • 2. Apache Druid on Kubernetes Apache Druid Database Overview Kubernetes & Helm Charts Apache Druid’s Helm Chart Overview Scaling Up and Down Auto-Scaling Ingestion What it Doesn’t (yet) Do
  • 3. It is a database that is: Fully scalable Batch and real-time data Ad-hoc statistical queries Low latency delivery What is Apache Druid?
  • 4. log search real-time ingest flexible schema text search Fully scalable Batch and real-time data Ad-hoc statistical queries Low latency delivery
  • 5. log search real-time ingest flexible schema text search timeseries low latency ingest time-based storage time functions Fully scalable Batch and real-time data Ad-hoc statistical queries Low latency delivery
  • 6. columnar efficient storage fast analytic queries data distribution log search real-time ingest flexible schema text search timeseries low latency ingest time-based storage time functions Fully scalable Batch and real-time data Ad-hoc statistical queries Low latency delivery
  • 7. columnar efficient storage fast analytic queries data distribution log search real-time ingest flexible schema text search timeseries low latency ingest time-based storage time functions High Performance Real-time Analytics
  • 8. Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. The Architecture
  • 9. The Druid Architecture Overview & High Availability Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker Deep Storage: - HDFS - S3, GCP, Azure - local ( test only) router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical
  • 10. The Druid Architecture Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical
  • 11. The Druid Architecture Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical REST API
  • 12. The Druid Architecture Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical Streaming data Batch data
  • 13. The Druid Architecture Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical middle- manager middle- manager middle- manager middle- manager Streaming data Batch data Streaming data Streaming data
  • 14. Deep Storage: The Druid Architecture Data Management Processing Query Services Data Services Master Services broker middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical Streaming data Batch data
  • 15. Deep Storage: The Druid Architecture Query Processing Query Services Data Services Master Services broker middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical Streaming data Batch data middle- manager middle- manager REST API
  • 16.
  • 17. A Little History first… in their own words Kubernetes
  • 18. Kubernetes Cluster High Level Functions Kubernetes Control Plane ➔ Acquire/manage Nodes and Storage ➔ Accept new object requests ➔ Schedule and manage containers on Nodes ➔ Instantiate containers for object deployment ➔ Monitor object state ➔ Apply application policies ◆ Restart policy ◆ Upgrade ◆ Fault tolerance namespace my-dev Dev Node Operating System Container Runtime Container zookeeper 2.1.4 zookeeper Container druid 0.22.1 coordinator Container druid 0.22.1 historical Container druid 0.22.1 overlord Container druid 0.22.1 broker Container druid 0.22.1 router Container druid 0.22.1 middle-manager Container postgresql 8.6.4 postgresql namespace qa-test Master Node Operating System Container Runtime Container zookeeper 2.1.4 zookeeper Container druid 0.22.1 coordinator Container postgresql 8.6.4 postgresql Master Operating System Container Runtime Container druid 0.22.1 overlord Master Node Operating System Container Runtime Container zookeeper 2.1.4 zookeeper Container druid 0.22.1 coordinator Master Operating System Container Runtime Container zookeeper 2.1.4 zookeeper Container druid 0.22.1 overlord Container druid 0.22.1 broker Query Node Operating System Container Runtime Container druid 0.22.1 router Container druid 0.22.1 broker Query Operating System Container Runtime Container druid 0.22.1 router Data Node Operating System Container Runtime Container druid 0.22.1 historical Data Operating System Container Runtime Container druid 0.22.1 historical Data Node Operating System Container Runtime Container druid 0.22.1 middle-manager Realtime Operating System Container Runtime Container druid 0.22.1 middle-manager
  • 19. Kubernetes provides Orchestration at Scale ● High Availability - ○ Recovery - Actively monitors and restarts pods if appropriate ○ AntiAffinity - Insures no single point of failure by placing services on separate nodes ○ Persistent storage enables fast Historical recovery ● Scalability ○ Manage individual components’ scale by changing one property ○ Autoscaling based on resource utilization ● Security - ○ Encryption ○ Ingress control & network Isolation ● Upgrades - ○ Roll out changes automatically and with controlled disruption Why Apache Druid on Kubernetes
  • 20. In general, it is a set of templates that describe Kubernetes objects that, in turn, provide services & applications. Apache Druid ® helm chart @ https://github.com/apache/druid/tree/master/helm/druid - Dependencies - zookeeper, postgresql or mysql - Templates for each microservice (historical, broker, middlemanager, etc.) - Default values.yaml - these are the parameters for an installation. Users override values to create different deployments with their own values.yaml: A Parameterization of Complex Deployments Helm Charts historical: replicaCount: 10 # scale of historical data middleManager: replicaCount: 6 # scale of real-time ingestion
  • 21. Template Objects Apache Druid Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical ● Deployment - manages a set of stateless pods and keeps them running ● Ingress - outside access ● Service - Logical persistent network access, HTTP(S) port
  • 22. Template Objects Apache Druid Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical ● StatefulSet - local files hold intermediate ingestion files, so stateful helps jobs pick up where they left off.. ● PodDisruptionBudget - determines how many pods can be offline at a time -> upgrades ● Service - Logical persistent access, HTTP(S) port ● Horizontal Pod Autoscaler - controls autoscaling
  • 23. Template Objects Apache Druid Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical ● StatefulSet - persistent storage is extremely important at recovery time => very fast recovery. ● PodDisruptionBudget - determines how many pods can be offline at a time -> upgrades ● Service - Logical persistent access, HTTP(S) port ● No HPA - Autoscaling is not a good idea here.
  • 24. A very simple example How to Use - Druid Helm Chart An example helps, so if we deploy vanilla: > git clone https://github.com/apache/druid > cd druid > helm dependency update helm/druid > helm install helm/druid a_druid -n a_space --create-namespace > kubectl get pods -n a_space NAME READY STATUS RESTARTS AGE druid-broker-744c5f46b7-5l4r7 1/1 Running 0 8m12s druid-coordinator-7c79f9c6c9-9wlqk 1/1 Running 0 8m12s druid-historical-0 1/1 Running 0 8m12s druid-middle-manager-0 1/1 Running 0 8m12s druid-postgresql-0 1/1 Running 0 8m12s druid-router-84d7cc6d87-w546r 1/1 Running 0 8m12s druid-zookeeper-0 1/1 Running 0 8m12s druid-zookeeper-1 1/1 Running 0 7m41s druid-zookeeper-2 1/1 Running 0 7m10s
  • 25. A very simple example How to Use - Druid Helm Chart Create a change file like values_2_historicals.yaml: historical: replicaCount: 2 # scale of historical data Best Practice ( requires helm diff add-on ) : > helm diff upgrade -C 2 a_druid helm/druid -n a_space -f values_2_historicals.yaml reading three way merge from env default, druid-historical, StatefulSet (apps) has changed: ... spec: serviceName: druid-historical - replicas: 1 + replicas: 2 selector: matchLabels:
  • 26. A very simple example How to Use - Druid Helm Chart Apply the change: > helm upgrade helm/druid a_druid -n a_space -f values_2_historicals.yaml > kubectl get pods -n a_space NAME READY STATUS RESTARTS AGE druid-broker-744c5f46b7-5l4r7 1/1 Running 0 13m druid-coordinator-7c79f9c6c9-9wlqk 1/1 Running 0 13m druid-historical-0 1/1 Running 0 13m druid-historical-1 0/1 Running 0 23s druid-middle-manager-0 1/1 Running 0 13m druid-postgresql-0 1/1 Running 0 13m druid-router-84d7cc6d87-w546r 1/1 Running 0 13m druid-zookeeper-0 1/1 Running 0 13m druid-zookeeper-1 1/1 Running 0 13m druid-zookeeper-2 1/1 Running 0 12m
  • 27. Configuration with Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker Deep Storage: s3 local hdfs router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical Metadata DB: postgresql mysql my_values.yaml: configVars: druid_storage_type ( See also more properties @ https://druid.apache.org/doc s/latest/configuration/index. html#deep-storage )
  • 28. Configuration with Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker Deep Storage: s3 local hdfs router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical Metadata DB: postgresql mysql my_values.yaml: configVars: druid_metadata_storage_type …connector_connectURI …connector_user …connector_password my_values.yaml: configVars: druid_storage_type ( See also more properties @ https://druid.apache.org/doc s/latest/configuration/index. html#deep-storage )
  • 29. Configuration with Helm Chart Query Services Data Services Master Services broker middle- manager historical middle- manager historical middle- manager historical broker broker Deep Storage: s3 router overlord overlord coordinator coordinator middle- manager historical middle- manager historical middle- manager historical Metadata DB: postgresql my_values.yaml: <service>: resources: requests: cpu: 250m memory: 1Gi limits: cpu: 1000m memory: 2Gi ( A great resource to determine good values is @https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html )
  • 30. Data Ingestion and Helm Chart Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical REST API my_values.yaml: router: replicaCount: 2 ingress: enabled: True my_values.yaml: overlord: replicaCount: 2 coordinator: replicaCount: 2
  • 31. Data Ingestion and Helm Chart Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical REST API my_values.yaml: middleManager: replicaCount: 2 antiaffinity nodeSelector config: druid_indexer_runn… druid_indexer_fork…
  • 32. Highly Available Data Ingestion Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical Streaming data Batch data kafka_ingestion.json: { … “ioConfig”:{ “taskCount”: 2, “replicas”: 2, “taskDuration”:”PT1H” } }
  • 33. Data Ingestion and Helm Chart Data Ingestion Processing Query Services Data Services Master Services broker historical historical middle- manager historical broker broker Deep Storage: router overlord overlord coordinator coordinator historical historical middle- manager historical middle- manager middle- manager middle- manager middle- manager Streaming data Batch data Streaming data Streaming data my_values.yaml: middleManager: replicaCount: 6 my_values.yaml: middleManager: autoscaling: enabled: true minReplicas: 2 maxReplicas: 6 metrics: memory and cpu thresholds
  • 34. Deep Storage: Historicals and Helm Chart Data Management Processing Query Services Data Services Master Services broker middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical Streaming data Batch data my_values.yaml: historical: replicaCount: 2 antiaffinity nodeSelector
  • 35. Deep Storage: Query Processing & Helm Chart Query Services Data Services Master Services broker middle- manager historical broker broker router overlord overlord coordinator coordinator middle- manager historical Streaming data Batch data middle- manager middle- manager REST API my_values.yaml: broker: replicaCount: 3 antiaffinity nodeSelector
  • 36. Summarizing ● Apache Druid is a real-time OLAP database ● Kubernetes makes deploying and managing the database easier ○ Increased availability (monitored, auto-recovered, persistent) ○ Better RTO and RPO ○ Autoscaled components for ingestion and real-time query ● helm install makes it easy to deploy many different configurations: ○ Create and manage different values.yaml for each config: ■ dev-min-cluster.yaml ■ qa-ha-cluster.yaml ■ prod-ha-cluster-autoscaling.yaml ● Changes to the configs can be applied live ■ helm diff and helm upgrade ● Not just scaling ● Rolling upgrades too
  • 37. What can you do to help What doesn’t it do? ● Metrics configuration - enable metrics collection and display ○ Metrics are part of Apache Druid ○ Metric-emitters have been contributed by the community ■ Influxdb-metrics-emitter, prometheus-emitter, kafka-emitter… and many more ○ Helm chart could use a set of options to turn on metrics and enable specific emitters. ● Multi-tier configurations are not yet enabled ○ Apache Druid support multiple temperature levels, i.e. ■ High speed SSDs vs High volume HDDs ○ Helm chart could use a dynamic tier configuration mechanism ● The Apache Druid Community : ○ You are invited! ○ Fork the repo at https://github.com/apache/druid ○ Make your changes ○ Submit a PR!
  • 38. ASF Slack #druid Google Groups https://groups.google.com/forum/#!forum/druid-user Druid Meetups https://www.meetup.com/pro/apache-druid/ Druid News & Info @druidio #apachedruid @implydata Druid Professionals Group https://www.linkedin.com/groups/8791983/ Druid User Forum by Imply https://www.druidforum.org Imply Community Team community@imply.io & Imply Training Program https://learn.imply.io
  • 39. Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries. Thank you