SlideShare a Scribd company logo
Don’t Forget about your past—
optimizing Apache Druid
performance with batch and real-time
Current 2022
Neil Buesing, Kinetic Edge
@nbuesing nbuesing
https://www.kineticedge.io
Goal
• Sleep as well as my dog, Katniss.
Goals
1. Technology Overview of Apache Druid and Apache Kafka
2. How to run Apache Druid and Apache Kafka locally
3. Druid Ingestion in real-time and batch
4. Query the data using Druid SQL Console
5. Con
fi
gure Apache Druid Real-Time Ingestion to make it safe to
reload historical segments
6. Real-Time and Batch Ingestion: working together
Apache Druid
Overview
Apache Druid
1. Apache Druid still uses term master (I want it to be renamed)
2. runs with coordinator, if druid.coordinator.asOverlord.enabled=true
3. peons are processes, incubating e
ff
ort indexer uses threads instead
4. postgres or MySQL
Query
broker
router
Command
coordinator
overlord
Data
middlemanager
historical
Dependencies
metadata store
zookeeper
peon(s)
Storage
1
2
3
4
• File Format
• Segmentation
• Time
• Dimensions
• Metrics
__time dimensions metrics
Apache Druid
Apache Druid
• Time
• Segment Granularity
• Query Granularity
__time dimensions metrics
__time dimensions metrics
2021-12-07 T 22:00:00 Z
2021-12-07 T 22:15:00 Z
2021-12-07 T 22:18:34.123 Z
2021-12-07 T 22:18:00.000 Z
22:15:00
22:18:00
22:15:00
22:18:00
22:09:00
22:09:00
22:05:00
22:13:00
Apache Druid
• Why Query Granularity?
• Partially
Precomputed
Aggregates
__time dimensions metrics
2021-12-07 T 22:18:34.123 Z
2021-12-07 T 22:18:00.000 Z
22:18:49
22:18:00
22:18:34 Cloud 9 1234A
Store SKU
1
COUNT QTY
Cloud 9 1234A 1
4
3
Cloud 9 1234A 2 7
With real-time ingestion precomputed
aggregates are not absolute.
select sum(count), count(count) are not the same
Apache Druid
middlemanager
Deep Storage
__time dimensions metrics __time dimensions metrics __time dimensions metrics __time dimensions metrics
__time dimensions metrics
historical
historical
middlemanager
task
task
task
task
broker
query
router
ui coordinator
overlord
metadata store
zookeeper
active
real-tim
e
segm
ents
only
Apache Druid
select
DATE_TRUNC('DAY',
_
_
time) "TIME",
storeId,
sku,
sum("count") "CNT",
sum(quantity) "QTY"
from skus
group by 1, 2, 3
order by 1 desc, 4 desc
Apache Druid - Aggregates
• Rollable
• Count
• Sum
• Min
• Max
• Unique Counts (Approximations) - super cool!
• First (String)
• Last (String)
• Non Rollable
• Mean
• First (Numeric)
• Last (Numeric)
String First & Last aggregate on rollup
store actual timestamp.
Apache Druid - Unique Counts
Apache Data Sketches : Theta : k=4
0.0 1.0
Star Trek : 0.590
Quantum Leap : 0.698
Fire
fl
y : 0.465
X-Files : 0.335
Mandalorian : 0.825
Battlestar Galactica : 0.323
4 * (1 / 0.465) = 8.6
k * (1 / theta)
Uniform Random Hash
Stranger Things : 0.238
My All-Time Favorite Druid Query
Apache Druid - Rollup Factor
select sum("count") "Logical Count",
count("count") "Physical Count",
sum("count")/(count("count")*1.0) "Rollup Factor"
from datasource
Apache Kafka
Overview
Apache Kafka
Kafka Raft
partitioner = murmur2_random
consumer B’
Apache Kafka
Kafka
Connect
(source)
Kafka
Connect
(sink)
Schema Registry
Producer
Application
Apache
Druid
(Consumer)
Kafka Streams
Application
ksqlDB
Apache Druid & Kafka
Overview
Apache
Druid Middle Manager
Apache Kafka & Druid
Apache Kafka
broker
broker
task-0
broker
a:0
a:1
a:2
a:0
a:0
a:1
a:1
a:2
a:2
druid superviser
__time dimensions metrics
__time dimensions metrics
__time dimensions metrics
task-1
task-2
assign()
metadata store
Druid Middle
Manager
Deep Storage
__time dimensions metrics
23:00:00Z
__time dimensions metrics
23:00:00Z
__time dimensions metrics
22:00:00Z
__time dimensions metrics
22:00:00Z
Apache Kafka & Druid
druid superviser
__time dimensions metrics
23:00:00Z
__time dimensions metrics
24:00:00Z
23:10
23:11
22:59
22:01
23:55
24:55
task-0
task-1
task-0
08:33
task-1
__time dimensions metrics
08:00:00Z
Druid Middle
Manager
Apache Kafka & Druid
druid
task-1
01:xx
__time dimensions metrics
01:00:00Z
02:xx
__time dimensions metrics
02:00:00Z
03:xx
__time dimensions metrics
03:00:00Z
04:xx
__time dimensions metrics
04:00:00Z
05:xx
__time dimensions metrics
05:00:00Z
__time dimensions metrics
06:00:00Z
06:xx
__time dimensions metrics
07:00:00Z
07:xx
__time dimensions metrics
08:00:00Z
08:xx
__time dimensions metrics
09:00:00Z
09:xx
__time dimensions metrics
10:00:00Z
10:xx
task
Druid Middle
Manager
Apache Kafka & Druid
druid superviser
task-1
__time dimensions metrics
01:00:00Z
__time dimensions metrics
02:00:00Z
__time dimensions metrics
03:00:00Z
__time dimensions metrics
04:00:00Z
__time dimensions metrics
05:00:00Z
__time dimensions metrics
06:00:00Z
__time dimensions metrics
07:00:00Z
__time dimensions metrics
08:00:00Z
__time dimensions metrics
09:00:00Z
__time dimensions metrics
10:00:00Z
task
a
v
o
i
d
• Fragmented Segments
• storage costs
• query performance
• compaction cost
• Open File Handles
• middle manager resources
Apache Kafka & Druid
__time dimensions metrics
01:00:00Z
__time dimensions metrics
02:00:00Z
__time dimensions metrics
03:00:00Z
__time dimensions metrics
04:00:00Z
__time dimensions metrics
05:00:00Z
__time dimensions metrics
06:00:00Z
__time dimensions metrics
07:00:00Z
__time dimensions metrics
08:00:00Z
__time dimensions metrics
09:00:00Z
__time dimensions metrics
10:00:00Z
task
Apache Superset
Overview
Apache Superset
Apache Superset
Development
A Local Environment
Kafka Local
• https://github.com/kineticedge/dev-local
• kafka
• druid
• kafka-connect
• ksqlDB
• mongo
• grafana/prometheus dashboards
• mysql
• superset
• and more
CP Images (7.2.0+) support arm64/v8 images
druid need to build your own arm64/v8 images
docker inspect image:version --format “{{.Architecture}}"
Apple Silicon?
Kafka Local Demos
• https://github.com/kineticedge/dev-local-demos
• Uses dev-local Container Based Environment
• demos with up/setup/down scripts for easy execution
• druid-late
• key-mismatch
• rdbms-cdc-nosql
• mongo-cdc
• … and more to come …
Today's Demo
Kafka Local / DEMO
cd dev-local-demo/druid-late
.
README.md
up.sh
setup.sh
druid.sh
connect.sh
producer/run.sh
Apache Kafka
Apache Druid
Kafka Connect / S3 Sink
Minio
Java Producer - Fake Data
Kafka Local / DEMO
SELECT
(case is_realtime when 1 then 'REALTIME' else 'HISTORICAL' end) "TYPE",
count(*) "COUNT"
FROM sys.segments
GROUP BY 1
Apache Druid
Real-Time
and
Batch
Apache Druid - Real-Time
Deep Storage
__time dimensions metrics __time dimensions metrics __time dimensions metrics
historical
middlemanager
task
broker
query
real-time
batch
real-time (handed-o
ff
)
Apache Druid
• reject messages earlier than period before the task was
created
• lateMessageRejectionPeriod
• e.g. PT1H
Apache Druid - Real-Time & Batch
Deep Storage
__time dimensions metrics __time dimensions metrics __time dimensions metrics
broker
lateMessageRejectionPeriod
PT1H
Append
or
Reload
historical
middlemanager
task
task
query
real-time
batch
real-time (handed-o
ff
)
Apache Druid - Batch Task
.
.
.
"pref
i
xes": [
"s3
:
/
/
sku/topics/skus/y=2022/m=09/"
],
.
.
.
"intervals": [
"2022-09-01T00
:
00
:
00/2022-10-01T00
:
00
:
00"
]
.
.
.
Apache Druid
Demonstration
Apache Druid - Real-Time & Batch
Deep Storage
__time dimensions metrics __time dimensions metrics __time dimensions metrics
broker
lateMessageRejectionPeriod
PT1H
Append
or
Reload
historical
middlemanager
task
task
query
real-time
batch
real-time (handed-o
ff
)
https://github.com/kineticedge/dev-local-demos
Demonstration
@nbuesing nbuesing
Questions
@nbuesing nbuesing
https://github.com/kineticedge
dev-local - container ecosystem
dev-local-demos - demonstrations
druid-m1 - build arm64/v8 image for your Apple Silicon
… & more …
https://www.kineticedge.io

More Related Content

What's hot

Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
Jordan Halterman
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
confluent
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Druid
DruidDruid
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
Knoldus Inc.
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
HostedbyConfluent
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
Celine George
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Altinity Ltd
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
OCoderFest
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
Kashif Khan
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
RomanKhavronenko
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
confluent
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
Arvind Kumar G.S
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
Syah Dwi Prihatmoko
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
Derrick Qin
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Databricks
 

What's hot (20)

Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Druid
DruidDruid
Druid
 
Monitoring With Prometheus
Monitoring With PrometheusMonitoring With Prometheus
Monitoring With Prometheus
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Server monitoring using grafana and prometheus
Server monitoring using grafana and prometheusServer monitoring using grafana and prometheus
Server monitoring using grafana and prometheus
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Grafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Getting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and GrafanaGetting Started Monitoring with Prometheus and Grafana
Getting Started Monitoring with Prometheus and Grafana
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorApache Spark Streaming in K8s with ArgoCD & Spark Operator
Apache Spark Streaming in K8s with ArgoCD & Spark Operator
 

Similar to Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Buesing | Current 2022

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
Jean-Georges Perrin
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
DoKC
 
OSN_2022.pdf
OSN_2022.pdfOSN_2022.pdf
OSN_2022.pdf
Neil Buesing
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
Uwe Printz
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013
Matt Ray
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overview
OpenStack Foundation
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overview
OpenStack Foundation
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
whoschek
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
Ceph Community
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
Matt Ray
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
Matt Ray
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
DataWorks Summit
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
Matt Ray
 
Chef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdfChef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdf
OpenStack Foundation
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」
SmartNews, Inc.
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
Puppet
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
ke4qqq
 

Similar to Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Buesing | Current 2022 (20)

ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
Dok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on KubernetesDok Talks #124 - Intro to Druid on Kubernetes
Dok Talks #124 - Intro to Druid on Kubernetes
 
OSN_2022.pdf
OSN_2022.pdfOSN_2022.pdf
OSN_2022.pdf
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overview
 
201304 chef for open stack overview
201304 chef for open stack overview201304 chef for open stack overview
201304 chef for open stack overview
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Achieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with ChefAchieving Infrastructure Portability with Chef
Achieving Infrastructure Portability with Chef
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
OpenStack Deployments with Chef
OpenStack Deployments with ChefOpenStack Deployments with Chef
OpenStack Deployments with Chef
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack  - OpenStack Fall 2012 SummitChef for OpenStack  - OpenStack Fall 2012 Summit
Chef for OpenStack - OpenStack Fall 2012 Summit
 
Chef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdfChef for OpenStack- Fall 2012.pdf
Chef for OpenStack- Fall 2012.pdf
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」
 
Puppet and Apache CloudStack
Puppet and Apache CloudStackPuppet and Apache CloudStack
Puppet and Apache CloudStack
 
Infrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStackInfrastructure as code with Puppet and Apache CloudStack
Infrastructure as code with Puppet and Apache CloudStack
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 

Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Buesing | Current 2022

  • 1. Don’t Forget about your past— optimizing Apache Druid performance with batch and real-time Current 2022 Neil Buesing, Kinetic Edge @nbuesing nbuesing https://www.kineticedge.io
  • 2. Goal • Sleep as well as my dog, Katniss.
  • 3. Goals 1. Technology Overview of Apache Druid and Apache Kafka 2. How to run Apache Druid and Apache Kafka locally 3. Druid Ingestion in real-time and batch 4. Query the data using Druid SQL Console 5. Con fi gure Apache Druid Real-Time Ingestion to make it safe to reload historical segments 6. Real-Time and Batch Ingestion: working together
  • 5. Apache Druid 1. Apache Druid still uses term master (I want it to be renamed) 2. runs with coordinator, if druid.coordinator.asOverlord.enabled=true 3. peons are processes, incubating e ff ort indexer uses threads instead 4. postgres or MySQL Query broker router Command coordinator overlord Data middlemanager historical Dependencies metadata store zookeeper peon(s) Storage 1 2 3 4
  • 6. • File Format • Segmentation • Time • Dimensions • Metrics __time dimensions metrics Apache Druid
  • 7. Apache Druid • Time • Segment Granularity • Query Granularity __time dimensions metrics __time dimensions metrics 2021-12-07 T 22:00:00 Z 2021-12-07 T 22:15:00 Z 2021-12-07 T 22:18:34.123 Z 2021-12-07 T 22:18:00.000 Z 22:15:00 22:18:00 22:15:00 22:18:00 22:09:00 22:09:00 22:05:00 22:13:00
  • 8. Apache Druid • Why Query Granularity? • Partially Precomputed Aggregates __time dimensions metrics 2021-12-07 T 22:18:34.123 Z 2021-12-07 T 22:18:00.000 Z 22:18:49 22:18:00 22:18:34 Cloud 9 1234A Store SKU 1 COUNT QTY Cloud 9 1234A 1 4 3 Cloud 9 1234A 2 7 With real-time ingestion precomputed aggregates are not absolute. select sum(count), count(count) are not the same
  • 9. Apache Druid middlemanager Deep Storage __time dimensions metrics __time dimensions metrics __time dimensions metrics __time dimensions metrics __time dimensions metrics historical historical middlemanager task task task task broker query router ui coordinator overlord metadata store zookeeper active real-tim e segm ents only
  • 10. Apache Druid select DATE_TRUNC('DAY', _ _ time) "TIME", storeId, sku, sum("count") "CNT", sum(quantity) "QTY" from skus group by 1, 2, 3 order by 1 desc, 4 desc
  • 11. Apache Druid - Aggregates • Rollable • Count • Sum • Min • Max • Unique Counts (Approximations) - super cool! • First (String) • Last (String) • Non Rollable • Mean • First (Numeric) • Last (Numeric) String First & Last aggregate on rollup store actual timestamp.
  • 12. Apache Druid - Unique Counts Apache Data Sketches : Theta : k=4 0.0 1.0 Star Trek : 0.590 Quantum Leap : 0.698 Fire fl y : 0.465 X-Files : 0.335 Mandalorian : 0.825 Battlestar Galactica : 0.323 4 * (1 / 0.465) = 8.6 k * (1 / theta) Uniform Random Hash Stranger Things : 0.238
  • 13. My All-Time Favorite Druid Query Apache Druid - Rollup Factor select sum("count") "Logical Count", count("count") "Physical Count", sum("count")/(count("count")*1.0) "Rollup Factor" from datasource
  • 18. Apache Druid & Kafka Overview
  • 19. Apache Druid Middle Manager Apache Kafka & Druid Apache Kafka broker broker task-0 broker a:0 a:1 a:2 a:0 a:0 a:1 a:1 a:2 a:2 druid superviser __time dimensions metrics __time dimensions metrics __time dimensions metrics task-1 task-2 assign() metadata store
  • 20. Druid Middle Manager Deep Storage __time dimensions metrics 23:00:00Z __time dimensions metrics 23:00:00Z __time dimensions metrics 22:00:00Z __time dimensions metrics 22:00:00Z Apache Kafka & Druid druid superviser __time dimensions metrics 23:00:00Z __time dimensions metrics 24:00:00Z 23:10 23:11 22:59 22:01 23:55 24:55 task-0 task-1 task-0 08:33 task-1 __time dimensions metrics 08:00:00Z
  • 21. Druid Middle Manager Apache Kafka & Druid druid task-1 01:xx __time dimensions metrics 01:00:00Z 02:xx __time dimensions metrics 02:00:00Z 03:xx __time dimensions metrics 03:00:00Z 04:xx __time dimensions metrics 04:00:00Z 05:xx __time dimensions metrics 05:00:00Z __time dimensions metrics 06:00:00Z 06:xx __time dimensions metrics 07:00:00Z 07:xx __time dimensions metrics 08:00:00Z 08:xx __time dimensions metrics 09:00:00Z 09:xx __time dimensions metrics 10:00:00Z 10:xx task
  • 22. Druid Middle Manager Apache Kafka & Druid druid superviser task-1 __time dimensions metrics 01:00:00Z __time dimensions metrics 02:00:00Z __time dimensions metrics 03:00:00Z __time dimensions metrics 04:00:00Z __time dimensions metrics 05:00:00Z __time dimensions metrics 06:00:00Z __time dimensions metrics 07:00:00Z __time dimensions metrics 08:00:00Z __time dimensions metrics 09:00:00Z __time dimensions metrics 10:00:00Z task a v o i d
  • 23. • Fragmented Segments • storage costs • query performance • compaction cost • Open File Handles • middle manager resources Apache Kafka & Druid __time dimensions metrics 01:00:00Z __time dimensions metrics 02:00:00Z __time dimensions metrics 03:00:00Z __time dimensions metrics 04:00:00Z __time dimensions metrics 05:00:00Z __time dimensions metrics 06:00:00Z __time dimensions metrics 07:00:00Z __time dimensions metrics 08:00:00Z __time dimensions metrics 09:00:00Z __time dimensions metrics 10:00:00Z task
  • 28. Kafka Local • https://github.com/kineticedge/dev-local • kafka • druid • kafka-connect • ksqlDB • mongo • grafana/prometheus dashboards • mysql • superset • and more CP Images (7.2.0+) support arm64/v8 images druid need to build your own arm64/v8 images docker inspect image:version --format “{{.Architecture}}" Apple Silicon?
  • 29. Kafka Local Demos • https://github.com/kineticedge/dev-local-demos • Uses dev-local Container Based Environment • demos with up/setup/down scripts for easy execution • druid-late • key-mismatch • rdbms-cdc-nosql • mongo-cdc • … and more to come … Today's Demo
  • 30. Kafka Local / DEMO cd dev-local-demo/druid-late . README.md up.sh setup.sh druid.sh connect.sh producer/run.sh Apache Kafka Apache Druid Kafka Connect / S3 Sink Minio Java Producer - Fake Data
  • 31. Kafka Local / DEMO SELECT (case is_realtime when 1 then 'REALTIME' else 'HISTORICAL' end) "TYPE", count(*) "COUNT" FROM sys.segments GROUP BY 1
  • 33. Apache Druid - Real-Time Deep Storage __time dimensions metrics __time dimensions metrics __time dimensions metrics historical middlemanager task broker query real-time batch real-time (handed-o ff )
  • 34. Apache Druid • reject messages earlier than period before the task was created • lateMessageRejectionPeriod • e.g. PT1H
  • 35. Apache Druid - Real-Time & Batch Deep Storage __time dimensions metrics __time dimensions metrics __time dimensions metrics broker lateMessageRejectionPeriod PT1H Append or Reload historical middlemanager task task query real-time batch real-time (handed-o ff )
  • 36. Apache Druid - Batch Task . . . "pref i xes": [ "s3 : / / sku/topics/skus/y=2022/m=09/" ], . . . "intervals": [ "2022-09-01T00 : 00 : 00/2022-10-01T00 : 00 : 00" ] . . .
  • 38. Apache Druid - Real-Time & Batch Deep Storage __time dimensions metrics __time dimensions metrics __time dimensions metrics broker lateMessageRejectionPeriod PT1H Append or Reload historical middlemanager task task query real-time batch real-time (handed-o ff )
  • 40. Questions @nbuesing nbuesing https://github.com/kineticedge dev-local - container ecosystem dev-local-demos - demonstrations druid-m1 - build arm64/v8 image for your Apple Silicon … & more … https://www.kineticedge.io