SlideShare a Scribd company logo
©2022, Imply
©2022, imply
The Trifecta of Real-Time Applications:
Apache Kafka, Flink, and Druid
1
©2022, Imply
Speakers
2
Gian Merlino
Co-founder and CTO of Imply
PMC Chair of Apache Druid
Kai Wähner
Field CTO of Confluent
©2022, Imply 3
Agenda 1. Overview of real-time applications
2. Kafka-Flink-Druid data architecture
3. Introduction to Apache Flink+use cases
4. Introduction to Apache Druid+use cases
5. Kafka-Flink-Druid case studies
6. Confluent and Imply integration
7. Q&A
©2022, Imply
©2021, imply
4
Data applications are going real-time
©2022, Imply 5
REAL-TIME APPLICATION
● >5 million events per second
● Interactive queries
● Customer-facing
● 250 queries per second
● Complex analytics
● Operational visibility
©2022, Imply 6
REAL-TIME APPLICATION
● Real-time monitoring
● Real-time alerting
● Real-time decisioning
● Interactive queries
● High dimensionality
©2022, Imply 7
REAL-TIME APPLICATION
● Real-time decisioning
● API-driven interactive queries
● Customer-facing
● 25 million users
● <100 ms query latency
● 100+ QPS
©2022, Imply 8
Building real-time apps with batch workflows doesn’t work
Waiting…
Data Collection
Data Processing
Data Ingestion
Data Analysis
Data Presentation
1 3
5
2 4
Waiting…
Waiting…
Waiting…
Waiting…
Latency measured in hours-to-days
©2022, Imply 9
Open source real-time data architecture
Data
Sources
Streaming Pipeline
Events/
Messages
Stream Processing
Real-Time
Analytics
Monitor/Alerting
Analytics
Visualization/
Decisioning
Real-Time
Applications
Historical Data / Context
+
Enrichment /
Transformation
©2022, Imply 10
timestamp sensor_id temperature
2023-07-10T10:00:00 SensorA 22.5
2023-07-10T10:01:00 SensorB 18.2
2023-07-10T10:02:00 SensorC 65.5
timestamp sensor_id location
2023-07-10T10:00:00 SensorA Room 101
2023-07-10T10:01:00 SensorB Room 101
2023-07-10T10:02:00 SensorC Room 101
Source
A common real-time KFD pattern
timestamp sensor_id location
tempe
rature
2023-07-10T10:00:00 SensorA Room 101 22.5
2023-07-10T10:01:00 SensorB Room 101 18.2
2023-07-10T10:02:00 SensorC Room 101 65.5
65.5
Current
©2022, Imply
©2021, imply
11
Introduction to Apache Flink
Flink growth has
mirrored the growth of
Kafka, the de facto
standard for streaming
data
>75% of the Fortune 500 estimated to
be using Kafka
>100,000+ orgs using Kafka
>41,000 Kafka meetup attendees
>750 Kafka Improvement Proposals
>12,000 Jiras for Apache Kafka
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users
Developers choose Flink because of its
performance and rich feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL,
enabling developers to
work in their language
of choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Flink supports unified stream and batch processing
● Entire pipeline must always be running ● Execution proceeds in stages, running as needed
● Input must be processed as it arrives ● Input may be pre-sorted by time and key
● Results are reported as they become ready ● Results are reported at the end of the job
● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Effectively exactly-once guarantees are more
straightforward
Effortlessly filter, join, and enrich your data streams with Apache Flink
Real-time processing
Power low-latency applications and pipelines that react to
real-time events and provide timely insights
Data reusability
Share consistent and reusable data streams widely with
downstream applications and systems
Data enrichment
Curate, filter, and augment data on-the-fly with additional
context to improve completeness, accuracy, & compliance
Efficiency
Improve resource utilization and cost-effectiveness by
avoiding redundant processing across silos
“With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors,
smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion
detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and
response to security incidents without any added operational burden.”
Process data streams in-flight to maximize actionability, fidelity, and portability
Blob
storage
3rd party
app
Databases Data
Warehouse
Database
SaaS app
Low latency apps
and data pipelines
Consistent, reusable
data products
Optimized resource
utilization
Enrich real-time data streams with Generative
AI directly from Flink SQL
INSERT INTO enriched_reviews
SELECT id
, review
,
invoke_openai(prompt,review) as
score
FROM product_reviews
;
K
N
Kate
4 hours ago
This was the worst decision ever.
Nikola
1 day ago
Not bad. Could have been cheaper.
K
N
B
Kate
★★★★★ 4 hours ago
This was the worst decision ever.
Nikola
★★★★★ 1 day ago
Not bad. Could have been cheaper.
Brian
★★★★★ 3 days ago
Amazing! Game Changer!
The Prompt
“Score the following text on a scale of 1
and 5 where 1 is negative and 5 is
positive returning only the number”
DATA STREAMING PLATFORM
B
Brian
3 days ago
Amazing! Game Changer!
©2022, Imply
©2021, imply
18
Introduction to Apache Druid
©2022, Imply 19
Applications
Analytics Applications
Druid is built for
the intersection
of analytics and
applications.
Apache
Druid
©2022, Imply 20
Apache Druid is a real-time analytics database
Sub-second queries at massive scale
Interactive analytics on TB-PBs of data
High concurrency at low cost
1000s QPS via highly efficient distributed query engine
Real-time and historical insights
True stream ingestion for Kafka and Kinesis
Plus, non-stop reliability with automated fault
tolerance and continuous backup
1
2
3
For analytics applications that require:
(Example of a Druid-powered UX)
©2022, Imply
Real-Time Ingestion
Ingestion Task 0
Ingestion Task 1
Ingestion Task M
…
Topic
Partition 0
Partition 1
Partition 2
Partition N
21
Druid’s stream ingestion scales right with Kafka
Producer
Producer
Producer
…
…
Query
Historical
Historical
Historical
©2022, Imply 22
22
1 2 3 4 5
Apache Druid is a database built for data in motion
Continuous backup
Guaranteed consistency
Real-time insights with
flexible historical queries
Event-based ingestion
High EPS scalability
Schema auto-detection
No connector needed
to druid
Real-time decisioning:
External-facing analytics:
Operational visibility at scale: Rapid data exploration:
For use cases where instant query response powers
automated rules engines and ML frameworks, including
real-time decisions and recommendations
For use cases that require instant response on interactive,
ad-hoc queries at scale on high-dimensional data such as
root cause diagnostics, ML training, and investigation.
For use cases where analytics are being delivered to
external stakeholders as a product or as a value add with
strict SLAs for performance under load and resiliency
For use cases that require real-time insights on big,
fast-moving event streams like observability, product
analytics, clickstream, IoT, and fraud detection
A high-performance, real-time analytics database
Supply chain Logistics Healthtech
Adtech Fintech Gaming Entertainment Retail
eCommerce
Operational visibility at scale
External-facing analytics
Rapid data exploration
Real-time decisioning
Just a few examples of the 1000s of Druid users
©2022, Imply 25
Selecting the right tool for the job
Apache Druid complements your analytics stack
● High QPS application/API usage
● Strict latency requirements
● Less elastic
● Ad-hoc, low concurrency usage
● Loose latency requirements
● Highly elastic
Ideal
workload
● Scatter-gather and shuffling engines
● Leverage broad array of indexes
● Guaranteed data caching
● Shuffling engines
● Focus on sequential data access
● Opportunistic data caching
Technical
properties
Snowflake/BigQuery/Trino
Cloud Data Warehouses + Query Engines
Apache Druid
Real-Time Analytics Database
Trusted technology with an awesome community
Companies using Druid
Active Contributors
YoY Increase in Community Activity
Community Members
1,900+ 150%
14,000+ 600+
©2022, Imply 27
And many more!
The data teams at leading organizations choose Druid
Advertising Entertainment Industrial Financial
Gaming Technology/Platform Technology/SaaS Security
©2022, Imply
©2021, imply
28
Case Studies
10+
GB /hr
3X
faster
queries
99.9%
availability
Ad Serving
Event Collector
METADATA
USER ACTIONS
Reddit Ad Serving Pipeline
30
Phone Service Amazon
Kinesis
JSON events
Batch processing
Viz tools
User
Lyft Analytics Pipeline
400+
queries /
minute
10X
faster
queries
<500ms
Latency on
99% of
queries
©2022, Imply 31
Summary: Real-time use cases for Kafka-Flink-Druid
ALERTING MONITORING DASHBOARDS EXPLORATION DECISIONING
State or stateless
triggered actions
Continuous
tracking of KPIs
User-facing
operational visibility
Ad-hoc rapid
data exploration
On high throughput Kafka streams
API-triggered
automated workflows
if X, then Y
©2022, Imply 32
What makes Kafka-Flink-Druid so popular?
Stream-Native
All 3 are designed
natively for streaming
data, supporting
exactly-once semantics
and event-by-event
processing.
Massively Scalable Fault Tolerant
All 3 can handle massive
event throughput into
the millions of events
per second across
delivery, processing, and
analytics.
All 3 work in tandem
for mission-critical use
cases with guaranteed
consistency, data and
node replication, and
data durability.
Complementary
All 3 provide distinct
capabilities that
together serve the
full-breadth of
real-time application
use cases.
©2022, Imply
©2021, imply
33
Cloud Services for KFD
"When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant
downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing
without the complexities of infrastructure management, enabling teams to focus on building real-time streaming
applications & pipelines that differentiate the business."
Enterprise-grade security
Secure stream processing with built-in identity and access
management, RBAC, and audit logs
Stream governance
Enforce data policies and avoid metadata duplication
leveraging native integration with Stream Governance
Monitoring
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Connectors
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Monitoring Connectors
Enterprise-grade
Security
Stream
Governance
Confluent Cloud: Unified platform for Kafka and Flink seamlessly integrated
©2022, Imply 35
Imply Polaris
The Cloud Database Service
for Apache Druid
Most
Affordable
Most
Secure
Best Time
to Value
And for OS Druid Users
©2022, Imply
©2022, imply
Thank you
36

More Related Content

Similar to A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Dataconomy Media
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
Biswajit Das
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
matteo mazzeri
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
VMware Tanzu
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
Daniel Zivkovic
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream Breakout
Splunk
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the Truth
Eric Kavanagh
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
Imply
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
Aerospike, Inc.
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech Day
Zivaro Inc
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
Denver Big Data Analytics Day
Denver Big Data Analytics DayDenver Big Data Analytics Day
Denver Big Data Analytics Day
Zivaro Inc
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
Splunk
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
kgshukla
 

Similar to A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid (20)

Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream Breakout
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the Truth
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech Day
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Denver Big Data Analytics Day
Denver Big Data Analytics DayDenver Big Data Analytics Day
Denver Big Data Analytics Day
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

  • 1. ©2022, Imply ©2022, imply The Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid 1
  • 2. ©2022, Imply Speakers 2 Gian Merlino Co-founder and CTO of Imply PMC Chair of Apache Druid Kai Wähner Field CTO of Confluent
  • 3. ©2022, Imply 3 Agenda 1. Overview of real-time applications 2. Kafka-Flink-Druid data architecture 3. Introduction to Apache Flink+use cases 4. Introduction to Apache Druid+use cases 5. Kafka-Flink-Druid case studies 6. Confluent and Imply integration 7. Q&A
  • 4. ©2022, Imply ©2021, imply 4 Data applications are going real-time
  • 5. ©2022, Imply 5 REAL-TIME APPLICATION ● >5 million events per second ● Interactive queries ● Customer-facing ● 250 queries per second ● Complex analytics ● Operational visibility
  • 6. ©2022, Imply 6 REAL-TIME APPLICATION ● Real-time monitoring ● Real-time alerting ● Real-time decisioning ● Interactive queries ● High dimensionality
  • 7. ©2022, Imply 7 REAL-TIME APPLICATION ● Real-time decisioning ● API-driven interactive queries ● Customer-facing ● 25 million users ● <100 ms query latency ● 100+ QPS
  • 8. ©2022, Imply 8 Building real-time apps with batch workflows doesn’t work Waiting… Data Collection Data Processing Data Ingestion Data Analysis Data Presentation 1 3 5 2 4 Waiting… Waiting… Waiting… Waiting… Latency measured in hours-to-days
  • 9. ©2022, Imply 9 Open source real-time data architecture Data Sources Streaming Pipeline Events/ Messages Stream Processing Real-Time Analytics Monitor/Alerting Analytics Visualization/ Decisioning Real-Time Applications Historical Data / Context + Enrichment / Transformation
  • 10. ©2022, Imply 10 timestamp sensor_id temperature 2023-07-10T10:00:00 SensorA 22.5 2023-07-10T10:01:00 SensorB 18.2 2023-07-10T10:02:00 SensorC 65.5 timestamp sensor_id location 2023-07-10T10:00:00 SensorA Room 101 2023-07-10T10:01:00 SensorB Room 101 2023-07-10T10:02:00 SensorC Room 101 Source A common real-time KFD pattern timestamp sensor_id location tempe rature 2023-07-10T10:00:00 SensorA Room 101 22.5 2023-07-10T10:01:00 SensorB Room 101 18.2 2023-07-10T10:02:00 SensorC Room 101 65.5 65.5 Current
  • 12. Flink growth has mirrored the growth of Kafka, the de facto standard for streaming data >75% of the Fortune 500 estimated to be using Kafka >100,000+ orgs using Kafka >41,000 Kafka meetup attendees >750 Kafka Improvement Proposals >12,000 Jiras for Apache Kafka 0 50,000 100,000 150,000 2020 2021 2022 2016 2017 2018 Flink Kafka Two Apache Projects, Born a Few Years Apart Monthly Unique Users
  • 13. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL, enabling developers to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 14. Flink supports unified stream and batch processing ● Entire pipeline must always be running ● Execution proceeds in stages, running as needed ● Input must be processed as it arrives ● Input may be pre-sorted by time and key ● Results are reported as they become ready ● Results are reported at the end of the job ● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart ● Flink guarantees effectively exactly-once results despite out-of-order data and restarts due to failures, etc. ● Effectively exactly-once guarantees are more straightforward
  • 15. Effortlessly filter, join, and enrich your data streams with Apache Flink Real-time processing Power low-latency applications and pipelines that react to real-time events and provide timely insights Data reusability Share consistent and reusable data streams widely with downstream applications and systems Data enrichment Curate, filter, and augment data on-the-fly with additional context to improve completeness, accuracy, & compliance Efficiency Improve resource utilization and cost-effectiveness by avoiding redundant processing across silos “With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security incidents without any added operational burden.”
  • 16. Process data streams in-flight to maximize actionability, fidelity, and portability Blob storage 3rd party app Databases Data Warehouse Database SaaS app Low latency apps and data pipelines Consistent, reusable data products Optimized resource utilization
  • 17. Enrich real-time data streams with Generative AI directly from Flink SQL INSERT INTO enriched_reviews SELECT id , review , invoke_openai(prompt,review) as score FROM product_reviews ; K N Kate 4 hours ago This was the worst decision ever. Nikola 1 day ago Not bad. Could have been cheaper. K N B Kate ★★★★★ 4 hours ago This was the worst decision ever. Nikola ★★★★★ 1 day ago Not bad. Could have been cheaper. Brian ★★★★★ 3 days ago Amazing! Game Changer! The Prompt “Score the following text on a scale of 1 and 5 where 1 is negative and 5 is positive returning only the number” DATA STREAMING PLATFORM B Brian 3 days ago Amazing! Game Changer!
  • 19. ©2022, Imply 19 Applications Analytics Applications Druid is built for the intersection of analytics and applications. Apache Druid
  • 20. ©2022, Imply 20 Apache Druid is a real-time analytics database Sub-second queries at massive scale Interactive analytics on TB-PBs of data High concurrency at low cost 1000s QPS via highly efficient distributed query engine Real-time and historical insights True stream ingestion for Kafka and Kinesis Plus, non-stop reliability with automated fault tolerance and continuous backup 1 2 3 For analytics applications that require: (Example of a Druid-powered UX)
  • 21. ©2022, Imply Real-Time Ingestion Ingestion Task 0 Ingestion Task 1 Ingestion Task M … Topic Partition 0 Partition 1 Partition 2 Partition N 21 Druid’s stream ingestion scales right with Kafka Producer Producer Producer … … Query Historical Historical Historical
  • 22. ©2022, Imply 22 22 1 2 3 4 5 Apache Druid is a database built for data in motion Continuous backup Guaranteed consistency Real-time insights with flexible historical queries Event-based ingestion High EPS scalability Schema auto-detection No connector needed to druid
  • 23. Real-time decisioning: External-facing analytics: Operational visibility at scale: Rapid data exploration: For use cases where instant query response powers automated rules engines and ML frameworks, including real-time decisions and recommendations For use cases that require instant response on interactive, ad-hoc queries at scale on high-dimensional data such as root cause diagnostics, ML training, and investigation. For use cases where analytics are being delivered to external stakeholders as a product or as a value add with strict SLAs for performance under load and resiliency For use cases that require real-time insights on big, fast-moving event streams like observability, product analytics, clickstream, IoT, and fraud detection A high-performance, real-time analytics database Supply chain Logistics Healthtech Adtech Fintech Gaming Entertainment Retail eCommerce Operational visibility at scale External-facing analytics Rapid data exploration Real-time decisioning
  • 24. Just a few examples of the 1000s of Druid users
  • 25. ©2022, Imply 25 Selecting the right tool for the job Apache Druid complements your analytics stack ● High QPS application/API usage ● Strict latency requirements ● Less elastic ● Ad-hoc, low concurrency usage ● Loose latency requirements ● Highly elastic Ideal workload ● Scatter-gather and shuffling engines ● Leverage broad array of indexes ● Guaranteed data caching ● Shuffling engines ● Focus on sequential data access ● Opportunistic data caching Technical properties Snowflake/BigQuery/Trino Cloud Data Warehouses + Query Engines Apache Druid Real-Time Analytics Database
  • 26. Trusted technology with an awesome community Companies using Druid Active Contributors YoY Increase in Community Activity Community Members 1,900+ 150% 14,000+ 600+
  • 27. ©2022, Imply 27 And many more! The data teams at leading organizations choose Druid Advertising Entertainment Industrial Financial Gaming Technology/Platform Technology/SaaS Security
  • 29. 10+ GB /hr 3X faster queries 99.9% availability Ad Serving Event Collector METADATA USER ACTIONS Reddit Ad Serving Pipeline
  • 30. 30 Phone Service Amazon Kinesis JSON events Batch processing Viz tools User Lyft Analytics Pipeline 400+ queries / minute 10X faster queries <500ms Latency on 99% of queries
  • 31. ©2022, Imply 31 Summary: Real-time use cases for Kafka-Flink-Druid ALERTING MONITORING DASHBOARDS EXPLORATION DECISIONING State or stateless triggered actions Continuous tracking of KPIs User-facing operational visibility Ad-hoc rapid data exploration On high throughput Kafka streams API-triggered automated workflows if X, then Y
  • 32. ©2022, Imply 32 What makes Kafka-Flink-Druid so popular? Stream-Native All 3 are designed natively for streaming data, supporting exactly-once semantics and event-by-event processing. Massively Scalable Fault Tolerant All 3 can handle massive event throughput into the millions of events per second across delivery, processing, and analytics. All 3 work in tandem for mission-critical use cases with guaranteed consistency, data and node replication, and data durability. Complementary All 3 provide distinct capabilities that together serve the full-breadth of real-time application use cases.
  • 34. "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that differentiate the business." Enterprise-grade security Secure stream processing with built-in identity and access management, RBAC, and audit logs Stream governance Enforce data policies and avoid metadata duplication leveraging native integration with Stream Governance Monitoring Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Connectors Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Monitoring Connectors Enterprise-grade Security Stream Governance Confluent Cloud: Unified platform for Kafka and Flink seamlessly integrated
  • 35. ©2022, Imply 35 Imply Polaris The Cloud Database Service for Apache Druid Most Affordable Most Secure Best Time to Value And for OS Druid Users