SlideShare a Scribd company logo

A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

"The data coming into Kafka is fresh and hot. And you can deliver a new level of operational visibility and intelligence fueling applications with it. But streaming data is no longer real-time when the sink is batch. So the challenge is processing it and analyzing it at scale and extracting those insights - before they go stale. So what’s the right architecture? Should you ingest streams into a data warehouse or data lake? Maybe use a stream processor or a database? Engineering teams love using Apache Flink, but they also love using Apache Druid, a popular real-time analytics database used by 1000s of companies like Confluent and Netflix. Do you need Flink and Druid? When does it make sense vs when does it not? Join this session to learn about Apache Druid and why companies use it in combination with Kafka and Flink for real-time applications. Learn how Apache Druid complements Flink and Kafka - and what makes it purpose-built for analyzing streams and events. This talk shows real-world examples from companies that use Apache Druid with Kafka and Flink in production today and the best-practices that every dev can take advantage of."

1 of 36
Download to read offline
©2022, Imply
©2022, imply
The Trifecta of Real-Time Applications:
Apache Kafka, Flink, and Druid
1
©2022, Imply
Speakers
2
Gian Merlino
Co-founder and CTO of Imply
PMC Chair of Apache Druid
Kai Wähner
Field CTO of Confluent
©2022, Imply 3
Agenda 1. Overview of real-time applications
2. Kafka-Flink-Druid data architecture
3. Introduction to Apache Flink+use cases
4. Introduction to Apache Druid+use cases
5. Kafka-Flink-Druid case studies
6. Confluent and Imply integration
7. Q&A
©2022, Imply
©2021, imply
4
Data applications are going real-time
©2022, Imply 5
REAL-TIME APPLICATION
● >5 million events per second
● Interactive queries
● Customer-facing
● 250 queries per second
● Complex analytics
● Operational visibility
©2022, Imply 6
REAL-TIME APPLICATION
● Real-time monitoring
● Real-time alerting
● Real-time decisioning
● Interactive queries
● High dimensionality

Recommended

Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - Aprilconfluent
 
Digital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraDigital Business Transformation in the Streaming Era
Digital Business Transformation in the Streaming EraAttunity
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
PCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM Vision 2019 Breakout: Quest Software
PCM Vision 2019 Breakout: Quest SoftwarePCM
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 

More Related Content

Similar to A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformmatteo mazzeri
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceVMware Tanzu
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionDaniel Zivkovic
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastDatabricks
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the TruthEric Kavanagh
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsImply
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsAerospike, Inc.
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DayZivaro Inc
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Denver Big Data Analytics Day
Denver Big Data Analytics DayDenver Big Data Analytics Day
Denver Big Data Analytics DayZivaro Inc
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunk
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analyticskgshukla
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Cloudera, Inc.
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�Actian Corporation
 

Similar to A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid (20)

Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platformNatalie Godec - AirFlow and GCP: tomorrow's health service data platform
Natalie Godec - AirFlow and GCP: tomorrow's health service data platform
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
Google Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data editionGoogle Cloud Next '22 Recap: Serverless & Data edition
Google Cloud Next '22 Recap: Serverless & Data edition
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Splunk MINT and Stream Breakout
Splunk MINT and Stream BreakoutSplunk MINT and Stream Breakout
Splunk MINT and Stream Breakout
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and FastCloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
 
A Real-Time Version of the Truth
 A Real-Time Version of the Truth A Real-Time Version of the Truth
A Real-Time Version of the Truth
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Splunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech DaySplunk Enterprise 6.3 - Splunk Tech Day
Splunk Enterprise 6.3 - Splunk Tech Day
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Denver Big Data Analytics Day
Denver Big Data Analytics DayDenver Big Data Analytics Day
Denver Big Data Analytics Day
 
SplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT BreakoutSplunkLive! London - Splunk App for Stream & MINT Breakout
SplunkLive! London - Splunk App for Stream & MINT Breakout
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Pivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream AnalyticsPivotal Real Time Data Stream Analytics
Pivotal Real Time Data Stream Analytics
 
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
Turning Petabytes of Data into Profit with Hadoop for the World’s Biggest Ret...
 
SQL + Hadoop: The High Performance Advantage�
SQL + Hadoop:  The High Performance Advantage�SQL + Hadoop:  The High Performance Advantage�
SQL + Hadoop: The High Performance Advantage�
 

More from HostedbyConfluent

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsHostedbyConfluent
 
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...HostedbyConfluent
 
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...HostedbyConfluent
 
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...HostedbyConfluent
 
Rule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at NetflixHostedbyConfluent
 
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...HostedbyConfluent
 
Indeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformHostedbyConfluent
 
Forecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine LearningForecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine LearningHostedbyConfluent
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...HostedbyConfluent
 
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...HostedbyConfluent
 
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsHostedbyConfluent
 
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...HostedbyConfluent
 
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...HostedbyConfluent
 
Go Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at ScaleHostedbyConfluent
 
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2HostedbyConfluent
 
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark PythonFrom Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark PythonHostedbyConfluent
 
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...HostedbyConfluent
 
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...HostedbyConfluent
 
From the Battlefield: Squeezing the Most From Your Kafka Infrastructure
From the Battlefield: Squeezing the Most From Your Kafka InfrastructureFrom the Battlefield: Squeezing the Most From Your Kafka Infrastructure
From the Battlefield: Squeezing the Most From Your Kafka InfrastructureHostedbyConfluent
 

More from HostedbyConfluent (20)

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka Streams
 
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
 
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
 
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
 
Rule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at Netflix
 
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
 
Indeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment Platform
 
Forecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine LearningForecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine Learning
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
 
Accelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
 
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
 
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
Go Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at Scale
 
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
 
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark PythonFrom Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
 
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
 
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
 
From the Battlefield: Squeezing the Most From Your Kafka Infrastructure
From the Battlefield: Squeezing the Most From Your Kafka InfrastructureFrom the Battlefield: Squeezing the Most From Your Kafka Infrastructure
From the Battlefield: Squeezing the Most From Your Kafka Infrastructure
 

Recently uploaded

National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...MichaelBenis1
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...ShapeBlue
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...htrindia
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfTejal81
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1Inbay UK
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareAsma Ben Abacha
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Product School
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...ShapeBlue
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, TripadvisorProduct School
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyMustafa Kuğu
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...UiPathCommunity
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Product School
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Product School
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoProduct School
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...SearchNorwich
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueShapeBlue
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxNeo4j
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerSaiLinnThu2
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientKari Kakkonen
 

Recently uploaded (20)

National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...National Institute of Standards and Technology (NIST) Cybersecurity Framework...
National Institute of Standards and Technology (NIST) Cybersecurity Framework...
 
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
What’s New in CloudStack 4.19, Abhishek Kumar, Release Manager Apache CloudSt...
 
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
HBR SERIES METAL HOUSED RESISTORS POWER ELECTRICAL ABSORBS HIGH CURRENT DURIN...
 
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdfQ4 2023 Quarterly Investor Presentation - FINAL.pdf
Q4 2023 Quarterly Investor Presentation - FINAL.pdf
 
IT Nation Evolve event 2024 - Quarter 1
IT Nation Evolve event 2024  - Quarter 1IT Nation Evolve event 2024  - Quarter 1
IT Nation Evolve event 2024 - Quarter 1
 
Large Language Models and Applications in Healthcare
Large Language Models and Applications in HealthcareLarge Language Models and Applications in Healthcare
Large Language Models and Applications in Healthcare
 
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
Synergy in Leadership and Product Excellence: A Blueprint for Growth by CPO, ...
 
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
Elevating Cloud Infrastructure with Object Storage, DRS, VM Scheduling, and D...
 
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner,  Challenge Like a VC by former CPO, TripadvisorAct Like an Owner,  Challenge Like a VC by former CPO, Tripadvisor
Act Like an Owner, Challenge Like a VC by former CPO, Tripadvisor
 
PrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5CompanyPrismCRM-RealEstate-SalesCRM_byCode5Company
PrismCRM-RealEstate-SalesCRM_byCode5Company
 
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
Dev Dives: Leverage APIs and Gen AI to power automations for RPA and software...
 
Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...Launching New Products In Companies Where It Matters Most by Product Director...
Launching New Products In Companies Where It Matters Most by Product Director...
 
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...Relationship Counselling: From Disjointed Features to Product-First Thinking ...
Relationship Counselling: From Disjointed Features to Product-First Thinking ...
 
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, MonzoRevolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
Revolutionizing The Banking Industry: The Monzo Way by CPO, Monzo
 
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...In sharing we trust. Taking advantage of a diverse consortium to build a tran...
In sharing we trust. Taking advantage of a diverse consortium to build a tran...
 
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
ChatGPT's Code Interpreter: Your secret weapon for SEO automation success - S...
 
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlueCloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
CloudStack Tooling Ecosystem – Kiran Chavala, ShapeBlue
 
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptxThe Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
The Art of the Possible with Graph by Dr Jim Webber Neo4j.pptx
 
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-ManagerCentralized TLS Certificates Management Using Vault PKI + Cert-Manager
Centralized TLS Certificates Management Using Vault PKI + Cert-Manager
 
AI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficientAI improves software testing to be more fault tolerant, focused and efficient
AI improves software testing to be more fault tolerant, focused and efficient
 

A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid

  • 1. ©2022, Imply ©2022, imply The Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid 1
  • 2. ©2022, Imply Speakers 2 Gian Merlino Co-founder and CTO of Imply PMC Chair of Apache Druid Kai Wähner Field CTO of Confluent
  • 3. ©2022, Imply 3 Agenda 1. Overview of real-time applications 2. Kafka-Flink-Druid data architecture 3. Introduction to Apache Flink+use cases 4. Introduction to Apache Druid+use cases 5. Kafka-Flink-Druid case studies 6. Confluent and Imply integration 7. Q&A
  • 4. ©2022, Imply ©2021, imply 4 Data applications are going real-time
  • 5. ©2022, Imply 5 REAL-TIME APPLICATION ● >5 million events per second ● Interactive queries ● Customer-facing ● 250 queries per second ● Complex analytics ● Operational visibility
  • 6. ©2022, Imply 6 REAL-TIME APPLICATION ● Real-time monitoring ● Real-time alerting ● Real-time decisioning ● Interactive queries ● High dimensionality
  • 7. ©2022, Imply 7 REAL-TIME APPLICATION ● Real-time decisioning ● API-driven interactive queries ● Customer-facing ● 25 million users ● <100 ms query latency ● 100+ QPS
  • 8. ©2022, Imply 8 Building real-time apps with batch workflows doesn’t work Waiting… Data Collection Data Processing Data Ingestion Data Analysis Data Presentation 1 3 5 2 4 Waiting… Waiting… Waiting… Waiting… Latency measured in hours-to-days
  • 9. ©2022, Imply 9 Open source real-time data architecture Data Sources Streaming Pipeline Events/ Messages Stream Processing Real-Time Analytics Monitor/Alerting Analytics Visualization/ Decisioning Real-Time Applications Historical Data / Context + Enrichment / Transformation
  • 10. ©2022, Imply 10 timestamp sensor_id temperature 2023-07-10T10:00:00 SensorA 22.5 2023-07-10T10:01:00 SensorB 18.2 2023-07-10T10:02:00 SensorC 65.5 timestamp sensor_id location 2023-07-10T10:00:00 SensorA Room 101 2023-07-10T10:01:00 SensorB Room 101 2023-07-10T10:02:00 SensorC Room 101 Source A common real-time KFD pattern timestamp sensor_id location tempe rature 2023-07-10T10:00:00 SensorA Room 101 22.5 2023-07-10T10:01:00 SensorB Room 101 18.2 2023-07-10T10:02:00 SensorC Room 101 65.5 65.5 Current
  • 12. Flink growth has mirrored the growth of Kafka, the de facto standard for streaming data >75% of the Fortune 500 estimated to be using Kafka >100,000+ orgs using Kafka >41,000 Kafka meetup attendees >750 Kafka Improvement Proposals >12,000 Jiras for Apache Kafka 0 50,000 100,000 150,000 2020 2021 2022 2016 2017 2018 Flink Kafka Two Apache Projects, Born a Few Years Apart Monthly Unique Users
  • 13. Developers choose Flink because of its performance and rich feature set Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL, enabling developers to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology
  • 14. Flink supports unified stream and batch processing ● Entire pipeline must always be running ● Execution proceeds in stages, running as needed ● Input must be processed as it arrives ● Input may be pre-sorted by time and key ● Results are reported as they become ready ● Results are reported at the end of the job ● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart ● Flink guarantees effectively exactly-once results despite out-of-order data and restarts due to failures, etc. ● Effectively exactly-once guarantees are more straightforward
  • 15. Effortlessly filter, join, and enrich your data streams with Apache Flink Real-time processing Power low-latency applications and pipelines that react to real-time events and provide timely insights Data reusability Share consistent and reusable data streams widely with downstream applications and systems Data enrichment Curate, filter, and augment data on-the-fly with additional context to improve completeness, accuracy, & compliance Efficiency Improve resource utilization and cost-effectiveness by avoiding redundant processing across silos “With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security incidents without any added operational burden.”
  • 16. Process data streams in-flight to maximize actionability, fidelity, and portability Blob storage 3rd party app Databases Data Warehouse Database SaaS app Low latency apps and data pipelines Consistent, reusable data products Optimized resource utilization
  • 17. Enrich real-time data streams with Generative AI directly from Flink SQL INSERT INTO enriched_reviews SELECT id , review , invoke_openai(prompt,review) as score FROM product_reviews ; K N Kate 4 hours ago This was the worst decision ever. Nikola 1 day ago Not bad. Could have been cheaper. K N B Kate ★★★★★ 4 hours ago This was the worst decision ever. Nikola ★★★★★ 1 day ago Not bad. Could have been cheaper. Brian ★★★★★ 3 days ago Amazing! Game Changer! The Prompt “Score the following text on a scale of 1 and 5 where 1 is negative and 5 is positive returning only the number” DATA STREAMING PLATFORM B Brian 3 days ago Amazing! Game Changer!
  • 19. ©2022, Imply 19 Applications Analytics Applications Druid is built for the intersection of analytics and applications. Apache Druid
  • 20. ©2022, Imply 20 Apache Druid is a real-time analytics database Sub-second queries at massive scale Interactive analytics on TB-PBs of data High concurrency at low cost 1000s QPS via highly efficient distributed query engine Real-time and historical insights True stream ingestion for Kafka and Kinesis Plus, non-stop reliability with automated fault tolerance and continuous backup 1 2 3 For analytics applications that require: (Example of a Druid-powered UX)
  • 21. ©2022, Imply Real-Time Ingestion Ingestion Task 0 Ingestion Task 1 Ingestion Task M … Topic Partition 0 Partition 1 Partition 2 Partition N 21 Druid’s stream ingestion scales right with Kafka Producer Producer Producer … … Query Historical Historical Historical
  • 22. ©2022, Imply 22 22 1 2 3 4 5 Apache Druid is a database built for data in motion Continuous backup Guaranteed consistency Real-time insights with flexible historical queries Event-based ingestion High EPS scalability Schema auto-detection No connector needed to druid
  • 23. Real-time decisioning: External-facing analytics: Operational visibility at scale: Rapid data exploration: For use cases where instant query response powers automated rules engines and ML frameworks, including real-time decisions and recommendations For use cases that require instant response on interactive, ad-hoc queries at scale on high-dimensional data such as root cause diagnostics, ML training, and investigation. For use cases where analytics are being delivered to external stakeholders as a product or as a value add with strict SLAs for performance under load and resiliency For use cases that require real-time insights on big, fast-moving event streams like observability, product analytics, clickstream, IoT, and fraud detection A high-performance, real-time analytics database Supply chain Logistics Healthtech Adtech Fintech Gaming Entertainment Retail eCommerce Operational visibility at scale External-facing analytics Rapid data exploration Real-time decisioning
  • 24. Just a few examples of the 1000s of Druid users
  • 25. ©2022, Imply 25 Selecting the right tool for the job Apache Druid complements your analytics stack ● High QPS application/API usage ● Strict latency requirements ● Less elastic ● Ad-hoc, low concurrency usage ● Loose latency requirements ● Highly elastic Ideal workload ● Scatter-gather and shuffling engines ● Leverage broad array of indexes ● Guaranteed data caching ● Shuffling engines ● Focus on sequential data access ● Opportunistic data caching Technical properties Snowflake/BigQuery/Trino Cloud Data Warehouses + Query Engines Apache Druid Real-Time Analytics Database
  • 26. Trusted technology with an awesome community Companies using Druid Active Contributors YoY Increase in Community Activity Community Members 1,900+ 150% 14,000+ 600+
  • 27. ©2022, Imply 27 And many more! The data teams at leading organizations choose Druid Advertising Entertainment Industrial Financial Gaming Technology/Platform Technology/SaaS Security
  • 29. 10+ GB /hr 3X faster queries 99.9% availability Ad Serving Event Collector METADATA USER ACTIONS Reddit Ad Serving Pipeline
  • 30. 30 Phone Service Amazon Kinesis JSON events Batch processing Viz tools User Lyft Analytics Pipeline 400+ queries / minute 10X faster queries <500ms Latency on 99% of queries
  • 31. ©2022, Imply 31 Summary: Real-time use cases for Kafka-Flink-Druid ALERTING MONITORING DASHBOARDS EXPLORATION DECISIONING State or stateless triggered actions Continuous tracking of KPIs User-facing operational visibility Ad-hoc rapid data exploration On high throughput Kafka streams API-triggered automated workflows if X, then Y
  • 32. ©2022, Imply 32 What makes Kafka-Flink-Druid so popular? Stream-Native All 3 are designed natively for streaming data, supporting exactly-once semantics and event-by-event processing. Massively Scalable Fault Tolerant All 3 can handle massive event throughput into the millions of events per second across delivery, processing, and analytics. All 3 work in tandem for mission-critical use cases with guaranteed consistency, data and node replication, and data durability. Complementary All 3 provide distinct capabilities that together serve the full-breadth of real-time application use cases.
  • 34. "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that differentiate the business." Enterprise-grade security Secure stream processing with built-in identity and access management, RBAC, and audit logs Stream governance Enforce data policies and avoid metadata duplication leveraging native integration with Stream Governance Monitoring Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Connectors Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Monitoring Connectors Enterprise-grade Security Stream Governance Confluent Cloud: Unified platform for Kafka and Flink seamlessly integrated
  • 35. ©2022, Imply 35 Imply Polaris The Cloud Database Service for Apache Druid Most Affordable Most Secure Best Time to Value And for OS Druid Users