"The data coming into Kafka is fresh and hot. And you can deliver a new level of operational visibility and intelligence fueling applications with it. But streaming data is no longer real-time when the sink is batch. So the challenge is processing it and analyzing it at scale and extracting those insights - before they go stale.
So what’s the right architecture? Should you ingest streams into a data warehouse or data lake? Maybe use a stream processor or a database? Engineering teams love using Apache Flink, but they also love using Apache Druid, a popular real-time analytics database used by 1000s of companies like Confluent and Netflix. Do you need Flink and Druid? When does it make sense vs when does it not?
Join this session to learn about Apache Druid and why companies use it in combination with Kafka and Flink for real-time applications. Learn how Apache Druid complements Flink and Kafka - and what makes it purpose-built for analyzing streams and events. This talk shows real-world examples from companies that use Apache Druid with Kafka and Flink in production today and the best-practices that every dev can take advantage of."
12. Flink growth has
mirrored the growth of
Kafka, the de facto
standard for streaming
data
>75% of the Fortune 500 estimated to
be using Kafka
>100,000+ orgs using Kafka
>41,000 Kafka meetup attendees
>750 Kafka Improvement Proposals
>12,000 Jiras for Apache Kafka
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users
13. Developers choose Flink because of its
performance and rich feature set
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL,
enabling developers to
work in their language
of choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
14. Flink supports unified stream and batch processing
● Entire pipeline must always be running ● Execution proceeds in stages, running as needed
● Input must be processed as it arrives ● Input may be pre-sorted by time and key
● Results are reported as they become ready ● Results are reported at the end of the job
● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Effectively exactly-once guarantees are more
straightforward
15. Effortlessly filter, join, and enrich your data streams with Apache Flink
Real-time processing
Power low-latency applications and pipelines that react to
real-time events and provide timely insights
Data reusability
Share consistent and reusable data streams widely with
downstream applications and systems
Data enrichment
Curate, filter, and augment data on-the-fly with additional
context to improve completeness, accuracy, & compliance
Efficiency
Improve resource utilization and cost-effectiveness by
avoiding redundant processing across silos
“With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors,
smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion
detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and
response to security incidents without any added operational burden.”
16. Process data streams in-flight to maximize actionability, fidelity, and portability
Blob
storage
3rd party
app
Databases Data
Warehouse
Database
SaaS app
Low latency apps
and data pipelines
Consistent, reusable
data products
Optimized resource
utilization
17. Enrich real-time data streams with Generative
AI directly from Flink SQL
INSERT INTO enriched_reviews
SELECT id
, review
,
invoke_openai(prompt,review) as
score
FROM product_reviews
;
K
N
Kate
4 hours ago
This was the worst decision ever.
Nikola
1 day ago
Not bad. Could have been cheaper.
K
N
B
Kate
★★★★★ 4 hours ago
This was the worst decision ever.
Nikola
★★★★★ 1 day ago
Not bad. Could have been cheaper.
Brian
★★★★★ 3 days ago
Amazing! Game Changer!
The Prompt
“Score the following text on a scale of 1
and 5 where 1 is negative and 5 is
positive returning only the number”
DATA STREAMING PLATFORM
B
Brian
3 days ago
Amazing! Game Changer!
23. Real-time decisioning:
External-facing analytics:
Operational visibility at scale: Rapid data exploration:
For use cases where instant query response powers
automated rules engines and ML frameworks, including
real-time decisions and recommendations
For use cases that require instant response on interactive,
ad-hoc queries at scale on high-dimensional data such as
root cause diagnostics, ML training, and investigation.
For use cases where analytics are being delivered to
external stakeholders as a product or as a value add with
strict SLAs for performance under load and resiliency
For use cases that require real-time insights on big,
fast-moving event streams like observability, product
analytics, clickstream, IoT, and fraud detection
A high-performance, real-time analytics database
Supply chain Logistics Healthtech
Adtech Fintech Gaming Entertainment Retail
eCommerce
Operational visibility at scale
External-facing analytics
Rapid data exploration
Real-time decisioning
24. Just a few examples of the 1000s of Druid users
26. Trusted technology with an awesome community
Companies using Druid
Active Contributors
YoY Increase in Community Activity
Community Members
1,900+ 150%
14,000+ 600+
34. "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant
downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing
without the complexities of infrastructure management, enabling teams to focus on building real-time streaming
applications & pipelines that differentiate the business."
Enterprise-grade security
Secure stream processing with built-in identity and access
management, RBAC, and audit logs
Stream governance
Enforce data policies and avoid metadata duplication
leveraging native integration with Stream Governance
Monitoring
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Connectors
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Monitoring Connectors
Enterprise-grade
Security
Stream
Governance
Confluent Cloud: Unified platform for Kafka and Flink seamlessly integrated