Stream Processing with Flink and Stream Sharing

Stream Processing with Apache Flink in the Cloud
and Stream Sharing
Kai Waehner
Field CTO, Conﬂuent

Data Streaming is part of our everyday lives
Trending Shows >
Recommendations >
Popular TV >
……
Personalization
Popularity score
Pattern detection
Categorization
Curated features & virality

3
9
3
9
Streaming Data
Pipelines
Data
Sharing
Real-time
Analytics
Cyber-
security
IoT &
Telematics
ML & AI
Customer
360
Stream Processing
…
Core Kafka
Real-Time Applications
Streaming Apps
and Pipelines
Compute
Storage
Data Streaming with Conﬂuent
Governance
Connectors
Platform Security Networking Observability

K A F K A
S T R E A M S
ksqlDB
K A F K A
C O N S U M E R
A N D
P R O D U C E R
S T R E A M
D E S I G N E R
Stream Processing for EVERYONE
Flexibility Simplicity

Steam Processing with Apache Flink
Serverless Flink as part of Conﬂuent Cloud

•
•
• Stream data
•
•
•
Process streams
Two Apache projects, born a few years apart

Immerok acquisition:
Accelerates our efforts of
bringing a cloud- native
Flink service to our
customers
● Also building a cloud-native Flink service
● Employs leading PMC members & committers
for Apache Flink
● Tackling some of the hardest problems in
cloud data infrastructure

Seamlessly process your data everywhere it resides with a Flink service that spans across the three major cloud
providers
Cloud-Native Complete Everywhere
Our Flink service will employ the same product principles
we’ve followed for Kafka
Deployment ﬂexibility
Integrated platform
Leverage Flink fully integrated with
Conﬂuent’s complete feature set,
enabling developers to build stream
processing applications quickly,
reliably, and securely
+
Serverless experience
Eliminate the operational burden
of managing Flink with a fully
managed, cloud-native service
that is simple, secure, and scalable
44
Seamlessly process your data
everywhere it resides with a Flink
service that spans across the three
major cloud providers

Why Stream Processing with Apache Flink?
45

Stream processing use cases
46
Data Exploration Data Pipelines Real-time Apps
Engineers and Analysts
both need to be able to
simply read and
understand the event
streams stored in Kafka
● Metadata discovery
● Throughput analysis
● Data sampling
● Interactive query
Data pipelines are used to
enrich, curate, and transform
events streams, creating new
derived event streams
● Filtering
● Joins
● Projections
● Aggregations
● Flattening
● Enrichment
Whole ecosystems of apps feed
on event streams automating
action in real-time
● Threat detection
● Quality of Service
● Fraud detection
● Intelligent routing
● Alerting

Data Exploration
…
…
SELECT * FROM input where eventType='A'

A A A
Aggregates and Rich Temporal Functions
00:00 00:20 00:40 01:00 01:20
A
B
B
TEMPORAL ANALYTICS FUNCTIONS:
…
COUNT(*) OVER
( PARTITION BY EventType
ORDER BY order_time
RANGE BETWEEN INTERVAL '20'
SECONDS
PRECEDING AND CURRENT ROW
)
A,1 A,2 A,3 A,4 B,1 B,2
A,3 A,1 B,2
WINDOWS:
SELECT EventType, COUNT(*)
TUMBLE (... , INTERVAL '20'
seconds)
GROUP BY EventType

A A A
Aggregates and Rich Temporal Functions
00:00 00:20 00:40 01:00 01:20
A
B
B
WINDOWS:
SELECT EventType, COUNT(*)
TUMBLE (... , INTERVAL '20' seconds)
GROUP BY EventType
TEMPORAL ANALYTICS FUNCTIONS:
…
COUNT(*) OVER
( PARTITION BY EventType
ORDER BY order_time
RANGE BETWEEN INTERVAL '20'
SECONDS
PRECEDING AND CURRENT ROW
)
Windowing and temporal analytics functions offer reach set of constructs for real-time processing and scenarios such as fraud detection.
● Windows (tumbling, hopping, etc.): events produced at regular intervals
● Temporal analytics functions: events products immediately
● Full composability of operators (windows of windows, aggregates of aggregates, etc.)
A,1 A,2 A,3 A,4 B,1 B,2
A,3 A,1 B,2

Data Enrichment
50
Orders
Currency
Rate
t1, 21.5 USD
t3, 55 EUR
t5, 35.3 EUR
t0, EUR:USD=1.01
t2, EUR:USD=1.05
t4: EUR:USD=1.10
t1, 21.5 USD
t3, 57.75 USD
t5, 38.83 USD
SELECT
order_id,
price,
currency,
conversion_rate,
order_time,
FROM orders
LEFT JOIN currency_rates FOR SYSTEM_TIME AS OF orders.order_time
ON orders.currency = currency_rates.currency;

Complex Event Processing (CEP) - Pattern Detection
C
price>lag(price)
B
price<lag(price)
D
price<lag(price)
E
price>lag(price)
MATCH_RECOGNIZE(
PARTITION BY stock_ticker
MEASURES
A as ﬁrstvalue
LAST(Z) as lastvalue
PATTERN (A B+ C+ D+ E+)
DEFINE
B as price<LAST(price)
C as price>LAST(price)
D as price<LAST(price)
E as price>LAST(price) and price>LAST(C)
A

Read once, Write many
Fan-out queries using Flink SQL
INSERT INTO cluster1.topicA
SELECT * FROM input where eventType='A'
INSERT INTO cluster1.topicB
SELECT * FROM input where eventType='B'
…
Input
topicA
topicB
topicC
topicD

Support for multi-clusters
topicA
topicB
topicC
topicD
Cluster1
Cluster4
topicA
topicB
Cluster2
Cluster3

Cross cluster processing
Flink
ksqlDB
Kafka Streams
54

55
Serverless Apache Flink in Conﬂuent Cloud

Feature Highlights
New SQL capabilities
Metadata integration
Cross cluster queries
Admin UI
CLI
Feature Highlights
Oauth and RBAC
Metered usage
Cloud UX
Notebook querying
Feature Highlights
99.99% uptime SLA
Private networking
Pricing & packaging
Feature Highlights
GA in all three clouds
Cluster autoscaling
Early Access
Spring 2023
We are planning to GA our Flink service in Q4 2023
Public Preview
Late Summer 2023
2
1
Limited Availability
Late Fall 2023
General Availability
Winter 2023/24
4
3
Product roadmap
56

Integration across Kafka Clusters
58

FlinkSQL Query against Kafka Topic (ANSI SQL)
61

Complex Event Processing (Pattern Matching)
62

TL;DR - Serverless Flink in Conﬂuent Cloud
63

● Easy collaboration on live data with
partners, customers, and vendors
✓ One-click sharing
✓ Trusted and governed
✓ Secure, granular, auditable
● Single, org-wide portal to discover data
streams from trusted sources
Conﬂuent
Stream Sharing
Secure, trusted, real-time data sharing
Available Now:
GA

66
Conﬂuent Stream Sharing in a Decentralized Data Mesh
Internal and external data sharing in real-time
Faster time to market, better customer experience and new business models

Generate an AsyncAPI Speciﬁcation
67

Conﬂuent Stream Sharing
Who is Stream Sharing for?
● Mainly for developers and architects in companies with medium-to-high Kafka maturity (Phases 3+ of
the streaming maturity model) with a use case to share their Kafka topics with any external parties
(e.g., vendors, partners, customers) or other internal teams (e.g., from different LoB)
What pain points is it trying to solve?
● Out-of-sync data: most existing solutions involve dumping data from Kafka to a sink in a batch
process and copying data from it before moving it to an external source, which turns real-time data into
stale data.
● Operational complexities: setting up, maintaining and scaling these sharing pipelines to meet security
and privacy requirements requires complex integration work and is operationally taxing
● Vendor lock-in: most sharing solutions require both the Data Provider and Data Recipient to be on the
same platform, resulting in contractual complexity and vendor lock-in
68

Conﬂuent Stream Sharing
What are the differences between Stream Sharing and Cluster Linking in their sharing capabilities?
And which one should we recommend to the customers?
Stream Sharing is our default data-sharing solution
There are three major differences between Cluster Linking and Stream Sharing
● 1) With Stream Sharing, Data Recipients can consume directly from Data Provider’s Kafka cluster without the
need to copy the data, saving cluster infra and provisioning efforts. Cluster Linking requires a destination
cluster ready for byte-by-byte replication
○ Sharing grants recipients access to the shared topic and Schema Registry subjects. 1 topic + n shared
Schema Registry subjects are included in the same share.
● 2) Data Recipients can use any platform to consume from Stream Sharing, whether it’s CC, CP, OSS Kafka,
MSK or Aiven. Cluster Linking requires the Data Recipient to be on CC Dedicated Cluster or CP
● 3) Only an email address is needed for Data Provider to share via Stream Sharing, whereas in Cluster Linking,
both parties’ cluster ID, API credentials are required for multiple steps for setup and provisioning.
69

Stream Processing with Flink and Stream Sharing

More Related Content

What's hot

Similar to Stream Processing with Flink and Stream Sharing

More from confluent

Recently uploaded

Stream Processing with Flink and Stream Sharing