SlideShare a Scribd company logo
1 of 53
Download to read offline
Stream Processing with
Apache Flink®
Flink Overview 101
+
01
02
03
04
05
Intro : The world has changed..some news
Understanding the importance of stream processing
Why Apache Flink is becoming the de facto standard
Enhancing Apache Flink as a cloud-native service
Questions
Agenda
Intro about confluent community
Some things you should know about us
https://discover.confluent.io/santander-and-confluent-better-together
Our Landing page
Aun hay mas!!! Coming soon
GEN IA & Confluent
webinar 7 May 2024
Apuntate ya !!!!
Some - Quick update from Kafka
summit london
Apache Flink en Confluent Cloud entra en modo General Available: Anunciamos la disponibilidad general de nuestro servicio
totalmente gestionado de Confluent Cloud para Apache Flink®, proporcionando a los clientes una verdadera solución
multicloud para implementar procesamiento de streams de eventos en cualquiera de los tres principales CSP's donde residan
los datos y aplicaciones. Podéis ver más detalles en el blog post.
Presentación de Tableflow: Presentamos nuestra visión para Tableflow, una nueva función en la plataforma de streaming de
eventos de Confluent Cloud que permite a los usuarios convertir tópicos de Apache Kafka y sus esquemas asociados en tablas
de Apache Iceberg® con un solo clic para abastecer mejor a los data lakes y data warehouses. Tableflow estará disponible en
modo early access privado próximamente.
Kora, el motor de Confluent Cloud, es más rápido que nunca: Revelamos que Kora, el motor que alimenta el servicio de
Kafka nativo cloud de Confluent Cloud, ahora es 16 veces más rápido que Kafka opensource.
Mejora de conectores : Se presentaron nuevas mejoras al portafolio de conectores totalmente gestionados de Confluent, que
incluyen DNS Forwarding y puntos de acceso de Egress privados. Además se anuncia un SLA mejorado del 99.99%.
Mejora de Stream Governance : El servicio de Stream Governance en Confluent sigue evolucionando y ahora está habilitado
de forma predeterminada en todos los entornos con un SLA de tiempo de actividad del 99.99% para Schema Registry.
Además, anunciamos que la cobertura regional para Stream Governance se expandirá a todas las regiones de Confluent Cloud,
proporcionando un acceso más fluido a las funcionalidades exclusivas de gobernanza.
Why ….
Why if we can unified both worlds ?
Understanding the importance of
stream processing
Enable frictionless
access to up-to-date
trustworthy data
products
Share
Reimagine data
streaming everywhere,
on-prem and in every
major public cloud
Stream
Make data in motion
self-service, secure,
compliant and
trustworthy
Govern
Drive greater
data reuse with
always-on stream
processing
Process
Make it easy to
on-ramp and off-ramp
data from existing
systems and apps
Connect
Stream processing is a critical part of data streaming
DATA IN MOTION
Streaming
Applications
Apache
Flink
Apache
Kafka
DATA AT REST
Application
Layer
Compute
Layer
Storage
Layer
Traditional
Databases
File
Systems
Web
Applications
Stream processing acts as the compute layer to
Kafka, powering real-time applications & pipelines
Processing
Kafka
Custom apps
3rd party apps
Databases
Database
Data Warehouse
SaaS app
Queries
Analytics
Interactions
Processing
Processing
Processing down
stream of Kafka
increases latency,
adds costs and
redundancy, and
inhibits data reuse
Increased complexity from
redundant processing
Data systems & applications
built on stale data
Expensive & inefficient to clean
and enrich data multiple times
Custom apps
3rd party apps
Databases
Database
Data Warehouse
SaaS app
Queries
Analytics
Interactions
Processing data at
ingest improves
latency, data
portability, and
cost effectiveness
Maximized data reusability &
consistency
Improved cost-efficiency from
cleaning & enriching data once
Real-time apps & data systems
reflect current state
Kafka
Storage
Flink
Compute
Stream Processing
Process your data once, process your data right
Heatmap service
Payment service
Supply chain systems
Watch lists
Profile mgmt
Incident mgmt
Customer
profile data
ITSM systems
Central log systems
Fraud & SIEM systems
Alerting systems
AI/ML engines
Visualization apps
Threat vector
Transactions
Payments
Mainframe data
Inventory
Weather
Telemetry
IoT data
Notification engine
Payroll systems
CRM systems
Mobile application
Personalization
Web application
Clickstreams
Customer loyalty
Change logs
Customer data
Recommendation engine
Stream processing enables users to filter, join, and
enrich streams on-the-fly to drive greater data reuse
Why Apache Flink is becoming
the de facto standard
0
50,000
100,000
150,000
2020 2021 2022
2016 2017 2018
Flink
Kafka
Two Apache Projects, Born a
Few Years Apart
Monthly Unique Users
Flink growth has
mirrored the
growth of Kafka,
the de facto
standard for
streaming data
>75% of the Fortune 500 estimated
to be using Kafka
>100,000+ orgs using Kafka
>41,000 Kafka meetup attendees
>750 Kafka Improvement Proposals
>12,000 Jiras for Apache Kafka
Innovative companies have adopted both Kafka & Flink
Digital natives leverage Flink to disrupt markets and
gain competitive advantage
UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
Lets have some fun !!!
Who is who?
463 M 30B Gaming
●
WHY
● SQL drive 90% uses cases with less than 50% TTM
● SQL make stream analytics accessible for all data scientist that
allows to create insights connected to revenue
● Keep & Play with statesto deep dive on business questions
● Eg: Revenue x level
● Game user counts RT
● Extremely auto-scalable
● Low latency
● Control temporal de eventos
● Wide variety of handling Script window
● Fault tolerance at the style :Check points & save points
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
SO basically Customers choose Flink because of its
performance and rich feature set
SERGIO IN
!!!!!!!!
Why are we even doing this?
Processing
“Shift Left” with Stream Processing
Expensive & Inefficient
The same data is cleaned, transformed and enriched
multiple locations (often using legacy technologies)
Varying Degree of Staleness
Every downstream system receives and processes
the same data at different times/different time
intervals and provides slightly different semantics.
This leads to inconsistencies and degraded customer
experience downstream.
Complex & Error Prone
The same business logic needs to be maintain at
multiple places, multiple processing engines need to
be maintained and operated.
Databases
Custom Apps
SaaS
Processing
Processing
Database
DWH
Data Lake
Queries
Analytics
Interactions
Processing
“Shift Left” with Stream Processing
Expensive & Inefficient
The same data is cleaned, transformed and enriched
multiple locations (often using legacy technologies)
Varying Degree of Staleness
Every downstream system receives and processes
the same data at different times/different time
intervals and provides slightly different semantics.
This leads to inconsistencies and degraded customer
experience downstream.
Complex & Error Prone
The same business logic needs to be maintain at
multiple places, multiple processing engines need to
be maintained and operated.
Databases
Custom Apps
SaaS
Processing
Processing
Database
DWH
Data Lake
Queries
Analytics
Interactions
Shift Left
Processing
Processing
Pitch “Shift Left” with Stream Processing
Cost Efficient
Data is only processed once in a single place and it is
processed continuously spreading the work over
time.
Fresh Data Everywhere
All application are supplied with equally fresh data
and represent the current state of your business.
Reusable & Consistent
Data is processed continuously and meets the
latency requirements of the most demanding
consumers which further increases reusability.
Databases
Custom Apps
SaaS
Database
DWH
Data Lake
Queries
Analytics
Interactions
Processing
What can we achieve?
What can I do with Flink?
26
Data Exploration Data Pipelines Real-time Apps
Engineers and Analysts both
need to be able to simply read
and understand the event
streams stored in Kafka
● Metadata discovery
● Throughput analysis
● Data sampling
● Interactive query
Data pipelines are used to
enrich, curate, and transform
events streams, creating new
derived event streams
● Filtering
● Joins
● Projections
● Aggregations
● Flattening
● Enrichment
Whole ecosystems of apps
feed on event streams
automating action in real-time
● Account360
● Next Best Call
● Quality of Service
● Fraud detection
● Intelligent routing
● Abandoned Shopping Cart
Build streaming
data pipelines to
inform real-time
decision making
Create new enriched and curated
streams of higher value using:
● Data transformations
● Streaming joins, temporal joins,
lookup joins, and versioned joins
● Fan out queries, multi-cluster queries
27
t1, 21.5 USD
t3, 55 EUR
t5, 35.3 EUR
t0, EUR:USD=1.00
t2, EUR:USD=1.05
t4: EUR:USD=1.10
t1, 21.5 USD
t3, 57.75 USD
t5, 38.83 USD
Currency rate
Orders
STREAMING DATA PIPELINES
Recognize
patterns and react
to events in a
timely manner
C
price>lag(price)
D
price<lag(price)
C
price>lag(price)
B
price<lag(price)
A
Double Bottom
Period & Volume
Price
Develop applications using fine-grained
control over how time progresses and
data is grouped together using:
● Hopping, tumbling, session windows
● OVER aggregations
● Pattern matching with
MATCH_RECOGNIZE
Pattern Detections
SELECT *
FROM Ticker
MATCH_RECOGNIZE (
PARTITION BY symbol
ORDER BY rowtime
MEASURES
START_ROW.rowtime AS start_tstamp,
LAST(PRICE_DOWN.rowtime) AS bottom_tstamp,
LAST(PRICE_UP.rowtime) AS end_tstamp
ONE ROW PER MATCH
AFTER MATCH SKIP TO LAST PRICE_UP
PATTERN (START_ROW PRICE_DOWN+ PRICE_UP)
DEFINE
PRICE_DOWN AS
(LAST(PRICE_DOWN.price, 1) IS NULL AND PRICE_DOWN.price <
START_ROW.price) OR
PRICE_DOWN.price < LAST(PRICE_DOWN.price, 1),
PRICE_UP AS
PRICE_UP.price > LAST(PRICE_DOWN.price, 1)
) MR;
Analyze real-time
data streams to
generate
important
business insights
Get up-to-date results to power
dashboards or applications requiring
continuous updates using:
● Materialized views
● Temporal analytic functions
● Interactive queries
Account Balance
A $15
B $2
C $15
Account A, +$10
Account B, +$12
Account C, +$5
Account B, -$10
Account C, +$10
Account A, -$5
Account A, +$10
Time
REAL-TIME ANALYTICS
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales
DESC) AS row_num
FROM ShopSales)
WHERE row_num <= 5
Why Flink?
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Developers choose Flink because of its performance
and rich feature set
Flink’s powerful runtime offers limitless scalability
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel Tasks
Trigger Checkpoints
Submit Job
Results
Applications are parallelized into possibly
thousands of tasks that are distributed and
concurrently executed in a cluster
Leverage in-memory performance
. . .
Durable
Storage
Logic State Logic State Logic State
Input
Tasks
Output
In-Memory or
On-Disk State
Local State
Access
Periodic, Asynchronous,
Incremental Snapshots
Stateful Flink applications are optimized for fast access to local state by maintaining task
state in memory or on-disk data structures, resulting in low latency processing.
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Developers choose Flink because of its performance
and rich feature set
Flink checkpoints and savepoints enable fault
tolerance and stateful processing
CHECKPOINTS SAVEPOINTS
Automatic snapshot
created by Flink periodically
● Used to recover from failures
● Optimized for quick recovery
● Automatically created and managed
by Flink
User-triggered snapshot at
a specific point in time
● Enables manual operational tasks,
such as upgrades
● Optimized for operational flexibility
● Created and managed by the user
Flink recovers from failures in a timely and efficient
manner
Job Manager
Client
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
. . . . . .
Task Slot
Data Streams
Deploy, Stop, Cancel Tasks
Trigger Checkpoints
Submit Job
Results
If a task managers fails, the job manager will
detect the failure and arrange for the job to be
restarted from the most recent state snapshot
X
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Developers choose Flink because of its performance
and rich feature set
Flink offers layered APIs at different levels of of
abstraction to handle both common and specialized
use cases
Flink SQL
Table API
DataStream API
ProcessFunction Apache Flink Runtime
Low-level Stream Operator API
DataStream API
ProcessFunction
Table / SQL API
Optimizer /
Planner
Level
of
Abstraction
How
the
Code
is
Organized
Flink SQL
High-level, declarative API that allows you to write SQL
queries to process data streams and batch data as dynamic
tables
Table API
Programmatic equivalent of Flink SQL, allowing you to
define your business logic in either Java or Python, or
combine it with SQL
DataStream API
Low-level, expressive API that exposes the building blocks
for stream processing, giving you direct access to things like
state and timers
ProcessFunction
The most low-level API, allowing for fine-grained processing
of individual elements for complex event-driven processing
logic and state management
Flink SQL is an ANSI-compliant SQL
engine that can define both simple
and complex queries, making it
well-suited for most stream processing
use cases, particularly building
real-time data products and pipelines.
GROUP BY color
events
results
COUNT
WHERE color <> orange
4
3
Streaming made
simple.
INSERT INTO enriched_reviews
SELECT id
, review
, invoke_openai(prompt,review)
as score
FROM product_reviews
;
K
N
B
Kate
4 hours ago
This was the worst decision ever.
Nikola
1 day ago
Not bad. Could have been
cheaper.
Brian
3 days ago
Amazing! Game Changer!
K
N
B
Kate
★★★★★ 4 hours ago
This was the worst decision ever.
Nikola
★★★★★ 1 day ago
Not bad. Could have been
cheaper.
Brian
★★★★★ 3 days ago
Amazing! Game Changer!
The Prompt
“Score the following text on a scale of 1
and 5 where 1 is negative and 5 is
positive returning only the number”
DATA STREAMING PLATFORM
Enrich real-time data streams with Generative AI
directly from Flink SQL
COMING SOON
Scalability and
Performance
Fault
Tolerance
Flink is a top 5 Apache project and boasts a robust developer community
Unified
Processing
Flink is capable of
supporting stream
processing workloads
at tremendous scale
Language
Flexibility
Flink's fault tolerance
mechanisms ensure it
can handle failures
effectively and provide
high availability
Flink supports Java,
Python, & SQL with
150+ built-in functions,
enabling devs to work
in their language of
choice
Flink supports stream
processing, batch
processing, and ad-hoc
analytics through one
technology
Developers choose Flink because of its performance
and rich feature set
Flink supports unified stream and batch processing
● Entire pipeline must always be running ● Execution proceeds in stages, running as needed
● Input must be processed as it arrives ● Input may be pre-sorted by time and key
● Results are reported as they become ready ● Results are reported at the end of the job
● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart
● Flink guarantees effectively exactly-once results
despite out-of-order data and restarts due to
failures, etc.
● Effectively exactly-once guarantees are more
straightforward
Flink SQL operators work across both stream and batch
processing modes
STREAMING AND BATCH
BATCH ONLY
• SELECT FROM [WHERE]
• GROUP BY [HAVING]
(includes time-based windowing)
• OVER aggregations
(including Top-N and Deduplication queries)
• INNER + OUTER JOINs
• MATCH_RECOGNIZE (pattern matching)
• Set Operations
• User-Defined Functions
• Statement Sets
STREAMING ONLY
• ORDER BY time ascending only
• INNER JOIN with
Temporal (versioned) table
External lookup table
• ORDER BY anything
Enhancing Apache Flink as a
cloud-native service
Deployment
Complexity
Setting up Flink requires a
deep understanding of
resource allocation and
management
Management &
Monitoring
Identifying relevant metrics
can be overwhelming for
DevOps teams
Incomplete
Ecosystem
OSS Flink lacks pre-built
integrations with observability,
metadata management, data
governance, and security tooling
Cost &
Risk
Self-supporting Flink incurs
significant costs &
resources in terms of infra
footprint and Dev & Ops
FTEs
However, operating Flink on your own (along with
Kafka) is difficult
Months Minutes
Weeks
Open Source
Apache Flink
In-house development and
maintenance without support
Cloud-hosted
Flink services
Manual Day 2 operations with
basic tooling and/or support
Apache Flink on
Confluent Cloud
Fully managed, elastic,
and automated product
capabilities with zero overhead
Go from zero to production in minutes versus months
Real-time processing
Power low-latency applications and pipelines that react to
real-time events and provide timely insights
Data reusability
Share consistent and reusable data streams widely with
downstream applications and systems
Data enrichment
Curate, filter, and augment data on-the-fly with additional
context to improve completeness, accuracy, & compliance
Efficiency
Improve resource utilization and cost-effectiveness by
avoiding redundant processing across silos
Effortlessly filter, join, and enrich your data streams with Apache Flink
“With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart
cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This
enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security
incidents without any added operational burden.”
"When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream
processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of
infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that
differentiate the business."
Enterprise-grade security
Secure stream processing with built-in identity and access
management, RBAC, and audit logs
Stream governance
Enforce data policies and avoid metadata duplication
leveraging native integration with Stream Governance
Monitoring
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Connectors
Ensure the health and uptime of your Flink queries in the
Confluent UI or via 3rd party monitoring services
Experience Kafka and Flink seamlessly integrated as a unified platform
Monitoring Connectors
Enterprise-grade
Security
Stream
Governance
Fully managed
Easily develop Flink applications with a serverless, SaaS-
based experience instantly available & without ops burden
Elastic scalability
Automatically scale up or down to meet the demands of the
most complex workloads without overprovisioning
Usage-based billing
Pay only for resources used instead of infrastructure
provisioned, with scale-to-zero pricing
Continuous, no touch updates
Build using an always up-to-date platform with declarative,
versionless APIs and interfaces
Enable high-performance and efficient stream processing at any scale
Throughput Over Time
Capacity Demand
"When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream
processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of
infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that
differentiate the business."
SQL client in Confluent
Cloud CLI
Different teams with different skills and needs
can access stream processing using the interface of their choice
Rich SQL editing
user interface
Tap into a next-generation, serverless SQL experience …
Select region(s) to create a
compute pool
1
Role bindings automatically
created for you
2
Start processing in Flink
3
…automatically provisioned and instantly available
Flink
Santander Stream Processing with Apache Flink

More Related Content

Similar to Santander Stream Processing with Apache Flink

Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
confluent
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
confluent
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Michael Noll
 

Similar to Santander Stream Processing with Apache Flink (20)

Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
DIMT '23 Session_Demo_ Latest Innovations Breakout.pdf
DIMT '23 Session_Demo_ Latest Innovations Breakout.pdfDIMT '23 Session_Demo_ Latest Innovations Breakout.pdf
DIMT '23 Session_Demo_ Latest Innovations Breakout.pdf
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
Partner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - AprilPartner Connect APAC - 2022 - April
Partner Connect APAC - 2022 - April
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniertFast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
 
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and ImplyAchieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
 
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
 
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
 
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent RamièreAu delà des brokers, un tour de l’environnement Kafka | Florent Ramière
Au delà des brokers, un tour de l’environnement Kafka | Florent Ramière
 
Modernizing your Application Architecture with Microservices
Modernizing your Application Architecture with MicroservicesModernizing your Application Architecture with Microservices
Modernizing your Application Architecture with Microservices
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
Now You See Me, Now You Compute: Building Event-Driven Architectures with Apa...
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 

Recently uploaded

Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Marc Lester
 

Recently uploaded (20)

Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Jax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined DeckJax, FL Admin Community Group 05.14.2024 Combined Deck
Jax, FL Admin Community Group 05.14.2024 Combined Deck
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
Salesforce Introduced Zero Copy Partner Network to Simplify the Process of In...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
The Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test AutomationThe Strategic Impact of Buying vs Building in Test Automation
The Strategic Impact of Buying vs Building in Test Automation
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
Malaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptxMalaysia E-Invoice digital signature docpptx
Malaysia E-Invoice digital signature docpptx
 

Santander Stream Processing with Apache Flink

  • 1. Stream Processing with Apache Flink® Flink Overview 101 +
  • 2. 01 02 03 04 05 Intro : The world has changed..some news Understanding the importance of stream processing Why Apache Flink is becoming the de facto standard Enhancing Apache Flink as a cloud-native service Questions Agenda
  • 4. Some things you should know about us https://discover.confluent.io/santander-and-confluent-better-together Our Landing page Aun hay mas!!! Coming soon GEN IA & Confluent webinar 7 May 2024 Apuntate ya !!!!
  • 5. Some - Quick update from Kafka summit london
  • 6. Apache Flink en Confluent Cloud entra en modo General Available: Anunciamos la disponibilidad general de nuestro servicio totalmente gestionado de Confluent Cloud para Apache Flink®, proporcionando a los clientes una verdadera solución multicloud para implementar procesamiento de streams de eventos en cualquiera de los tres principales CSP's donde residan los datos y aplicaciones. Podéis ver más detalles en el blog post. Presentación de Tableflow: Presentamos nuestra visión para Tableflow, una nueva función en la plataforma de streaming de eventos de Confluent Cloud que permite a los usuarios convertir tópicos de Apache Kafka y sus esquemas asociados en tablas de Apache Iceberg® con un solo clic para abastecer mejor a los data lakes y data warehouses. Tableflow estará disponible en modo early access privado próximamente. Kora, el motor de Confluent Cloud, es más rápido que nunca: Revelamos que Kora, el motor que alimenta el servicio de Kafka nativo cloud de Confluent Cloud, ahora es 16 veces más rápido que Kafka opensource. Mejora de conectores : Se presentaron nuevas mejoras al portafolio de conectores totalmente gestionados de Confluent, que incluyen DNS Forwarding y puntos de acceso de Egress privados. Además se anuncia un SLA mejorado del 99.99%. Mejora de Stream Governance : El servicio de Stream Governance en Confluent sigue evolucionando y ahora está habilitado de forma predeterminada en todos los entornos con un SLA de tiempo de actividad del 99.99% para Schema Registry. Además, anunciamos que la cobertura regional para Stream Governance se expandirá a todas las regiones de Confluent Cloud, proporcionando un acceso más fluido a las funcionalidades exclusivas de gobernanza.
  • 7. Why …. Why if we can unified both worlds ?
  • 8. Understanding the importance of stream processing
  • 9. Enable frictionless access to up-to-date trustworthy data products Share Reimagine data streaming everywhere, on-prem and in every major public cloud Stream Make data in motion self-service, secure, compliant and trustworthy Govern Drive greater data reuse with always-on stream processing Process Make it easy to on-ramp and off-ramp data from existing systems and apps Connect Stream processing is a critical part of data streaming
  • 10. DATA IN MOTION Streaming Applications Apache Flink Apache Kafka DATA AT REST Application Layer Compute Layer Storage Layer Traditional Databases File Systems Web Applications Stream processing acts as the compute layer to Kafka, powering real-time applications & pipelines
  • 11. Processing Kafka Custom apps 3rd party apps Databases Database Data Warehouse SaaS app Queries Analytics Interactions Processing Processing Processing down stream of Kafka increases latency, adds costs and redundancy, and inhibits data reuse Increased complexity from redundant processing Data systems & applications built on stale data Expensive & inefficient to clean and enrich data multiple times
  • 12. Custom apps 3rd party apps Databases Database Data Warehouse SaaS app Queries Analytics Interactions Processing data at ingest improves latency, data portability, and cost effectiveness Maximized data reusability & consistency Improved cost-efficiency from cleaning & enriching data once Real-time apps & data systems reflect current state Kafka Storage Flink Compute Stream Processing Process your data once, process your data right
  • 13. Heatmap service Payment service Supply chain systems Watch lists Profile mgmt Incident mgmt Customer profile data ITSM systems Central log systems Fraud & SIEM systems Alerting systems AI/ML engines Visualization apps Threat vector Transactions Payments Mainframe data Inventory Weather Telemetry IoT data Notification engine Payroll systems CRM systems Mobile application Personalization Web application Clickstreams Customer loyalty Change logs Customer data Recommendation engine Stream processing enables users to filter, join, and enrich streams on-the-fly to drive greater data reuse
  • 14. Why Apache Flink is becoming the de facto standard
  • 15. 0 50,000 100,000 150,000 2020 2021 2022 2016 2017 2018 Flink Kafka Two Apache Projects, Born a Few Years Apart Monthly Unique Users Flink growth has mirrored the growth of Kafka, the de facto standard for streaming data >75% of the Fortune 500 estimated to be using Kafka >100,000+ orgs using Kafka >41,000 Kafka meetup attendees >750 Kafka Improvement Proposals >12,000 Jiras for Apache Kafka
  • 16. Innovative companies have adopted both Kafka & Flink
  • 17. Digital natives leverage Flink to disrupt markets and gain competitive advantage UBER: Real-time Pricing NETFLIX: Personalized Recs STRIPE: Real-time Fraud Detection
  • 18. Lets have some fun !!! Who is who? 463 M 30B Gaming ● WHY ● SQL drive 90% uses cases with less than 50% TTM ● SQL make stream analytics accessible for all data scientist that allows to create insights connected to revenue ● Keep & Play with statesto deep dive on business questions ● Eg: Revenue x level ● Game user counts RT ● Extremely auto-scalable ● Low latency ● Control temporal de eventos ● Wide variety of handling Script window ● Fault tolerance at the style :Check points & save points
  • 19. Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology SO basically Customers choose Flink because of its performance and rich feature set
  • 21. Why are we even doing this?
  • 22. Processing “Shift Left” with Stream Processing Expensive & Inefficient The same data is cleaned, transformed and enriched multiple locations (often using legacy technologies) Varying Degree of Staleness Every downstream system receives and processes the same data at different times/different time intervals and provides slightly different semantics. This leads to inconsistencies and degraded customer experience downstream. Complex & Error Prone The same business logic needs to be maintain at multiple places, multiple processing engines need to be maintained and operated. Databases Custom Apps SaaS Processing Processing Database DWH Data Lake Queries Analytics Interactions
  • 23. Processing “Shift Left” with Stream Processing Expensive & Inefficient The same data is cleaned, transformed and enriched multiple locations (often using legacy technologies) Varying Degree of Staleness Every downstream system receives and processes the same data at different times/different time intervals and provides slightly different semantics. This leads to inconsistencies and degraded customer experience downstream. Complex & Error Prone The same business logic needs to be maintain at multiple places, multiple processing engines need to be maintained and operated. Databases Custom Apps SaaS Processing Processing Database DWH Data Lake Queries Analytics Interactions Shift Left Processing
  • 24. Processing Pitch “Shift Left” with Stream Processing Cost Efficient Data is only processed once in a single place and it is processed continuously spreading the work over time. Fresh Data Everywhere All application are supplied with equally fresh data and represent the current state of your business. Reusable & Consistent Data is processed continuously and meets the latency requirements of the most demanding consumers which further increases reusability. Databases Custom Apps SaaS Database DWH Data Lake Queries Analytics Interactions Processing
  • 25. What can we achieve?
  • 26. What can I do with Flink? 26 Data Exploration Data Pipelines Real-time Apps Engineers and Analysts both need to be able to simply read and understand the event streams stored in Kafka ● Metadata discovery ● Throughput analysis ● Data sampling ● Interactive query Data pipelines are used to enrich, curate, and transform events streams, creating new derived event streams ● Filtering ● Joins ● Projections ● Aggregations ● Flattening ● Enrichment Whole ecosystems of apps feed on event streams automating action in real-time ● Account360 ● Next Best Call ● Quality of Service ● Fraud detection ● Intelligent routing ● Abandoned Shopping Cart
  • 27. Build streaming data pipelines to inform real-time decision making Create new enriched and curated streams of higher value using: ● Data transformations ● Streaming joins, temporal joins, lookup joins, and versioned joins ● Fan out queries, multi-cluster queries 27 t1, 21.5 USD t3, 55 EUR t5, 35.3 EUR t0, EUR:USD=1.00 t2, EUR:USD=1.05 t4: EUR:USD=1.10 t1, 21.5 USD t3, 57.75 USD t5, 38.83 USD Currency rate Orders STREAMING DATA PIPELINES
  • 28. Recognize patterns and react to events in a timely manner C price>lag(price) D price<lag(price) C price>lag(price) B price<lag(price) A Double Bottom Period & Volume Price Develop applications using fine-grained control over how time progresses and data is grouped together using: ● Hopping, tumbling, session windows ● OVER aggregations ● Pattern matching with MATCH_RECOGNIZE Pattern Detections SELECT * FROM Ticker MATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY rowtime MEASURES START_ROW.rowtime AS start_tstamp, LAST(PRICE_DOWN.rowtime) AS bottom_tstamp, LAST(PRICE_UP.rowtime) AS end_tstamp ONE ROW PER MATCH AFTER MATCH SKIP TO LAST PRICE_UP PATTERN (START_ROW PRICE_DOWN+ PRICE_UP) DEFINE PRICE_DOWN AS (LAST(PRICE_DOWN.price, 1) IS NULL AND PRICE_DOWN.price < START_ROW.price) OR PRICE_DOWN.price < LAST(PRICE_DOWN.price, 1), PRICE_UP AS PRICE_UP.price > LAST(PRICE_DOWN.price, 1) ) MR;
  • 29. Analyze real-time data streams to generate important business insights Get up-to-date results to power dashboards or applications requiring continuous updates using: ● Materialized views ● Temporal analytic functions ● Interactive queries Account Balance A $15 B $2 C $15 Account A, +$10 Account B, +$12 Account C, +$5 Account B, -$10 Account C, +$10 Account A, -$5 Account A, +$10 Time REAL-TIME ANALYTICS SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS row_num FROM ShopSales) WHERE row_num <= 5
  • 31. Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Developers choose Flink because of its performance and rich feature set
  • 32. Flink’s powerful runtime offers limitless scalability Job Manager Client . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot Data Streams Deploy, Stop, Cancel Tasks Trigger Checkpoints Submit Job Results Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster
  • 33. Leverage in-memory performance . . . Durable Storage Logic State Logic State Logic State Input Tasks Output In-Memory or On-Disk State Local State Access Periodic, Asynchronous, Incremental Snapshots Stateful Flink applications are optimized for fast access to local state by maintaining task state in memory or on-disk data structures, resulting in low latency processing.
  • 34. Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Developers choose Flink because of its performance and rich feature set
  • 35. Flink checkpoints and savepoints enable fault tolerance and stateful processing CHECKPOINTS SAVEPOINTS Automatic snapshot created by Flink periodically ● Used to recover from failures ● Optimized for quick recovery ● Automatically created and managed by Flink User-triggered snapshot at a specific point in time ● Enables manual operational tasks, such as upgrades ● Optimized for operational flexibility ● Created and managed by the user
  • 36. Flink recovers from failures in a timely and efficient manner Job Manager Client . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot . . . . . . Task Slot Data Streams Deploy, Stop, Cancel Tasks Trigger Checkpoints Submit Job Results If a task managers fails, the job manager will detect the failure and arrange for the job to be restarted from the most recent state snapshot X
  • 37. Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Developers choose Flink because of its performance and rich feature set
  • 38. Flink offers layered APIs at different levels of of abstraction to handle both common and specialized use cases Flink SQL Table API DataStream API ProcessFunction Apache Flink Runtime Low-level Stream Operator API DataStream API ProcessFunction Table / SQL API Optimizer / Planner Level of Abstraction How the Code is Organized Flink SQL High-level, declarative API that allows you to write SQL queries to process data streams and batch data as dynamic tables Table API Programmatic equivalent of Flink SQL, allowing you to define your business logic in either Java or Python, or combine it with SQL DataStream API Low-level, expressive API that exposes the building blocks for stream processing, giving you direct access to things like state and timers ProcessFunction The most low-level API, allowing for fine-grained processing of individual elements for complex event-driven processing logic and state management
  • 39. Flink SQL is an ANSI-compliant SQL engine that can define both simple and complex queries, making it well-suited for most stream processing use cases, particularly building real-time data products and pipelines. GROUP BY color events results COUNT WHERE color <> orange 4 3 Streaming made simple.
  • 40. INSERT INTO enriched_reviews SELECT id , review , invoke_openai(prompt,review) as score FROM product_reviews ; K N B Kate 4 hours ago This was the worst decision ever. Nikola 1 day ago Not bad. Could have been cheaper. Brian 3 days ago Amazing! Game Changer! K N B Kate ★★★★★ 4 hours ago This was the worst decision ever. Nikola ★★★★★ 1 day ago Not bad. Could have been cheaper. Brian ★★★★★ 3 days ago Amazing! Game Changer! The Prompt “Score the following text on a scale of 1 and 5 where 1 is negative and 5 is positive returning only the number” DATA STREAMING PLATFORM Enrich real-time data streams with Generative AI directly from Flink SQL COMING SOON
  • 41. Scalability and Performance Fault Tolerance Flink is a top 5 Apache project and boasts a robust developer community Unified Processing Flink is capable of supporting stream processing workloads at tremendous scale Language Flexibility Flink's fault tolerance mechanisms ensure it can handle failures effectively and provide high availability Flink supports Java, Python, & SQL with 150+ built-in functions, enabling devs to work in their language of choice Flink supports stream processing, batch processing, and ad-hoc analytics through one technology Developers choose Flink because of its performance and rich feature set
  • 42. Flink supports unified stream and batch processing ● Entire pipeline must always be running ● Execution proceeds in stages, running as needed ● Input must be processed as it arrives ● Input may be pre-sorted by time and key ● Results are reported as they become ready ● Results are reported at the end of the job ● Failure recovery resumes from a recent snapshot ● Failure recovery does a reset and full restart ● Flink guarantees effectively exactly-once results despite out-of-order data and restarts due to failures, etc. ● Effectively exactly-once guarantees are more straightforward
  • 43. Flink SQL operators work across both stream and batch processing modes STREAMING AND BATCH BATCH ONLY • SELECT FROM [WHERE] • GROUP BY [HAVING] (includes time-based windowing) • OVER aggregations (including Top-N and Deduplication queries) • INNER + OUTER JOINs • MATCH_RECOGNIZE (pattern matching) • Set Operations • User-Defined Functions • Statement Sets STREAMING ONLY • ORDER BY time ascending only • INNER JOIN with Temporal (versioned) table External lookup table • ORDER BY anything
  • 44. Enhancing Apache Flink as a cloud-native service
  • 45. Deployment Complexity Setting up Flink requires a deep understanding of resource allocation and management Management & Monitoring Identifying relevant metrics can be overwhelming for DevOps teams Incomplete Ecosystem OSS Flink lacks pre-built integrations with observability, metadata management, data governance, and security tooling Cost & Risk Self-supporting Flink incurs significant costs & resources in terms of infra footprint and Dev & Ops FTEs However, operating Flink on your own (along with Kafka) is difficult
  • 46. Months Minutes Weeks Open Source Apache Flink In-house development and maintenance without support Cloud-hosted Flink services Manual Day 2 operations with basic tooling and/or support Apache Flink on Confluent Cloud Fully managed, elastic, and automated product capabilities with zero overhead Go from zero to production in minutes versus months
  • 47. Real-time processing Power low-latency applications and pipelines that react to real-time events and provide timely insights Data reusability Share consistent and reusable data streams widely with downstream applications and systems Data enrichment Curate, filter, and augment data on-the-fly with additional context to improve completeness, accuracy, & compliance Efficiency Improve resource utilization and cost-effectiveness by avoiding redundant processing across silos Effortlessly filter, join, and enrich your data streams with Apache Flink “With Confluent’s fully managed Flink offering, we can access, aggregate, and enrich data from IoT sensors, smart cameras, and Wi-Fi analytics, to swiftly take action on potential threats in real time, such as intrusion detection. This enables us to process sensor data as soon as the events occur, allowing for faster detection and response to security incidents without any added operational burden.”
  • 48. "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that differentiate the business." Enterprise-grade security Secure stream processing with built-in identity and access management, RBAC, and audit logs Stream governance Enforce data policies and avoid metadata duplication leveraging native integration with Stream Governance Monitoring Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Connectors Ensure the health and uptime of your Flink queries in the Confluent UI or via 3rd party monitoring services Experience Kafka and Flink seamlessly integrated as a unified platform Monitoring Connectors Enterprise-grade Security Stream Governance
  • 49. Fully managed Easily develop Flink applications with a serverless, SaaS- based experience instantly available & without ops burden Elastic scalability Automatically scale up or down to meet the demands of the most complex workloads without overprovisioning Usage-based billing Pay only for resources used instead of infrastructure provisioned, with scale-to-zero pricing Continuous, no touch updates Build using an always up-to-date platform with declarative, versionless APIs and interfaces Enable high-performance and efficient stream processing at any scale Throughput Over Time Capacity Demand "When used in combination, Apache Flink & Apache Kafka can enable data reusability and avoid redundant downstream processing. The delivery of Flink & Kafka as fully managed services delivers stream processing without the complexities of infrastructure management, enabling teams to focus on building real-time streaming applications & pipelines that differentiate the business."
  • 50. SQL client in Confluent Cloud CLI Different teams with different skills and needs can access stream processing using the interface of their choice Rich SQL editing user interface Tap into a next-generation, serverless SQL experience …
  • 51. Select region(s) to create a compute pool 1 Role bindings automatically created for you 2 Start processing in Flink 3 …automatically provisioned and instantly available
  • 52. Flink