"The shift from batch processing to real-time processing of data is accelerating. Building real-time data applications is a necessity for many businesses as customers expect data to be always up-to-date and their apps to react to changes as they happen. However building and productizing real-time applications is often a complex and lengthy process due to limited serverless options to build such apps.
The introduction of AWS lambdas was a watershed moment in the world of cloud computing. It allowed developers to fire up “fully-managed” computer programs while paying for only when the program ran. Serverless compute comes with three big advantages - improved scalability, reduced cost, and increased flexibility. We’re bringing this same powerful paradigm to real time data processing with Flink in Confluent Cloud. Using this model, users can focus on writing business logic instead of managing nodes and other infrastructure.
Attendees will learn the benefits of serverless and see how it fits into the context of stream processing. We’ll then kick off a demo where we’ll focus on a real world production use case that uses Flink jobs to power an application with extremely low latency."
Why Serverless Flink Matters - Blazing Fast Stream Processing Made Scalable
1. Why Serverless Flink Matters -
Blazing Fast Stream Processing
Made Scalable
1
Mayank Juneja, Confluent
Jean-Sébastien Brunner, Confluent
2. Agenda
2
1. Stream Processing
a. Overview and Challenges
2. Why Flink?
a. Why is it so popular?
b. Challenges of self managed Flink
3. Serverless Flink + Demo
3. Real-time
Data
A Sale
A Shipment
A Trade
Rich Front-End
Customer Experiences
A Customer
Experience
Real-Time Backend
Operations
Stream processing is computing over unbounded
streams of data
Stream
Processor
4. Stream processing use cases
4
Data Exploration Data Pipelines Real-time Apps
Engineers and Analysts both
need to be able to simply read
and understand the event
streams stored in Kafka
● Metadata discovery
● Throughput analysis
● Data sampling
● Interactive query
Data pipelines are used to
enrich, curate, and transform
events streams, creating new
derived event streams
● Filtering
● Joins
● Projections
● Aggregations
● Flattening
● Enrichment
Whole ecosystems of apps
feed on event streams
automating action in real-time
● Threat detection
● Quality of Service
● Fraud detection
● Intelligent routing
● Alerting
5. Challenges with Stream Processing
Ordering and
Timing
State
Management
Fault
Tolerance
Scalability
How do you
handle
out-of-order and
late events?
How do you scale
for unexpected
large throughput?
Do you need
exactly-once
semantics?
How do you
manage state in
a distributed
environment?
7. What’s great about Apache Flink?
Scalability Language Flexibility Unified Processing
Flink is capable of supporting
stream processing workloads
at hyper scale, as evidenced by
its broad adoption by leading
digital native companies
Flink supports Java, Python, &
SQL without making major
tradeoffs in functionality,
enabling developers to work in
their language of choice
Flink supports stream
processing and batch
processing through one
technology, rather than
needing separate tools
Flink is a top 5 Apache project and is leveraged as the stream processing engine for >25% of Kafka users
8. Stream Processing with Flink
Ordering
and Timing
State
Management
Fault
Tolerance
Scalability &
Performance
● Event time
processing
● Watermarks
● Elastic scale out
● Network Traffic
Optimization
● Backpressure
Handling
Challenges
Flink
Features
● Local and
in-memory
states for all
computations
● Exactly once
semantics
● Distributed
snapshots /
checkpoints
10. Self Managed Flink comes with its own challenges
Configuration
and Setup
Monitoring Cost
Management
Security
11. - Resource allocation: Provisioning
resources (CPU, memory, storage)
for each Flink job can be a
complicated task
- Dependency management
- Connectors, databases
- Configuration
- Standalone vs k8s vs YARN,
Application mode vs Session
mode
Challenge #1 - Configuration and Setup
12. Challenge #2 - Monitoring and Maintenance
- Metrics:
- Filtering down to the most
relevant metrics for your
application can be
overwhelming
- Version Upgrades
- Upgrading Flink versions esp
when ensuring backward
compatibility is a pain
- Disaster recovery
- Needs regular backups,
checkpointing, savepoints
Flink downloads, Mar 2023
13. Challenge #3: Total Cost of Ownership
- Hardware costs: Significant
investments required for
managing hardware costs - can
be underutilized
- Expertise: Hiring of skilled
professionals who can set up,
manage and maintain Flink
- Opportunity Cost: Less time
spent on developing core
product or service
14. Challenge #4 - Security
- RBAC: Flink lacks built-in
capabilities for granular role based
access control
- Encryption: Data encryption at
rest for Flink state backends does
not come out of the box
- Multi-tenancy: Insufficient
capabilities to support multi
tenancy within the same cluster
16. 1
6
Powerful SQL Streaming Operators
Time windows Pattern Matching Streaming Joins
● Time-based windows
● Event-density windows
● Event-based windows: every single
event can trigger a new window
● Complex Event
Processing
● Stream-to-stream joins
● Temporal joins
● Lookup joins
● Versioned joins
etc.
17. Solution - Serverless Flink
- Evergreen runtime: once you submit a job it can run 24.7 and you don't
need to take care of any upgrade (security patch, Flink, etc.), it just runs.
- Elastic autoscaling of the compute pools:
- Elastic scale up of the pool, with a user-defined maximum
- Elastic scale down of the pool, with scale-to-zero when nothing runs
- Usage based billing
- Separation of compute (Flink) and storage (Kafka)
- Scale independently to get best best cost and best performance
- Optimization of communications for even increased
cost/performance
18. Autoscale and monitoring at the job level
● Per task dynamic scaling
○ Rescale based on backpressure and utilization of the vertices, not only
based on CPU or infrastructure-level metrics
○ Take into account the throughput from the source
● Job level metrics and monitoring
●
19. Flink and Kafka closer together
High
bandwidth
Low
bandwidth
Flink and Kafka are closer
together, allowing to reduce:
● Latency
● Network cost
With Fetch-from-follower the
optimization can be done at the
Availability Zone level.
Confluent Cloud
High
bandwidth
21. Apache Flink in Confluent Cloud
2
1
Serverless Flink SQL
Rich Experience
Complete and Secure
● ANSI-SQL with powerful streaming operators
● Rich CLI Experience
● SQL Editor with "workspaces"
● Integration with Schema Registry and
Governance
● Support for user-authentication and Service
account
+
+
22. Support various use-cases and Personas
Developers Data Analysts Data Engineers
Languages Java & SQL ANSI SQL SQL & Python
Tools
Use Cases Streaming Apps Data Exploration Data Pipelines
IDE & SQL CLI Notebooks
UI / BI / JDBC