Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

Company/speaker
Presentation title
#teqnation2021
CO-SPONSORS
MAIN SPONSOR
Why and when should we consider
Stream Processing
in our solutions
Soroosh Khodami
May 17 2023 @ Teqnation

Agenda
What is Stream Processing?
Frameworks & Platforms
Basic Concepts & Patterns
Demo Time
Benefits & Drawbacks + Considerations
Use Cases For Different Industries
How to start ?

This Talk is For
Software Developers
Tech Leads / Software Architects
Data Engineers / Data Scientist / AI Engineers
Product Owners / Product Managers / Business Analysts

$ whoami
 I’m Soroosh Khodami
 Full-Stack Developer at Bol.com & Code Nomads
 Working with Stream Processing at Scale in Bol.com
 Software Architecture Enthusiastic
@SorooshKh linkedin.com/in/sorooshkhodami/
Slides & Code Repository Link Will Be Shared At The End

What is Stream Processing?
Event Processing?
Event Driven?

Ref: https://en.wikipedia.org/wiki/Stream_processing
Wikipedia Definition

Stream (Data) Processing
Stream processing is a big data technique that focuses on
continuously reading data, processing the data individually
or joining it with related data sets in real-time or near real-
time, and then sending the output to other applications,
data-stores, or systems.

Event Processing
Trigger Actions
Decision Making
Event
Payment Received

Stream Processing
Frameworks & Platforms

Stream Processing Universe
2023

2023
Code will be executed on a Runner Standalone / Alongside other frameworks

2023
Cloud Platforms

Hardened at Scale
Powered By Flink https://flink.apache.org/powered-by/

+ Examples
Stream Processing

Bounded Stream / Unbounded Stream
Time
Now
Past Future
Unbounded Stream
Bounded Stream #1
Start End
Time
Now
Past Future
Bounded Stream #2
Start End

Event Time & Processing Time
Processing
Time
Event Time
1
Login
1 2 3 4 5 6 7
2
Search
3
View
4
View
5
View
6
Play
1
Login
2
Search
3
View
4
View
5
View
6
Play
1 2 3 4 5 6 7

Delivery Guarantees
Learn More (Important)
Streaming Concepts - Exactly Once Fault Tolerance Guarantees youtube.com/watch?v=9pRsewtSPkQ
Rundown of Flink's Checkpoints - youtube.com/watch?v=hoLeQjoGBkQ
Understanding exactly-once processing and windowing in streaming pipelines - youtube.com/watch?v=DraQGkARegE
At Most Once
At Least Once
Exactly Once
Messages can be lost, but never duplicated (Fire & Forget)
Messages can be duplicated
Messages are delivered & processed exactly once

IoT Farm
Context
 +1000 Sensors
 Multiple Sensors per location
 Not reliable internet connection
 Large amount of continious sensors data
Requirements
 Aggregated Sensors Data Per Location
 Correct Order Of Data
 No Duplicates

Read Source
Operators & Transform
Transforms Sink
Operator(s) Operator(s) Operator(s)
Basic Building Blocks

Read Soil Moisture Sensors
Sink
IOT Farm Example
Operator(s)
Operator(s)
Read Optical Sensors
Read Temperture Sensors
Filter Selected
Locations Join & Aggregate
Operator(s)
Operator(s)

Images From:
http://ibmstreams.github.io/streamsx.documentation/docs/spl/quick-start/qs-2/
Analyzing tweets using Cloud Dataflow pipeline templates https://cloud.google.com/blog/products/gcp/analyzing-tweets-using-cloud-dataflow-pipeline-templates/

Time
5
4 4
1
7
2 2
6
4 1
Windowing
Sum: 19
Count: 5
2
3
6
4 4
7
2
2
6 4
1
2
• Divides an unbounded, continuous data stream into
smaller, finite segments
• Allows to perform operations and calculations on
manageable chunks of data.
• It’s not feasible to load/keep entire stream into memory
• Useful for analyzing data over specific time periods or
fixed numbers of events.
Window of Data
Learn More
Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s
Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M

Time
5
4 4
1
7
2 2
6
4
1
5 seconds
Time Based Windows
No Overlaps between windows elements
Tumbling/Fixed Window
5
1
4
7
2
4
5 seconds 5 seconds
4
2 1
Sum:11
Count: 4
Sum: 19
Count: 5
Sum: 5
Count: 2
Time
5
2 3
4 4
1
7
2 2
6
4
1
Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 13
Count: 4
2 3
2 3
Time
5
2 3
4 4
1
7
2 2
6
4
1
Time & Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 7
Count: 3
5 seconds 5 seconds 5 seconds

Sliding Window
Time
Success
Success
Success
Success Success
Error
WARN
WARN Error
WARN
Window #1 Window #2 Window #3 Window #N Window #N+1
Time Based Windows
Error
Error Error
Error Error
Error Error
Error
Success : 4
Warn : 0
Error : 0
Success : 3
Warn : 0
Error : 1
Success : 1
Warn : 2
Error : 1
………..
Success : 0
Warn : 0
Error : 4
Last 10 Second Every 5 Seconds + Overlaps Between Windows

Session Window
Time
User #1
Play
Heartbeat
Heart Beat
Seek
Seek Heartbeat
Seek
Heart Beat Heartbeat Heartbeat
Seek
Pause
Window #1 Window #2
10 sec
User #2
Play
Heartbeat
Heart Beat
Seek
Heartbeat
Heartbeat
Window #1 Window #2
20 sec
Close the window based on GAP Duration = 10 sec

Watermarks
1
2
3
4
7
Window #1 Window #2
5 seconds 5 seconds
1
2
3
4
7
Window #1 Window #2
5 seconds 5 seconds
4
Learn More
Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s
Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M

 Bounded Stream / Unbounded Stream
 Operators & Transforms
 Event Time & Processing Time
 Event Delivery Guarantee
 Windowing ( Fixed , Sliding, Session, Watermark )
 States & Stateful Stream Processing
 Joining Streams & Enrichment Pattern

Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
2
5 3
2
1 2
1
3 4 5
Temperature Sensor
Stream
Moisture Sensor
Stream
Window Window Inner Join
2
1 1
2
Window Cross Join
(CoGroup)
3
2
1
5
2
1
Joining Streams & Enrichment Pattern
Device-2 , Temp : 28
Device-2 , Moisture : 876
Device-2
Moisture : 876
Temp : 28
Inner Join

States & Stateful Stream Processing
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
State
Stateful
Operator
Streams
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateful
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
State

States & Stateful Stream Processing
Login
Attempts
State:
Last Threshold Breach : Nullable
Read
Windowing
Last 15 Minutes
Count
Enrich With Previous
Breache and Update
Last Breach
Group By IP
Brute Force Login Monitoring
Sink
Security
Alerts
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
Login
Attempts
Login
Attempts
Filter Above
Threshold

Group By Key / KeyBy [4Geeks]
Play
Heartbeat
Heart Beat
Seek
Seek
Heartbeat
Seek
Heart Beat
Heartbeat
Heartbeat
Seek
Group By Action
Play
Play
Play
Group By Customer Seek Heartbeat
Heartbeat
Heartbeat Seek
Play
Play
Learn More
Apache Flink Specifying Keys https://medium.com/big-data-processing/apache-flink-specifying-keys-81b3b651469
Branching & merging PCollections with Apache Beam - https://youtu.be/RYD40js20a4

What You Just Saw
Hidden Code Behind
The Functions

Order Enrichment With Customer Data [4Geeks]
Apache Beam + Dataflow vs Spring Boot
Customers Events (CDC)
Orders Events
Enriched Orders With
Customer Data
Enrich Order Data
Code Repository & Slides
@SorooshKh

Insights
1 Dataflow Worker with Default Spec
120k message processed in 3 minutes
Apache Beam + Dataflow
Order Enrichment Test Results
Note: Please note that the insights provided above are not derived from a fully accurate benchmark.
~ 700 msg/second
Higher Costs
For Keeping Job Running
Tested on Minimum Kubernetes Hardware on GCP
120k message processed in 5 minutes
Spring Boot
~ 400 msg/second
Lower Costs
For Keeping Job Running

Customer
CDC
Read
Enrich Order With
Customer Data
Sink
EnrichedOrder
Orders Read
Store Customer
in Redis
Get Customer
Information from Redis
Spring Boot + Redis

Customer
CDC
State:
Customer
Read
CoGroupByKey
EnrichOrderWithCusto
merData
Sink
EnrichedOrder
Orders Read
KeyBy
CustomerID
KeyBy
CustomerID
Update Customer in State
Customer(123) (123, Customer(123)) (123, Customer(123))
Order(1005, CustomerId =123) (123, Order(1005, CustomerId=123)) (123, Order(1005, CustomerId=123))
OrderWithCustomerData
- Order
- Customer
Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
Apache Beam + Dataflow

Why Should We Consider It
Benefits, Drawbacks & Considerations

Benefits & Drawbacks
 Fast & High-Throughput
 Easy to Scale
 Exactly Once Processing / Fault Tolerant
 Customizable
 Advanced features in scale: Windowing,
Watermarks, Stateful Functions and ..
✖ Complexity
✖ Implementation & Maintenance
✖ Testing & Debugging is challenging
✖ Changing the data pipelines are hard
✖ Error handling is not simple
✖ Data consistency is not easy
Drawbacks
Benefits
Stream Processing Frameworks

Stream Data Integration vs Stream Analytics
Learn More
Stream Processing – Concepts and Frameworks (Guido Schmutz, Switzerland)
https://www.youtube.com/watch?v=vFshGQ2ndeg | https://www.slideshare.net/gschmutz/introduction-to-stream-processing-132881199
(Stream ETL)
Stream Data Integration Stream Analytics
 Reading Input
 Map
 Filter
 Simple Enrich
 Stateful Processing
 Pattern Matching
 Complex Calculations / Aggregations

Considerations
Learn More ( Important )
Apache Flink Worst Practices - Konstantin Knauf - https://www.youtube.com/watch?v=F7HQd3KX2TQ
Learning Curve Project Timeline Hard to Find Developer
Limited Docs/Resources Community Support Costs
Stream Data Integration
1 – 2 Weeks
Stream Analytics
2 – 3 Months
3 – 4 Engineers
4 – 6 Months
0 -> Stability
Cloud Providers Helps a Bit

Stream Processing
When should we consider it in our solutions?

DECISION
MAKING
FACTORS
Requirements
(FRs + NFRs +
Roadmap)
Development
Cost (Capex)
Maintenance
Cost (Opex)
Complexity Limitations Industry Best
Practices

Case: Stream Data Integration
Context / Conditions

Case: Stream Data Integration
• Events / second < 1K
• Experience of Stream processing : No
• Business queries are changing frequently
• Time to market : Very tight
• 3 – 4 Mid-Senior Developers
Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.

Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Case: Stream Analytics
• Events / second > 10K
• Experience of Stream processing : No
• Business queries are clear and not changing frequently
• Real time/near real time insights are crucial ? Yes
• 3 – 4 Mid-Senior Developers
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.

Quick Look On
Stream Processing Use Cases

Usecases
Video Streaming
Playback Analytics
IOT
GPS Tracking
Telecom
Billing / Charging System
Finance
Fraud Detection
E-Commerce
User Analytics
Gaming Industry
Anti-Cheat

Video Platforms
Use cases
Playback Analytics
Content Provider Shares
Pay Per Minute
Fraud Detection
Personalized
Recommendation
Learn More
Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis youtube.com/watch?v=lC0d3gAPXaI
Custom, Complex Windows at Scale using Apache Flink - Matt Zimmer (Netflix) youtube.com/watch?v=XUvqnsWm8yo
SF 2017: Monal Daxini - Stream Processing with Flink at Netflix youtube.com/watch?v=sPB8w-YXX1s
Real-time Processing with Flink for Machine Learning at Netflix - Elliot Chow youtube.com/watch?v=o4C7TDneH00

Gaming Industry
Use cases
Learn More
Kafka and Big Data Streaming Use Cases in the Gaming Industry
https://www.confluent.io/online-talks/kafka-and-big-data-streaming-use-cases-in-the-gaming-
industry/
Let's Play Flink – Fun with Streaming in a Gaming Company
https://www.youtube.com/watch?v=8BNKEmt47UM
Game
Telemetry
Analytics
Rewards
(In-Game)
Live
In-Game
Changes
(NPC, Quests, .. )
IoT
Integration
Loyalty
Service
Anti-Cheat
Chat Service
Monitoring
Match
Making
Payment
Fraud
Detection
In-Game
Recommendation
Advertiseme
AI
Training
Payment

Application Analytics
Use cases
Learn More
Implementing Google Analytics: A Case Study - Making Sense of Stream Processing by Martin Kleppmann
https://www.oreilly.com/library/view/making-sense-of/9781492042563/ch01.html
Martin Kleppmann — Event Sourcing and Stream Processing at Scale https://www.youtube.com/watch?v=avi-TZI9t2I
Singles Day 2018: Data in a Flink of an eye https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye

Learn More
7 Reasons to use Apache Flink for your IoT Project
https://www.youtube.com/watch?v=Q0LBTmT4W9o
Fleet management / GPS Tracking
Anomaly detection
Smart home automation
Energy management
Environmental monitoring
Predictive maintenance
Self-Driving Cars
Internet Of Things
Use cases

Billing Network Optimization Security Fraud Detection
Learn More
Maciej Próchniak - Stream processing in telco - case study based on Apache Flink & TouK Nussknacker @ Devoxx Poland
https://www.youtube.com/watch?v=WLfEB__fM-4
Telecommunication
Use cases

Fraud detection
Algorithmic trading
Risk management
Real-time portfolio analysis Customer analytics
Regulatory compliance
Profit & Lost Insights
Learn More
Real Time Fraud Detection with Stateful Functions https://www.youtube.com/watch?v=RxDlksbsdQ0
Fast Data at ING - Martijn Visser & Bas Geerdink (ING) https://www.youtube.com/watch?v=e-_6gijUGAw
Stream ING Models – Real time model deployment of ML Capabilities https://www.youtube.com/watch?v=Do7C4UJyWCM
Financial Systems
Use cases

Stream Processing
How to start learning ?

How to start learning?
[1] https://youtu.be/65lmwL7rSy4
[2] https://youtube.com/playlist?list=PL8bzd7vku-WhVHzJgmXoCxx3aB4PxTQLP
[3] https://beamsummit.org/
[3] https://www.flink-forward.org/
[4] https://beam.apache.org/documentation/
[4] https://nightlies.apache.org/flink/flink-docs-stable/
1 2 3 4
IMPORTANT NOTE
Creating a Stream Processing service isn't as straightforward as crafting CRUD APIs. Relying solely on Google, development
tools, Stackoverflow, and copy-pasting won't get you far. It's crucial to dedicate ample time to thoroughly learn and
understand the underlying concepts.
Google Cloud Apache Beam
Debi Cabrera
Apache Beam Step By Step
Atul Raina
BEAM SUMMIT & FLINK
FORWARD
Official Documentation

Slides & Code Repository
Any Question ?
Send me a message on twitter or Linkedin
Thanks for your Attention !
@SorooshKh linkedin.com/in/sorooshkhodami/
Please Rate This Session
And Share Your Feedback

Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

Similar to Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023 (20)

Recently uploaded

Recently uploaded (20)

Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

Editor's Notes