SlideShare a Scribd company logo
Company/speaker
Presentation title
#teqnation2021
CO-SPONSORS
MAIN SPONSOR
Why and when should we consider
Stream Processing
in our solutions
Soroosh Khodami
May 17 2023 @ Teqnation
Agenda
What is Stream Processing?
Frameworks & Platforms
Basic Concepts & Patterns
Demo Time
Benefits & Drawbacks + Considerations
Use Cases For Different Industries
How to start ?
This Talk is For
Software Developers
Tech Leads / Software Architects
Data Engineers / Data Scientist / AI Engineers
Product Owners / Product Managers / Business Analysts
$ whoami
 I’m Soroosh Khodami
 Full-Stack Developer at Bol.com & Code Nomads
 Working with Stream Processing at Scale in Bol.com
 Software Architecture Enthusiastic
@SorooshKh linkedin.com/in/sorooshkhodami/
Slides & Code Repository Link Will Be Shared At The End
RIGHT TOOL
FOR THE JOB
What is Stream Processing?
Event Processing?
Event Driven?
Ref: https://en.wikipedia.org/wiki/Stream_processing
Wikipedia Definition
Stream (Data) Processing
Stream processing is a big data technique that focuses on
continuously reading data, processing the data individually
or joining it with related data sets in real-time or near real-
time, and then sending the output to other applications,
data-stores, or systems.
Event Processing
Trigger Actions
Decision Making
Event
Payment Received
Event Driven Architecture
Stream Processing
Frameworks & Platforms
Stream Processing Universe
2023
Stream Processing Universe
2023
Code will be executed on a Runner Standalone / Alongside other frameworks
Stream Processing Universe
2023
Cloud Platforms
Hardened at Scale
Powered By Flink https://flink.apache.org/powered-by/
+ Examples
Stream Processing
Basic Concepts & Patterns
Bounded Stream / Unbounded Stream
Time
Now
Past Future
Unbounded Stream
Bounded Stream #1
Start End
Time
Now
Past Future
Bounded Stream #2
Start End
Event Time & Processing Time
Processing
Time
Event Time
1
Login
1 2 3 4 5 6 7
2
Search
3
View
4
View
5
View
6
Play
1
Login
2
Search
3
View
4
View
5
View
6
Play
1 2 3 4 5 6 7
Delivery Guarantees
Learn More (Important)
Streaming Concepts - Exactly Once Fault Tolerance Guarantees youtube.com/watch?v=9pRsewtSPkQ
Rundown of Flink's Checkpoints - youtube.com/watch?v=hoLeQjoGBkQ
Understanding exactly-once processing and windowing in streaming pipelines - youtube.com/watch?v=DraQGkARegE
At Most Once
At Least Once
Exactly Once
Messages can be lost, but never duplicated (Fire & Forget)
Messages can be duplicated
Messages are delivered & processed exactly once
IoT Farm
Context
 +1000 Sensors
 Multiple Sensors per location
 Not reliable internet connection
 Large amount of continious sensors data
Requirements
 Aggregated Sensors Data Per Location
 Correct Order Of Data
 No Duplicates
Read Source
Operators & Transform
Transforms Sink
Operator(s) Operator(s) Operator(s)
Basic Building Blocks
Read Soil Moisture Sensors
Operators & Transform
Sink
IOT Farm Example
Operator(s)
Operator(s)
Read Optical Sensors
Read Temperture Sensors
Filter Selected
Locations Join & Aggregate
Operator(s)
Operator(s)
Operators & Transform
Images From:
http://ibmstreams.github.io/streamsx.documentation/docs/spl/quick-start/qs-2/
Analyzing tweets using Cloud Dataflow pipeline templates https://cloud.google.com/blog/products/gcp/analyzing-tweets-using-cloud-dataflow-pipeline-templates/
Time
5
4 4
1
7
2 2
6
4 1
Windowing
Sum: 19
Count: 5
2
3
6
4 4
7
2
2
6 4
1
2
• Divides an unbounded, continuous data stream into
smaller, finite segments
• Allows to perform operations and calculations on
manageable chunks of data.
• It’s not feasible to load/keep entire stream into memory
• Useful for analyzing data over specific time periods or
fixed numbers of events.
Window of Data
Learn More
Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s
Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M
Time
5
4 4
1
7
2 2
6
4
1
5 seconds
Time Based Windows
No Overlaps between windows elements
Tumbling/Fixed Window
5
1
4
7
2
4
5 seconds 5 seconds
4
2 1
Sum:11
Count: 4
Sum: 19
Count: 5
Sum: 5
Count: 2
Time
5
2 3
4 4
1
7
2 2
6
4
1
Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 13
Count: 4
2 3
2 3
Time
5
2 3
4 4
1
7
2 2
6
4
1
Time & Size Based Windows
5
2 3
1
4
7
2
4
4
2
6
1
Sum: 11
Count: 4
Sum: 17
Count: 4
Sum: 7
Count: 3
5 seconds 5 seconds 5 seconds
Sliding Window
Time
Success
Success
Success
Success Success
Error
WARN
WARN Error
WARN
Window #1 Window #2 Window #3 Window #N Window #N+1
Time Based Windows
Error
Error Error
Error Error
Error Error
Error
Success : 4
Warn : 0
Error : 0
Success : 3
Warn : 0
Error : 1
Success : 1
Warn : 2
Error : 1
………..
Success : 0
Warn : 0
Error : 4
Last 10 Second Every 5 Seconds + Overlaps Between Windows
Session Window
Time
User #1
Play
Heartbeat
Heart Beat
Seek
Seek Heartbeat
Seek
Heart Beat Heartbeat Heartbeat
Seek
Pause
Window #1 Window #2
10 sec
User #2
Play
Heartbeat
Heart Beat
Seek
Heartbeat
Heartbeat
Window #1 Window #2
20 sec
Close the window based on GAP Duration = 10 sec
Watermarks
1
2
3
4
7
Window #1 Window #2
5 seconds 5 seconds
1
2
3
4
7
Window #1 Window #2
5 seconds 5 seconds
4
Learn More
Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s
Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M
Basic Concepts & Patterns
 Bounded Stream / Unbounded Stream
 Operators & Transforms
 Event Time & Processing Time
 Event Delivery Guarantee
 Windowing ( Fixed , Sliding, Session, Watermark )
 States & Stateful Stream Processing
 Joining Streams & Enrichment Pattern
Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
2
5 3
2
1 2
1
3 4 5
Temperature Sensor
Stream
Moisture Sensor
Stream
Window Window Inner Join
2
1 1
2
Window Cross Join
(CoGroup)
3
2
1
5
2
1
Joining Streams & Enrichment Pattern
Device-2 , Temp : 28
Device-2 , Moisture : 876
Device-2
Moisture : 876
Temp : 28
Inner Join
States & Stateful Stream Processing
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
State
Stateful
Operator
Streams
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
Stateful
Operator
Stateless
Operator
Stateless
Operator
Stateless
Operator
State
States & Stateful Stream Processing
Login
Attempts
State:
Last Threshold Breach : Nullable
Read
Windowing
Last 15 Minutes
Count
Enrich With Previous
Breache and Update
Last Breach
Group By IP
Brute Force Login Monitoring
Sink
Security
Alerts
Learn More
Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0
Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk
Login
Attempts
Login
Attempts
Filter Above
Threshold
Group By Key / KeyBy [4Geeks]
Play
Heartbeat
Heart Beat
Seek
Seek
Heartbeat
Seek
Heart Beat
Heartbeat
Heartbeat
Seek
Group By Action
Play
Play
Play
Group By Customer Seek Heartbeat
Heartbeat
Heartbeat Seek
Play
Play
Learn More
Apache Flink Specifying Keys https://medium.com/big-data-processing/apache-flink-specifying-keys-81b3b651469
Branching & merging PCollections with Apache Beam - https://youtu.be/RYD40js20a4
DEMO TIME
Apache Beam Code
IP Monitoring ( Apache Beam )
IP Monitoring ( Apache Beam )
What You Just Saw
Hidden Code Behind
The Functions
Order Enrichment With Customer Data [4Geeks]
Apache Beam + Dataflow vs Spring Boot
Customers Events (CDC)
Orders Events
Enriched Orders With
Customer Data
Enrich Order Data
Code Repository & Slides
@SorooshKh
Insights
1 Dataflow Worker with Default Spec
120k message processed in 3 minutes
Apache Beam + Dataflow
Order Enrichment Test Results
Note: Please note that the insights provided above are not derived from a fully accurate benchmark.
~ 700 msg/second
Higher Costs
For Keeping Job Running
Tested on Minimum Kubernetes Hardware on GCP
120k message processed in 5 minutes
Spring Boot
~ 400 msg/second
Lower Costs
For Keeping Job Running
Order Enrichment With Customer Data [4Geeks]
Customer
CDC
Read
Enrich Order With
Customer Data
Sink
EnrichedOrder
Orders Read
Store Customer
in Redis
Get Customer
Information from Redis
Spring Boot + Redis
Order Enrichment With Customer Data [4Geeks]
Customer
CDC
State:
Customer
Read
CoGroupByKey
EnrichOrderWithCusto
merData
Sink
EnrichedOrder
Orders Read
KeyBy
CustomerID
KeyBy
CustomerID
Update Customer in State
Customer(123) (123, Customer(123)) (123, Customer(123))
Order(1005, CustomerId =123) (123, Order(1005, CustomerId=123)) (123, Order(1005, CustomerId=123))
OrderWithCustomerData
- Order
- Customer
Learn More
Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY
Apache Beam + Dataflow
Why Should We Consider It
Benefits, Drawbacks & Considerations
Benefits & Drawbacks
 Fast & High-Throughput
 Easy to Scale
 Exactly Once Processing / Fault Tolerant
 Customizable
 Advanced features in scale: Windowing,
Watermarks, Stateful Functions and ..
✖ Complexity
✖ Implementation & Maintenance
✖ Testing & Debugging is challenging
✖ Changing the data pipelines are hard
✖ Error handling is not simple
✖ Data consistency is not easy
Drawbacks
Benefits
Stream Processing Frameworks
Stream Data Integration vs Stream Analytics
Learn More
Stream Processing – Concepts and Frameworks (Guido Schmutz, Switzerland)
https://www.youtube.com/watch?v=vFshGQ2ndeg | https://www.slideshare.net/gschmutz/introduction-to-stream-processing-132881199
(Stream ETL)
Stream Data Integration Stream Analytics
 Reading Input
 Map
 Filter
 Simple Enrich
 Stateful Processing
 Pattern Matching
 Complex Calculations / Aggregations
Considerations
Learn More ( Important )
Apache Flink Worst Practices - Konstantin Knauf - https://www.youtube.com/watch?v=F7HQd3KX2TQ
Learning Curve Project Timeline Hard to Find Developer
Limited Docs/Resources Community Support Costs
Stream Data Integration
1 – 2 Weeks
Stream Analytics
2 – 3 Months
3 – 4 Engineers
4 – 6 Months
0 -> Stability
Cloud Providers Helps a Bit
Stream Processing
When should we consider it in our solutions?
DECISION
MAKING
FACTORS
Requirements
(FRs + NFRs +
Roadmap)
Development
Cost (Capex)
Maintenance
Cost (Opex)
Complexity Limitations Industry Best
Practices
When should we consider it in our solutions?
Case: Stream Data Integration
Context / Conditions
When should we consider it in our solutions?
Case: Stream Data Integration
Context / Conditions
• Events / second < 1K
• Experience of Stream processing : No
• Business queries are changing frequently
• Time to market : Very tight
• 3 – 4 Mid-Senior Developers
Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
When should we consider it in our solutions?
Learn More
Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ
Context / Conditions
Case: Stream Analytics
• Events / second > 10K
• Experience of Stream processing : No
• Business queries are clear and not changing frequently
• Real time/near real time insights are crucial ? Yes
• 3 – 4 Mid-Senior Developers
Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
Quick Look On
Stream Processing Use Cases
Usecases
Video Streaming
Playback Analytics
IOT
GPS Tracking
Telecom
Billing / Charging System
Finance
Fraud Detection
E-Commerce
User Analytics
Gaming Industry
Anti-Cheat
Video Platforms
Use cases
Playback Analytics
Content Provider Shares
Pay Per Minute
Fraud Detection
Personalized
Recommendation
Learn More
Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis youtube.com/watch?v=lC0d3gAPXaI
Custom, Complex Windows at Scale using Apache Flink - Matt Zimmer (Netflix) youtube.com/watch?v=XUvqnsWm8yo
SF 2017: Monal Daxini - Stream Processing with Flink at Netflix youtube.com/watch?v=sPB8w-YXX1s
Real-time Processing with Flink for Machine Learning at Netflix - Elliot Chow youtube.com/watch?v=o4C7TDneH00
Gaming Industry
Use cases
Learn More
Kafka and Big Data Streaming Use Cases in the Gaming Industry
https://www.confluent.io/online-talks/kafka-and-big-data-streaming-use-cases-in-the-gaming-
industry/
Let's Play Flink – Fun with Streaming in a Gaming Company
https://www.youtube.com/watch?v=8BNKEmt47UM
Game
Telemetry
Analytics
Rewards
(In-Game)
Live
In-Game
Changes
(NPC, Quests, .. )
IoT
Integration
Loyalty
Service
Anti-Cheat
Chat Service
Monitoring
Match
Making
Payment
Fraud
Detection
In-Game
Recommendation
Advertiseme
AI
Training
Payment
Application Analytics
Use cases
Learn More
Implementing Google Analytics: A Case Study - Making Sense of Stream Processing by Martin Kleppmann
https://www.oreilly.com/library/view/making-sense-of/9781492042563/ch01.html
Martin Kleppmann — Event Sourcing and Stream Processing at Scale https://www.youtube.com/watch?v=avi-TZI9t2I
Singles Day 2018: Data in a Flink of an eye https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye
Learn More
7 Reasons to use Apache Flink for your IoT Project
https://www.youtube.com/watch?v=Q0LBTmT4W9o
Fleet management / GPS Tracking
Anomaly detection
Smart home automation
Energy management
Environmental monitoring
Predictive maintenance
Self-Driving Cars
Internet Of Things
Use cases
Billing Network Optimization Security Fraud Detection
Learn More
Maciej Próchniak - Stream processing in telco - case study based on Apache Flink & TouK Nussknacker @ Devoxx Poland
https://www.youtube.com/watch?v=WLfEB__fM-4
Telecommunication
Use cases
Fraud detection
Algorithmic trading
Risk management
Real-time portfolio analysis Customer analytics
Regulatory compliance
Profit & Lost Insights
Learn More
Real Time Fraud Detection with Stateful Functions https://www.youtube.com/watch?v=RxDlksbsdQ0
Fast Data at ING - Martijn Visser & Bas Geerdink (ING) https://www.youtube.com/watch?v=e-_6gijUGAw
Stream ING Models – Real time model deployment of ML Capabilities https://www.youtube.com/watch?v=Do7C4UJyWCM
Financial Systems
Use cases
Stream Processing
How to start learning ?
How to start learning?
[1] https://youtu.be/65lmwL7rSy4
[2] https://youtube.com/playlist?list=PL8bzd7vku-WhVHzJgmXoCxx3aB4PxTQLP
[3] https://beamsummit.org/
[3] https://www.flink-forward.org/
[4] https://beam.apache.org/documentation/
[4] https://nightlies.apache.org/flink/flink-docs-stable/
1 2 3 4
IMPORTANT NOTE
Creating a Stream Processing service isn't as straightforward as crafting CRUD APIs. Relying solely on Google, development
tools, Stackoverflow, and copy-pasting won't get you far. It's crucial to dedicate ample time to thoroughly learn and
understand the underlying concepts.
Google Cloud Apache Beam
Debi Cabrera
Apache Beam Step By Step
Atul Raina
BEAM SUMMIT & FLINK
FORWARD
Official Documentation
Slides & Code Repository
Any Question ?
Send me a message on twitter or Linkedin
Thanks for your Attention !
@SorooshKh linkedin.com/in/sorooshkhodami/
Please Rate This Session
And Share Your Feedback

More Related Content

What's hot

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
Amazon Web Services
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
confluent
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
Daniel Marcous
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Future of Data and AI in Retail - NRF 2023
Future of Data and AI in Retail - NRF 2023Future of Data and AI in Retail - NRF 2023
Future of Data and AI in Retail - NRF 2023
Rob Saker
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
Lukas Masuch
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
Databricks
 
Spring I/O 2022: Knative and Spring - Bringing back the `func`
Spring I/O 2022: Knative and Spring - Bringing back the `func`Spring I/O 2022: Knative and Spring - Bringing back the `func`
Spring I/O 2022: Knative and Spring - Bringing back the `func`
Mauricio (Salaboy) Salatino
 
Rethinking Trust in Data
Rethinking Trust in Data Rethinking Trust in Data
Rethinking Trust in Data
DATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DATAVERSITY
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
confluent
 
Analytics ROI Best Practices
Analytics ROI Best PracticesAnalytics ROI Best Practices
Analytics ROI Best Practices
DATAVERSITY
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 

What's hot (20)

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Modern Data Flow
Modern Data FlowModern Data Flow
Modern Data Flow
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Future of Data and AI in Retail - NRF 2023
Future of Data and AI in Retail - NRF 2023Future of Data and AI in Retail - NRF 2023
Future of Data and AI in Retail - NRF 2023
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
Learn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML LifecycleLearn to Use Databricks for the Full ML Lifecycle
Learn to Use Databricks for the Full ML Lifecycle
 
Spring I/O 2022: Knative and Spring - Bringing back the `func`
Spring I/O 2022: Knative and Spring - Bringing back the `func`Spring I/O 2022: Knative and Spring - Bringing back the `func`
Spring I/O 2022: Knative and Spring - Bringing back the `func`
 
Rethinking Trust in Data
Rethinking Trust in Data Rethinking Trust in Data
Rethinking Trust in Data
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Analytics ROI Best Practices
Analytics ROI Best PracticesAnalytics ROI Best Practices
Analytics ROI Best Practices
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 

Similar to Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
TechWell
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
WSO2
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveJavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
Andreas Grabner
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
SharePoint 2010 Global Deployment
SharePoint 2010 Global DeploymentSharePoint 2010 Global Deployment
SharePoint 2010 Global Deployment
Joel Oleson
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
VMware Tanzu
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
VMware Tanzu
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
VMware Tanzu
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
Andreas Grabner
 
Datasmith Warehousing Solutions
Datasmith Warehousing SolutionsDatasmith Warehousing Solutions
Datasmith Warehousing Solutions
Paul Kolozsvari
 
How fluentd fits into the modern software landscape
How fluentd fits into the modern software landscapeHow fluentd fits into the modern software landscape
How fluentd fits into the modern software landscape
Phil Wilkins
 
Running in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectRunning in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectMaarten Balliauw
 
Running in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectRunning in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectMaarten Balliauw
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
Guido Schmutz
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
Apache Apex
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
Tamar Duvshani Hermel
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
LivePerson
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
Guido Schmutz
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
Julien Pivotto
 

Similar to Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023 (20)

Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
JavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep DiveJavaOne 2015: Top Performance Patterns Deep Dive
JavaOne 2015: Top Performance Patterns Deep Dive
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
SharePoint 2010 Global Deployment
SharePoint 2010 Global DeploymentSharePoint 2010 Global Deployment
SharePoint 2010 Global Deployment
 
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel LavoieSpring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
Spring Boot & Spring Cloud Apps on Pivotal Application Service - Daniel Lavoie
 
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Datasmith Warehousing Solutions
Datasmith Warehousing SolutionsDatasmith Warehousing Solutions
Datasmith Warehousing Solutions
 
How fluentd fits into the modern software landscape
How fluentd fits into the modern software landscapeHow fluentd fits into the modern software landscape
How fluentd fits into the modern software landscape
 
Running in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectRunning in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure project
 
Running in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure projectRunning in the Cloud - First Belgian Azure project
Running in the Cloud - First Belgian Azure project
 
Data Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platformsData Ingestion in Big Data and IoT platforms
Data Ingestion in Big Data and IoT platforms
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Monitoring as an entry point for collaboration
Monitoring as an entry point for collaborationMonitoring as an entry point for collaboration
Monitoring as an entry point for collaboration
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
vrstrong314
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Nidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, TipsNidhi Software Price. Fact , Costs, Tips
Nidhi Software Price. Fact , Costs, Tips
 

Why And When Should We Consider Stream Processing In Our Solutions Teqnation 2023

  • 1. Company/speaker Presentation title #teqnation2021 CO-SPONSORS MAIN SPONSOR Why and when should we consider Stream Processing in our solutions Soroosh Khodami May 17 2023 @ Teqnation
  • 2. Agenda What is Stream Processing? Frameworks & Platforms Basic Concepts & Patterns Demo Time Benefits & Drawbacks + Considerations Use Cases For Different Industries How to start ?
  • 3. This Talk is For Software Developers Tech Leads / Software Architects Data Engineers / Data Scientist / AI Engineers Product Owners / Product Managers / Business Analysts
  • 4. $ whoami  I’m Soroosh Khodami  Full-Stack Developer at Bol.com & Code Nomads  Working with Stream Processing at Scale in Bol.com  Software Architecture Enthusiastic @SorooshKh linkedin.com/in/sorooshkhodami/ Slides & Code Repository Link Will Be Shared At The End
  • 5.
  • 7. What is Stream Processing? Event Processing? Event Driven?
  • 9. Stream (Data) Processing Stream processing is a big data technique that focuses on continuously reading data, processing the data individually or joining it with related data sets in real-time or near real- time, and then sending the output to other applications, data-stores, or systems.
  • 10. Event Processing Trigger Actions Decision Making Event Payment Received
  • 14. Stream Processing Universe 2023 Code will be executed on a Runner Standalone / Alongside other frameworks
  • 16. Hardened at Scale Powered By Flink https://flink.apache.org/powered-by/
  • 17. + Examples Stream Processing Basic Concepts & Patterns
  • 18. Bounded Stream / Unbounded Stream Time Now Past Future Unbounded Stream Bounded Stream #1 Start End Time Now Past Future Bounded Stream #2 Start End
  • 19. Event Time & Processing Time Processing Time Event Time 1 Login 1 2 3 4 5 6 7 2 Search 3 View 4 View 5 View 6 Play 1 Login 2 Search 3 View 4 View 5 View 6 Play 1 2 3 4 5 6 7
  • 20. Delivery Guarantees Learn More (Important) Streaming Concepts - Exactly Once Fault Tolerance Guarantees youtube.com/watch?v=9pRsewtSPkQ Rundown of Flink's Checkpoints - youtube.com/watch?v=hoLeQjoGBkQ Understanding exactly-once processing and windowing in streaming pipelines - youtube.com/watch?v=DraQGkARegE At Most Once At Least Once Exactly Once Messages can be lost, but never duplicated (Fire & Forget) Messages can be duplicated Messages are delivered & processed exactly once
  • 21. IoT Farm Context  +1000 Sensors  Multiple Sensors per location  Not reliable internet connection  Large amount of continious sensors data Requirements  Aggregated Sensors Data Per Location  Correct Order Of Data  No Duplicates
  • 22. Read Source Operators & Transform Transforms Sink Operator(s) Operator(s) Operator(s) Basic Building Blocks
  • 23. Read Soil Moisture Sensors Operators & Transform Sink IOT Farm Example Operator(s) Operator(s) Read Optical Sensors Read Temperture Sensors Filter Selected Locations Join & Aggregate Operator(s) Operator(s)
  • 24. Operators & Transform Images From: http://ibmstreams.github.io/streamsx.documentation/docs/spl/quick-start/qs-2/ Analyzing tweets using Cloud Dataflow pipeline templates https://cloud.google.com/blog/products/gcp/analyzing-tweets-using-cloud-dataflow-pipeline-templates/
  • 25. Time 5 4 4 1 7 2 2 6 4 1 Windowing Sum: 19 Count: 5 2 3 6 4 4 7 2 2 6 4 1 2 • Divides an unbounded, continuous data stream into smaller, finite segments • Allows to perform operations and calculations on manageable chunks of data. • It’s not feasible to load/keep entire stream into memory • Useful for analyzing data over specific time periods or fixed numbers of events. Window of Data Learn More Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M
  • 26. Time 5 4 4 1 7 2 2 6 4 1 5 seconds Time Based Windows No Overlaps between windows elements Tumbling/Fixed Window 5 1 4 7 2 4 5 seconds 5 seconds 4 2 1 Sum:11 Count: 4 Sum: 19 Count: 5 Sum: 5 Count: 2 Time 5 2 3 4 4 1 7 2 2 6 4 1 Size Based Windows 5 2 3 1 4 7 2 4 4 2 6 1 Sum: 11 Count: 4 Sum: 17 Count: 4 Sum: 13 Count: 4 2 3 2 3 Time 5 2 3 4 4 1 7 2 2 6 4 1 Time & Size Based Windows 5 2 3 1 4 7 2 4 4 2 6 1 Sum: 11 Count: 4 Sum: 17 Count: 4 Sum: 7 Count: 3 5 seconds 5 seconds 5 seconds
  • 27. Sliding Window Time Success Success Success Success Success Error WARN WARN Error WARN Window #1 Window #2 Window #3 Window #N Window #N+1 Time Based Windows Error Error Error Error Error Error Error Error Success : 4 Warn : 0 Error : 0 Success : 3 Warn : 0 Error : 1 Success : 1 Warn : 2 Error : 1 ……….. Success : 0 Warn : 0 Error : 4 Last 10 Second Every 5 Seconds + Overlaps Between Windows
  • 28. Session Window Time User #1 Play Heartbeat Heart Beat Seek Seek Heartbeat Seek Heart Beat Heartbeat Heartbeat Seek Pause Window #1 Window #2 10 sec User #2 Play Heartbeat Heart Beat Seek Heartbeat Heartbeat Window #1 Window #2 20 sec Close the window based on GAP Duration = 10 sec
  • 29. Watermarks 1 2 3 4 7 Window #1 Window #2 5 seconds 5 seconds 1 2 3 4 7 Window #1 Window #2 5 seconds 5 seconds 4 Learn More Basics of Windowing - https://www.youtube.com/watch?v=oJ-LueBvOcM&t=1s Advanced Windowing Concepts - https://www.youtube.com/watch?v=MuFA6CSti6M
  • 30. Basic Concepts & Patterns  Bounded Stream / Unbounded Stream  Operators & Transforms  Event Time & Processing Time  Event Delivery Guarantee  Windowing ( Fixed , Sliding, Session, Watermark )  States & Stateful Stream Processing  Joining Streams & Enrichment Pattern
  • 31. Learn More Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY 2 5 3 2 1 2 1 3 4 5 Temperature Sensor Stream Moisture Sensor Stream Window Window Inner Join 2 1 1 2 Window Cross Join (CoGroup) 3 2 1 5 2 1 Joining Streams & Enrichment Pattern Device-2 , Temp : 28 Device-2 , Moisture : 876 Device-2 Moisture : 876 Temp : 28 Inner Join
  • 32. States & Stateful Stream Processing Learn More Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0 Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk State Stateful Operator Streams Stateless Operator Stateless Operator Stateless Operator Stateless Operator Stateless Operator Stateless Operator Stateful Operator Stateless Operator Stateless Operator Stateless Operator State
  • 33. States & Stateful Stream Processing Login Attempts State: Last Threshold Breach : Nullable Read Windowing Last 15 Minutes Count Enrich With Previous Breache and Update Last Breach Group By IP Brute Force Login Monitoring Sink Security Alerts Learn More Introduction to Stateful Stream Processing with Apache Flink - Robert Metzger https://www.youtube.com/watch?v=DkNeyCW-eH0 Webinar: Deep Dive on Apache Flink State - Seth Wiesman - https://www.youtube.com/watch?v=9GF8Hwqzwnk Login Attempts Login Attempts Filter Above Threshold
  • 34. Group By Key / KeyBy [4Geeks] Play Heartbeat Heart Beat Seek Seek Heartbeat Seek Heart Beat Heartbeat Heartbeat Seek Group By Action Play Play Play Group By Customer Seek Heartbeat Heartbeat Heartbeat Seek Play Play Learn More Apache Flink Specifying Keys https://medium.com/big-data-processing/apache-flink-specifying-keys-81b3b651469 Branching & merging PCollections with Apache Beam - https://youtu.be/RYD40js20a4
  • 36.
  • 37. IP Monitoring ( Apache Beam )
  • 38. IP Monitoring ( Apache Beam )
  • 39. What You Just Saw Hidden Code Behind The Functions
  • 40. Order Enrichment With Customer Data [4Geeks] Apache Beam + Dataflow vs Spring Boot Customers Events (CDC) Orders Events Enriched Orders With Customer Data Enrich Order Data Code Repository & Slides @SorooshKh
  • 41. Insights 1 Dataflow Worker with Default Spec 120k message processed in 3 minutes Apache Beam + Dataflow Order Enrichment Test Results Note: Please note that the insights provided above are not derived from a fully accurate benchmark. ~ 700 msg/second Higher Costs For Keeping Job Running Tested on Minimum Kubernetes Hardware on GCP 120k message processed in 5 minutes Spring Boot ~ 400 msg/second Lower Costs For Keeping Job Running
  • 42. Order Enrichment With Customer Data [4Geeks] Customer CDC Read Enrich Order With Customer Data Sink EnrichedOrder Orders Read Store Customer in Redis Get Customer Information from Redis Spring Boot + Redis
  • 43. Order Enrichment With Customer Data [4Geeks] Customer CDC State: Customer Read CoGroupByKey EnrichOrderWithCusto merData Sink EnrichedOrder Orders Read KeyBy CustomerID KeyBy CustomerID Update Customer in State Customer(123) (123, Customer(123)) (123, Customer(123)) Order(1005, CustomerId =123) (123, Order(1005, CustomerId=123)) (123, Order(1005, CustomerId=123)) OrderWithCustomerData - Order - Customer Learn More Stream Join in Flink: from Discrete to Continuous - Xingcan Cui https://www.youtube.com/watch?v=3YVRluJUKIw Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf - https://www.youtube.com/watch?v=cJS18iKLUIY Apache Beam + Dataflow
  • 44. Why Should We Consider It Benefits, Drawbacks & Considerations
  • 45. Benefits & Drawbacks  Fast & High-Throughput  Easy to Scale  Exactly Once Processing / Fault Tolerant  Customizable  Advanced features in scale: Windowing, Watermarks, Stateful Functions and .. ✖ Complexity ✖ Implementation & Maintenance ✖ Testing & Debugging is challenging ✖ Changing the data pipelines are hard ✖ Error handling is not simple ✖ Data consistency is not easy Drawbacks Benefits Stream Processing Frameworks
  • 46. Stream Data Integration vs Stream Analytics Learn More Stream Processing – Concepts and Frameworks (Guido Schmutz, Switzerland) https://www.youtube.com/watch?v=vFshGQ2ndeg | https://www.slideshare.net/gschmutz/introduction-to-stream-processing-132881199 (Stream ETL) Stream Data Integration Stream Analytics  Reading Input  Map  Filter  Simple Enrich  Stateful Processing  Pattern Matching  Complex Calculations / Aggregations
  • 47. Considerations Learn More ( Important ) Apache Flink Worst Practices - Konstantin Knauf - https://www.youtube.com/watch?v=F7HQd3KX2TQ Learning Curve Project Timeline Hard to Find Developer Limited Docs/Resources Community Support Costs Stream Data Integration 1 – 2 Weeks Stream Analytics 2 – 3 Months 3 – 4 Engineers 4 – 6 Months 0 -> Stability Cloud Providers Helps a Bit
  • 48. Stream Processing When should we consider it in our solutions?
  • 49. DECISION MAKING FACTORS Requirements (FRs + NFRs + Roadmap) Development Cost (Capex) Maintenance Cost (Opex) Complexity Limitations Industry Best Practices
  • 50. When should we consider it in our solutions? Case: Stream Data Integration Context / Conditions
  • 51. When should we consider it in our solutions? Case: Stream Data Integration Context / Conditions • Events / second < 1K • Experience of Stream processing : No • Business queries are changing frequently • Time to market : Very tight • 3 – 4 Mid-Senior Developers Learn More Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
  • 52. When should we consider it in our solutions? Learn More Apache Flink Worst Practices - Konstantin Knauf https://www.youtube.com/watch?v=F7HQd3KX2TQ Context / Conditions Case: Stream Analytics • Events / second > 10K • Experience of Stream processing : No • Business queries are clear and not changing frequently • Real time/near real time insights are crucial ? Yes • 3 – 4 Mid-Senior Developers Note: The cases incorporated within this presentation are designed to demonstrate the reasoning process.
  • 53. Quick Look On Stream Processing Use Cases
  • 54. Usecases Video Streaming Playback Analytics IOT GPS Tracking Telecom Billing / Charging System Finance Fraud Detection E-Commerce User Analytics Gaming Industry Anti-Cheat
  • 55. Video Platforms Use cases Playback Analytics Content Provider Shares Pay Per Minute Fraud Detection Personalized Recommendation Learn More Massive Scale Data Processing at Netflix using Flink - Snehal Nagmote & Pallavi Phadnis youtube.com/watch?v=lC0d3gAPXaI Custom, Complex Windows at Scale using Apache Flink - Matt Zimmer (Netflix) youtube.com/watch?v=XUvqnsWm8yo SF 2017: Monal Daxini - Stream Processing with Flink at Netflix youtube.com/watch?v=sPB8w-YXX1s Real-time Processing with Flink for Machine Learning at Netflix - Elliot Chow youtube.com/watch?v=o4C7TDneH00
  • 56. Gaming Industry Use cases Learn More Kafka and Big Data Streaming Use Cases in the Gaming Industry https://www.confluent.io/online-talks/kafka-and-big-data-streaming-use-cases-in-the-gaming- industry/ Let's Play Flink – Fun with Streaming in a Gaming Company https://www.youtube.com/watch?v=8BNKEmt47UM Game Telemetry Analytics Rewards (In-Game) Live In-Game Changes (NPC, Quests, .. ) IoT Integration Loyalty Service Anti-Cheat Chat Service Monitoring Match Making Payment Fraud Detection In-Game Recommendation Advertiseme AI Training Payment
  • 57. Application Analytics Use cases Learn More Implementing Google Analytics: A Case Study - Making Sense of Stream Processing by Martin Kleppmann https://www.oreilly.com/library/view/making-sense-of/9781492042563/ch01.html Martin Kleppmann — Event Sourcing and Stream Processing at Scale https://www.youtube.com/watch?v=avi-TZI9t2I Singles Day 2018: Data in a Flink of an eye https://www.ververica.com/blog/singles-day-2018-data-in-a-flink-of-an-eye
  • 58. Learn More 7 Reasons to use Apache Flink for your IoT Project https://www.youtube.com/watch?v=Q0LBTmT4W9o Fleet management / GPS Tracking Anomaly detection Smart home automation Energy management Environmental monitoring Predictive maintenance Self-Driving Cars Internet Of Things Use cases
  • 59. Billing Network Optimization Security Fraud Detection Learn More Maciej Próchniak - Stream processing in telco - case study based on Apache Flink & TouK Nussknacker @ Devoxx Poland https://www.youtube.com/watch?v=WLfEB__fM-4 Telecommunication Use cases
  • 60. Fraud detection Algorithmic trading Risk management Real-time portfolio analysis Customer analytics Regulatory compliance Profit & Lost Insights Learn More Real Time Fraud Detection with Stateful Functions https://www.youtube.com/watch?v=RxDlksbsdQ0 Fast Data at ING - Martijn Visser & Bas Geerdink (ING) https://www.youtube.com/watch?v=e-_6gijUGAw Stream ING Models – Real time model deployment of ML Capabilities https://www.youtube.com/watch?v=Do7C4UJyWCM Financial Systems Use cases
  • 61. Stream Processing How to start learning ?
  • 62. How to start learning? [1] https://youtu.be/65lmwL7rSy4 [2] https://youtube.com/playlist?list=PL8bzd7vku-WhVHzJgmXoCxx3aB4PxTQLP [3] https://beamsummit.org/ [3] https://www.flink-forward.org/ [4] https://beam.apache.org/documentation/ [4] https://nightlies.apache.org/flink/flink-docs-stable/ 1 2 3 4 IMPORTANT NOTE Creating a Stream Processing service isn't as straightforward as crafting CRUD APIs. Relying solely on Google, development tools, Stackoverflow, and copy-pasting won't get you far. It's crucial to dedicate ample time to thoroughly learn and understand the underlying concepts. Google Cloud Apache Beam Debi Cabrera Apache Beam Step By Step Atul Raina BEAM SUMMIT & FLINK FORWARD Official Documentation
  • 63. Slides & Code Repository Any Question ? Send me a message on twitter or Linkedin Thanks for your Attention ! @SorooshKh linkedin.com/in/sorooshkhodami/ Please Rate This Session And Share Your Feedback

Editor's Notes

  1. What is Stream Processing ? Why We Should Learn It ?
  2. Developer By Day, Furniture Assembelr By Night I learned that using Right tool is the most important part of assembling
  3. Question 1: Who has heard these technologies a lot ? Question 2: Who has used this technologies in production ? Everyday that we wake up, we hear some new Apache technologies ..
  4. Okay, Not for me I'm not fan of complex definitions. let's get to a simple definition
  5. reading data multiple source processing Data itself. payload itself individually or joined with other data sending out to another system
  6. Event processing is a technique that focuses on listening for specific events or patterns of events within a system, enabling decision-making and triggering actions based on the information contained in the events.
  7. Services communicates with Events
  8. We need to chunk the data to make it feasible to process
  9. Bounded Stream Example : Processing list of last month records for Train Check in – Checkout for Analysis purpose
  10. 1 Minute : You are watching netflix on Airplane / Subway . Your actions will be synced afterward
  11. We have three type of guarantees, no gurantee , at least one delivery, exactly once delivery Flink -> Checkpointing Don’t forget to check learn more
  12. Ok, wait. Hold your horse , So you said a lot of definitions, what is the usecase ..
  13. Transforms are Filter , Map , Aggregate , Join, Custom Functions
  14. 30s – 1 Minute
  15. 1 minute
  16. 1 minute We cannot carry two watermelon with one hand We need to chunk the data to make it feasible to process Ok, right. We should devide . but how we are going to divide the data ?
  17. It’s very similar to a shuttle, isn’t it ?
  18. Let’s imagine that we are receiving request logs
  19. Watching Video on in the Subway or during the flight Phone Call How Stream Processing can do this ? Session Window is based on Group By Key
  20. 1 Minute : Thing that we need to learn, they are too much. So we make it easier by Examples ! How can we do it in our current applications, without Stream processing frame works ?
  21. Some times we need to store some data, and later looking back to stored data similar to what we used to do with Redis / Database.
  22. Key By is most commong Transformation partition the data stream similar to group by in SQL Some times we need to group some of the data together Some times it may cause a network shuffle that will partition the stream on different nodes
  23. 5 minute
  24. val failedLogins = p.apply("Read PubSub Messages", readFromPubSubSubscription()) val ipCounts = failedLogins .apply("Window", failedLoginWindowingStrategy()) .apply("Map to KV <IP,MSG>", mapToKVIPAddr()) .apply("Group by Key IP-Addr", GroupByKey.create()) .apply("Count per IP", countNumberOfAttempts()) val alerts = ipCounts .apply("Filter by Threshold", isCountOfAttempAboveThresholdFilter()) .apply("Enrich with Old Breaches Last Month", enrichWithOldBreachesLastMonth()) alerts.apply("Write Alerts to PubSub", publishToPubSubTopic())
  25. val failedLogins = p.apply("Read PubSub Messages", readFromPubSubSubscription()) val ipCounts = failedLogins .apply("Window", failedLoginWindowingStrategy()) .apply("Map to KV <IP,MSG>", mapToKVIPAddr()) .apply("Group by Key IP-Addr", GroupByKey.create()) .apply("Count per IP", countNumberOfAttempts()) val alerts = ipCounts .apply("Filter by Threshold", isCountOfAttempAboveThresholdFilter()) .apply("Enrich with Old Breaches Last Month", enrichWithOldBreachesLastMonth()) alerts.apply("Write Alerts to PubSub", publishToPubSubTopic())
  26. Stream Processing Applications and especially when you start to have Stateful functions are not really easy.
  27. Complexity Handling out-of-order events, windowing, and state management Increased complexity compared to batch processing Implementation and Maintenance Expertise required in distributed systems, fault tolerance, and specific stream processing frameworks Maintenance effort for business logic and data flow changes Testing and Debugging Complex testing scenarios and simulation of various events and failures Difficulties in debugging due to real-time and distributed nature of processing Error Handling Managing errors and edge cases can be challenging Recovery mechanisms and failure scenarios require careful consideration Data Consistency Ensuring exactly-once processing and data consistency can be challenging Requires robust handling of distributed systems and failures Learning Curve and Project Timeline 2-3 months for a medior developer to become proficient 4-6 months for a project to reach stability from start Resource Intensiveness Real-time processing may consume more resources than batch processing Cloud services can help mitigate infrastructure costs
  28. In Short Stream Data Integration is Map Transform Filter Enrich Stream Data Integration is also using States , Windowing , State Management, Event Pattern
  29. Learning Curve Stream Data Integration : 1 – 2 weeks Stream Analytics: 2 – 3 months For not very basic project, expect 2-4 months from project initiation to reach stability It’s not easy to find developers with extensive stream processing experience. For most of Stream processing frameworks, there are not many step by step documentation & stack overflow questions with working answers. You need to connect the dots yourself. Decent community support available, but not as extensive as Spring or other popular frameworks Stream processing can be resource-intensive, ( Cloud services helps us here )
  30. Case Stream Data Integration: (Map, Filter, Basic Enrichment) You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up. Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
  31. Case Stream Data Integration: (Map, Filter, Basic Enrichment) You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up. Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
  32. Case Stream Data Integration: (Real time ETL) You are not getting much out of using Stream processing frameworks. You can achieve almost same results with other tools with possibility to scale up. Case Stream Analytics : You should start investing on your stream processing solution and building a team by help of professional consultants to lead/faciliate/boost the process. In the mean time, you can use other available tools to support part of your business requirements. ( Like BigQuery, Monitoring tools)
  33. Anomaly detection: Stream processing can help identify unusual patterns or behaviors in IoT device data, enabling early detection of potential issues or failures. For example, it can be used to monitor sensor data from industrial equipment or vehicles to detect anomalies that may indicate a malfunction or maintenance need. Smart home automation: In a smart home environment, stream processing can be used to analyze data from various sensors and devices to trigger automated actions, such as adjusting lighting or temperature based on occupancy, time of day, or user preferences. Fleet management: Stream processing can analyze data from GPS trackers, vehicle sensors, and other devices in real-time to optimize fleet operations. This may include route planning, vehicle maintenance scheduling, fuel efficiency analysis, or driver behavior monitoring. Environmental monitoring: IoT devices can be deployed to monitor various environmental parameters, such as air quality, water levels, or temperature. Stream processing can be used to analyze this data in real-time, enabling rapid response to environmental changes or potential hazards. Energy management: Stream processing can be used to analyze energy consumption data from smart meters, IoT devices, and sensors in real-time, helping to optimize energy usage and reduce costs. This can be applied to smart grids, microgrids, or individual buildings. Predictive maintenance: By analyzing IoT sensor data in real-time, stream processing can help predict when a machine or equipment may require maintenance or is likely to fail. This allows for proactive maintenance scheduling, reducing downtime and increasing operational efficiency.