SlideShare a Scribd company logo
1 of 29
Download to read offline
Complex Event Processing:
Use Cases & FlinkCEP Library
Gordon Tai - @tzulitai
July 19, 2016 @ Flink.tw Meetup
00 This Talk is About ...
● How FlinkCEP got me interested in Flink
● CEP use cases & applications
○ Use case study #1: tracking an order process
○ Use case study #2: advertisement targeting
● A look at the API
1
● 戴資力(Gordon)
● Data Engineer @ VMFive
● Java, Scala
● Using Flink as an user on VMFive’s Adtech platform
● Enjoy working on distributed computing systems
● Works on Flink during free time
● Contributor: Flink Kinesis Consumer connector
00 Me & Flink
2
Tale of a Data Engineer trying to figure
out how to build up a streaming analytics
pipeline ...
1. First lesson: non-trivial streaming
applications are never stateless
2. Second lesson: statefull streaming
topologies are a pain
3
1. Exactly-once state updates on failures for correctness
2. Idempotance wrt. external state stores
3. Out-of-order events
4. Aggregating on time windows
5. Rapid application development
Applications I was working on:
Streaming aggregation for reporting &
Conversion patterns for alerting
4
TL;DR. It isn’t fun. At all.
● Reference:
Building a Stream Processing
System for Playable Ads Data
at VMFive @ HadoopCon 2015
● Redis was used as an external
state store
● All state update had to be
idempotent
● Exactly-once & replay on
failover implemented with
Storm’s tuple acking
mechanism
5
● Generate derived events when a specified pattern on raw
events occur in a data stream
○ if A and then B → infer complex event C
● Goal: identify meaningful event patterns and respond to
them as quickly as possible
● Demanding on the stream processor to provide robust state
handling & out-of-order events support while keeping low
latency with high throughput
01 Complex Event Processing
6
02 Apache Flink CEP Library
● Built upon Flink’s
DataStream API
● Allows users to define
patterns, inject them on
event streams, and
generates new event
streams based on the
pattern
● Exploits Flink’s exactly-
once semantics for definite
correctness
7
eCommerce Order Process Tracking
Use case study #1
** Note: the illustrations & content in this section is from Data Artisans’ presentation:
Streaming Analytics & CEP - Two Sides of the Same Coin?
03 Order Tracking Data Model
● Order(orderId, tStamp, “received”) extends Event
● Shipment(orderId, tStamp, “shipped”) extends Event
● Delivery(orderId, tStamp, “delivered”) extends Event
8
04 Real-Time Warnings for SLAs
● ProcessSucc(orderId, tStamp, duration)
● ProcessWarn(orderId, tStamp)
● DeliverySucc(orderId, tStamp, duration)
● DeliveryWarn(orderId, tStamp)
New inferred events:
9
05 Glimpse at the FlinkCEP API
val processingPattern = Pattern
.begin[Event]("orderReceived").subtype(classOf[ Order])
.followedBy( "orderShipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy( "orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("orderReceived").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP( "orderReceived").orderId, fP( "orderShipped").tStamp,
fP( "orderShipped").tStamp – fP( "orderReceived").tStamp)
}
10
06 Glimpse at the FlinkCEP API
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...))
val processingPattern = Pattern.begin(...)...
val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)
procResult.addSink(new RedisSink(...))
// .addSink(new FlinkKafkaProducer09(...))
// .addSink(new ElasticsearchSink(...))
// .map(new MapFunction{...})
// … anything you’d like to continue to do with the inferred event stream
env.execute()
11
07 Glimpse at the FlinkCEP API
val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic( TimeCharacteristic.EventTime)
val input: DataStream[Event] = env
.addSource(new FlinkKafkaConsumer09(...))
.assignTimestampsAndWatermarks(new CustomExtractor)
val processingPattern = Pattern.begin(...)...
val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...)
procResult.addSink(new RedisSink(...))
// .addSink(new FlinkKafkaProducer09(...))
// .addSink(new ElasticsearchSink(...))
// .map(new MapFunction{...})
env.execute()
12
08 Combining Stream SQL & CEP
● Further reading: Streaming Analytics & CEP - Two Sides of the Same Coin?
13
Ad Targeting based on User Attribution
Use case study #2
** Note: the content in this section is heavily based on my experience at VMFive
14
09 Ad Targeting 101
● What an ad server does, in a nutshell →
determine an appropriate advertisement, chosen from an
advertisement campaign pool, for each incoming ad
request
AdServer
Campaign
Pool
(1) request advertisement
(2) return appropriate
advertisement info
from campaign pool
● “appropriate”:
fulfill the targeting rules of
each campaign
15
10 Ad Targeting Rule Types
● Fundamental campaign targeting rule types:
○ Target users’ current location, ex. users in Taipei
○ Target specific user device type, ex. tablet or phone
○ ...
● Advanced campaign targeting rule types:
○ Target user’s past location trace, ex. in Taipei for the past 7 days
○ Target users entering / departuring countries
○ Target users with specific attribution, ex. viewed
○ ...
16
11 Ad Targeting Rule Types
● Fundamental campaign targeting rule types:
○ Target users’ current location, ex. users in Taipei
○ Target specific user device type, ex. tablet or phone
○ ...
● Advanced campaign targeting rule types:
○ Target user’s past location trace, ex. in Taipei for the past 7 days
○ Target users entering / departuring countries
○ Target users with specific attribution, ex. viewed
○ ...
● Does not require event aggregation
● The rules can be matched simply
based on info at request time
● Requires aggregation of historical events
● Aggregating at request time will be far too slow
● Requires inferring complex events from patterns in
raw event stream → CEP to the rescue!
16
12 Basic Ad Targeting Architecture
Campaign Pool
Targeting Cache
Ad Targeter
register ad
campaigns
Event Logger
WebService
AdServerData Warehouse
17
(1) initial
connection
12 Basic Ad Targeting Architecture
Campaign Pool
Targeting Cache
Ad Targeter
Event Logger
WebService
AdServerData Warehouse
17
(2) fetch ad
12 Basic Ad Targeting Architecture
Ad Targeter
Event Logger
WebService
AdServerData Warehouse
Raw Logs
Event Bus Service
Reporting & analytics
services
Batch
Streaming
...
Campaign Pool
Targeting Cache
18
(3) event
tracking
13 Advanced Ad Targeting Architecture
Ad Targeter
Event Logger
WebService
AdServerData Warehouse
Raw Logs
Event Bus Service
Reporting & analytics
services
Batch
Streaming
...
RulesServuce
Campaign Pool
Targeting Cache
C
E
P
19
13 Advanced Ad Targeting Architecture
Data Warehouse
Raw Logs
Event Bus Service
Batch
Streaming
...
RulesService
C
E
P
CEP-Rule Templates
Rule
Fulfillment
Cache
(Redis)
Entry /
Depart
User
Attribution
...
(1) Inject a rule
to start matching
on event stream
(3)
submit
CEP
topology
(2) Return Rule ID
20
13 Advanced Ad Targeting Architecture
Data Warehouse
Raw Logs
Event Bus Service
Batch
Streaming
...
RulesService
C
E
P
CEP-Rule Templates
Rule
Fulfillment
Cache
(Redis)
Entry /
Depart
User
Attribution
...
(4) When CEP
pattern is fulfilled,
write to cache:
UID → RuleID
(5) Lookup
whether a UID
has fulfilled a
RuleID
21
13 Advanced Ad Targeting Architecture
Ad Targeter
register ad
campaigns
Event Logger
WebService
AdServerData Warehouse
Raw Logs
Event Bus Service
Reporting & analytics
services
Batch
Streaming
...
RulesService
Campaign Pool
Targeting Cache
C
E
P
22
(1) register rule
for campaign
(2) lookup whether
user fulfils a rule
14 Some Discussion
● Why a fixed pool of CEP-Rule Templates?
○ Prevent rogue rules to match, ex. rules that will consume too much resource
○ It’s a lot less work and complication ;)
● Would be very nice to have a freestyle rule service
○ Pattern matching across different event streams of an organization
○ For BI, there will be arbitrary complex events / patterns analysts want to monitor
● Further study for similar use case: King’s RBEA
○ RBEA: Rule-Based Event Aggregator
○ https://techblog.king.com/rbea-scalable-real-time-analytics-king/
○ http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/
23
Closing
XX Closing
● Complex Event Processing is an emerging way to draw
insights from data streams, and is demanding of the
underlying stream processor for exactly-once semantics for
correctness
● FlinkCEP builds on the DataStreamAPI to make this possible
and easy
24

More Related Content

Recently uploaded

一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
0uyfyq0q4
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 

Recently uploaded (20)

Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Heaps & its operation -Max Heap, Min Heap
Heaps & its operation -Max Heap, Min  HeapHeaps & its operation -Max Heap, Min  Heap
Heaps & its operation -Max Heap, Min Heap
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

  • 1. Complex Event Processing: Use Cases & FlinkCEP Library Gordon Tai - @tzulitai July 19, 2016 @ Flink.tw Meetup
  • 2. 00 This Talk is About ... ● How FlinkCEP got me interested in Flink ● CEP use cases & applications ○ Use case study #1: tracking an order process ○ Use case study #2: advertisement targeting ● A look at the API 1
  • 3. ● 戴資力(Gordon) ● Data Engineer @ VMFive ● Java, Scala ● Using Flink as an user on VMFive’s Adtech platform ● Enjoy working on distributed computing systems ● Works on Flink during free time ● Contributor: Flink Kinesis Consumer connector 00 Me & Flink 2
  • 4. Tale of a Data Engineer trying to figure out how to build up a streaming analytics pipeline ... 1. First lesson: non-trivial streaming applications are never stateless 2. Second lesson: statefull streaming topologies are a pain 3
  • 5. 1. Exactly-once state updates on failures for correctness 2. Idempotance wrt. external state stores 3. Out-of-order events 4. Aggregating on time windows 5. Rapid application development Applications I was working on: Streaming aggregation for reporting & Conversion patterns for alerting 4
  • 6. TL;DR. It isn’t fun. At all. ● Reference: Building a Stream Processing System for Playable Ads Data at VMFive @ HadoopCon 2015 ● Redis was used as an external state store ● All state update had to be idempotent ● Exactly-once & replay on failover implemented with Storm’s tuple acking mechanism 5
  • 7. ● Generate derived events when a specified pattern on raw events occur in a data stream ○ if A and then B → infer complex event C ● Goal: identify meaningful event patterns and respond to them as quickly as possible ● Demanding on the stream processor to provide robust state handling & out-of-order events support while keeping low latency with high throughput 01 Complex Event Processing 6
  • 8. 02 Apache Flink CEP Library ● Built upon Flink’s DataStream API ● Allows users to define patterns, inject them on event streams, and generates new event streams based on the pattern ● Exploits Flink’s exactly- once semantics for definite correctness 7
  • 9. eCommerce Order Process Tracking Use case study #1 ** Note: the illustrations & content in this section is from Data Artisans’ presentation: Streaming Analytics & CEP - Two Sides of the Same Coin?
  • 10. 03 Order Tracking Data Model ● Order(orderId, tStamp, “received”) extends Event ● Shipment(orderId, tStamp, “shipped”) extends Event ● Delivery(orderId, tStamp, “delivered”) extends Event 8
  • 11. 04 Real-Time Warnings for SLAs ● ProcessSucc(orderId, tStamp, duration) ● ProcessWarn(orderId, tStamp) ● DeliverySucc(orderId, tStamp, duration) ● DeliveryWarn(orderId, tStamp) New inferred events: 9
  • 12. 05 Glimpse at the FlinkCEP API val processingPattern = Pattern .begin[Event]("orderReceived").subtype(classOf[ Order]) .followedBy( "orderShipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("orderReceived").orderId, timestamp) } { fP => // Select function ProcessSucc( fP( "orderReceived").orderId, fP( "orderShipped").tStamp, fP( "orderShipped").tStamp – fP( "orderReceived").tStamp) } 10
  • 13. 06 Glimpse at the FlinkCEP API val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...)) val processingPattern = Pattern.begin(...)... val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...) procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) // … anything you’d like to continue to do with the inferred event stream env.execute() 11
  • 14. 07 Glimpse at the FlinkCEP API val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic( TimeCharacteristic.EventTime) val input: DataStream[Event] = env .addSource(new FlinkKafkaConsumer09(...)) .assignTimestampsAndWatermarks(new CustomExtractor) val processingPattern = Pattern.begin(...)... val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...) procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) env.execute() 12
  • 15. 08 Combining Stream SQL & CEP ● Further reading: Streaming Analytics & CEP - Two Sides of the Same Coin? 13
  • 16. Ad Targeting based on User Attribution Use case study #2 ** Note: the content in this section is heavily based on my experience at VMFive 14
  • 17. 09 Ad Targeting 101 ● What an ad server does, in a nutshell → determine an appropriate advertisement, chosen from an advertisement campaign pool, for each incoming ad request AdServer Campaign Pool (1) request advertisement (2) return appropriate advertisement info from campaign pool ● “appropriate”: fulfill the targeting rules of each campaign 15
  • 18. 10 Ad Targeting Rule Types ● Fundamental campaign targeting rule types: ○ Target users’ current location, ex. users in Taipei ○ Target specific user device type, ex. tablet or phone ○ ... ● Advanced campaign targeting rule types: ○ Target user’s past location trace, ex. in Taipei for the past 7 days ○ Target users entering / departuring countries ○ Target users with specific attribution, ex. viewed ○ ... 16
  • 19. 11 Ad Targeting Rule Types ● Fundamental campaign targeting rule types: ○ Target users’ current location, ex. users in Taipei ○ Target specific user device type, ex. tablet or phone ○ ... ● Advanced campaign targeting rule types: ○ Target user’s past location trace, ex. in Taipei for the past 7 days ○ Target users entering / departuring countries ○ Target users with specific attribution, ex. viewed ○ ... ● Does not require event aggregation ● The rules can be matched simply based on info at request time ● Requires aggregation of historical events ● Aggregating at request time will be far too slow ● Requires inferring complex events from patterns in raw event stream → CEP to the rescue! 16
  • 20. 12 Basic Ad Targeting Architecture Campaign Pool Targeting Cache Ad Targeter register ad campaigns Event Logger WebService AdServerData Warehouse 17 (1) initial connection
  • 21. 12 Basic Ad Targeting Architecture Campaign Pool Targeting Cache Ad Targeter Event Logger WebService AdServerData Warehouse 17 (2) fetch ad
  • 22. 12 Basic Ad Targeting Architecture Ad Targeter Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... Campaign Pool Targeting Cache 18 (3) event tracking
  • 23. 13 Advanced Ad Targeting Architecture Ad Targeter Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... RulesServuce Campaign Pool Targeting Cache C E P 19
  • 24. 13 Advanced Ad Targeting Architecture Data Warehouse Raw Logs Event Bus Service Batch Streaming ... RulesService C E P CEP-Rule Templates Rule Fulfillment Cache (Redis) Entry / Depart User Attribution ... (1) Inject a rule to start matching on event stream (3) submit CEP topology (2) Return Rule ID 20
  • 25. 13 Advanced Ad Targeting Architecture Data Warehouse Raw Logs Event Bus Service Batch Streaming ... RulesService C E P CEP-Rule Templates Rule Fulfillment Cache (Redis) Entry / Depart User Attribution ... (4) When CEP pattern is fulfilled, write to cache: UID → RuleID (5) Lookup whether a UID has fulfilled a RuleID 21
  • 26. 13 Advanced Ad Targeting Architecture Ad Targeter register ad campaigns Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... RulesService Campaign Pool Targeting Cache C E P 22 (1) register rule for campaign (2) lookup whether user fulfils a rule
  • 27. 14 Some Discussion ● Why a fixed pool of CEP-Rule Templates? ○ Prevent rogue rules to match, ex. rules that will consume too much resource ○ It’s a lot less work and complication ;) ● Would be very nice to have a freestyle rule service ○ Pattern matching across different event streams of an organization ○ For BI, there will be arbitrary complex events / patterns analysts want to monitor ● Further study for similar use case: King’s RBEA ○ RBEA: Rule-Based Event Aggregator ○ https://techblog.king.com/rbea-scalable-real-time-analytics-king/ ○ http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/ 23
  • 29. XX Closing ● Complex Event Processing is an emerging way to draw insights from data streams, and is demanding of the underlying stream processor for exactly-once semantics for correctness ● FlinkCEP builds on the DataStreamAPI to make this possible and easy 24