Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

385 views

Published on

Presenter: Gordon Tai
Video Link: https://www.youtube.com/watch?v=Uho24uN1YZQ
Flink.tw Meetup Event (2016/07/19):
"Stream Processing with Apache Flink w/ Flink PMC Robert Metzger"

Published in: Data & Analytics
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
385
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
22
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Complex Event Processing: Use Cases & FlinkCEP Library (Flink.tw Meetup 2016/07/19)

  1. 1. Complex Event Processing: Use Cases & FlinkCEP Library Gordon Tai - @tzulitai July 19, 2016 @ Flink.tw Meetup
  2. 2. 00 This Talk is About ... ● How FlinkCEP got me interested in Flink ● CEP use cases & applications ○ Use case study #1: tracking an order process ○ Use case study #2: advertisement targeting ● A look at the API 1
  3. 3. ● 戴資力(Gordon) ● Data Engineer @ VMFive ● Java, Scala ● Using Flink as an user on VMFive’s Adtech platform ● Enjoy working on distributed computing systems ● Works on Flink during free time ● Contributor: Flink Kinesis Consumer connector 00 Me & Flink 2
  4. 4. Tale of a Data Engineer trying to figure out how to build up a streaming analytics pipeline ... 1. First lesson: non-trivial streaming applications are never stateless 2. Second lesson: statefull streaming topologies are a pain 3
  5. 5. 1. Exactly-once state updates on failures for correctness 2. Idempotance wrt. external state stores 3. Out-of-order events 4. Aggregating on time windows 5. Rapid application development Applications I was working on: Streaming aggregation for reporting & Conversion patterns for alerting 4
  6. 6. TL;DR. It isn’t fun. At all. ● Reference: Building a Stream Processing System for Playable Ads Data at VMFive @ HadoopCon 2015 ● Redis was used as an external state store ● All state update had to be idempotent ● Exactly-once & replay on failover implemented with Storm’s tuple acking mechanism 5
  7. 7. ● Generate derived events when a specified pattern on raw events occur in a data stream ○ if A and then B → infer complex event C ● Goal: identify meaningful event patterns and respond to them as quickly as possible ● Demanding on the stream processor to provide robust state handling & out-of-order events support while keeping low latency with high throughput 01 Complex Event Processing 6
  8. 8. 02 Apache Flink CEP Library ● Built upon Flink’s DataStream API ● Allows users to define patterns, inject them on event streams, and generates new event streams based on the pattern ● Exploits Flink’s exactly- once semantics for definite correctness 7
  9. 9. eCommerce Order Process Tracking Use case study #1 ** Note: the illustrations & content in this section is from Data Artisans’ presentation: Streaming Analytics & CEP - Two Sides of the Same Coin?
  10. 10. 03 Order Tracking Data Model ● Order(orderId, tStamp, “received”) extends Event ● Shipment(orderId, tStamp, “shipped”) extends Event ● Delivery(orderId, tStamp, “delivered”) extends Event 8
  11. 11. 04 Real-Time Warnings for SLAs ● ProcessSucc(orderId, tStamp, duration) ● ProcessWarn(orderId, tStamp) ● DeliverySucc(orderId, tStamp, duration) ● DeliveryWarn(orderId, tStamp) New inferred events: 9
  12. 12. 05 Glimpse at the FlinkCEP API val processingPattern = Pattern .begin[Event]("orderReceived").subtype(classOf[ Order]) .followedBy( "orderShipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("orderReceived").orderId, timestamp) } { fP => // Select function ProcessSucc( fP( "orderReceived").orderId, fP( "orderShipped").tStamp, fP( "orderShipped").tStamp – fP( "orderReceived").tStamp) } 10
  13. 13. 06 Glimpse at the FlinkCEP API val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment val input: DataStream[Event] = env.addSource(new FlinkKafkaConsumer09(...)) val processingPattern = Pattern.begin(...)... val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...) procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) // … anything you’d like to continue to do with the inferred event stream env.execute() 11
  14. 14. 07 Glimpse at the FlinkCEP API val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic( TimeCharacteristic.EventTime) val input: DataStream[Event] = env .addSource(new FlinkKafkaConsumer09(...)) .assignTimestampsAndWatermarks(new CustomExtractor) val processingPattern = Pattern.begin(...)... val processingPatternStream = CEP.pattern(input.keyBy( "orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select(...) procResult.addSink(new RedisSink(...)) // .addSink(new FlinkKafkaProducer09(...)) // .addSink(new ElasticsearchSink(...)) // .map(new MapFunction{...}) env.execute() 12
  15. 15. 08 Combining Stream SQL & CEP ● Further reading: Streaming Analytics & CEP - Two Sides of the Same Coin? 13
  16. 16. Ad Targeting based on User Attribution Use case study #2 ** Note: the content in this section is heavily based on my experience at VMFive 14
  17. 17. 09 Ad Targeting 101 ● What an ad server does, in a nutshell → determine an appropriate advertisement, chosen from an advertisement campaign pool, for each incoming ad request AdServer Campaign Pool (1) request advertisement (2) return appropriate advertisement info from campaign pool ● “appropriate”: fulfill the targeting rules of each campaign 15
  18. 18. 10 Ad Targeting Rule Types ● Fundamental campaign targeting rule types: ○ Target users’ current location, ex. users in Taipei ○ Target specific user device type, ex. tablet or phone ○ ... ● Advanced campaign targeting rule types: ○ Target user’s past location trace, ex. in Taipei for the past 7 days ○ Target users entering / departuring countries ○ Target users with specific attribution, ex. viewed ○ ... 16
  19. 19. 11 Ad Targeting Rule Types ● Fundamental campaign targeting rule types: ○ Target users’ current location, ex. users in Taipei ○ Target specific user device type, ex. tablet or phone ○ ... ● Advanced campaign targeting rule types: ○ Target user’s past location trace, ex. in Taipei for the past 7 days ○ Target users entering / departuring countries ○ Target users with specific attribution, ex. viewed ○ ... ● Does not require event aggregation ● The rules can be matched simply based on info at request time ● Requires aggregation of historical events ● Aggregating at request time will be far too slow ● Requires inferring complex events from patterns in raw event stream → CEP to the rescue! 16
  20. 20. 12 Basic Ad Targeting Architecture Campaign Pool Targeting Cache Ad Targeter register ad campaigns Event Logger WebService AdServerData Warehouse 17 (1) initial connection
  21. 21. 12 Basic Ad Targeting Architecture Campaign Pool Targeting Cache Ad Targeter Event Logger WebService AdServerData Warehouse 17 (2) fetch ad
  22. 22. 12 Basic Ad Targeting Architecture Ad Targeter Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... Campaign Pool Targeting Cache 18 (3) event tracking
  23. 23. 13 Advanced Ad Targeting Architecture Ad Targeter Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... RulesServuce Campaign Pool Targeting Cache C E P 19
  24. 24. 13 Advanced Ad Targeting Architecture Data Warehouse Raw Logs Event Bus Service Batch Streaming ... RulesService C E P CEP-Rule Templates Rule Fulfillment Cache (Redis) Entry / Depart User Attribution ... (1) Inject a rule to start matching on event stream (3) submit CEP topology (2) Return Rule ID 20
  25. 25. 13 Advanced Ad Targeting Architecture Data Warehouse Raw Logs Event Bus Service Batch Streaming ... RulesService C E P CEP-Rule Templates Rule Fulfillment Cache (Redis) Entry / Depart User Attribution ... (4) When CEP pattern is fulfilled, write to cache: UID → RuleID (5) Lookup whether a UID has fulfilled a RuleID 21
  26. 26. 13 Advanced Ad Targeting Architecture Ad Targeter register ad campaigns Event Logger WebService AdServerData Warehouse Raw Logs Event Bus Service Reporting & analytics services Batch Streaming ... RulesService Campaign Pool Targeting Cache C E P 22 (1) register rule for campaign (2) lookup whether user fulfils a rule
  27. 27. 14 Some Discussion ● Why a fixed pool of CEP-Rule Templates? ○ Prevent rogue rules to match, ex. rules that will consume too much resource ○ It’s a lot less work and complication ;) ● Would be very nice to have a freestyle rule service ○ Pattern matching across different event streams of an organization ○ For BI, there will be arbitrary complex events / patterns analysts want to monitor ● Further study for similar use case: King’s RBEA ○ RBEA: Rule-Based Event Aggregator ○ https://techblog.king.com/rbea-scalable-real-time-analytics-king/ ○ http://data-artisans.com/rbea-scalable-real-time-analytics-at-king/ 23
  28. 28. Closing
  29. 29. XX Closing ● Complex Event Processing is an emerging way to draw insights from data streams, and is demanding of the underlying stream processor for exactly-once semantics for correctness ● FlinkCEP builds on the DataStreamAPI to make this possible and easy 24

×