Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Till Rohrmann
trohrmann@apache.org
@stsffap
Fabian Hueske
fhueske@apache.org
@fhueske
Streaming Analytics & CEP
Two sides ...
Streams are Everywhere
§  Most data is continuously produced as stream
§  Processing data as it arrives
is becoming very p...
Complex Event Processing
§  Analyzing a stream of events and drawing conclusions
•  Detect patterns and assemble new event...
Batch Analytics
4
§  The batch approach to data analytics
Streaming Analytics
§  Online aggregation of streams
•  No delay – Continuous results
§  Stream analytics subsumes batch a...
Apache Flink™
§  Platform for scalable stream processing
§  Meets requirements of CEP and stream analytics
•  Low latency ...
This Talk is About
§  Flink’s new APIs for CEP and Stream Analytics
•  DSL to define CEP patterns and actions
•  Stream SQL...
Tracking an Order Process
Use Case
8
Order Fulfillment Scenario
9
Order Events
§  Process is reflected in a stream of order events
§  Order(orderId,	tStamp,	“received”)	
§  Shipment(orderId...
Aggregating Massive Streams
Stream Analytics
11
Stream Analytics
§  Traditional batch analytics
•  Repeated queries on finite and changing data sets
•  Queries join and ag...
Compute Aggregates on Streams
§  Split infinite stream into finite “windows”
•  Split usually by time
§  Tumbling windows
• ...
Example: Count Orders by Hour
14
Example: Count Orders by Hour
15
SELECT STREAM
TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour,
COUNT(*) AS cnt
FROM event...
Stream SQL Architecture
§  Flink features SQL on static
and streaming tables
§  Parsing and optimization by
Apache Calcite...
Pattern Matching on Streams
Complex Event Processing
17
Real-time Warnings
18
CEP to the Rescue
§  Define processing and delivery intervals (SLAs)
§  ProcessSucc(orderId,	tStamp,	duration)	
§  ProcessW...
CEP Example
20
Processing: Order à Shipment
21
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followe...
… and both at the same time!
Integrated Stream Analytics with CEP
22
Count Delayed Shipments
23
Compute Avg Processing Time
24
CEP + Stream SQL
25
// complex event processing result
val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
v...
CEP-enriched Stream SQL
26
SELECT
TUMBLE_START(tStamp, INTERVAL '1' DAY) as day,
AVG(duration) as avgDuration
FROM (
// CE...
Conclusion
§  Apache Flink handles CEP and analytical
workloads
§  Apache Flink offers intuitive APIs
§  New class of appl...
Flink Forward 2016, Berlin
Submission deadline: June 30, 2016
Early bird deadline: July 15, 2016
www.flink-forward.org
We are hiring!
data-artisans.com/careers
Upcoming SlideShare
Loading in …5
×

Streaming Analytics & CEP - Two sides of the same coin?

1,917 views

Published on

Talk I gave together with Fabian Hueske at the Berlin Buzzwords 2016 conference.

The talk demonstrates how we can combine streaming analytics and complex event processing (CEP) on the same execution engine, namely Apache Flink. This combination allows to open up a new field of applications where we can easily combine aggregations with temporal pattern detection.

Published in: Data & Analytics

Streaming Analytics & CEP - Two sides of the same coin?

  1. 1. Till Rohrmann trohrmann@apache.org @stsffap Fabian Hueske fhueske@apache.org @fhueske Streaming Analytics & CEP Two sides of the same coin?
  2. 2. Streams are Everywhere §  Most data is continuously produced as stream §  Processing data as it arrives is becoming very popular §  Many diverse applications and use cases 2
  3. 3. Complex Event Processing §  Analyzing a stream of events and drawing conclusions •  Detect patterns and assemble new events §  Applications •  Network intrusion •  Process monitoring •  Algorithmic trading §  Demanding requirements on stream processor •  Low latency! •  Exactly-once semantics & event-time support 3
  4. 4. Batch Analytics 4 §  The batch approach to data analytics
  5. 5. Streaming Analytics §  Online aggregation of streams •  No delay – Continuous results §  Stream analytics subsumes batch analytics •  Batch is a finite stream §  Demanding requirements on stream processor •  High throughput •  Exactly-once semantics •  Event-time & advanced window support 5
  6. 6. Apache Flink™ §  Platform for scalable stream processing §  Meets requirements of CEP and stream analytics •  Low latency and high throughput •  Exactly-once semantics •  Event-time & advanced windowing §  Core DataStream API available for Java & Scala 6
  7. 7. This Talk is About §  Flink’s new APIs for CEP and Stream Analytics •  DSL to define CEP patterns and actions •  Stream SQL to define queries on streams §  Integration of CEP and Stream SQL §  Early stage à Work in progress 7
  8. 8. Tracking an Order Process Use Case 8
  9. 9. Order Fulfillment Scenario 9
  10. 10. Order Events §  Process is reflected in a stream of order events §  Order(orderId, tStamp, “received”) §  Shipment(orderId, tStamp, “shipped”) §  Delivery(orderId, tStamp, “delivered”) §  orderId: Identifies the order §  tStamp: Time at which the event happened 10
  11. 11. Aggregating Massive Streams Stream Analytics 11
  12. 12. Stream Analytics §  Traditional batch analytics •  Repeated queries on finite and changing data sets •  Queries join and aggregate large data sets §  Stream analytics •  “Standing” query produces continuous results from infinite input stream •  Query computes aggregates on high-volume streams §  How to compute aggregates on infinite streams? 12
  13. 13. Compute Aggregates on Streams §  Split infinite stream into finite “windows” •  Split usually by time §  Tumbling windows •  Fixed size & consecutive §  Sliding windows •  Fixed size & may overlap §  Event time mandatory for correct & consistent results! 13
  14. 14. Example: Count Orders by Hour 14
  15. 15. Example: Count Orders by Hour 15 SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  16. 16. Stream SQL Architecture §  Flink features SQL on static and streaming tables §  Parsing and optimization by Apache Calcite §  SQL queries are translated into native Flink programs 16
  17. 17. Pattern Matching on Streams Complex Event Processing 17
  18. 18. Real-time Warnings 18
  19. 19. CEP to the Rescue §  Define processing and delivery intervals (SLAs) §  ProcessSucc(orderId, tStamp, duration) §  ProcessWarn(orderId, tStamp) §  DeliverySucc(orderId, tStamp, duration) §  DeliveryWarn(orderId, tStamp) §  orderId: Identifies the order §  tStamp: Time when the event happened §  duration: Duration of the processing/delivery 19
  20. 20. CEP Example 20
  21. 21. Processing: Order à Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  22. 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  23. 23. Count Delayed Shipments 23
  24. 24. Compute Avg Processing Time 24
  25. 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings", deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT STREAM | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  26. 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp FROM inputs PATTERN a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp WITHIN INTERVAL '1’ HOUR WHERE a.status = ‘received’ AND b.status = ‘shipped’ ) GROUP BY TUMBLE(tStamp, INTERVAL '1’ DAY)
  27. 27. Conclusion §  Apache Flink handles CEP and analytical workloads §  Apache Flink offers intuitive APIs §  New class of applications by CEP and Stream SQL integration J 27
  28. 28. Flink Forward 2016, Berlin Submission deadline: June 30, 2016 Early bird deadline: July 15, 2016 www.flink-forward.org
  29. 29. We are hiring! data-artisans.com/careers

×