Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL and CEP

566 views

Published on

http://flink-forward.org/kb_sessions/declarative-stream-processing-with-streamsql-and-cep/

Complex event processing (CEP) and stream analytics are commonly treated as distinct classes of stream processing applications. While CEP workloads identify patterns from event streams in near real-time, stream analytics queries ingest and aggregate high-volume streams. Both types of use cases have very different requirements which resulted in diverging system designs. CEP systems excel at low-latency processing whereas engines for stream analytics achieve high throughput. Recent advances in open source stream processing yielded systems that can process several millions of events per second at sub-second latency. Systems like Apache Flink enable applications that include typical CEP features as well as heavy aggregations. In this talk we will show how Apache Flink unifies CEP and stream analytics workloads. Guided by examples, we introduce Flink’s CEP-enriched StreamSQL interface and discuss how queries are compiled, optimized, and executed on Flink.

Published in: Data & Analytics
  • Be the first to comment

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL and CEP

  1. 1. Till Rohrmann trohrmann@apache.org @stsffap Fabian Hueske fhueske@apache.org @fhueske Streaming Analytics & CEP Two sides of the same coin?
  2. 2. Streams are Everywhere  Most data is continuously produced as stream  Processing data as it arrives is becoming very popular  Many diverse applications and use cases 2
  3. 3. Complex Event Processing  Analyzing a stream of events and drawing conclusions • Detect patterns and assemble new events  Applications • Network intrusion • Process monitoring • Algorithmic trading  Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support 3
  4. 4. Batch Analytics 4  The batch approach to data analytics
  5. 5. Streaming Analytics  Online aggregation of streams • No delay – Continuous results  Stream analytics subsumes batch analytics • Batch is a finite stream  Demanding requirements on stream processor • High throughput • Exactly-once semantics • Event-time & advanced window support 5
  6. 6. Apache Flink®  Platform for scalable stream processing  Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time & advanced windowing  Core DataStream API available for Java & Scala 6
  7. 7. This Talk is About  Flink’s new APIs for CEP and Stream Analytics • DSL to define CEP patterns and actions • Stream SQL to define queries on streams  Integration of CEP and Stream SQL  Early stage  Work in progress 7
  8. 8. Tracking an Order Process Use Case 8
  9. 9. Order Fulfillment Scenario 9
  10. 10. Order Events  Process is reflected in a stream of order events  Order(orderId, tStamp, “received”)  Shipment(orderId, tStamp, “shipped”)  Delivery(orderId, tStamp, “delivered”)  orderId: Identifies the order  tStamp: Time at which the event happened 10
  11. 11. Aggregating Massive Streams Stream Analytics 11
  12. 12. Stream Analytics  Traditional batch analytics • Repeated queries on finite and changing data sets • Queries join and aggregate large data sets  Stream analytics • “Standing” query produces continuous results from infinite input stream • Query computes aggregates on high-volume streams  How to compute aggregates on infinite streams? 12
  13. 13. Compute Aggregates on Streams  Split infinite stream into finite “windows” • Split usually by time  Tumbling windows • Fixed size & consecutive  Sliding windows • Fixed size & may overlap  Event time mandatory for correct & consistent results! 13
  14. 14. Example: Count Orders by Hour 14
  15. 15. Example: Count Orders by Hour 15 SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  16. 16. Stream SQL Architecture  Flink features SQL on static and streaming tables  Parsing and optimization by Apache Calcite  SQL queries are translated into native Flink programs 16
  17. 17. Pattern Matching on Streams Complex Event Processing 17
  18. 18. Real-time Warnings 18
  19. 19. CEP to the Rescue  Define processing and delivery intervals (SLAs)  ProcessSucc(orderId, tStamp, duration)  ProcessWarn(orderId, tStamp)  DeliverySucc(orderId, tStamp, duration)  DeliveryWarn(orderId, tStamp)  orderId: Identifies the order  tStamp: Time when the event happened  duration: Duration of the processing/delivery 19
  20. 20. CEP Example 20
  21. 21. Processing: Order  Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  22. 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  23. 23. Count Delayed Shipments 23
  24. 24. Compute Avg Processing Time 24
  25. 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT STREAM | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  26. 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp FROM inputs PATTERN a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp WITHIN INTERVAL '1’ HOUR WHERE a.status = ‘received’ AND b.status = ‘shipped’ ) GROUP BY TUMBLE(tStamp, INTERVAL '1’ DAY)
  27. 27. Conclusion  Apache Flink handles CEP and analytical workloads  Apache Flink offers intuitive APIs  New class of applications by CEP and Stream SQL integration  27

×