Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extending Spark Streaming to Support Complex Event Processing

5,105 views

Published on

In this talk, we introduce the extensions of Spark Streaming to support (1) SQL-based query processing and (2) elastic-seamless resource allocation. First, we explain the methods of supporting window queries and query chains. As we know, last year, Grace Huang and Jerry Shao introduced the concept of “StreamSQL” that can process streaming data with SQL-like queries by adapting SparkSQL to Spark Streaming. However, we made advances in supporting complex event processing (CEP) based on their efforts. In detail, we implemented the sliding window concept to support a time-based streaming data processing at the SQL level. Here, to reduce the aggregation time of large windows, we generate an efficient query plan that computes the partial results by evaluating only the data entering or leaving the window and then gets the current result by merging the previous one and the partial ones. Next, to support query chains, we made the result of a query over streaming data be a table by adding the “insert into” query. That is, it allows us to apply stream queries to the results of other ones. Second, we explain the methods of allocating resources to streaming applications dynamically, which enable the applications to meet a given deadline. As the rate of incoming events varies over time, resources allocated to applications need to be adjusted for high resource utilization. However, the current Spark's resource allocation features are not suitable for streaming applications. That is, the resources allocated will not be freed when new data are arriving continuously to the streaming applications even though the quantity of the new ones is very small. In order to resolve the problem, we consider their resource utilization. If the utilization is low, we choose victim nodes to be killed. Then, we do not feed new data into the victims to prevent a useless recovery issuing when they are killed. Accordingly, we can scale-in/-out the resources seamlessly.

Published in: Software

Extending Spark Streaming to Support Complex Event Processing

  1. 1. 이름 김병진, 권오찬 Extending Spark Streaming to Support Complex Event Processing 소속 삼성전자 2015. 10. 27.
  2. 2. Agenda • Background • Motivation • Streaming SQL • Auto Scaling • Future Work
  3. 3. Background - Spark • Apache Spark is a high performance data processing platform • Use a distributed memory cache (RDD*) • Process batch and stream data in the common platform • Be developed by 700+ contributors 3x faster than Storm (Stream, 30 Nodes) 100x faster than Hadoop (Batch, 50 Nodes, Clustering Alg.) *RDD: Resilient Distributed Datasets
  4. 4. Background – Spark RDD • Resilient Distributed Datasets • RDDs are immutable • Transformations are lazy operations which build a RDD’s lineage graph • E.g.: map, filter, join, union • Actions launch a computation to return a value or write data • E.g.: count, collect, reduce, save • The scheduler builds a DAG of stages to execute • If a task fails, spark re-runs it on another node as long as its stage’s parent are still available
  5. 5. Background - Spark Modules • Spark Streaming • Divide stream data into micro-batches • Compute the batches with fault tolerance • Spark SQL • Support SQL to handle structured data • Provide a common way to access a variety of data sources (Hive, Avro, Parquet, ORC, JSON, and JDBC) Spark Spark Streaming batches of X seconds live data stream processed results DStreams RDDs map, reduce, count, …
  6. 6. Motivation – Support CEP in Spark • Current spark is not enough to support CEP* • Do not support continuous query language to process stream data • Do not support auto-scaling to elastically allocate resources • We have solved the issues: • Extend Intel’s Streaming SQL package • Improve performance by optimizing time-based windowed aggregation • Support query chains by implementing “Insert Into” queries • Implement elastic-seamless resource allocation • Can do auto-scale in/out *Complex Event Processing
  7. 7. Streaming SQL
  8. 8. Streaming SQL - Intel • Streaming SQL is a third party library of Spark Packages • Build on top of Spark Streaming and Spark SQL Catalyst • Manipulate stream data like static structured data in database • Process queries continuously • Support time based windowing join/aggregation queries http://spark-packages.org/package/Intel-bigdata/spark-streamingsql Streaming SQL
  9. 9. Streaming SQL Query Schema DStream Optimized Logical Plan Streaming SQL - Plans • Logical Plan • Modify streaming SQL optimizer • Add windowed aggregate logical plan • Physical Plan • Add windowed aggregate physical plan to call new WindowedStateDStream class • Implement new expression functions (Count, Sum, Average, Min, Max, Distinct) • Develop FixedSizedAggregator for efficiency which is the concept of IBM InfoSphere Streams Samsung Stream Plan Intel Logical Plan Physical Plan
  10. 10. Streaming SQL – Windowed State • WindowedStateDStream class • Modify StateDStream class to support windowed computing • Add inverse update function to evaluate old values • Add filter function to remove obsolete keys Intel Samsung Compute all elements in the window time 1 time 2 time 3 time 4 time 5 Original DStream Windowed State DStream time 1 time 2 time 3 time 4 time 5 Original DStream Windowed DStream Compute only Delta (old and new elements) window-based operation Update (in) InvUpdate (out) State: Array[AggregateFunction] Count Update InvUpdate Update InvUpdate State Sum Update InvUpdate State Avg Update InvUpdate State Min Update InvUpdate State Max Update InvUpdate State
  11. 11. Streaming SQL – Fixed Sized Aggregator • Fixed-Sized Aggregator* of IBM InfoSphere Streams • Use fixed sized binary tree • Maintain leaf nodes as a circular buffer using front and back pointers • Very efficient for non-invertable expressions (e.g. Min, Max) • Example - Min Aggregator 4 7 3 4 3 3 4 7 3 2 4 2 2 7 3 2 7 2 2 9 7 3 2 7 2 2 4 4 4 4 7 4 4 *K. Tangwongsan, M. Hirzel, S. Schneider, K-L. Wu, “General incremental sliding-window aggregation,” In VLDB, 2015.
  12. 12. Streaming SQL – Aggregate Functions • Windowed Aggregate Functions • Add invUpdate function • Reduce objects for efficient serialization • Implement Fixed-Sized Aggregator for Min, Max functions Aggregate Functions State Update InvUpdate Count countValue: Long Increase countValue Decrease countValue Sum sumValue: Any Add to sumValue Subtract to sumValue Average sumValue: Any countValue: Long Increase countValue Add to sumValue Decrease countValue Subtract to sumValue Min fat: FixedSizedAggregator Insert minimum to fat Remove the oldest from fat Max fat: FixedSizedAggregator Insert maximum to fat Remove the oldest from fat CountDistinct distinctMap: mutable.HashMap[Row, Long] Increase count of distinct value in map Decrease count of distinct value in map
  13. 13. Streaming SQL – Insert Into • Support “Insert Into” query • Implement Data Sources API • Implement the insert function in Kafka relation • Modified some physical plans to support streaming • Modified physical planning strategies to assign a specific plan • Convert current RDD to a DataFrame and then insert it • Support query chaining  The result a query can be reused by multiple queries Example2 Kafka Kafka INSERT INTO TABLE Example1 SELECT duid, time FROM ParsedTable INSERT INTO TABLE Example2 SELECT duid, COUNT (*) FROM Example1 GROUP BY duidParsed Table Example1 Kafka Kafka Relation InsertIntoStreamSource StreamExecutedCommandInsertIntoTable Kafka Logical Plan Physical Plan Kafka Relation DataFrame
  14. 14. Streaming SQL – Evaluation • Experimental Environment • Processing Node: Intel i5 2.67GHz, 4 Cores, 4GB per node Spark Cluster: 7 Nodes Kafka Cluster: 3 Nodes • Test Query: SELECT t.word, COUNT(t.word), SUM(t.num), AVG(t.num), MIN(t.num), MAX(t.num) FROM (SELECT * FROM t_kafka) OVER (WINDOW 'x' SECONDS, SLIDE ‘1' SECONDS) AS t GROUP BY t.word • Default Set: 100 EPS, 100 Keys, 20 sec Window, 1 sec Slide, 10 Executors, 10 Reducers Spark Cluster Test Agent Kafka Cluster Spark UI
  15. 15. Streaming SQL – Evaluation • Test Result • Show low processing delays despite of heavy event loads or large-sized windows • Need memory optimization 0 200 400 600 800 1000 1200 1400 20 40 60 80 100 Delay(ms) Window Size (sec) Intel Samsung 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 20 40 60 80 100 Memory(KB) Window Size (sec) Intel Samsung 0 1000 2000 3000 4000 5000 10 100 1000 10000 Delay(ms) EPS Intel Samsung 0 2000 4000 6000 8000 10000 12000 10 100 1000 10000 Memory(KB) EPS Intel Samsung
  16. 16. Auto Scaling
  17. 17. Time-varying Event Rate in Real World • Spark Streaming • Data can be ingested from many sources like Kafka, Flume, Twitter, etc. • Live input data streams are divided into batches, and they are processed • In streaming application, event rate may change frequently over time • How can this be dealt with? < Event Rate > < Batch Processing Time> High event rate Can’t be processed in real-time
  18. 18. Spark AS-IS • Spark currently supports dynamic resource allocation on Yarn (SPARK-3174) and coarse-grained Mesos (SPARK-6287) • Existing Dynamic Resource Allocation is not optimized for streaming • Even though event rate becomes smaller, executors may not be removed due to scheduling policy • It is difficult to determine appropriate configuration parameters • Backpressure (SPARK-7398) • Enable the Spark streaming to control the receiving rate dynamically for handling bursty input streams • Not real-time Driver Task queue Task Executor #1 ... Executor #2 Executor #n • Upper/lower bound for the number of executors • spark.dynamicAllocation.minExecutors • spark.dynamicAllocation.maxExecutors • Scale-out condition • schedulerBacklogTimeout < staying time of a task in task queue • Scale-in condition • executorIdleTimeout < staying time of an executor in idle state
  19. 19. • Goal • Allocate resources to streaming applications dynamically as the rate of incoming events varies over time • Enable the applications to meet real-time deadlines • Utilize the resource efficiently • Cloud Architecture Elastic-seamless Resource Allocation Flint* *Flint: our spark job manager
  20. 20. • Spark currently supports three cluster managers • Standalone, Apache Mesos, Hadoop Yarn • Spark on Mesos • Fine-grained mode • Each Spark task runs as a separate Mesos task. • Launching overhead is big, so it is not suitable for streaming. • Coarse-grained mode • Launch only one long-running Spark task on each Mesos machine • Cannot scale-out more than the number of Mesos machines • Cannot control executor resource • Only available for total resource • Both of two modes have some problems to achieve our goal Spark Deployment Architecture
  21. 21. Flint Architecture Overview Slave 1 Slave 2 Slave n . . . MESOS Cluster Cluster 1 YARN Cluster n YARN . . . . . . NM 1 NM 2 . . . NM n NM 1 NM 2 NM n . . . App 1 SPARK App n SPARK EXEC 1 EXEC 2 EXEC n. . . EXEC 1 EXEC 2 EXEC n. . . FLINT REST Manager Scheduler Application Manager Application Handler Application Handler Application Handler Deploy Manager Log Manager • Marathon, Zookeeper, ETCD, HDFS Spark on on-demand Yarn
  22. 22. • Job submission process 1. Request to deploy dockerized YARN via Marathon 2. Launch ResourceManager, NodeManagers for Spark driver/executors 3. Check ResourceManager status 4. Submit Spark Job and check Job status 5. Watch Spark driver status and get Spark endpoint Flint Architecture: Job Submission Slave 1 Slave n . . . MESOS Cluster Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 NM 2 . . . NM n Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 NM 2 . . . NM n App SPARK EXEC 1 EXEC 2 EXEC n. . . Create Yarn Cluster Submit Spark Job
  23. 23. • Scale-out Process 1. Request to increase the instance of NodeManager via Marathon 2. Launch new NodeManager 3. Scale-out Spark executor Flint Architecture: Scale-out Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n App SPARK EXEC 1 EXEC n. . . Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n App SPARK EXEC 1 EXEC n. . . NM 2 Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n NM 2 App SPARK EXEC 1 EXEC n. . . EXEC 2 Request new NodeManager Scale-out Spark executor
  24. 24. • Scale-in Process 1. Get Executor’s info and select spark victims 2. Inactivate spark victims and kill them after n x batch interval 3. Get Yarn victims and decommission NodeManager via ResourceManager 4. Get mesos task id of Yarn victims and kill mesos tasks of victims Flint Architecture: Scale-in Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n NM 2 App SPARK EXEC 1 EXEC n. . . EXEC 2 Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n NM 2 App SPARK EXEC 1 EXEC n. . . Slave 1 Slave n . . . MESOS Cluster Cluster YARN NM 1 . . . NM n NM 2 App SPARK EXEC 1 EXEC n. . . Scale-in Spark executor Remove NodeManager
  25. 25. • Auto-scaling Mechanism • Real-time constraint • Batch processing time < Batch interval • Scale-out condition • α x batch interval < batch processing delay • Scale-in condition • β x batch interval > batch processing delay Flint Architecture: Auto-scaling ( 0 < β < α ≤ 1 ) Processing Time Batch Interval α x Batch Intervalβ x Batch Interval Scale-in Scale-out 0 Time
  26. 26. • Timeout & Retry for Killed Executor • Although Spark executor is killed by admin, Spark re-tries to connect it. Spark Issues: Scale-in (1/3) Scale-in Scheduling Delay Event Rate Processing Delay Driver Exec* scale-in Exec Task RDD req 1st try 2nd try 3rd try fail Processing Delay • spark.shuffle.io.maxRetries = 3 • spark.shuffle.io.retryWait = 5s
  27. 27. • Timeout & Retry for Killed Executor • Advertisement for killed executor • Inactivate executor before scale-in • Inactivate executor and kill them after n x batch interval • During inactivating stage, the executor does not receive any task from driver. Spark Issues: Scale-in (2/3) Driver Exec* scale-in Exec Task RDD req fail advertise Fast Fail
  28. 28. • Data Locality • Spark scheduler consider data locality • Processing/Node/Rack locality • spark.locality.wait = 3s • However, waiting time for locality is big burden to streaming applications • The waiting time should be much less than batch interval for streaming • Otherwise, streaming may not be processed in real-time • Thus, Flint overrides “spark.locality.wait” to very small value if application type is streaming Spark Issues: Scale-in (3/3) Scheduler Cluster TTT T … 3s10ms (max) rsc alloc T
  29. 29. • Data Localization • During scale-out process, new Yarn container performs to localize some data (e.g. spark jar, application jar) • Data localization incurs high disk I/O, so • Solution • Prefetch if possible • Disk isolation, SSD Spark Issues: Scale-out (1/2) Existing App New App Disk Access disk during localization (50-70 MB/sec) Performance Degradation
  30. 30. • RDD Replication • When a new executor is added, a receiver requests to replicate received RDD blocks to the executor • New executor does not ready to receive RDD blocks during an initialization • At this situation, the receiver waits until new executor is ready • Solution • A receiver does not replicate RDD blocks to new executor during initialization • E.g., spark.blockmanager.peer.bootstrap = 10s Spark Issues: Scale-out (2/2) Kafka R RT R R Executor R R T R Executor 1 R T R Executor 2 RT T Receiver Task Task R RDDReplicate factor = 2
  31. 31. • Demo Environment • Data source: Kafka • Auto-scaling parameter • α=0.4, β=0.9 • SQL query • SELECT t.word, COUNT(DISTINCT t.num), SUM(t.num), AVG(t.num), MIN(t.num), MAX(t.num) FROM (SELECT * FROM t_kafka) OVER (WINDOW 300 SECONDS, SLIDE 3 SECONDS) AS t GROUP BY t.word Demo (1/2) Task Task Producer Rcvr 8 3 4 2 5 7 9 1 0 3 7 8 Task Driver 0: 23 1: 34 2: 41 … Flint Monitoring Scale-in/Out
  32. 32. Demo (2/2)
  33. 33. Future Work • Streaming SQL • Apply Tungsten Framework • Small-sized object, code gen, GC free memory management • Share RDDs • Map stream tables to RDD • Auto Scaling • Support to dynamically allocate resources for batch (non-streaming) applications • Support unified schedulers for heterogeneous applications • Batch, streaming, ad-hoc
  34. 34. THANK YOU! https://github.com/samsung/spark-cep

×