Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building Robust, Adaptive Streaming
Apps with Spark Streaming
Tathagata “TD” Das
@tathadas
Spark Summit East 2016
Who am I?
Project ManagementCommittee (PMC) member of Spark
Started Spark Streaming in grad school - AMPLab, UC Berkeley
C...
Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
3
Fast a...
Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
4
Fast a...
Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
5
Fast a...
Streaming Apps in the Real World
Building a high volume stream processingsystem in
production has many challenges
6
Fast a...
Adaptive Streaming Apps
Processing conditions can changedynamically
Sudden surges in data rates
Diurnal variationsin proce...
8
Backpressure
Make apps robust
againstdata surges
Elastic Scaling
Make apps scale with
load variations
9
Backpressure
Make apps robust
againstdata surges
Motivation
Stability condition for any streaming app
Receive data only as fast as the system can processit
Stability condi...
Stable micro-batch operation
11
batch interval
1s 1s 1s 1s
batch processing time <=
Previousbatch isprocessed before next ...
Unstable micro-batch operation
12
batch processing time > batch interval
2.1s 2.1s 2.1s 2.1s
scheduling delay builds up
Ba...
Backpressure: Feedback Loop
13
Backpressureintroducesa feedback loop to dynamically
adapt the system and avoid instability
Backpressure: Dynamic rate limiting
14
Batch processingtimes
Scheduling delays
Batch processingtimes and schedulingdelays ...
Backpressure: Dynamic rate limiting
15
∑
Batch processingtimes
Scheduling delays
Max stable processingrate estimated with ...
Backpressure: Dynamic rate limiting
16
∑
Batch processingtimes
Scheduling delays
Rate limits on
ingestion
Accordingly, the...
Data buffered in Kafka, ensures
Spark Streaming stays stable
Backpressure: Dynamic rate limiting
17
If HDFS ingestion slow...
Backpressure: Configuration
Available since Spark 1.5
Enabled through SparkConf,set
spark.streaming.backpressure.enabled=t...
19
Elastic Scaling
Make apps scale with
load variations
Elastic Scaling (aka Dynamic Allocation)
Scaling the number of Spark executorsaccording to the load
Spark alreadysupports ...
Scaling policy with Streaming
21
Load is very low
Lots idle time between batches
Cluster resources wasted
stable but
ineff...
Scaling policy with Streaming
22
scale down
the cluster
scaling down increases batch processing times
stable operation wit...
Scaling policy with Streaming
23
scaling up reduces batch processing times
unstable system becomes stable again
scale up
t...
Elastic Scaling
24
Data buffered in Kafka starts draining,
allowing app adapt to any data rate
If Kafka gets data faster t...
Elastic Scaling: Configuration
Will be available in Spark 2.0
Enabled through SparkConf,set
spark.streaming.dynamicAllocat...
Elastic Scaling: Configuration
Make surethere is enoughparallelismto take advantage of
max cluster size
# of partitions in...
27
Backpressure Elastic Scaling
28
Awesome
Adaptive
Apps
29
Demo
How app behaveswhen
the data rate suddenly
increases20x
Demo
30
Processing time increases
with data rate, until equal to
batch interval
Backpressurelimits the
ingestion rate lowe...
Demo
31
Elastic Scaling detects heavy
load and increasescluster size
Processing times reducesas
more resourcesavailable
Demo
32
Backpressurerelaxeslimits to
allow higheringestion rate
But still less than 20x as
cluster is fullyutilized
Takeaways
Backpressure:makes apps robust to sudden changes
Elastic Scaling: makes apps adapt to slower changes
Backpressur...
Upcoming SlideShare
Loading in …5
×

Building Robust, Adaptive Streaming Apps with Spark Streaming

3,897 views

Published on

As the adoption of Spark Streaming increases rapidly, the community has been asking for greater robustness and scalability from Spark Streaming applications in a wider range of operating environments. To fulfill these demands, we have steadily added a number of features in Spark Streaming. We have added backpressure mechanisms which allows Spark Streaming to dynamically adapt to changes in incoming data rates, and maintain stability of the application. In addition, we are extending Spark’s Dynamic Allocation to Spark Streaming, so that streaming applications can elastically scale based on processing requirements. In my talk, I am going to explore these mechanisms and explain how developers can write robust, scalable and adaptive streaming applications using them. Presented by Tathagata "TD" Das from Databricks.

Published in: Engineering
  • Dating for everyone is here: ♥♥♥ http://bit.ly/2ZDZFYj ♥♥♥
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ http://bit.ly/2ZDZFYj ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get Paid To Manage Facebook Fan Pages! Facebook Fan Page Workers Required - Start Immediately. ▲▲▲ https://tinyurl.com/rbrfd6j
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Building Robust, Adaptive Streaming Apps with Spark Streaming

  1. 1. Building Robust, Adaptive Streaming Apps with Spark Streaming Tathagata “TD” Das @tathadas Spark Summit East 2016
  2. 2. Who am I? Project ManagementCommittee (PMC) member of Spark Started Spark Streaming in grad school - AMPLab, UC Berkeley Currenttechnical lead of Spark Streaming Software engineerat Databricks 2
  3. 3. Streaming Apps in the Real World Building a high volume stream processingsystem in production has many challenges 3 Fast and Scalable Spark Streaming is fast, distributed and scalableby design! Running in production at many places with largeclusters and high data volumes
  4. 4. Streaming Apps in the Real World Building a high volume stream processingsystem in production has many challenges 4 Fast and Scalable Easy to program Spark Streaming makes it easy to express complex streaming businesslogic Interoperateswith Spark RDDs, Spark SQL DataFrames/Datasets and MLlib
  5. 5. Streaming Apps in the Real World Building a high volume stream processingsystem in production has many challenges 5 Fast and Scalable Easy to program Fault-tolerant Spark Streaming is fullyfault-tolerant and can provide end-to-end semantic guarantees See my previous Spark Summit talk for moredetails
  6. 6. Streaming Apps in the Real World Building a high volume stream processingsystem in production has many challenges 6 Fast and Scalable Easy to program Fault-tolerant Adaptive Focus of this talk
  7. 7. Adaptive Streaming Apps Processing conditions can changedynamically Sudden surges in data rates Diurnal variationsin processing load Unexpected slowdownsin downstreamdata stores Streaming apps should be able to adapt accordingly 7
  8. 8. 8 Backpressure Make apps robust againstdata surges Elastic Scaling Make apps scale with load variations
  9. 9. 9 Backpressure Make apps robust againstdata surges
  10. 10. Motivation Stability condition for any streaming app Receive data only as fast as the system can processit Stability condition for Spark Streaming’s “micro-batch” model Finish processing previous batch before next one arrives 10
  11. 11. Stable micro-batch operation 11 batch interval 1s 1s 1s 1s batch processing time <= Previousbatch isprocessed before next one arrives => stable Spark Streaming runs micro-batchesat fixed batchintervals time 0s 2s 4s 6s 8s
  12. 12. Unstable micro-batch operation 12 batch processing time > batch interval 2.1s 2.1s 2.1s 2.1s scheduling delay builds up Batchescontinuously getsdelayed and backlogged => unstable time Spark Streaming runs micro-batchesat fixed batchintervals 0s 2s 4s 6s 8s
  13. 13. Backpressure: Feedback Loop 13 Backpressureintroducesa feedback loop to dynamically adapt the system and avoid instability
  14. 14. Backpressure: Dynamic rate limiting 14 Batch processingtimes Scheduling delays Batch processingtimes and schedulingdelays used to continuouslyestimate currentprocessing rates
  15. 15. Backpressure: Dynamic rate limiting 15 ∑ Batch processingtimes Scheduling delays Max stable processingrate estimated with PID Controllertheory Well known feedbackmechanismused in industrial control systems PID Controller
  16. 16. Backpressure: Dynamic rate limiting 16 ∑ Batch processingtimes Scheduling delays Rate limits on ingestion Accordingly, the system dynamically adapts the limits on the data ingestion rates PID Controller
  17. 17. Data buffered in Kafka, ensures Spark Streaming stays stable Backpressure: Dynamic rate limiting 17 If HDFS ingestion slows down, processingtimes increase SS lowers rate limits to slow down receiving from Kafka
  18. 18. Backpressure: Configuration Available since Spark 1.5 Enabled through SparkConf,set spark.streaming.backpressure.enabled=true 18
  19. 19. 19 Elastic Scaling Make apps scale with load variations
  20. 20. Elastic Scaling (aka Dynamic Allocation) Scaling the number of Spark executorsaccording to the load Spark alreadysupports Dynamic Allocation for batch jobs Scale down if executorsare idle Scale up if tasks are queueing up Streaming “micro-batch” jobs needdifferent scaling policy No executoris idle for a long time! 20
  21. 21. Scaling policy with Streaming 21 Load is very low Lots idle time between batches Cluster resources wasted stable but inefficient
  22. 22. Scaling policy with Streaming 22 scale down the cluster scaling down increases batch processing times stable operation with less resources wasted stable but inefficient stable and efficient
  23. 23. Scaling policy with Streaming 23 scaling up reduces batch processing times unstable system becomes stable again scale up the cluster unstable stable
  24. 24. Elastic Scaling 24 Data buffered in Kafka starts draining, allowing app adapt to any data rate If Kafka gets data faster than what backpressureallows SS scalesup clusterto increaseprocessing rate
  25. 25. Elastic Scaling: Configuration Will be available in Spark 2.0 Enabled through SparkConf,set spark.streaming.dynamicAllocation.enabled = true More parameters will be in the online programming guide 25
  26. 26. Elastic Scaling: Configuration Make surethere is enoughparallelismto take advantage of max cluster size # of partitions in reduce,join, etc. # of Kafka partitions # of receivers Gives usualfault-tolerance guaranteeswith files, Kafka Direct, Kinesis, and receiver-basedsourceswith WAL enabled 26
  27. 27. 27 Backpressure Elastic Scaling
  28. 28. 28 Awesome Adaptive Apps
  29. 29. 29 Demo How app behaveswhen the data rate suddenly increases20x
  30. 30. Demo 30 Processing time increases with data rate, until equal to batch interval Backpressurelimits the ingestion rate lower than 20k recs/secto keep app stable
  31. 31. Demo 31 Elastic Scaling detects heavy load and increasescluster size Processing times reducesas more resourcesavailable
  32. 32. Demo 32 Backpressurerelaxeslimits to allow higheringestion rate But still less than 20x as cluster is fullyutilized
  33. 33. Takeaways Backpressure:makes apps robust to sudden changes Elastic Scaling: makes apps adapt to slower changes Backpressure+ Elastic Scaling = AwesomeAdaptive Apps Follow me @tathadas

×