Spark Meetup:DataScience@Concur - Reacting to RT events to control throughput
1. Task Flow Rate Management
via Spark Streaming
Steve Hastings & Anikate Singh
DataScience@Concur
2. Task Queue
Flag Tasks
Service Center
• Dynamically flag transactions while controlling their throughput based on various parameters
• Decouple the task creation process from the task flagging process
• Real time visibility on task flow rates and ability to perform analytics
Problem Domain
5. Spark Streaming Agenda
1. Flagging Strategies
1. Random Sampling (dynamic sampling rates)
2. Transaction analysis (dynamic analysis)
2. Maintaining State
1. In spark or elsewhere
3. Getting data in/out
6. Flagging - Random Sampling
• Random txn flagging
– Each txn individually sampled
• Two types of rate change
– "Panic Button" - instant drop in rate
– Re-equilibrate - slow rise in rate
Rate Change Formulae
R0 = T
Rt+1 = Rt + 0.25 x (T - Rt)
or
Rt+1 = 0.9 * Rt
PANIC
7. Flagging - Transaction Analysis
• Example: outliers
– Flag txns with some value > 95 percentile
– But with streaming updates to percentiles
• Using Twitter Algebird QTree
+ = ?
9. Maintaining State (cont)
def updateState(values: Seq[(Int,Int)],
state: Option[Double]): Option[Double] = {
val newState = state match {
case Some(old) => // do stuff
case None => // do other stuff
}
Some(newState)
}
10. Getting Data In/Out
• Kafka Input
– Fairly easy to get up and going
• Kafka (or any) output
– Create your own connections
– Watch where your code is running
Spark Streaming
Executor
Executor
Executor
DestinationSource