Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Marton Balassi – Stateful Stream Processing

7,518 views

Published on

Flink Forward 2015

Published in: Technology
  • Be the first to comment

Marton Balassi – Stateful Stream Processing

  1. 1. Stateful Stream Processing Márton Balassi mbalassi@apache.org Gábor Hermann ghermann@ilab.sztaki.hu
  2. 2. This talk  Stateful stream processing by example  Open source stream processors  Runtime architecture  Fault tolerance  Stateful processing  Closing 2Flink Forward2015-10-13
  3. 3. Stateful stream processing by example 2015-10-13 Flink Forward 3
  4. 4. Streaming applications ETL style operations • Filter incoming data, Log analysis • No or minimal state Window aggregations • Trending tweets, User sessions, Stream joins • State: Window buffer 2015-10-13 Flink Forward 4 Inpu t Inpu t Inpu tInput Process/Enrich
  5. 5. Streaming applications Machine learning • Fitting trends to the evolving stream, Stream clustering • State: Model Pattern recognition • Fraud detection, Triggering signals based on activity • State: Finite state machine 5Flink Forward2015-10-13
  6. 6. State in streaming programs  Almost all non-trivial streaming programs are stateful  Stateful operators (in essence): 𝒇: 𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆′  State hangs around and can be read and modified as the stream evolves  Goal: Get as close as possible while maintaining scalability and fault-tolerance 6Flink Forward2015-10-13
  7. 7. Open source stream processors 2015-10-13 Flink Forward 7
  8. 8. Apache Streaming landscape 82015-10-13 Flink Forward
  9. 9. Apache Storm  Started in 2010, development driven by BackType, then Twitter  Pioneer in large scale stream processing  Distributed dataflow abstraction (spouts & bolts) 92015-10-13 Flink Forward
  10. 10. Apache Flink  Started in 2008 as a research project (Stratosphere) at European universities  Unique combination of low latency streaming and high throughput batch analysis  Flexible operator states and windowing 10 Batch data Kafka, RabbitMQ, ... HDFS, JDBC, ... Stream Data 2015-10-13 Flink Forward
  11. 11. Apache Samza  Developed at LinkedIn, open sourced in 2013  Builds heavily on Kafka’s log based philosophy  Pluggable messaging system and execution backend 112015-10-13 Flink Forward
  12. 12. Apache Spark  Started in 2009 at UC Berkley, Apache since 2013  Very strong community, wide adoption  Unified batch and stream processing over a batch runtime  Good integration with batch programs 122015-10-13 Flink Forward
  13. 13. Runtime and programming model 2015-10-13 Flink Forward 13
  14. 14. Native Streaming 2015-10-13 Flink Forward 14
  15. 15. Distributed dataflow runtime  Storm, Samza and Flink  General properties • Long standing operators • Pipelined execution • Usually possible to create cyclic flows 2015-10-13 Flink Forward 15 Pros • Full expressivity • Low-latency execution • Stateful operators Cons • Fault-tolerance is hard • Throughput may suffer • Load balancing is an issue
  16. 16. Micro-batching 2015-10-13 Flink Forward 16
  17. 17. Micro-batch runtime  Implemented by Apache Spark  General properties • Computation broken down to time intervals • Load aware scheduling • Easy interaction with batch 2015-10-13 Flink Forward 17 Pros • Easy to reason about • High-throughput • FT comes for “free” • Dynamic load balancing Cons • Latency depends on batch size • Limited expressivity • Stateless by nature
  18. 18. Fault tolerance 2015-10-13 Flink Forward 18
  19. 19. Fault tolerance intro  Fault-tolerance in streaming systems is inherently harder than in batch • Can’t just restart computation • State is a problem • Fast recovery is crucial • Streaming topologies run 24/7 for a long period  Fault-tolerance is a complex issue • No single point of failure is allowed • Guaranteeing input processing • Consistent operator state • Fast recovery • At-least-once vs Exactly-once semantics 2015-10-13 Flink Forward 19
  20. 20. Storm record acknowledgements  Track the lineage of tuples as they are processed (anchors and acks)  Special “acker” bolts track each lineage DAG (efficient xor based algorithm)  Replay the root of failed (or timed out) tuples 2015-10-13 Flink Forward 20
  21. 21. Samza offset tracking  Exploits the properties of a durable, offset based messaging layer  Each task maintains its current offset, which moves forward as it processes elements  The offset is checkpointed and restored on failure (some messages might be repeated) 2015-10-13 Flink Forward 21
  22. 22. Flink checkpointing  Based on consistent global snapshots  Algorithm designed for stateful dataflows (minimal runtime overhead)  Exactly-once semantics 22Flink Forward2015-10-13
  23. 23. Spark RDD recomputation  Immutable data model with repeatable computation  Failed RDDs are recomputed using their lineage  Checkpoint RDDs to reduce lineage length  Parallel recovery of failed RDDs  Exactly-once semantics 2015-10-13 Flink Forward 23
  24. 24. Stateful stream processing 2015-10-13 Flink Forward 24
  25. 25. State in streaming programs  Almost all non-trivial streaming programs are stateful  Stateful operators (in essence): 𝒇: 𝒊𝒏, 𝒔𝒕𝒂𝒕𝒆 ⟶ 𝒐𝒖𝒕, 𝒔𝒕𝒂𝒕𝒆′  State hangs around and can be read and modified as the stream evolves  Goal: Get as close as possible while maintaining scalability and fault-tolerance 25Flink Forward2015-10-13
  26. 26.  States available only in Trident API  Dedicated operators for state updates and queries  State access methods • stateQuery(…) • partitionPersist(…) • persistentAggregate(…)  It’s very difficult to implement transactional states Exactly-once guarantee 26Flink Forward2015-10-13
  27. 27. Storm Trident Word Count 27Flink Forward2015-10-13
  28. 28.  Stateless runtime by design • No continuous operators • UDFs are assumed to be stateless  State can be generated as a separate stream of RDDs: updateStateByKey(…) 𝒇: 𝑺𝒆𝒒[𝒊𝒏 𝒌], 𝒔𝒕𝒂𝒕𝒆 𝒌 ⟶ 𝒔𝒕𝒂𝒕𝒆′ 𝒌  𝒇 is scoped to a specific key  Exactly-once semantics 28Flink Forward2015-10-13
  29. 29. val stateDstream = wordDstream.updateStateByKey[Int]( newUpdateFunc, new HashPartitioner(ssc.sparkContext.defaultParallelism), true, initialRDD) val updateFunc = (values: Seq[Int], state: Option[Int]) => { val currentCount = values.sum val previousCount = state.getOrElse(0) Some(currentCount + previousCount) } Spark Streaming Word Count 29Flink Forward2015-10-13
  30. 30.  Stateful dataflow operators (Any task can hold state)  State changes are stored as a log by Kafka  Custom storage engines can be plugged in to the log  𝒇 is scoped to a specific task  At-least-once processing semantics 30Flink Forward2015-10-13
  31. 31. Samza Word Count public class WordCounter implements StreamTask, InitableTask { //Some omitted details… private KeyValueStore<String, Integer> store; public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { //Get the current count String word = (String) envelope.getKey(); Integer count = store.get(word); if (count == null) count = 0; //Increment, store and send count += 1; store.put(word, count); collector.send( new OutgoingMessageEnvelope(OUTPUT_STREAM, word ,count)); } } 31Flink Forward2015-10-13
  32. 32.  Stateful dataflow operators (conceptually similar to Samza)  Two state access patterns • Local (Task) state • Partitioned (Key) state  Proper API integration • Java: OperatorState interface • Scala: mapWithState, flatMapWithState…  Exactly-once semantics by checkpointing 32Flink Forward2015-10-13 0.9.1
  33. 33. Flink Word Count words.keyBy(x => x).mapWithState { (word, count: Option[Int]) => { val newCount = count.getOrElse(0) + 1 val output = (word, newCount) (output, Some(newCount)) } } 33Flink Forward2015-10-13
  34. 34. Local State Example (Java) public class MySource extends RichParallelSourceFunction { // Omitted details private OperatorState<Long> offset; @Override public void run(SourceContext ctx) { Object checkpointLock = ctx.getCheckpointLock(); isRunning = true; while (isRunning) { synchronized (checkpointLock) { offset.update(offset.value() + 1); //ctx.collect(next); } } } } 34Flink Forward2015-10-13
  35. 35.  Internal operators are checkpointed: • Aggregations • Window operators • …  KeyValue state • Easing common access patterns  Flexible state backend interface  Removes non-partitioned operator state 35Flink Forward2015-10-13 0.10
  36. 36. Performance (Fault tolerance) 2015-10-13 Flink Forward 36
  37. 37. Performance (Statefullness) 2015-10-13 Flink Forward 37 CheckpointInterval: 5 sec BatchDuration: 5 sec
  38. 38. Closing 2015-10-13 Flink Forward 38
  39. 39. Summary  Storm (Trident) + Consistent state accessible from outside – Only works well with idempotent states – States are not part of the operators  Spark + Integrates well with the system guarantees – Limited expressivity – Immutability increases update complexity  Samza + Efficient log based state updates + States are well integrated with the operators – Lack of exactly-once semantics – State access is not fully transparent  Flink + Light-weight, exactly once distributed snaphots + Flexible checkpointing and state backend interfaces – Has to coordinate with a persistent source – Internal operators not checkpointed in 0.9.1 39Flink Forward2015-10-13
  40. 40. Thank you!
  41. 41. List of Figures (in order of usage)  https://upload.wikimedia.org/wikipedia/commons/thumb/2/2a/CPT-FSM- abcd.svg/326px-CPT-FSM-abcd.svg.png  https://storm.apache.org/images/topology.png  https://databricks.com/wp-content/uploads/2015/07/image11-1024x655.png  https://databricks.com/wp-content/uploads/2015/07/image21-1024x734.png  https://people.csail.mit.edu/matei/papers/2012/hotcloud_spark_streaming.pdf, page 2.  http://www.slideshare.net/ptgoetz/storm-hadoop-summit2014, page 69-71.  http://samza.apache.org/img/0.9/learn/documentation/container/checkpointi ng.svg  https://databricks.com/wp-content/uploads/2015/07/image41-1024x602.png  https://storm.apache.org/documentation/images/spout-vs-state.png  http://samza.apache.org/img/0.9/learn/documentation/container/stateful_job. png 2015-10-13 Flink Forward 41

×