Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

603 views

Published on

Large-scale data stream processing has come a long way to where it is today. It combines all the essential requirements of modern data analytics: subsecond latency, high throughput and impressively, strong consistency. Apache Flink is a system that serves as a proof-of-concept of these characteristics and it is mainly well-known for its lightweight fault tolerance. Data engineers and analysts can now let the system handle Terabytes of computational state without worrying about failures that can potentially occur.
This presentation describes all the fundamental challenges behind exactly-once processing guarantees in large-scale streaming in a simple and intuitive way. Furthermore, it demonstrate the basic and extended versions of Flink's state-of-the-art snapshotting algorithm tailored to the needs of a dataflow graph.

Published in: Data & Analytics
  • Be the first to comment

Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action

  1. 1. Exactly Once, Large-Scale Stream Processing in Action Not Less, Not More Paris Carbone Committer @ Apache Flink PhD Candidate @ KTH @FOSDEM 2017 @SenorCarbone
  2. 2. Data Stream Processors Data Stream Processor can set up any data pipeline for you http://edge.alluremedia.com.au/m/l/2014/10/CoolingPipes.jpg
  3. 3. With Data Stream Processors
  4. 4. • sub-second latency and high throughput finally coexist With Data Stream Processors
  5. 5. • sub-second latency and high throughput finally coexist • late data* is handled gracefully With Data Stream Processors * http://dl.acm.org/citation.cfm?id=2824076
  6. 6. • sub-second latency and high throughput finally coexist • late data* is handled gracefully • and btw data stream pipelines run 365/24/7 consistently without any issues in general. With Data Stream Processors * http://dl.acm.org/citation.cfm?id=2824076
  7. 7. • sub-second latency and high throughput finally coexist • late data* is handled gracefully • and btw data stream pipelines run 365/24/7 consistently without any issues in general. • wait…what? With Data Stream Processors * http://dl.acm.org/citation.cfm?id=2824076
  8. 8. So…what about
  9. 9. So…what about handling failures! application updates reconfiguring/upgrading the system adding more/less workers
  10. 10. So…what about handling failures! application updates reconfiguring/upgrading the system adding more/less workers is it realistic to expect the thing run forever correctly?
  11. 11. we cannot eliminate entropy
  12. 12. we cannot eliminate entropy but in a fail-recovery model …we can turn back time and try again
  13. 13. Let’s talk about Guarantees
  14. 14. Let’s talk about Guarantees guaranteed tuple processing exactly once! transactional processing processing output idempotent writes end-to-end resilient state high availability deterministic processing delivery at-least once at-most once
  15. 15. What Guarantees
  16. 16. 1) Processing • Output • Delivery • End-to-End 2) system outside world What Guarantees
  17. 17. Processing Guarantees
  18. 18. Processing Guarantees • Why should we care? • Processing creates side effects inside the system’s internal state. • Less or more processing sometimes means incorrect internal state.
  19. 19. Processing Guarantees
  20. 20. Processing Guarantees • At-Most Once: the system might process less (e.g., ignoring data if overloaded) • At-Least Once: the system might process more (e.g., replaying input) • Exactly Once: the system behaves as if input data are processed exactly once
  21. 21. At-Least Once Processing • Useful when repetition can be tolerated. • Already offered by logs (e.g., Kafka, Kinesis) • Manual Logging & Bookkeeping (Storm < v.2)
  22. 22. At-Least Once Processing
  23. 23. At-Least Once Processing 1 1 1 1
  24. 24. At-Least Once Processing 1 1 1
  25. 25. At-Least Once Processing 1 1 1
  26. 26. At-Least Once Processing 12 2 2
  27. 27. Exactly Once Processing
  28. 28. Exactly Once Processing It is a bit trickier. We need to make sure that 1. Data leaves sides effects only once 2. Failure Recovery/Re-Scaling do not impact the correct execution of the system
  29. 29. Exactly Once Processing A fine-grained solution… Maintain a log for each operation* persistent store * http://dl.acm.org/citation.cfm?id=2536229 1 1 2 1 2 3
  30. 30. Exactly Once Processing A fine-grained solution… Maintain a log for each operation*
  31. 31. Exactly Once Processing A fine-grained solution… Maintain a log for each operation* • It allows for fine grained failure recovery and trivial reconfiguration. • Can be optimised to batch writes However: • It requires a finely-tuned performant store • Can cause aggressive write/append congestion
  32. 32. Remember cassettes?
  33. 33. Remember cassettes?
  34. 34. Remember cassettes?
  35. 35. Remember cassettes? durable logs similarly allow you to rollback input
  36. 36. Remember cassettes? durable logs similarly allow you to rollback input
  37. 37. Remember cassettes? durable logs similarly allow you to rollback input in parallel
  38. 38. Remember cassettes? durable logs similarly allow you to rollback input in parallel from specific offsets
  39. 39. Remember cassettes? durable logs similarly allow you to rollback input in parallel from specific offsets
  40. 40. Remember cassettes? durable logs similarly allow you to rollback input in parallel from specific offsets
  41. 41. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions
  42. 42. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions part 1
  43. 43. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions part 1 part 2
  44. 44. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions part 1 part 2 part 3 part 4
  45. 45. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions part 1 part 2 part 3 part 4 each part either completes or repeats
  46. 46. Exactly Once Processing Now a more coarse-grained approach… Turn continuous computation into a series of transactions part 1 part 2 part 3 part 4 each part either completes or repeats also got to capture the global state of the system( ) after processing each part to resume completely if needed
  47. 47. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) System State Store snap Snap after each part has being processed
  48. 48. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 1 System State Store snap Snap after each part has being processed
  49. 49. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 1 System State Store snap Snap after each part has being processed
  50. 50. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 1 System State Store snap Snap after each part has being processed
  51. 51. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 2 System State Store snap Snap after each part has being processed
  52. 52. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 2 System State Store snap Snap after each part has being processed
  53. 53. Coarse Grained Fault Tolerance - Illustrated Type 1 Discrete Execution (micro-batch) prepare part 2 System State Store snap Snap after each part has being processed
  54. 54. Exactly Once Processing *http://shivaram.org/drafts/drizzle.pdf
  55. 55. Exactly Once Processing Micro-batching: • A fine example of discretely emulating continuous processing as a series of transactions. • However: • It enforces a somewhat inconvenient think-like-a-batch logic for a continuous processing programming model. • Causes unnecessarily high periodic scheduling latency (can be traded over higher reconfiguration latency by pre-scheduling multiple micro-batches* ) *http://shivaram.org/drafts/drizzle.pdf
  56. 56. Type 2 Long-Running Synchronous System State Store snap Coarse Grained Fault Tolerance - Illustrated Snap while each part is being processed
  57. 57. Type 2 Long-Running Synchronous System State Store snap halt & snap! Coarse Grained Fault Tolerance - Illustrated Snap while each part is being processed
  58. 58. Type 2 Long-Running Synchronous System State Store snap + halt & snap! in-transit events to replay Coarse Grained Fault Tolerance - Illustrated Snap while each part is being processed
  59. 59. Type 2 Long-Running Synchronous System State Store snap + halt & snap! in-transit events to replay Coarse Grained Fault Tolerance - Illustrated Snap while each part is being processed
  60. 60. we want to capture distributed state without ✴ enforcing it in the API or ✴ disrupting the execution
  61. 61. we want to capture distributed state without ✴ enforcing it in the API or ✴ disrupting the execution also, do we really need those in-transit events?
  62. 62. “The global-state-detection algorithm is to be superimposed on the underlying computation: it must run concurrently with, but not alter, this underlying computation” Leslie Lamport
  63. 63. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  64. 64. Type 3 Long Running Pipelined* System State Store snap insert markers * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  65. 65. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  66. 66. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  67. 67. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 align to prioritise records of part 1 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  68. 68. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 align to prioritise records of part 1 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  69. 69. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  70. 70. Type 3 Long Running Pipelined* System State Store snap * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  71. 71. Type 3 Long Running Pipelined* System State Store snap got a full snapshot! with no records in-transit * https://arxiv.org/abs/1506.08603 Coarse Grained Fault Tolerance - Illustrated Snap just in time!
  72. 72. Facts about Flink’s Snapshotting *http://lamport.azurewebsites.net/pubs/chandy.pdf
  73. 73. Facts about Flink’s Snapshotting • It pipelines naturally with the data-flow (respecting back-pressure etc.) • We can get at-least-once processing guarantees by simply dropping aligning (try it) • Tailors Chandy-Lamport’s original approach* to dataflow graphs (with minimal snapshot state & messages) • It can also work for cycles (with a minor modification) *http://lamport.azurewebsites.net/pubs/chandy.pdf
  74. 74. Supporting Cycles
  75. 75. Supporting Cycles Problem: we cannot wait indefinitely for records in cycles
  76. 76. Supporting Cycles Problem: we cannot wait indefinitely for records in cycles Solution: log those records as part of the snapshot. Replay upon recovery. https://github.com/apache/flink/pull/1668
  77. 77. Output Guarantees
  78. 78. Is this a thing? Output Guarantees
  79. 79. 1. Can’t, it’s distributed Is this a thing? Output Guarantees
  80. 80. 1. Can’t, it’s distributed 2. Yep easy Is this a thing? Output Guarantees
  81. 81. 1. Can’t, it’s distributed 2. Yep easy 3. It depends ;) Is this a thing? Output Guarantees
  82. 82. 1. Can’t, it’s distributed 2. Yep easy 3. It depends ;) Is this a thing? Output Guarantees
  83. 83. • Idempontency ~ repeated operations give the same output result. (e.g., Flink’s Cassandra sink*) • Rolling Files ~ Pipeline output is bucketed and committed when a checkpoint is complete otherwise we roll it back. (see Flink’s HDFS RollingSink**) *https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/cassandra.html **https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.html in-progress pending pending committed Exactly Once Output
  84. 84. so no design flaws possible… right? Sir, about that Job Manager… shoot here to detonate
  85. 85. > Abort Mission! They have HA
  86. 86. High Availability Zookeeper State • Current Leader (elected) • Pending Pipeline Metadata • State Snapshot Metadata zab JM JM JM
  87. 87. Perks of using Flink today (v1.2) • Key-space partitioning and key group allocation • Job Rescaling - from snapshots ;) • Async state snapshots in Rocksdb • Managed State Structures - ListState (append only), ValueState, ReducingState • Externalised Checkpoints for custom cherry picking to rollback. • Adhoc checkpoints (savepoints)
  88. 88. Coming up next Autoscaling Incremental Snapshots Durable Iterative Processing
  89. 89. Acknowledgements • Stephan Ewen, Ufuk Celebi, Aljoscha Krettek (and more folks at dataArtisans) • Gyula Fóra (King.com) and all contributors who have put code, effort and thought to build this unique state management system.
  90. 90. Exactly Once, Large-Scale Stream Processing in Action Not Less, Not More @SenorCarbone

×