Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - Building End-to-End Exactly-Once Applications with Flink

865 views

Published on

Getting data in and out of Flink is by far the most important aspect, and an everyday typical requirement of building Flink applications. Doing so in an end-to-end exactly-once manner, however, can be tricky. Being able to reliably consume data from the outside world without any duplicate processing and guaranteeing consistent distributed state, and at the same time provide computed results back to the outside world also without introducing duplicates, is crucial for the consistency and correctness of applications built upon stream processors. In this talk, we will talk about how end-to-end exactly-once guarantees can be achieved with Apache Flink. We will talk about Flink’s checkpointing mechanism, and how exactly to leverage it when consuming and producing data from your Flink streaming pipelines. In particular, we will be having a detailed review on how our supported connectors do so, with the aim to provide reference implementations for your own custom consumers and sinks.

Published in: Data & Analytics
  • Be the first to comment

Flink Forward Berlin 2017: Piotr Nowojski - "Hit me, baby, just one time" - Building End-to-End Exactly-Once Applications with Flink

  1. 1. Piotr Nowojski piotr@data-artisans.com @PiotrNowojski “Hit me, baby, just one time” Building End-to-End Exactly-Once Applications with Flink
  2. 2. Overview of delivering guarantees 2
  3. 3. No guarantees and failures 3
  4. 4. At-Least-Once and failures 4
  5. 5. Exactly-Once and failures 5
  6. 6. How to achieve Exactly-Once 6
  7. 7. Exactly-Once on a single node ▪ Simple transactions ▪ Write data in transaction ▪ Include read offsets in transaction ▪ On success - commit transaction ▪ On failure - abort transaction and restart from last committed read offset 7
  8. 8. Exactly-Once on a single node ▪ What about application state? ▪ For example running average ▪ Include in the transaction, by adding as a separate file/table just before commit 8
  9. 9. Exactly-Once in a distributed system ▪ Persistent communication channels 9
  10. 10. Exactly-Once in a distributed system ▪ Persistent communication channels ▪ Allow to re-process messages from last committed read offsets ▪ High costs of communication ▪ Processes operate independent of each other 10
  11. 11. Exactly-Once in Flink 1 1
  12. 12. Exactly-Once two phase commit ▪ Can we do better? Yes! Two phase commit protocol 12 Input data Process 1 Process 2 Output data
  13. 13. Exactly-Once two phase commit 13 Input data Process 1 Process 2 Output data State backend Job manager
  14. 14. Exactly-Once two phase commit 14 Input data Process 1 Process 2 Output data State backend Job manager Start external transaction
  15. 15. Exactly-Once two phase commit 15 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Pre-commit (checkpoint starts)
  16. 16. Exactly-Once two phase commit 16 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Pre-commit
  17. 17. Exactly-Once two phase commit 17 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Snapshot state (3) Pre-commit external transaction (3) Pre-commit
  18. 18. Exactly-Once two phase commit ▪ Without side effects (only internal state) •Example: calculating running average •State managed by Flink ▪ With side effects •Operator must manage external state on it’s own 18
  19. 19. Exactly-Once two phase commit 19 Input data Process 1 Process 2 Output data State backend Job manager Inject checkpoint barrier (1) Snapshot offsets & state (2) Pass checkpoint barrier (2) Snapshot state (3) Pre-commit external transaction (3) Pre-commit
  20. 20. Exactly-Once two phase commit 20 Input data Process 1 Process 2 Output data State backend Job manager Notify checkpoint completed (1) Commit
  21. 21. Exactly-Once two phase commit 21 Input data Process 1 Process 2 Output data State backend Job manager Notify checkpoint completed (1) Commit external transaction (2) Commit
  22. 22. Exactly-Once two phase commit ▪ Once all processes successfully complete pre- commit, commit is issued ▪ If at least one pre-commit fails others are aborted ▪ After a successful pre-commit, commit MUST be guaranteed to eventually succeed ▪ This guarantees that all processes agree on the final outcome 22
  23. 23. Two phase commit example implementation ▪ Begin transaction - create a temporary file ▪ Write element - write data to that file ▪ Pre-commit - flush the file ▪ And begin new transaction for subsequent writes ▪ Commit - move the file to a target directory ▪ Can increase latency 23
  24. 24. Exactly-Once two phase commit ▪ In case of system crash, we can restore state of the application to the latest snapshot ▪ We could/should abort previous “pending” transactions •Delete temporary files 24
  25. 25. TwoPhaseCommitSinkFunction ▪ Flink’s snapshots/checkpoints are very similar to two phase commit ▪ There is a TwoPhaseCommitSinkFunction which maps checkpoint calls to beginTransaction, preCommit, commit and abort calls 25
  26. 26. Exactly-Once producers in Flink ▪ BucketingSink ▪ Pravega connector ▪ Kafka 0.11 connector ▪ Kafka 0.11 introduced transactions ▪ Implemented on top of the TwoPhaseCommitSinkFunction ▪ Low overhead 26
  27. 27. Summarizon ▪ Many ways to achieve Exactly-Once ▪ Flink checkpointing is similar to the two phase commit protocol ▪ No materialization of the data in transit ▪ Lower latency ▪ Higher throughput 27
  28. 28. Questions? 2 8
  29. 29. 29 Thank you! @PiotrNowojski @ApacheFlink @dataArtisans
  30. 30. We are hiring! data-artisans.com/careers

×