Advertisement

Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4

Flink Forward
Sep. 13, 2017
Advertisement

More Related Content

Slideshows for you(20)

Similar to Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4(20)

Advertisement

More from Flink Forward(20)

Advertisement

Flink Forward Berlin 2017: Till Rohrmann - From Apache Flink 1.3 to 1.4

  1. Till Rohrmann till@data-artisans.com @stsffap From Apache Flink® 1.3 to 1.4
  2. 2 Original creators of Apache Flink® Providers of dA Platform 2, including open source Apache Flink + dA Application Manager
  3. Overview Apache Flink 1.3 – Previously on Apache Flink Apache Flink 1.4 – What’s happening now? Apache Flink 1.5+ – Next on Apache Flink 3
  4. Previously on Apache Flink Apache Flink 1.3
  5. Apache Flink 1.3 in Numbers 141 contributors (no deduplication) 1400 commits >= 680 resolved JIRA issues +261813 / -65646 LOC 7
  6. Evolution of Flink’s API 8 Flink 1.0.0 State API (ValueState ReducingState, ListState) Flink 1.1.0 Session Windows Late arriving events Flink 1.2.0 ProcessFunction (access to state, timers, events) Flink 1.3.0 Side outputs Access to per-window state
  7. Side Outputs  Additional outputs for a stream  Late events  Corrupted input data  More expressive APIs  FLINK-4460 9 Process Function Main output Side output
  8. Evolution of Large State Handling 11 Flink 1.0.0 RocksDB for out-of-core state support Flink 1.1.0 Fully async RocksDB snapshots Flink 1.2.0 Rescalable keyed and non-partitioned state Flink 1.3.0 Incremental checkpoints Fine-grained recovery
  9. G H C D Full Checkpoints 12 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E @t1 @t2 @t3 A F C D E G H C D I E
  10. G H C D Incremental Checkpoints 13 Checkpoint 1 Checkpoint 2 Checkpoint 3 I E A B C D A B C D A F C D E E F G H I @t1 @t2 @t3
  11. Incremental Checkpoints 14 Checkpoint 1 Checkpoint 2 Checkpoint 3 Checkpoint 4 C1 C3C1 C1 Chunk 1 Chunk 2 Chunk 3 Chunk 4 Storage C2 C4C3
  12. Incremental Checkpointing Contd. Currently supported for RocksDB state backend FLINK-5053 Faster and smaller checkpoints 15 Full checkpoint Incremental checkpoint Size 60 GB 1 – 30 GB Time 180 s 3 – 30 s “A Look at Flink’s Internal Data Structures and Algorithms for Efficient Checkpointing” by Stefan Richter, Tomorrow @ 12:20 pm Maschinenhaus
  13. Evolution of High Level APIs 16 Flink 1.0.0 CEP library added Table API v1 Flink 1.1.0 Table API overhaul Integration with Apache Calcite Flink 1.2.0 Tumbling, sliding and session group-windows for Table API Flink 1.3.0 Rescalable CEP operators Retractions in Table API/SQL
  14. Enriched CEP Language Support for quantifiers (+, *, ?) FLINK-3318 Iterative conditions FLINK-6197 Not operator FLINK-3320 17 “Complex Event Processing With Flink: The State of FlinkCEP” by Kostas Kloudas, Today @ 2:30 pm Maschinenhaus
  15. What’s Happening Now? Apache Flink 1.4
  16. Event Driven I/O 23 Rework of Flink’s network stack Event driven network I/O Use full available capacity Near perfect latency behaviour TCP Buffer capacity left flush
  17. Flow Control  Flow control for TaskManager communication  Single channel no longer stalls other multiplexed channels  Fine-grained backpressure control  Improves checkpoint alignments 24 “Building a Network Stack for Optimal Throughput / Low-Latency Trade-Offs” by Nico Kruber, Today @ 2:00 pm Palais Atelier Receiver Sender #1 Sender #2 Give credit Send credited data
  18. New Deployment Model Rework of Flink’s distributed architecture Ready for multitude of deployment scenarios Support for dynamic scaling 25 “Flink in Containerland” by Patrick Lucas, Tomorrow @ 3:20 pm Maschinenhaus
  19. Producing Exactly Once with Kafka 0.11 Support for Kafka 0.11 First Kafka producer with exactly once processing guarantees 26 “Hit Me, Baby, Just One Time – Building End-to-End Exactly Once Applications With Flink” by Piotr Nowojski, Today @ 3:20 pm Palais Atelier Consuming Producing End-to-End exactly once processing
  20. Operational Robustness Drop Java 7 Support Scala 2.12 Avoid dependency hell Child first class loading Relocation of dependencies De-Hadoopification 28
  21. Next on Apache Flink Apache Flink 1.5+
  22. Side Inputs  Additional input for operator  Join with static data set  Feeding of externally trained ML model  Window joins  Flip-17 design document: https://goo.gl/W4yMEu 30 Process Function Main input Side input
  23. State Management & Evolution Eager state declaration State type, serializer and name known at pre-flight time Flip-22 design document: https://goo.gl/trFiSi Evolving existing state Schema updates Serializer upgrades 31 “Managing State in Apache Flink” by Tzu-Li Tai, Today @ 4:30 pm Kesselhaus
  24. State Replication Replicate state between TaskManagers Faster recovery in case of failures High throughput queryable state 32 TaskManager TaskManager Change log stream Input State
  25. Programmatic Job Control Improve client to give better job control Run concurrent jobs from the same program Trigger savepoints programmatically Better testing facilities 33
  26. JobClient & ClusterClient 34 StreamExecutionEnvironment env = ...; // define program JobClient jobClient = env.execute(); CompletableFuture<Acknowledge> savepointFuture = jobClient.takeSavepoint(savepointPath); // wait for the savepoint completion savepointFuture.get(); CompletableFuture<JobExecutionResult> resultFuture = jobClient.getResultFuture(); // cancel the job jobClient.cancelJob(); // get the execution result --> should be canceled JobExecutionResult result = resultFuture.get(); // get list of all still running jobs on the cluster ClusterClient clusterClient = jobClient.getClusterClient(); CompletableFuture<List<JobInfo>> jobInfosFuture = clusterClient.getJobInfos(); List<JobInfo> jobInfos = jobInfosFuture.get();
  27. TL;DL Apache Flink one of the most innovative open source stream processing platforms Stay tuned what’s happening next  Visit the in depths talks to learn more about Flink’s internals 36
  28. 37 Thank you! @stsffap @ApacheFlink @dataArtisans
  29. We are hiring! data-artisans.com/careers 38

Editor's Notes

  1. A little bit about myself, I am a committer for Apache Flink and a software engineer for data Artisans, the original creators of Apache Flink and the providers of the dA Platform.
  2. Flink as a library Containerized execution
  3. Relocated dependencies: asm, guava, jackson, netty
Advertisement