Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

4,599 views

Published on

Apache Flink (incubating) is one of the latest addition to the Apache family of data processing engines. In short, Flink’s design aims to be as fast as in-memory engines, while providing the reliability of Hadoop. Flink contains (1) APIs in Java and Scala for both batch-processing and data streaming applications, (2) a translation stack for transforming these programs to parallel data flows and (3) a runtime that supports both proper streaming and batch processing for executing these data flows in large compute clusters.

Flink’s batch APIs build on functional primitives (map, reduce, join, cogroup, etc), and augment those with dedicated operators for iterative algorithms, and support for logical, SQL-like key attribute referencing (e.g., groupBy(“WordCount.word”). The Flink streaming API extends the primitives from the batch API with flexible window semantics.

Internally, Flink transforms the user programs into distributed data stream programs. In the course of the transformation, Flink analyzes functions and data types (using Scala macros and reflection), and picks physical execution strategies using a cost-based optimizer. Flink’s runtime is a true streaming engine, supporting both batching and streaming. Flink operates on a serialized data representation with memory-adaptive out-of-core algorithms for sorting and hashing. This makes Flink match the performance of in-memory engines on memory-resident datasets, while scaling robustly to larger disk-resident datasets.

Finally, Flink is compatible with the Hadoop ecosystem. Flink runs on YARN, reads data from HDFS and HBase, and supports mixing existing Hadoop Map and Reduce functions into Flink programs. Ongoing work is adding Apache Tez as an additional runtime backend.

This talk presents Flink from a user perspective. We introduce the APIs and highlight the most interesting design points behind Flink, discussing how they contribute to the goals of performance, robustness, and flexibility. We finally give an outlook on Flink’s development roadmap.

Published in: Technology
  • Be the first to comment

January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing

  1. 1. Kostas Tzoumas, Stephan Ewen Flink committers co-founders, data Artisans @kostas_tzoumas @StephanEwen Apache Flink
  2. 2. What is Flink  Collection programming APIs for batch and real-time streaming analysis  Backed by a very robust execution backend • with true streaming capabilities, • custom memory manager, • native iteration execution, • and a cost-based optimizer. 2
  3. 3. The case for Flink  Performance and ease of use • Exploits in-memory and pipelining, language- embedded logical APIs  Unified batch and real streaming • Batch and Stream APIs on top of streaming engine  A runtime that "just works" without tuning • C++ style memory management inside the JVM  Predictable and dependable execution • Bird’s-eye view of what runs and how, and what failed and why 3
  4. 4. Example: WordCount 4 case class Word (word: String, frequency: Int) val env = ExecutionEnvironment.getExecutionEnvironment env.readTextFile(...) .flatMap {line => line.split(" ").map(word => Word(word,1))} .groupBy("word").sum("frequency”).print() env.execute() Flink has mirrored Java and Scala APIs that offer the same functionality, including by-name addressing.
  5. 5. Example: Window WordCount 5 case class Word (word: String, frequency: Int) val env = StreamExecutionEnvironment.getExecutionEnvironment val lines = env.fromSocketStream(...) lines .flatMap {line => line.split(" ").map(word => Word(word,1))} .window(Count.of(100)).every(Count.of(10)) .groupBy("word").sum("frequency”).print() env.execute()
  6. 6. Defining windows  Trigger policy • When to trigger the computation on current window  Eviction policy • When data points should leave the window • Defines window width/size  E.g., count-based policy • evict when #elements > n • start a new window every n-th element  Built-in: Count, Time, Delta policies 6
  7. 7. Flink API in a nutshell  map, flatMap, filter, groupBy, reduce, reduceGroup, aggregate, join, coGroup, cross, project, distinct, union, iterate, iterateDelta, ...  All Hadoop input formats are supported  API similar for data sets and data streams with slightly different operator semantics  Window functions for data streams  Counters, accumulators, and broadcast variables 7
  8. 8. Flink stack 8 Flink Optimizer Flink Stream Builder Common API Scala API Java API Python API (upcoming) Graph API (Gelly) Apache MRQL Flink Local RuntimeEmbedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Data storage HDFSFiles S3 JDBC Flume Rabbit MQ KafkaHBase … Single node execution Standalone or YARN cluster
  9. 9. Technology inside Flink  Technology inspired by compilers + MPP databases + distributed systems  For ease of use, reliable performance, and scalability case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Memory manager Out-of-core algos real-time streaming Task scheduling Recovery metadata Data serialization stack Streaming network stack ... Pre-flight (client) Master Workers
  10. 10. Notable runtime features 1. Pipelined data transfers 2. Management of memory 3. Native iterations 4. Program optimization 10
  11. 11. Pipelined data transfers 11
  12. 12. Staged (batch) execution Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 Stage 1: Create/cache Log Subseqent stages: Grep log for matches Caching in-memory and disk if needed 12
  13. 13. Pipelined execution Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str3 Grep 1 Grep 2 Grep 3 001100110011001100110011 Stage 1: Deploy and start operators Data transfer in- memory and disk if needed 13 Note: Log DataSet is never “created”!
  14. 14. Pipelining in Flink  Currently the default mode of operation • Much better performance in many cases – no need to materialize large data sets • Supports both batch and real-time streaming  In the future pluggable • Batch will use combination of blocking and pipelining • Streaming will use pipelining • Interactive will use blocking 14
  15. 15. Memory management 15
  16. 16. Memory management in Flink public class WC { public String word; public int count; } empty page Pool of Memory Pages Sorting, hashing, caching Shuffling, broadcasts User code objects ManagedUnmanaged 16 Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components.
  17. 17. Configuring Flink  Per job • Parallelism  System config • Total JVM heap size (-Xmx) • % of total JVM size for Flink runtime • Memory for network buffers (soon not needed)  That's all you need. System will not throw an OOM exception to you. 17
  18. 18. Benefits of managed memory  More reliable and stable performance (less GC effects, easy to go to disk) 18
  19. 19. Native iterative processing 19
  20. 20. Example: Transitive Closure 20 case class Path (from: Long, to: Long) val env = ExecutionEnvironment.getExecutionEnvironment val edges = ... val tc = edges.iterate (10) { paths: DataSet[Path] => val next = paths .join(edges).where("to").equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths).distinct() next } tc.print() env.execute()
  21. 21. Iterate natively 21 partial solution partial solutionX other datasets Y initial solution iteration result Replace Step function
  22. 22. Iterate natively with deltas 22 partial solution delta setX other datasets Y initial solution iteration result workset A B workset Merge deltas Replace initial workset
  23. 23. Effect of delta iterations 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 1 6 11 16 21 26 31 36 41 46 51 56 61 #ofelementsupdated iteration
  24. 24. Iteration performance 24 MapReduce
  25. 25. Closing 25
  26. 26. Flink roadmap for 2015  Unify batch and streaming  Machine learning library and Mahout  Graph processing library improvements  Interactive programs and Zeppelin  Logical queries and SQL  And many more 26
  27. 27. Flink community 0 20 40 60 80 100 120 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15 #unique contributors by git commits (without manual de-dup)
  28. 28. flink.apache.org @ApacheFlink

×