Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

438 views

Published on

Streaming data is now the new trend, and for very good reasons. Most data is produced continuously, and it makes sense that it is processed and analyzed continuously. Whether it is the need for more real-time products, adopting micro-services, or building continuous applications, stream processing technology offers to simplify the data infrastructure stack and reduce the latency to decisions. Before Apache Flink, users of stream processing frameworks had to make hard choices and trade off either latency, throughput, or result accuracy. Flink was the first open source framework (and still the only one), that has been demonstrated to deliver (1) throughput in the order of tens of millions of events per second in moderate clusters, (2) sub-second latency that can be as low as few 10s of milliseconds, (3) guaranteed exactly once semantics for application state, as well as exactly once end-to-end delivery with supported sources and sinks (e.g., pipelines from Kafka to Flink to HDFS or Cassandra), and (4) accurate results in the presence of out of order data arrival through its support for event time. In this talk, I will cover the basics on Flink: why the project exists, where it came from, what gap does it fill, how it differs from all the other stream processing projects, and what is it being used for. I will also recent developments in the Flink community, what the community is working on currently, and touch upon a longer-term vision for Flink.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

  1. 1. 1 Kostas Tzoumas @kostas_tzoumas Strata + Hadoop World NYC 2016 September 29, 2016 Apache Flink®: State of the Union and What's Next
  2. 2. What I'd like to talk about  Some highlights from Flink Forward 2016  Streaming ecosystem evolution and Flink  What's coming up in Flink 2
  3. 3. 3 Original creators of Apache Flink® Providers of the dA Platform, the supported Flink distribution
  4. 4. Flink Forward 2016 4
  5. 5. 5 Flink Forward 2016
  6. 6. 7 sponsors
  7. 7. Speaker organizations
  8. 8. Retail, e-commerce  Better product recommendations  Process monitoring  Inventory management Finance  Differentiation via tech  Push-based products  Fraud detection Telco, IoT, Infrastructure  Infrastructure monitoring  Anomaly detection Internet & mobile  Personalization  User behavior monitoring  Analytics 8
  9. 9. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second 9
  10. 10. 10
  11. 11. Streaming ecosystem and Flink 11
  12. 12. Streaming technology is enabling the obvious: continuous processing on data that is continuously produced Hint: you already have streaming data 12
  13. 13. 13 collect log analyze query app state history log
  14. 14. 14 (Aside: streaming and "batch") 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Batch (bounded stream)Stream (high latency)
  15. 15. What is Flink's unique contribution in the streaming data ecosystem? 15
  16. 16. Before Flink, users had to make hard choices between volume, latency, and accuracy 16
  17. 17. Flink eliminates these tradeoffs  10s of millions events per second for stateful applications  Sub-second latency, as low as single-digit milliseconds  Accurate computation results 17
  18. 18. A broader definition of accuracy: the results that I want when I want them 1. Accurate under failures and downtime 2. Accurate under out of order data 3. Results when you need them 4. Accurate modeling of the world 18
  19. 19. 1. Failures and downtime  Checkpoints & savepoints  Exactly-once guarantees 2. Out of order and late data  Event time support  Watermarks 3. Results when you need them  Low latency  Triggers 4. Accurate modeling  True streaming engine  Sessions and flexible windows 19
  20. 20. 5. Batch + streaming  One engine  Dedicated APIs 6. Reprocessing  High throughput, event time support, and savepoints 7. Ecosystem  Rich connector ecosystem and 3rd party packages 8. Community support  One of the most active projects with over 200 contributors 20 flink -s <savepoint> <job>
  21. 21. 21 Having a dependable framework enables more stateful applications to run as streaming applications
  22. 22. What's coming up in Flink 22
  23. 23.  Provide state of the art streaming capabilities (✔)  Operate in the largest infrastructures of the world  Open up to a wider set of enterprise users  Broaden the scope of stream processing 23
  24. 24. Flink's unique combination of features 24 Low latency High Throughput Well-behaved flow control (back pressure) Consistency Works on real-time and historic data Performance Event Time APIs Libraries Stateful Streaming Savepoints (replays, A/B testing, upgrades, versioning) Exactly-once semantics for fault tolerance Windows & user-defined state Flexible windows (time, count, session, roll-your own) Complex Event Processing Fluent API Out-of-order events Fast and large out-of-core state
  25. 25. Flink v1.1 25 Connectors Metric System (Stream) SQL Session Windows Library enhancements
  26. 26. Flink v1.1 + current threads 26 Connectors Session Windows (Stream) SQL Library enhancements Metric System Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Security Mesos & others Dynamic Resource Management Authentication Queryable State
  27. 27. Flink v1.1 + current threads 27 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  28. 28. Flink v1.1 + current threads 28 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  29. 29. Security / Authentication 29 No unauthorized data access Secured clusters with Kerberos-based authentication • Kafka, ZooKeeper, HDFS, YARN, HBase, … No unencrypted traffic between Flink Processes • RPC, Data Exchange, Web UI Largely contributed by Prevent malicious users to hook into Flink jobs
  30. 30. Checkpoints / Savepoints 30 Recover a running job into a new job Recover a running job onto a new cluster Application state backwards compatibility • Flink 1.0 made the APIs backwards compatible • Now making the savepoints backwards compatible • Applications can be moved to newer versions of Flink even when state backends or internals change v1.x v2.0v1.y
  31. 31. Dynamic scaling 31 Changing load bears changing resource requirements • Need to adjust parallelism of running streaming jobs Re-scaling stateless operators is trivial Re-scaling stateful operators is hard (windows, user state) • Efficiently re-shard state time Workload Resources Re-scaling Flink jobs preserves exactly-once guarantees
  32. 32. Cluster management 32 Series of improvements to seamlessly interoperate with various cluster managers • YARN, Mesos, Docker, Standalone, … Driven by Mesos integration contributed by and
  33. 33. Stream SQL 33 SQL is the standard high-level query language A natural way to open up streaming to more people Problem: There is no Streaming SQL standard • At least beyond the basic operations • Challenging: Incorporate windows and time semantics Flink community working with Apache Calcite to draft a new model
  34. 34. State in stream processing 34 Stateless Streaming (Apache Storm) Stateful Streaming (Apache Samza) Accurate Stateful Streaming (Apache Flink) State sizes in Flink today: 10s gigabytes per operator How to scale this to many terabytes? • Queryable State • Data driven triggers over large state
  35. 35. Large-state streaming 35 How to scale the stream processor state? … and maintain fast checkpoint intervals? … and have very fast recovery on machine failures? More and more database techniques coming into Flink
  36. 36. 36 I wrote a book! Get it at mapr.com/introduction-to- apache-flink
  37. 37. 3 @kostas_tzoumas | @ApacheFlink | @dataArtisans Thank you! We are hiring!

×