Aljoscha Krettek, Sensei Software Engineer
(Past), Present, and Future of
Apache Flink®
© 2018 data Artisans2
What is Apache Flink?
Batch Processing
process static and
historic data
Data Stream
Processing
realtime results
from data streams
Event-driven
Applications
data-driven actions
and services
Stateful Computations Over Data Streams
© 2018 data Artisans3
What is Apache Flink?
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application
© 2018 data Artisans4
Present – New in Flink 1.5
• FLIP-6
‒ Tighter integration with the resource manager (YARN, Mesos, Kubernetes)
‒ Enables dynamic management of resources
‒ Rework of the client/cluster communication to be REST-based
• Localised Failure Recovery
‒ Failures don‘t require restoring all state from distributed storage
‒ TaskManagers keep state on machines
‒ Failures that are not caused by machine failures lead to faster recovery
• 50% Network Stack Rewrite
‒ Better throughput at very low latencies
‒ Much improved backpressure handling
© 2018 data Artisans5
Present – New in Flink 1.5
• Broadcast State
‒ API that enables new use cases such as applying dynamicCEP patterns on a stream or join
• SQL CLI
‒ An interactive command-line interface for executing SQL queries on Flink
• UnifiedTable Sources
‒ A new interface for defining sources for aTable API/SQL program that allows defining sources from a
configuration file
• Loads more automated testing/release verification
‒ Streamlined testing which will lead to lower overhead for releases
© 2018 data Artisans6
Future – Flink 1.6 and Beyond
• Autoscaling
‒ Automatic and dynamic changes in the parallelism of Flink programs and individual operators
• Hot-standby replication
‒ Replication of the state of operations to multiple machines so that we can instantly migrate
computation in case of failures
• Zero-downtime scaling and upgrades
‒ Parallelism changes, framework upgrades and user-code updates without any downtime
© 2018 data Artisans7
Future – Flink 1.6 and Beyond
• MoreTable API/SQL connectors, integration with data bases
‒ DynamicTables based on a data base, not a stream
• End-to-end batch/streaming integration
‒ Unification of the DataStream and DataSetAPIs
‒ Efficient execution of batch programs and streaming programs
‒ Dynamic switching of execution modes based on workload
• Support for more programming languages
‒ Upcoming: Python and Go (via Apache Beam)
‒ Tensorflow for Machine Learning andAI (also viaApache Beam)
© 2018 data Artisans8
Wrap up
• Despite all the work, Flink is already the best open-source stream processing system, in
production at a ton of companies
• Flink 1.5 has exciting new features
• There are even more exciting features coming up
Thank you!
@aljoscha
@dataArtisans
@ApacheFlink
We are hiring!
data-artisans.com/careers
© 2018 data Artisans10
About Data Artisans
Original creators of
Apache Flink®
Open Source Apache Flink
+ dA Application Manager
© 2018 data Artisans11
dA platform
data-artisans.com/download
© 2018 data Artisans12
Powered by Apache Flink
© 2018 data Artisans13
Download the free book
info.data-artisans.com/book

(Past), Present, and Future of Apache Flink

  • 1.
    Aljoscha Krettek, SenseiSoftware Engineer (Past), Present, and Future of Apache Flink®
  • 2.
    © 2018 dataArtisans2 What is Apache Flink? Batch Processing process static and historic data Data Stream Processing realtime results from data streams Event-driven Applications data-driven actions and services Stateful Computations Over Data Streams
  • 3.
    © 2018 dataArtisans3 What is Apache Flink? Queries Applications Devices etc. Database Stream File / Object Storage Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory, event time, large state, exactly-once Historic Data Streams Application
  • 4.
    © 2018 dataArtisans4 Present – New in Flink 1.5 • FLIP-6 ‒ Tighter integration with the resource manager (YARN, Mesos, Kubernetes) ‒ Enables dynamic management of resources ‒ Rework of the client/cluster communication to be REST-based • Localised Failure Recovery ‒ Failures don‘t require restoring all state from distributed storage ‒ TaskManagers keep state on machines ‒ Failures that are not caused by machine failures lead to faster recovery • 50% Network Stack Rewrite ‒ Better throughput at very low latencies ‒ Much improved backpressure handling
  • 5.
    © 2018 dataArtisans5 Present – New in Flink 1.5 • Broadcast State ‒ API that enables new use cases such as applying dynamicCEP patterns on a stream or join • SQL CLI ‒ An interactive command-line interface for executing SQL queries on Flink • UnifiedTable Sources ‒ A new interface for defining sources for aTable API/SQL program that allows defining sources from a configuration file • Loads more automated testing/release verification ‒ Streamlined testing which will lead to lower overhead for releases
  • 6.
    © 2018 dataArtisans6 Future – Flink 1.6 and Beyond • Autoscaling ‒ Automatic and dynamic changes in the parallelism of Flink programs and individual operators • Hot-standby replication ‒ Replication of the state of operations to multiple machines so that we can instantly migrate computation in case of failures • Zero-downtime scaling and upgrades ‒ Parallelism changes, framework upgrades and user-code updates without any downtime
  • 7.
    © 2018 dataArtisans7 Future – Flink 1.6 and Beyond • MoreTable API/SQL connectors, integration with data bases ‒ DynamicTables based on a data base, not a stream • End-to-end batch/streaming integration ‒ Unification of the DataStream and DataSetAPIs ‒ Efficient execution of batch programs and streaming programs ‒ Dynamic switching of execution modes based on workload • Support for more programming languages ‒ Upcoming: Python and Go (via Apache Beam) ‒ Tensorflow for Machine Learning andAI (also viaApache Beam)
  • 8.
    © 2018 dataArtisans8 Wrap up • Despite all the work, Flink is already the best open-source stream processing system, in production at a ton of companies • Flink 1.5 has exciting new features • There are even more exciting features coming up
  • 9.
  • 10.
    © 2018 dataArtisans10 About Data Artisans Original creators of Apache Flink® Open Source Apache Flink + dA Application Manager
  • 11.
    © 2018 dataArtisans11 dA platform data-artisans.com/download
  • 12.
    © 2018 dataArtisans12 Powered by Apache Flink
  • 13.
    © 2018 dataArtisans13 Download the free book info.data-artisans.com/book

Editor's Notes

  • #10 (Keep this slide up during the Q&A part of your talk. Having this up in the final 5-10 minutes of the session gives the audience something useful to look at.)
  • #11 • data Artisans was founded by the original creators of Apache Flink • We provide dA Platform, a complete stream processing infrastructure with open-source Apache Flink
  • #12 • Also included is the Application Manager, which turns dA Platform into a self-service platform for stateful stream processing applications. • dA Platform is generally available, and you can download a free trial today!
  • #13 • These companies are among many users of Apache Flink, and during this conference you’ll meet folks from some of these companies as well as others using Flink. • If your company would like to be represented on the “Powered by Apache Flink” page, email me.
  • #14 (Optional slide – may not be appropriate for advanced audience. Helps us capture leads.)