Be the first to like this
Streaming data is now the new trend, and for very good reasons. Most data is produced continuously, and it makes sense that it is processed and analyzed continuously. Whether it is the need for more real-time products, adopting micro-services, or building continuous applications, stream processing technology offers to simplify the data infrastructure stack and reduce the latency to decisions. Before Apache Flink, users of stream processing frameworks had to make hard choices and trade off either latency, throughput, or result accuracy. Flink was the first open source framework (and still the only one), that has been demonstrated to deliver (1) throughput in the order of tens of millions of events per second in moderate clusters, (2) sub-second latency that can be as low as few 10s of milliseconds, (3) guaranteed exactly once semantics for application state, as well as exactly once end-to-end delivery with supported sources and sinks (e.g., pipelines from Kafka to Flink to HDFS or Cassandra), and (4) accurate results in the presence of out of order data arrival through its support for event time. In this talk, I will cover the basics on Flink: why the project exists, where it came from, what gap does it fill, how it differs from all the other stream processing projects, and what is it being used for. I will also recent developments in the Flink community, what the community is working on currently, and touch upon a longer-term vision for Flink.