The Apache Flink community has pushed (and continues to push) the boundary for Stream Processing over the last years, following the understanding that Stream Processing is unifying paradigm to build data processing applications, beyond real-time analytics.
The latest major effort in the Flink community is nothing less then re-architecting the API and runtime stack, with the goal to naturally support the spectrum of analytics and data-driven applications, to unify the APIs for batch and streaming (Table API and DataStream API), and to build a streaming runtime that is not only state-of-the-art in stream processing, but also in batch processing performance.
In this keynote, we give an overview of the goals and technology behind the above effort, and look at the adoption of Apache Flink for Stream Processing and "beyond streaming" use cases, as well as various efforts in the community to support the growth in users, applications, and ecosystem.
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
Similar to KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified Data Processing System - Stephan Ewen & Xiaowei Jiang & Robert Metzger
A New Approach to Continuous Monitoring in the Cloud: Migrate to AWS with NET...Amazon Web Services
Similar to KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified Data Processing System - Stephan Ewen & Xiaowei Jiang & Robert Metzger (20)
Table API is the choice for analytics workloads
Shared implementation with Flink SQL
Module/Composable code
Dynamic queries
Interactive Programming
Ease of Use
2019.2-2019.6: FLIP-32,refacor table api, decouple API from implementation, side by side supports for multiple runners.(concurrently, one month after)2019.3-2019.6: Blink runner initial merge, supports major functionalities in Blink SQL, and production ready2019.6-2019.7 flink 1.9.0发布2019.7-2019.10 continue to merge and improve Blink Runner. Fully merge all Blink code2019.10-2019.11 new Flink release (1.10.0/2.0.0?)
Supported execution mode:Local, Remote和Yarn
Supports TableAPI, Batch SQL and Stream SQL
Visualize static and dynamic tables
Supports savepoints in Flink jobs
Supports advanced functionalities in ZeppelinContext
Unified the interface of algorithm functions
Table serves as input and output of ML algorithms
Support ML Pipeline
Implemented some major ML algorithms
Design and refine the algorithm implementations for higher performance
Support large scale data, running and validating in production environment
Find and solve performance and stability problems
Get the equal or better performance than Spark ML
Accumulated utilities for Flink ML developer
Basic local library: linear algebra, probabilistic, etc
Distributed computing library: statistics, parallel sort, etc
Refine data transmission, for example: memory cache training data
Like the codebase, Flink’s community growing
Now among the top 5 projects within Apache, in terms of community size
numbers are all looking very positive
1st number: dev@ activity – developer community growth
More contributions
project future discussions
2nd number: pageviews – user community growth
Constant growth over the past years
We expect these to grow in the coming years
Also because we’ve started onboarding the huge Chinese Flink community …
… 1300 people attended Flink Forward China in December this year
There’s been an effort in China to translate the Flink docs .. We are now collaborating in translating Flink’s official material
Flink website almost done
Flink docs coming now
great opportunity for Flink: documentation (and contribution opportunities) for one of the most active Flink communities in the world
Flink has thousands of users / WeChat (or DingTalk) group with thousands of members in China
foster further growth of Flink in China
bring them into the Apache world / offer infrastructure in an apache way
make the chinese community more visible to other Flink users + within the ASF
English obviously stays the official language for all developer discussions, and we are open for adding more languages
Better Jira processes
cleanup and reorganization of the components
Flinkbot, helping with the review of pull requests, and labeling PRs into components
discussion about a stricter Jira workflow, to make sure people work on things which have consensus to be accepted
the PMC is actively working on onboarding new committers to help with the amount of incoming pull requests, and the overall growing project.
Started a project for a Flink Ecosystem Portal, where people can submit their own Flink extensions for connectors, metrics reporters, file systems, API utilities etc.
This is hosted at Apache Infra, run by the Flink project and based on open source
reach out to me during the conference if you want to talk community, such as starting your own meetup group, ideas how to improve the community etc.