"In my talk, we will examine all the stages of building our self-service Streaming Data Platform based on Apache Flink and Kafka Connect, from the selection of a solution for stateful streaming data processing, right up to the successful design of a robust self-service platform, covering the challenges that we’ve met.
I will share our experience in providing non-Java developers with a company-wide self-service solution, which allows them to quickly and easily develop their streaming data pipelines.
Additionally, I will highlight specific business use cases that would not have been implemented without our platform.0 characters0 characters"
2. 20/03/2024
Data as Bedrock
20/03/2024
Kafka Summit 2024
● Exness is the largest CFD broker by
trading volume and active clients
● Every millisecond counts
● As Exness delved deeper into event-driven
architecture, the need for processing
streaming data became paramount
● Each team had to deal with processing
streaming data on their own, solving all
the problems with:
○ Scalability
○ Fault tolerance
○ State management
○ Security
2
4. 20/03/2024
Why Apache Flink?
● Support for several Kafka instances
● Performance
● Fault tolerance
● Support of very large state
● Java based framework
20/03/2024
Kafka Summit 2024 4
5. 20/03/2024
What challenges have we faced
How to provide
Python and Go
developers with a
self-service platform
based on a Java
framework?
01
How to provide
developers with a
unified deployment
process for all the
components and
make it simple?
02
How to ensure
security?
03
How to flexibly
manage and isolate
resources between
teams?
04
20/03/2024
Kafka Summit 2024 5
7. 20/03/2024
Flink SQL challenges
● Perfect for simple cases:
○ Aggregate;
○ Union data;
○ Flat data.
● Doesn’t work so perfect
with complex cases:
○ A lot of enrichments;
○ Complex business logic;
○ No tracing support.
20/03/2024
Kafka Summit 2024 7
Go developer
9. 20/03/2024
PyFlink challenges
● Lack of Apple silicon support (M1 / M2)
● No tracing support out of the box
● Necessity of both Table API and Data
Stream API usage in one PyFlink job
20/03/2024
Kafka Summit 2024 9
● At least the 1.17 version of Flink
● Tracing using:
○ OpenTelemetry;
○ Jaeger.
● Stream table environment to work with
both Table and Data Stream APIs
10. 20/03/2024
Unified deployment process
● Main components of the deployment
process:
○ Terraform;
○ GitLab pipeline;
○ K8S operators.
20/03/2024
Kafka Summit 2024 10
12. 20/03/2024
Streaming Data replication using
Terraform over Flink
● Templated Terraform modules
instead of multiple and similar
SQL artefacts
● One module defines
configuration of the whole
pipeline from Kafka topic to S3
20/03/2024
Kafka Summit 2024 12
13. 20/03/2024
One team–one Flink Cluster
● Security
● Resource management
● Own development environment
● Observability
20/03/2024
Kafka Summit 2024 13
14. 20/03/2024
Monitoring and alerting
● Separate monitoring
for each team
● Slack channels with alerts
for each Flink cluster
● One Slack channel for technical
support of all the users
20/03/2024
Kafka Summit 2024 14
15. 20/03/2024
The most important projects delivered
on our self-service platform
● Trading data processing lag
decrease from 2 hours to 2
minutes during peak times
● 1 MLN+ bots’ activity events
are prevented in real-time
● Fraud and abuse prevention
based on real-time data
● Marketing campaigns based
on real-time data
20/03/2024
Kafka Summit 2024 15
16. 20/03/2024
Special thanks to:
Kafka Summit 2024 16
https://medium.com/exness-blog
● Alexey Perminov
● Ilya Soin
● Yury Smirnov
● Igor Matcko