Stream Processing is a powerful paradigm, especially when backed by a system like Apache Flink. With each release and year, we see Flink being used for more challenging use case and applications.
But beyond the individual application (though it may be grand and challenging in itself), stream processing is a much broader building block: the foundational piece of a platform that brings together the different parts of a data architecture. A platform that integrates data analytics, data ingestion, SQL, Machine Learning, data provenance, databases, and other aspects of a data-driven infrastructure in a meaningful way.
In this keynote, we look at what goes into building a stream processing platform that is more than the sum of its parts.
8. Recall last Flink Forward…
8
Classic tiered architecture Streaming architecture
database
layer
compute
layer
application working state
+ historic state
compute
+
stream storage
and
snapshot storage
(backup)
application state
9. Changing the Two Tier Architecture
9
reads/writes across
tier boundary
asynchronous writes
of large blobs
all modifications
are local
Classic tiered architecture Streaming architecture
14. What about stateful containers?
14
Kubernetes
• Example: Scaling down a replicated database
• 3 replicas, 4 node scale down
need to move or
reorganize data
before container
shutdown
15. Stateful Questions
consistent stateful upgrades
• application evolution and bug fixes
migration of application state
• cluster migration, A/B testing
re-processing and reinstatement
• fix corrupt results, bootstrap new applications
state evolution (schema evolution)
15
A B
18. Versioned Applications, not Jobs/Jars
18
Stream Processing
Application
Version 3
Version 2
Version 1
Code and Application
Snapshot
upgrade
upgrade
New Application
Version 3a
Version 2a
fork /
duplicate
21. The Usual Suspects
Role-based access control
Metadata management
Cross Datacenter Failover /
Disaster Recovery
21
22. Support for Batch Processing
22
Everything is a stream. Finite applications as a special case.
23. Periodic Bursty Stream Processing
23
time
Bursty Event Stream (events only at end-of-day )
Checkpoint / Savepoint
Store
24. Support a Broad Developer Audience
24
Streaming Data Platform
…
25. Use Case Vertical Libraries
25
Streaming Data Platform
SQL CEP …
Machine
Learning
26. Apache Flink
Stateful stream processing
Kubernetes
Container platform
Logging
Metrics
dA
Application
Manager
Application lifecycle
management
dA Platform is a turnkey solution for stateful
stream processing with Apache Flink.