Slides from the talk The State of Streaming & Modern Data by Pavan Keshavamurthy of Platformatory at Bengaluru Streams June 2023 meetup.
A navigator to the initiated and uninitiated into the world of streaming: Cover the breadth of drivers, reference architectures, blueprints and tooling within the Apache Fn ecosystem
3. TOC
- Fast data beats slow data
- Some fundamental shifts in data engineering
- The modern data stack
- Hint, it has streaming in between
- A tale of two architectures
- Lambda
- Kappa
- A view of the streaming ecosystem
- Kafka is the CNS
- Data Movement
- Stream proc will intersect converge the
operational and analytical planes
- Streaming databases is where a lot of analytical
and BI loads will move to
- Data Mesh is the new architecture paradigm for a
modern data estatehe
- The greatest beneficiary will be AI/ML
5. Fast Data > Slow Data
- MTTI = Mean Time To Insight
- MTTA = Mean Time To (Insight Driven, hopefully useful) Action
6. Traditional Data
Architecture * just
can’t keep up with
the explosion of
data
** includes
- Warehouses
- Marts
- Lakes
- Swamps
7. A few foundational
shifts for the
modern
data-driven
enterprise
1. Absolutely everything leads to the cloud
2. Real-time processing will be relevant in almost all
mission critical use-cases
3. Best-in-breed platforms beat packaged platforms
4. Data fan-out at scale over point to point
connectivity
5. Domain based architecture is the only way to break
the data monolith
6. A product approach to data is not only useful but
also necessary
McKinsey: How to build a data architecture to drive innovation
8. Streaming is hard,
but it is worth it 1. Stream as a core primitive across operational
and analytical planes
2. Data Sourcing & Movement
3. Storage
4. Processing
5. Querying
6. Cross-cutting concerns (Security, Observability,
Governance, etc)
14. - Streaming is hotter than ever
- Apache Kafka: The de-facto protocol for
eventing
- Stream Processing Engines have finally come
off age: Apache Flink, Spark Streaming, KSQL,
Materialize, RisingWave and a whole host of
streaming SQL
- Lake-house architectures are open: Apache
Hudi, Iceberg
- Real Time Analytics now comes with a modern
flavour: Apache Pinot, Druid, Clickhouse…
- AI/ML centric ops will increasingly converge
into streaming
A practitioner’s
view and closing
notes