“The State of Streaming”
Presented at: Bengaluru Streams Meetup - 17
June, 2023
A practitioner’s guide to modern data architecture
whoami
● ಬೆಂಗಳೂರು boy.
● Cofounder, handyman @
platformatory.io
● OSS → ArchLinux, Envoy, Apache
Kafka, Kong (amongst others)
● Functional Programming,
Distributed systems, Himalayas,
Karnataka Music
- https://in.linkedin.com/in/
pavankmurthy
- https://grahana.net/
- https://twitter.com/p6
TOC
- Fast data beats slow data
- Some fundamental shifts in data engineering
- The modern data stack
- Hint, it has streaming in between
- A tale of two architectures
- Lambda
- Kappa
- A view of the streaming ecosystem
- Kafka is the CNS
- Data Movement
- Stream proc will intersect converge the
operational and analytical planes
- Streaming databases is where a lot of analytical
and BI loads will move to
- Data Mesh is the new architecture paradigm for a
modern data estatehe
- The greatest beneficiary will be AI/ML
328.77 M TB/d
120 ZB/y
*Protip: Big data getting bigger and
faster.
Fast Data > Slow Data
- MTTI = Mean Time To Insight
- MTTA = Mean Time To (Insight Driven, hopefully useful) Action
Traditional Data
Architecture * just
can’t keep up with
the explosion of
data
** includes
- Warehouses
- Marts
- Lakes
- Swamps
A few foundational
shifts for the
modern
data-driven
enterprise
1. Absolutely everything leads to the cloud
2. Real-time processing will be relevant in almost all
mission critical use-cases
3. Best-in-breed platforms beat packaged platforms
4. Data fan-out at scale over point to point
connectivity
5. Domain based architecture is the only way to break
the data monolith
6. A product approach to data is not only useful but
also necessary
McKinsey: How to build a data architecture to drive innovation
Streaming is hard,
but it is worth it 1. Stream as a core primitive across operational
and analytical planes
2. Data Sourcing & Movement
3. Storage
4. Processing
5. Querying
6. Cross-cutting concerns (Security, Observability,
Governance, etc)
Some unified data infrastructure archetypes emerge: Courtesy A16z
Modern BI
Enterprise Multi-Modal processing
AI/ML
The Stories
20Trillion
events/day
400Billion
events/day
1Trillion+
evets/day
- Streaming is hotter than ever
- Apache Kafka: The de-facto protocol for
eventing
- Stream Processing Engines have finally come
off age: Apache Flink, Spark Streaming, KSQL,
Materialize, RisingWave and a whole host of
streaming SQL
- Lake-house architectures are open: Apache
Hudi, Iceberg
- Real Time Analytics now comes with a modern
flavour: Apache Pinot, Druid, Clickhouse…
- AI/ML centric ops will increasingly converge
into streaming
A practitioner’s
view and closing
notes

The State of Streaming.pdf

  • 1.
    “The State ofStreaming” Presented at: Bengaluru Streams Meetup - 17 June, 2023 A practitioner’s guide to modern data architecture
  • 2.
    whoami ● ಬೆಂಗಳೂರು boy. ●Cofounder, handyman @ platformatory.io ● OSS → ArchLinux, Envoy, Apache Kafka, Kong (amongst others) ● Functional Programming, Distributed systems, Himalayas, Karnataka Music - https://in.linkedin.com/in/ pavankmurthy - https://grahana.net/ - https://twitter.com/p6
  • 3.
    TOC - Fast databeats slow data - Some fundamental shifts in data engineering - The modern data stack - Hint, it has streaming in between - A tale of two architectures - Lambda - Kappa - A view of the streaming ecosystem - Kafka is the CNS - Data Movement - Stream proc will intersect converge the operational and analytical planes - Streaming databases is where a lot of analytical and BI loads will move to - Data Mesh is the new architecture paradigm for a modern data estatehe - The greatest beneficiary will be AI/ML
  • 4.
    328.77 M TB/d 120ZB/y *Protip: Big data getting bigger and faster.
  • 5.
    Fast Data >Slow Data - MTTI = Mean Time To Insight - MTTA = Mean Time To (Insight Driven, hopefully useful) Action
  • 6.
    Traditional Data Architecture *just can’t keep up with the explosion of data ** includes - Warehouses - Marts - Lakes - Swamps
  • 7.
    A few foundational shiftsfor the modern data-driven enterprise 1. Absolutely everything leads to the cloud 2. Real-time processing will be relevant in almost all mission critical use-cases 3. Best-in-breed platforms beat packaged platforms 4. Data fan-out at scale over point to point connectivity 5. Domain based architecture is the only way to break the data monolith 6. A product approach to data is not only useful but also necessary McKinsey: How to build a data architecture to drive innovation
  • 8.
    Streaming is hard, butit is worth it 1. Stream as a core primitive across operational and analytical planes 2. Data Sourcing & Movement 3. Storage 4. Processing 5. Querying 6. Cross-cutting concerns (Security, Observability, Governance, etc)
  • 9.
    Some unified datainfrastructure archetypes emerge: Courtesy A16z
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    - Streaming ishotter than ever - Apache Kafka: The de-facto protocol for eventing - Stream Processing Engines have finally come off age: Apache Flink, Spark Streaming, KSQL, Materialize, RisingWave and a whole host of streaming SQL - Lake-house architectures are open: Apache Hudi, Iceberg - Real Time Analytics now comes with a modern flavour: Apache Pinot, Druid, Clickhouse… - AI/ML centric ops will increasingly converge into streaming A practitioner’s view and closing notes