SlideShare a Scribd company logo
1 of 14
Download to read offline
“The State of Streaming”
Presented at: Bengaluru Streams Meetup - 17
June, 2023
A practitioner’s guide to modern data architecture
whoami
● ಬೆಂಗಳೂರು boy.
● Cofounder, handyman @
platformatory.io
● OSS → ArchLinux, Envoy, Apache
Kafka, Kong (amongst others)
● Functional Programming,
Distributed systems, Himalayas,
Karnataka Music
- https://in.linkedin.com/in/
pavankmurthy
- https://grahana.net/
- https://twitter.com/p6
TOC
- Fast data beats slow data
- Some fundamental shifts in data engineering
- The modern data stack
- Hint, it has streaming in between
- A tale of two architectures
- Lambda
- Kappa
- A view of the streaming ecosystem
- Kafka is the CNS
- Data Movement
- Stream proc will intersect converge the
operational and analytical planes
- Streaming databases is where a lot of analytical
and BI loads will move to
- Data Mesh is the new architecture paradigm for a
modern data estatehe
- The greatest beneficiary will be AI/ML
328.77 M TB/d
120 ZB/y
*Protip: Big data getting bigger and
faster.
Fast Data > Slow Data
- MTTI = Mean Time To Insight
- MTTA = Mean Time To (Insight Driven, hopefully useful) Action
Traditional Data
Architecture * just
can’t keep up with
the explosion of
data
** includes
- Warehouses
- Marts
- Lakes
- Swamps
A few foundational
shifts for the
modern
data-driven
enterprise
1. Absolutely everything leads to the cloud
2. Real-time processing will be relevant in almost all
mission critical use-cases
3. Best-in-breed platforms beat packaged platforms
4. Data fan-out at scale over point to point
connectivity
5. Domain based architecture is the only way to break
the data monolith
6. A product approach to data is not only useful but
also necessary
McKinsey: How to build a data architecture to drive innovation
Streaming is hard,
but it is worth it 1. Stream as a core primitive across operational
and analytical planes
2. Data Sourcing & Movement
3. Storage
4. Processing
5. Querying
6. Cross-cutting concerns (Security, Observability,
Governance, etc)
Some unified data infrastructure archetypes emerge: Courtesy A16z
Modern BI
Enterprise Multi-Modal processing
AI/ML
The Stories
20Trillion
events/day
400Billion
events/day
1Trillion+
evets/day
- Streaming is hotter than ever
- Apache Kafka: The de-facto protocol for
eventing
- Stream Processing Engines have finally come
off age: Apache Flink, Spark Streaming, KSQL,
Materialize, RisingWave and a whole host of
streaming SQL
- Lake-house architectures are open: Apache
Hudi, Iceberg
- Real Time Analytics now comes with a modern
flavour: Apache Pinot, Druid, Clickhouse…
- AI/ML centric ops will increasingly converge
into streaming
A practitioner’s
view and closing
notes

More Related Content

Similar to The State of Streaming.pdf

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Similar to The State of Streaming.pdf (20)

Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Spark Streaming the Industrial IoT
Spark Streaming the Industrial IoTSpark Streaming the Industrial IoT
Spark Streaming the Industrial IoT
 
Data streaming
Data streamingData streaming
Data streaming
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
Big data apache spark + scala
Big data   apache spark + scalaBig data   apache spark + scala
Big data apache spark + scala
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming...
Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming...
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 

More from AvinashUpadhyaya3

More from AvinashUpadhyaya3 (6)

Kong Workshop.pdf
Kong Workshop.pdfKong Workshop.pdf
Kong Workshop.pdf
 
A Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdfA Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdf
 
Stories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdfStories from running Kafka on K8S.pdf
Stories from running Kafka on K8S.pdf
 
Kong API Gateway.pdf
Kong API Gateway.pdfKong API Gateway.pdf
Kong API Gateway.pdf
 
Kuma + Kong
Kuma + KongKuma + Kong
Kuma + Kong
 
Introduction to Kong Plugin Development.pdf
Introduction to Kong Plugin Development.pdfIntroduction to Kong Plugin Development.pdf
Introduction to Kong Plugin Development.pdf
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

The State of Streaming.pdf

  • 1. “The State of Streaming” Presented at: Bengaluru Streams Meetup - 17 June, 2023 A practitioner’s guide to modern data architecture
  • 2. whoami ● ಬೆಂಗಳೂರು boy. ● Cofounder, handyman @ platformatory.io ● OSS → ArchLinux, Envoy, Apache Kafka, Kong (amongst others) ● Functional Programming, Distributed systems, Himalayas, Karnataka Music - https://in.linkedin.com/in/ pavankmurthy - https://grahana.net/ - https://twitter.com/p6
  • 3. TOC - Fast data beats slow data - Some fundamental shifts in data engineering - The modern data stack - Hint, it has streaming in between - A tale of two architectures - Lambda - Kappa - A view of the streaming ecosystem - Kafka is the CNS - Data Movement - Stream proc will intersect converge the operational and analytical planes - Streaming databases is where a lot of analytical and BI loads will move to - Data Mesh is the new architecture paradigm for a modern data estatehe - The greatest beneficiary will be AI/ML
  • 4. 328.77 M TB/d 120 ZB/y *Protip: Big data getting bigger and faster.
  • 5. Fast Data > Slow Data - MTTI = Mean Time To Insight - MTTA = Mean Time To (Insight Driven, hopefully useful) Action
  • 6. Traditional Data Architecture * just can’t keep up with the explosion of data ** includes - Warehouses - Marts - Lakes - Swamps
  • 7. A few foundational shifts for the modern data-driven enterprise 1. Absolutely everything leads to the cloud 2. Real-time processing will be relevant in almost all mission critical use-cases 3. Best-in-breed platforms beat packaged platforms 4. Data fan-out at scale over point to point connectivity 5. Domain based architecture is the only way to break the data monolith 6. A product approach to data is not only useful but also necessary McKinsey: How to build a data architecture to drive innovation
  • 8. Streaming is hard, but it is worth it 1. Stream as a core primitive across operational and analytical planes 2. Data Sourcing & Movement 3. Storage 4. Processing 5. Querying 6. Cross-cutting concerns (Security, Observability, Governance, etc)
  • 9. Some unified data infrastructure archetypes emerge: Courtesy A16z
  • 12. AI/ML
  • 14. - Streaming is hotter than ever - Apache Kafka: The de-facto protocol for eventing - Stream Processing Engines have finally come off age: Apache Flink, Spark Streaming, KSQL, Materialize, RisingWave and a whole host of streaming SQL - Lake-house architectures are open: Apache Hudi, Iceberg - Real Time Analytics now comes with a modern flavour: Apache Pinot, Druid, Clickhouse… - AI/ML centric ops will increasingly converge into streaming A practitioner’s view and closing notes