SlideShare a Scribd company logo
Evolution of BigData Messaging:
A Look Back and Path Forward
Kartik Paramasivam,
Director Eng. @LinkedIn
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
The Script
1. Evolution of Messaging (and Application
Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Things Started Simple
Client
Database
Then came (web)Services
Client Database
(Web)
Service
Request-Response
Communication
Client waits for the request to
get processed
Mid-Tier and Nearline Event Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Client doesn’t wait for all the
processing to be done.
Competing Consumer Model
for scaling Mid-Tier Service
Mid-Tier and Nearline Event Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Queues supported
significantly higher insert
rates than Databases.
Eg. 100 K writes/sec
Enterprise Messaging - Early Winners
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
● IBM MQ Series (later called MQ)
● Awesome Integration capabilities
● Client in every language/platform
● Microsoft Message Queue (MSMQ)
After Queues came Pub-Sub ...
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● Decouple producing
apps from consuming
apps.
● 1->* messaging
● Agreement is only on
the schema of the
message.
Pub-Sub - Early Winners
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● IBM MQ
● TIBCO
● WebLogic
● Microsoft BizTalk
Server
Pub-Sub - with broker side filters
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● Typically based on
Message Headers
● SQL like
expressability
● Hierarchical Filters
(PO.*.US vs
PO.Books.*)
Msg.type = Flight
Msg.type = *
Msg.type = Hotel
Enterprise Messaging - Features Galore !
● Competing Consumers (Lock, Process, Ack each message)
● Poison Message Handling
○ Automatic retries
○ Automatic deadlettering of messages
● Message Headers, TTL, Priorities
● Support for Message Grouping (Sessions) by Key
● Support for Transacted Session State
● Browse messages (e.g. supporting a UI.. )
● Request Response Pattern (ReceiveByID, ReceiveByCorrelationId)
● SubQueues - Deferred Processing of Messages
● Security (ACLS, Encryption of data)
● Availability (Active-Passive clustering)
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Rise of Big-Data Messaging : Partitioning !!
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Processors
1
2
3
Produ
cer
Send with
PartitionKey
Rise of Big-Data Messaging : Partitioning !!
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Processors
1
2
3
Produ
cer
Send with
PartitionKey
● Client Side Offsets
Rise of Big-Data Messaging : Partitioning !!
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Processors
1
2
3
Produ
cer
Send with
PartitionKey
● Client Side Offsets
● Time based
Message Retention
Messaging Products ..
Enterprise Messaging Big Data Messaging
Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ...
Azure : ServiceBus Queues/Topics Azure : EventHub
AWS : SQS, SNS AWS : Kinesis
Google : Cloud Pub-Sub Google : ~
Messaging Products ..
Enterprise Messaging Big Data Messaging
Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ...
Azure : ServiceBus Queues/Topics Azure : EventHub
AWS : SQS, SNS AWS : Kinesis
Google : Cloud Pub-Sub Google : ~
These cloud native messaging systems are also ‘Big-Data’
Systems per say
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Partitioned Processor
Ordered Processing - Big Data Messaging Style
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
1
2
3
● Every Partition is a log of events
● Unique Processor per Partition
Ordered Processing - Enterprise Messaging Style
Azure ServiceBus Queue
Processor1
Processor2
Processor3
● Processors compete on
Sessions (message groups)
● Ordered processing within
a session
Session1
Session2
Session3
Application Instances
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Tasks
Apache Samza App
(with local state)
Stateful Processing - with Big Data Messaging
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Partitioned Processing
1
2
3
● Efficient Stateful Processing
(e.g. alert on avg(device-temp) > 50C)
Stateful Processing - with Any Message Broker
Google Pub-Sub/
Azure ServiceBus/
Kafka/Kinesis
1
2
3
● Actor Model (Orleans/Akka)
● Stream Processing Frameworks
● ..
Topi
c
“Reader
Stage”
Actors/Tasks
Reshuffle
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Delayed Processing - Enterprise Messaging Style
Producer Consumer
Main Q
Deferred Q
Enterprise Message Broker
(e.g. Azure ServiceBus Queues)
fetch()
Defer messages that can’t be processed
Delayed Processing - Big-Data Messaging Style
Database (local or remote)
Hold un-processed requests
Producer
Kafka
Consumer
● Windowed Processing :
○ Messages are held till the window is complete
● Event Time based Processing and Late Arrivals :
○ Messages need to be held for a much longer period of time
even after window is ‘complete’.
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
d. Exactly Once Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
The Exactly Once Processing Problem
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Two Phase
Commit
Needed ?
The Exactly Once Processing Problem
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Two Phase
Commit
● Enterprise Messaging Brokers and
Databases started supporting Distributed
Transactions in 90s
● Slower and Harder to Maintain
Standard Solution : De-duplication or Idempotent
Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Main
Table
History
Table
Use Local Database Transactions
● Works with Big Data
Messaging !!
Processing Pipelines and Exactly Once Processing
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
Service
Two Phase
Commit ??
Local Transactions - Between Two Queues
Broker1
producer
Q1
Broker2
Q2
Consumer-
Stage-I
Consumer -
Stage-II
Q1.TRANSFER
Acknowledge, State
Send message
Exactly once transfer
A local transaction covers these operations (Azure ServiceBus)
Did we forget dupes in the First Stage ?
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
ServiceHistory
Table
ID based de-duplication in the Broker
(e.g. Azure ServiceBus)
Did we forget dupes in the First Stage ?
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
Service
History
Table
ID based de-duplication in the
Application Tier with Local Or Remote
Database
Works with Big Data
Messaging !!
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
d. Exactly Once Processing
e. Protocols and APIs
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
Evolution of Protocols - Towards Standardization..
● Started with Proprietary protocols (MQSeries, MSMQ)
● AMQP (Advanced Message Queuing Protocol)
○ AMQP 1.0 is an OASIS Standard
○ Supported by ActiveMQ, QPID, Azure ServiceBus/EventHub, RabbitMQ etc.
● MQTT (started by IBM)
○ Optimized for and Popular in the IOT space (Azure/AWS IOT offerings)
● WebSockets
○ No need to punch a hole in firewall.
○ Needs AMQP/MQTT or other messaging protocol on top
● Proprietary Protocols still rule (e.g. Kafka)
Evolution of APIs - Standardization ??
● Started with proprietary API (MQSeries, MSMQ etc.)
● JMS (Java Message Service) APIs were the first real standard
○ Widely Supported
● Proprietary APIs (e.g. Kafka) continue to thrive along with the product
Stream Processing Frameworks..
Framework (e.g. Apache Samza,
Storm, Beam)
Application Logic
Kafka
Kinesis
EventHub
● APIs for the
Broker don’t
really matter
● Application
Logic coded to
the Stream
Processing
Framework
DynamoDB
Streams
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
Challenges of Cloud - Extreme Multi-tenancy
Routing Tier
Broker Tier
Applications
Challenges of Cloud - Extreme Multi-tenancy
Routing Tier
Broker TierApplications
Quotas/DOS
prevention
Challenges of Cloud - Extreme Multi-tenancy
Routing Tier
Broker TierApplications
Per application
caches/IO limits
etc.
Cloud Native vs Open Source
Cloud Native Messaging
e.g. AWS Kinesis, SQS, SNS
Azure EventHub, ServiceBus
Google Cloud Pub-Sub
Open Source Messaging in a
Cloud Environment
e.g. Kafka/RabbitMQ/ActiveMQ on
Azure/AWS
Lower $/iops with support for extreme multi-
tenancy
More expensive (you will end up provisioning
for peaks)
Lower TCO (total cost of ownership) Higher TCO
Maybe Cloud Portable (via standardized
protocols/apis)
Cloud Portable
Ode to Performance : Speed always Wins !
● Optimizations
○ Batching, Pipelining, Compression,
Prefetching ..
○ Less bookkeeping in Broker is good !
○ Fire and Forget (Best effort)
● Performance<->Durability TradeOff
○ In memory replication with Lazy Flush to disk
■ Secret Weapon used by Kafka
○ Most other brokers typically flush to disk
before Ack
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
Espresso @ LinkedIn
Database as an Event Source
Client
Database
Front-End
Application 1
Application 2
Database
Front-End
DB Change Events
Kafka Topic
Database as an Event Source
● Espresso@LinkedIn exposed as a Kafka event stream
● AWS DynamoDB Streams
● Azure CosmosDB Changes
● Oracle GoldenGate with BigData connectors
● ...
The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
‘Featurification’ of Big-Data Messaging: Kafka
Recent Past
● Coordination without ZK
● Message Headers
● Log Compaction
● Distributed Transactions
● On Demand Message Purge
Future
● Where there are headers there will be
filtering :)
● Better support for Global Topics - write
once - read everywhere.
Big Data Messaging is a Storage Problem
● Reliability Issues :
○ Hot Partitions
● Operating At Scale
○ Easy expansion of clusters
○ Automated Dealing of Hardware Failures
○ Easy mechanism to keep machine usage balanced in a cluster
● Cost
○ Efficient Storage - Erasure Coding ?
○ Efficient Storage - Background Compression ?
Need for Cloud Native Open Source Messaging
(e.g. Kafka on Azure)
● Kafka brokers running on
Azure compute nodes each
attached to a managed
Azure disk
● 9X disk space : 3 way
replication done by each
Azure disk on top of Kafka
replication $$ Azure Disk (3-
way replicated)
Replicas
followers
fetching and
write to Azure
Disk
Leader writes to
Azure disk
Ultra Low Latency Messaging
Producer
Consumer
Consumer
● Popular in the Financial
Sector
● Microsecond latencies
● Typically Best effort
messaging (eg. ZeroMQ)
● Guaranteed messaging
flavors exist (eg. 29West)
Broker-less P2P Messaging
Ultra Low Latency Messaging
Producer
Consumer
Consumer
● Popular in the Financial
Sector
● Microsecond latencies
● Typically Best effort
messaging (eg. ZeroMQ)
● Guaranteed messaging
flavors exist (eg. 29West)
Broker-less P2P Messaging
Industry Trend : Non
Volatile Memory will change
the game
IOT Friendly Messaging
Billions of Devices
● Based on IOT friendly
protocols (e.g. MQTT)
● Billions of Small Queues
Thank You !

More Related Content

What's hot

The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpHostedbyConfluent
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellN Masahiro
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Araf Karsh Hamid
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Kai Wähner
 

What's hot (20)

The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Fluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshellFluentd v1.0 in a nutshell
Fluentd v1.0 in a nutshell
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security Apache Ranger Hive Metastore Security
Apache Ranger Hive Metastore Security
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Evolution of Big Data Messaging

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaRicardo Bravo
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleApache Kafka TLV
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...confluent
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integrationprajods
 
Gib 2021 - Intro to BizTalk Migrator
Gib 2021 - Intro to BizTalk MigratorGib 2021 - Intro to BizTalk Migrator
Gib 2021 - Intro to BizTalk MigratorDaniel Toomey
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)Henning Spjelkavik
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices confluent
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Running a Megasite on Microsoft Technologies
Running a Megasite on Microsoft TechnologiesRunning a Megasite on Microsoft Technologies
Running a Megasite on Microsoft Technologiesgoodfriday
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesCisco DevNet
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...HostedbyConfluent
 

Similar to Evolution of Big Data Messaging (20)

Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
 
Gib 2021 - Intro to BizTalk Migrator
Gib 2021 - Intro to BizTalk MigratorGib 2021 - Intro to BizTalk Migrator
Gib 2021 - Intro to BizTalk Migrator
 
101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)101 mistakes FINN.no has made with Kafka (Baksida meetup)
101 mistakes FINN.no has made with Kafka (Baksida meetup)
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Event driven-arch
Event driven-archEvent driven-arch
Event driven-arch
 
Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices Application Modernisation through Event-Driven Microservices
Application Modernisation through Event-Driven Microservices
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Running a Megasite on Microsoft Technologies
Running a Megasite on Microsoft TechnologiesRunning a Megasite on Microsoft Technologies
Running a Megasite on Microsoft Technologies
 
MANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData ServicesMANTL Data Platform, Microservices and BigData Services
MANTL Data Platform, Microservices and BigData Services
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
 

Recently uploaded

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...CzechDreamin
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1DianaGray10
 

Recently uploaded (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 

Evolution of Big Data Messaging

  • 1. Evolution of BigData Messaging: A Look Back and Path Forward Kartik Paramasivam, Director Eng. @LinkedIn
  • 2. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 3. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 5. Then came (web)Services Client Database (Web) Service Request-Response Communication Client waits for the request to get processed
  • 6. Mid-Tier and Nearline Event Processing Database (Web)- Service Queue Mid-Tier ServiceClient Client doesn’t wait for all the processing to be done. Competing Consumer Model for scaling Mid-Tier Service
  • 7. Mid-Tier and Nearline Event Processing Database (Web)- Service Queue Mid-Tier ServiceClient Queues supported significantly higher insert rates than Databases. Eg. 100 K writes/sec
  • 8. Enterprise Messaging - Early Winners Database (Web)- Service Queue Mid-Tier ServiceClient ● IBM MQ Series (later called MQ) ● Awesome Integration capabilities ● Client in every language/platform ● Microsoft Message Queue (MSMQ)
  • 9. After Queues came Pub-Sub ... (Web)- Service Topic Book Flight Service Client Book Hotel Service Logging Service ● Decouple producing apps from consuming apps. ● 1->* messaging ● Agreement is only on the schema of the message.
  • 10. Pub-Sub - Early Winners (Web)- Service Topic Book Flight Service Client Book Hotel Service Logging Service ● IBM MQ ● TIBCO ● WebLogic ● Microsoft BizTalk Server
  • 11. Pub-Sub - with broker side filters (Web)- Service Topic Book Flight Service Client Book Hotel Service Logging Service ● Typically based on Message Headers ● SQL like expressability ● Hierarchical Filters (PO.*.US vs PO.Books.*) Msg.type = Flight Msg.type = * Msg.type = Hotel
  • 12. Enterprise Messaging - Features Galore ! ● Competing Consumers (Lock, Process, Ack each message) ● Poison Message Handling ○ Automatic retries ○ Automatic deadlettering of messages ● Message Headers, TTL, Priorities ● Support for Message Grouping (Sessions) by Key ● Support for Transacted Session State ● Browse messages (e.g. supporting a UI.. ) ● Request Response Pattern (ReceiveByID, ReceiveByCorrelationId) ● SubQueues - Deferred Processing of Messages ● Security (ACLS, Encryption of data) ● Availability (Active-Passive clustering)
  • 13. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 14. Rise of Big-Data Messaging : Partitioning !! Broker 2 Broker1 Partition1 Partition2 Partition3 Partition4 Partition5 Kafka Topic Processors 1 2 3 Produ cer Send with PartitionKey
  • 15. Rise of Big-Data Messaging : Partitioning !! Broker 2 Broker1 Partition1 Partition2 Partition3 Partition4 Partition5 Kafka Topic Processors 1 2 3 Produ cer Send with PartitionKey ● Client Side Offsets
  • 16. Rise of Big-Data Messaging : Partitioning !! Broker 2 Broker1 Partition1 Partition2 Partition3 Partition4 Partition5 Kafka Topic Processors 1 2 3 Produ cer Send with PartitionKey ● Client Side Offsets ● Time based Message Retention
  • 17. Messaging Products .. Enterprise Messaging Big Data Messaging Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ... Azure : ServiceBus Queues/Topics Azure : EventHub AWS : SQS, SNS AWS : Kinesis Google : Cloud Pub-Sub Google : ~
  • 18. Messaging Products .. Enterprise Messaging Big Data Messaging Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ... Azure : ServiceBus Queues/Topics Azure : EventHub AWS : SQS, SNS AWS : Kinesis Google : Cloud Pub-Sub Google : ~ These cloud native messaging systems are also ‘Big-Data’ Systems per say
  • 19. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 20. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging a. Ordered Processing 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 21. Partitioned Processor Ordered Processing - Big Data Messaging Style Broker 2 Broker1 Partition1 Partition2 Partition3 Partition4 Partition5 Kafka Topic 1 2 3 ● Every Partition is a log of events ● Unique Processor per Partition
  • 22. Ordered Processing - Enterprise Messaging Style Azure ServiceBus Queue Processor1 Processor2 Processor3 ● Processors compete on Sessions (message groups) ● Ordered processing within a session Session1 Session2 Session3 Application Instances
  • 23. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging a. Ordered Processing b. Stateful Processing 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 24. Tasks Apache Samza App (with local state) Stateful Processing - with Big Data Messaging Broker 2 Broker1 Partition1 Partition2 Partition3 Partition4 Partition5 Kafka Topic Partitioned Processing 1 2 3 ● Efficient Stateful Processing (e.g. alert on avg(device-temp) > 50C)
  • 25. Stateful Processing - with Any Message Broker Google Pub-Sub/ Azure ServiceBus/ Kafka/Kinesis 1 2 3 ● Actor Model (Orleans/Akka) ● Stream Processing Frameworks ● .. Topi c “Reader Stage” Actors/Tasks Reshuffle
  • 26. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging a. Ordered Processing b. Stateful Processing c. Delayed Processing 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 27. Delayed Processing - Enterprise Messaging Style Producer Consumer Main Q Deferred Q Enterprise Message Broker (e.g. Azure ServiceBus Queues) fetch() Defer messages that can’t be processed
  • 28. Delayed Processing - Big-Data Messaging Style Database (local or remote) Hold un-processed requests Producer Kafka Consumer ● Windowed Processing : ○ Messages are held till the window is complete ● Event Time based Processing and Late Arrivals : ○ Messages need to be held for a much longer period of time even after window is ‘complete’.
  • 29. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging a. Ordered Processing b. Stateful Processing c. Delayed Processing d. Exactly Once Processing 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 30. The Exactly Once Processing Problem Database (Web)- Service Queue Mid-Tier ServiceClient Two Phase Commit Needed ?
  • 31. The Exactly Once Processing Problem Database (Web)- Service Queue Mid-Tier ServiceClient Two Phase Commit ● Enterprise Messaging Brokers and Databases started supporting Distributed Transactions in 90s ● Slower and Harder to Maintain
  • 32. Standard Solution : De-duplication or Idempotent Processing Database (Web)- Service Queue Mid-Tier ServiceClient Main Table History Table Use Local Database Transactions ● Works with Big Data Messaging !!
  • 33. Processing Pipelines and Exactly Once Processing Client (Web) Service Mid-Tier Service Mid-Tier Service Two Phase Commit ??
  • 34. Local Transactions - Between Two Queues Broker1 producer Q1 Broker2 Q2 Consumer- Stage-I Consumer - Stage-II Q1.TRANSFER Acknowledge, State Send message Exactly once transfer A local transaction covers these operations (Azure ServiceBus)
  • 35. Did we forget dupes in the First Stage ? Client (Web) Service Mid-Tier Service Mid-Tier ServiceHistory Table ID based de-duplication in the Broker (e.g. Azure ServiceBus)
  • 36. Did we forget dupes in the First Stage ? Client (Web) Service Mid-Tier Service Mid-Tier Service History Table ID based de-duplication in the Application Tier with Local Or Remote Database Works with Big Data Messaging !!
  • 37. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. Explore Hard Problems in Messaging a. Ordered Processing b. Stateful Processing c. Delayed Processing d. Exactly Once Processing e. Protocols and APIs 4. The Cloud Factor 5. Fusion of Databases and Messaging 6. Path Ahead
  • 38. Evolution of Protocols - Towards Standardization.. ● Started with Proprietary protocols (MQSeries, MSMQ) ● AMQP (Advanced Message Queuing Protocol) ○ AMQP 1.0 is an OASIS Standard ○ Supported by ActiveMQ, QPID, Azure ServiceBus/EventHub, RabbitMQ etc. ● MQTT (started by IBM) ○ Optimized for and Popular in the IOT space (Azure/AWS IOT offerings) ● WebSockets ○ No need to punch a hole in firewall. ○ Needs AMQP/MQTT or other messaging protocol on top ● Proprietary Protocols still rule (e.g. Kafka)
  • 39. Evolution of APIs - Standardization ?? ● Started with proprietary API (MQSeries, MSMQ etc.) ● JMS (Java Message Service) APIs were the first real standard ○ Widely Supported ● Proprietary APIs (e.g. Kafka) continue to thrive along with the product
  • 40. Stream Processing Frameworks.. Framework (e.g. Apache Samza, Storm, Beam) Application Logic Kafka Kinesis EventHub ● APIs for the Broker don’t really matter ● Application Logic coded to the Stream Processing Framework DynamoDB Streams
  • 41. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. The Cloud Factor 4. Fusion of Databases and Messaging 5. Path Ahead
  • 42. Challenges of Cloud - Extreme Multi-tenancy Routing Tier Broker Tier Applications
  • 43. Challenges of Cloud - Extreme Multi-tenancy Routing Tier Broker TierApplications Quotas/DOS prevention
  • 44. Challenges of Cloud - Extreme Multi-tenancy Routing Tier Broker TierApplications Per application caches/IO limits etc.
  • 45. Cloud Native vs Open Source Cloud Native Messaging e.g. AWS Kinesis, SQS, SNS Azure EventHub, ServiceBus Google Cloud Pub-Sub Open Source Messaging in a Cloud Environment e.g. Kafka/RabbitMQ/ActiveMQ on Azure/AWS Lower $/iops with support for extreme multi- tenancy More expensive (you will end up provisioning for peaks) Lower TCO (total cost of ownership) Higher TCO Maybe Cloud Portable (via standardized protocols/apis) Cloud Portable
  • 46. Ode to Performance : Speed always Wins ! ● Optimizations ○ Batching, Pipelining, Compression, Prefetching .. ○ Less bookkeeping in Broker is good ! ○ Fire and Forget (Best effort) ● Performance<->Durability TradeOff ○ In memory replication with Lazy Flush to disk ■ Secret Weapon used by Kafka ○ Most other brokers typically flush to disk before Ack
  • 47. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. The Cloud Factor 4. Fusion of Databases and Messaging 5. Path Ahead
  • 48. Espresso @ LinkedIn Database as an Event Source Client Database Front-End Application 1 Application 2 Database Front-End DB Change Events Kafka Topic
  • 49. Database as an Event Source ● Espresso@LinkedIn exposed as a Kafka event stream ● AWS DynamoDB Streams ● Azure CosmosDB Changes ● Oracle GoldenGate with BigData connectors ● ...
  • 50. The Script 1. Evolution of Messaging (and Application Architecture) 2. Rise of Big Data Messaging 3. The Cloud Factor 4. Fusion of Databases and Messaging 5. Path Ahead
  • 51. ‘Featurification’ of Big-Data Messaging: Kafka Recent Past ● Coordination without ZK ● Message Headers ● Log Compaction ● Distributed Transactions ● On Demand Message Purge Future ● Where there are headers there will be filtering :) ● Better support for Global Topics - write once - read everywhere.
  • 52. Big Data Messaging is a Storage Problem ● Reliability Issues : ○ Hot Partitions ● Operating At Scale ○ Easy expansion of clusters ○ Automated Dealing of Hardware Failures ○ Easy mechanism to keep machine usage balanced in a cluster ● Cost ○ Efficient Storage - Erasure Coding ? ○ Efficient Storage - Background Compression ?
  • 53. Need for Cloud Native Open Source Messaging (e.g. Kafka on Azure) ● Kafka brokers running on Azure compute nodes each attached to a managed Azure disk ● 9X disk space : 3 way replication done by each Azure disk on top of Kafka replication $$ Azure Disk (3- way replicated) Replicas followers fetching and write to Azure Disk Leader writes to Azure disk
  • 54. Ultra Low Latency Messaging Producer Consumer Consumer ● Popular in the Financial Sector ● Microsecond latencies ● Typically Best effort messaging (eg. ZeroMQ) ● Guaranteed messaging flavors exist (eg. 29West) Broker-less P2P Messaging
  • 55. Ultra Low Latency Messaging Producer Consumer Consumer ● Popular in the Financial Sector ● Microsecond latencies ● Typically Best effort messaging (eg. ZeroMQ) ● Guaranteed messaging flavors exist (eg. 29West) Broker-less P2P Messaging Industry Trend : Non Volatile Memory will change the game
  • 56. IOT Friendly Messaging Billions of Devices ● Based on IOT friendly protocols (e.g. MQTT) ● Billions of Small Queues