Story of how the industry evolved from enterprise messaging to big-data messaging and the path ahead.
Along the way we discuss some of the hard problems in Messaging (exactly once, transaction, ordering, partitioning etc. )
UiPath Test Automation using UiPath Test Suite series, part 1
Evolution of Big Data Messaging
1. Evolution of BigData Messaging:
A Look Back and Path Forward
Kartik Paramasivam,
Director Eng. @LinkedIn
2. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
3. The Script
1. Evolution of Messaging (and Application
Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
5. Then came (web)Services
Client Database
(Web)
Service
Request-Response
Communication
Client waits for the request to
get processed
6. Mid-Tier and Nearline Event Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Client doesn’t wait for all the
processing to be done.
Competing Consumer Model
for scaling Mid-Tier Service
7. Mid-Tier and Nearline Event Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Queues supported
significantly higher insert
rates than Databases.
Eg. 100 K writes/sec
8. Enterprise Messaging - Early Winners
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
● IBM MQ Series (later called MQ)
● Awesome Integration capabilities
● Client in every language/platform
● Microsoft Message Queue (MSMQ)
9. After Queues came Pub-Sub ...
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● Decouple producing
apps from consuming
apps.
● 1->* messaging
● Agreement is only on
the schema of the
message.
10. Pub-Sub - Early Winners
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● IBM MQ
● TIBCO
● WebLogic
● Microsoft BizTalk
Server
11. Pub-Sub - with broker side filters
(Web)-
Service
Topic
Book Flight
Service
Client
Book Hotel
Service
Logging
Service
● Typically based on
Message Headers
● SQL like
expressability
● Hierarchical Filters
(PO.*.US vs
PO.Books.*)
Msg.type = Flight
Msg.type = *
Msg.type = Hotel
12. Enterprise Messaging - Features Galore !
● Competing Consumers (Lock, Process, Ack each message)
● Poison Message Handling
○ Automatic retries
○ Automatic deadlettering of messages
● Message Headers, TTL, Priorities
● Support for Message Grouping (Sessions) by Key
● Support for Transacted Session State
● Browse messages (e.g. supporting a UI.. )
● Request Response Pattern (ReceiveByID, ReceiveByCorrelationId)
● SubQueues - Deferred Processing of Messages
● Security (ACLS, Encryption of data)
● Availability (Active-Passive clustering)
13. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
16. Rise of Big-Data Messaging : Partitioning !!
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Processors
1
2
3
Produ
cer
Send with
PartitionKey
● Client Side Offsets
● Time based
Message Retention
17. Messaging Products ..
Enterprise Messaging Big Data Messaging
Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ...
Azure : ServiceBus Queues/Topics Azure : EventHub
AWS : SQS, SNS AWS : Kinesis
Google : Cloud Pub-Sub Google : ~
18. Messaging Products ..
Enterprise Messaging Big Data Messaging
Open Source: RabbitMQ, ActiveMQ ... Open Source: Apache Kafka, Pulsar ...
Azure : ServiceBus Queues/Topics Azure : EventHub
AWS : SQS, SNS AWS : Kinesis
Google : Cloud Pub-Sub Google : ~
These cloud native messaging systems are also ‘Big-Data’
Systems per say
19. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
20. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
21. Partitioned Processor
Ordered Processing - Big Data Messaging Style
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
1
2
3
● Every Partition is a log of events
● Unique Processor per Partition
23. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
24. Tasks
Apache Samza App
(with local state)
Stateful Processing - with Big Data Messaging
Broker 2
Broker1
Partition1
Partition2
Partition3
Partition4
Partition5
Kafka Topic
Partitioned Processing
1
2
3
● Efficient Stateful Processing
(e.g. alert on avg(device-temp) > 50C)
25. Stateful Processing - with Any Message Broker
Google Pub-Sub/
Azure ServiceBus/
Kafka/Kinesis
1
2
3
● Actor Model (Orleans/Akka)
● Stream Processing Frameworks
● ..
Topi
c
“Reader
Stage”
Actors/Tasks
Reshuffle
26. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
27. Delayed Processing - Enterprise Messaging Style
Producer Consumer
Main Q
Deferred Q
Enterprise Message Broker
(e.g. Azure ServiceBus Queues)
fetch()
Defer messages that can’t be processed
28. Delayed Processing - Big-Data Messaging Style
Database (local or remote)
Hold un-processed requests
Producer
Kafka
Consumer
● Windowed Processing :
○ Messages are held till the window is complete
● Event Time based Processing and Late Arrivals :
○ Messages need to be held for a much longer period of time
even after window is ‘complete’.
29. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
d. Exactly Once Processing
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
30. The Exactly Once Processing Problem
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Two Phase
Commit
Needed ?
31. The Exactly Once Processing Problem
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Two Phase
Commit
● Enterprise Messaging Brokers and
Databases started supporting Distributed
Transactions in 90s
● Slower and Harder to Maintain
32. Standard Solution : De-duplication or Idempotent
Processing
Database
(Web)-
Service
Queue
Mid-Tier
ServiceClient
Main
Table
History
Table
Use Local Database Transactions
● Works with Big Data
Messaging !!
33. Processing Pipelines and Exactly Once Processing
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
Service
Two Phase
Commit ??
34. Local Transactions - Between Two Queues
Broker1
producer
Q1
Broker2
Q2
Consumer-
Stage-I
Consumer -
Stage-II
Q1.TRANSFER
Acknowledge, State
Send message
Exactly once transfer
A local transaction covers these operations (Azure ServiceBus)
35. Did we forget dupes in the First Stage ?
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
ServiceHistory
Table
ID based de-duplication in the Broker
(e.g. Azure ServiceBus)
36. Did we forget dupes in the First Stage ?
Client
(Web)
Service
Mid-Tier
Service
Mid-Tier
Service
History
Table
ID based de-duplication in the
Application Tier with Local Or Remote
Database
Works with Big Data
Messaging !!
37. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. Explore Hard Problems in Messaging
a. Ordered Processing
b. Stateful Processing
c. Delayed Processing
d. Exactly Once Processing
e. Protocols and APIs
4. The Cloud Factor
5. Fusion of Databases and Messaging
6. Path Ahead
38. Evolution of Protocols - Towards Standardization..
● Started with Proprietary protocols (MQSeries, MSMQ)
● AMQP (Advanced Message Queuing Protocol)
○ AMQP 1.0 is an OASIS Standard
○ Supported by ActiveMQ, QPID, Azure ServiceBus/EventHub, RabbitMQ etc.
● MQTT (started by IBM)
○ Optimized for and Popular in the IOT space (Azure/AWS IOT offerings)
● WebSockets
○ No need to punch a hole in firewall.
○ Needs AMQP/MQTT or other messaging protocol on top
● Proprietary Protocols still rule (e.g. Kafka)
39. Evolution of APIs - Standardization ??
● Started with proprietary API (MQSeries, MSMQ etc.)
● JMS (Java Message Service) APIs were the first real standard
○ Widely Supported
● Proprietary APIs (e.g. Kafka) continue to thrive along with the product
40. Stream Processing Frameworks..
Framework (e.g. Apache Samza,
Storm, Beam)
Application Logic
Kafka
Kinesis
EventHub
● APIs for the
Broker don’t
really matter
● Application
Logic coded to
the Stream
Processing
Framework
DynamoDB
Streams
41. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
44. Challenges of Cloud - Extreme Multi-tenancy
Routing Tier
Broker TierApplications
Per application
caches/IO limits
etc.
45. Cloud Native vs Open Source
Cloud Native Messaging
e.g. AWS Kinesis, SQS, SNS
Azure EventHub, ServiceBus
Google Cloud Pub-Sub
Open Source Messaging in a
Cloud Environment
e.g. Kafka/RabbitMQ/ActiveMQ on
Azure/AWS
Lower $/iops with support for extreme multi-
tenancy
More expensive (you will end up provisioning
for peaks)
Lower TCO (total cost of ownership) Higher TCO
Maybe Cloud Portable (via standardized
protocols/apis)
Cloud Portable
46. Ode to Performance : Speed always Wins !
● Optimizations
○ Batching, Pipelining, Compression,
Prefetching ..
○ Less bookkeeping in Broker is good !
○ Fire and Forget (Best effort)
● Performance<->Durability TradeOff
○ In memory replication with Lazy Flush to disk
■ Secret Weapon used by Kafka
○ Most other brokers typically flush to disk
before Ack
47. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
48. Espresso @ LinkedIn
Database as an Event Source
Client
Database
Front-End
Application 1
Application 2
Database
Front-End
DB Change Events
Kafka Topic
49. Database as an Event Source
● Espresso@LinkedIn exposed as a Kafka event stream
● AWS DynamoDB Streams
● Azure CosmosDB Changes
● Oracle GoldenGate with BigData connectors
● ...
50. The Script
1. Evolution of Messaging (and Application Architecture)
2. Rise of Big Data Messaging
3. The Cloud Factor
4. Fusion of Databases and Messaging
5. Path Ahead
51. ‘Featurification’ of Big-Data Messaging: Kafka
Recent Past
● Coordination without ZK
● Message Headers
● Log Compaction
● Distributed Transactions
● On Demand Message Purge
Future
● Where there are headers there will be
filtering :)
● Better support for Global Topics - write
once - read everywhere.
52. Big Data Messaging is a Storage Problem
● Reliability Issues :
○ Hot Partitions
● Operating At Scale
○ Easy expansion of clusters
○ Automated Dealing of Hardware Failures
○ Easy mechanism to keep machine usage balanced in a cluster
● Cost
○ Efficient Storage - Erasure Coding ?
○ Efficient Storage - Background Compression ?
53. Need for Cloud Native Open Source Messaging
(e.g. Kafka on Azure)
● Kafka brokers running on
Azure compute nodes each
attached to a managed
Azure disk
● 9X disk space : 3 way
replication done by each
Azure disk on top of Kafka
replication $$ Azure Disk (3-
way replicated)
Replicas
followers
fetching and
write to Azure
Disk
Leader writes to
Azure disk
54. Ultra Low Latency Messaging
Producer
Consumer
Consumer
● Popular in the Financial
Sector
● Microsecond latencies
● Typically Best effort
messaging (eg. ZeroMQ)
● Guaranteed messaging
flavors exist (eg. 29West)
Broker-less P2P Messaging
55. Ultra Low Latency Messaging
Producer
Consumer
Consumer
● Popular in the Financial
Sector
● Microsecond latencies
● Typically Best effort
messaging (eg. ZeroMQ)
● Guaranteed messaging
flavors exist (eg. 29West)
Broker-less P2P Messaging
Industry Trend : Non
Volatile Memory will change
the game